Pony Alpha Unmasked: GLM-5 is Zhipu AI's Open-Source Challenger to Claude and GPT#

For the past week, the AI community has been obsessed with a mystery. A stealth model called "Pony Alpha" appeared on OpenRouter on February 6, 2026 — no branding, no company name, just raw performance that had developers losing their minds on Reddit.

Now the secret is out: Pony Alpha is GLM-5, the next-generation flagship model from Zhipu AI (Z.ai), and it's one of the most impressive open-source AI models ever released.

The Great AI Mystery: What Was Pony Alpha?#

When Pony Alpha first appeared on OpenRouter, nobody knew where it came from. But the clues were there:

The release timing coincided with Zhipu AI's announcement that GLM-5 would launch around Chinese New Year
The output style matched the GLM series
When prompted about its origins, the model identified itself as a GLM model developed by Zhipu
"Pony" references the Year of the Horse in the Chinese zodiac — a playful nod from the team

Reddit threads exploded with speculation. Was it DeepSeek V4? A secret OpenAI experiment? The reveal sent Zhipu AI's stock price surging 60% in two days, pushing its market cap past HKD 150 billion (~$19 billion USD).

GLM-5 by the Numbers#

GLM-5 is a massive leap from its predecessor GLM-4.7. Here's what's under the hood:

Spec	GLM-5	GLM-4.7 (Previous)
Total Parameters	745B	355B
Active Parameters	44B (MoE)	32B
Expert Configuration	256 total / 8 active	—
Context Window	200K tokens	128K
Pre-training Data	28.5T tokens	23T
Training Hardware	Huawei Ascend	Huawei Ascend
License	MIT (Open Source)	MIT

The Mixture of Experts (MoE) architecture means that while GLM-5 has 745 billion total parameters, only 44 billion are active per inference — keeping it fast and cost-efficient despite its massive scale.

Benchmark Performance: Competing with the Best#

GLM-5 doesn't just compete with open-source models — it goes toe-to-toe with the most expensive proprietary models on the planet.

Coding Benchmarks#

Benchmark	GLM-5	Claude Opus 4.5	GPT-5.2
SWE-bench Verified	77.8	80.9	80.0
SWE-bench Multilingual	73.3	77.5	—
Terminal-Bench 2.0	56.2	59.3	54.0

Reasoning Benchmarks#

Benchmark	GLM-5
AIME 2026	92.7
GPQA-Diamond	86.0
Humanity's Last Exam	30.5 (with tools: 50.4)

Agentic Performance#

Benchmark	GLM-5	GPT-5.2
BrowseComp	75.9	65.8
Vending Bench 2	$4,432 (#1 open-source)	—
MCP-Atlas Public Set	67.8	—

On BrowseComp (web browsing tasks), GLM-5 actually surpasses GPT-5.2 by a significant margin. It also achieved a record-low hallucination rate according to Z.ai's testing.

What Makes GLM-5 Special?#

1. The "Slime" Reinforcement Learning Framework#

GLM-5 introduces a novel open-source RL infrastructure called Slime. Traditional RL training for large models is slow and bottlenecked. Slime decouples data generation from policy updates, achieving up to 3x higher throughput compared to conventional methods.

This isn't just a training trick — it's why GLM-5 excels at long-horizon agentic tasks. The framework specifically optimizes for task completion consistency over superficial metric gaming.

2. DeepSeek Sparse Attention (DSA)#

For the first time in the GLM series, GLM-5 integrates DeepSeek's Sparse Attention mechanism. Traditional transformers suffer from quadratic complexity — doubling context length quadruples computational cost. DSA breaks this ceiling, enabling GLM-5 to handle its full 200K context window without the usual performance degradation.

3. Built for Agentic Engineering#

Unlike chat-optimized models, GLM-5 is purpose-built for multi-step engineering workflows. In the Vending Bench 2 evaluation (simulating a year-long business operation), GLM-5 demonstrated consistent resource allocation, dynamic strategy adjustment, and risk management across 365 simulated days — ranking #1 among all open-source models.

4. Trained Entirely on Chinese Hardware#

GLM-5 was trained exclusively on Huawei Ascend chips using the MindSpore framework — achieving complete independence from US-manufactured semiconductors. This is a significant milestone for China's domestic AI infrastructure and proves that frontier-level models can be built without NVIDIA hardware.

Pricing: 7x Cheaper Than Claude#

This is where GLM-5 gets really interesting for developers:

Model	Input (per 1M tokens)	Output (per 1M tokens)
GLM-5	$0.80 – $1.00	$2.56 – $3.20
Claude Opus 4.5	$5.00	$25.00
GPT-5.2	$1.25	$10.00

GLM-5 delivers ~90% of Claude Opus 4.5's coding performance at ~14% of the cost. For teams running high-volume AI workloads, the savings are massive.

Even better: GLM-5 is currently available for free on Kilo Code for a limited time — no API keys, no subscriptions.

How to Use GLM-5#

Option 1: OpenRouter API#

GLM-5 is live on OpenRouter as of February 11, 2026. You can use it with any OpenRouter-compatible tool or framework.

Option 2: Z.ai Platform#

Access GLM-5 directly through Z.ai, Zhipu AI's official platform.

Option 3: Kilo Code (Free)#

Kilo Code offers GLM-5 for free during the launch period. Just install the VS Code extension, select GLM-5 from the model dropdown, and start coding.

Option 4: WaveSpeed API#

WaveSpeed provides optimized API access with competitive pricing.

Option 5: Self-Host (Coming Soon)#

With MIT licensing and weights on HuggingFace and ModelScope, you'll be able to deploy GLM-5 on your own infrastructure. Given the 745B parameter count, you'll need serious hardware — but the 44B active parameter design makes it more manageable than you'd expect.

GLM-5 vs GPT-5.2 vs Claude Opus 4.5: The Full Picture#

Feature	GLM-5	GPT-5.2	Claude Opus 4.5
Parameters	745B (44B active)	Undisclosed	Undisclosed
Context Window	200K	400K in / 128K out	200K
Open Source	✅ MIT License	❌ Closed	❌ Closed
SWE-bench	77.8	80.0	80.9
BrowseComp	75.9	65.8	—
Terminal-Bench	56.2	54.0	59.3
Input Price/1M	$0.80–1.00	$1.25	$5.00
Output Price/1M	$2.56–3.20	$10.00	$25.00
Training Hardware	Huawei Ascend	NVIDIA	NVIDIA
Free Tier	✅ (Kilo Code)	❌	❌

The verdict: GLM-5 is the best value proposition in AI right now. It's not quite at Claude Opus 4.5 levels across the board, but it's remarkably close — and it's open-source, 7x cheaper, and available for free. For most real-world coding and reasoning tasks, the performance gap is negligible.

Who Should Use GLM-5?#

Budget-conscious developers who want near-frontier performance without frontier pricing
Open-source advocates who prefer models they can inspect, fine-tune, and self-host
Agentic AI builders — GLM-5's long-horizon task performance is genuinely best-in-class for open models
Global teams working in multiple languages (GLM-5 has excellent multilingual support, especially for Chinese + English)
Anyone curious about the model that broke the internet as "Pony Alpha"

The Bottom Line#

Zhipu AI pulled off one of the most brilliant marketing moves in AI history. By releasing GLM-5 anonymously as "Pony Alpha," they let the model's performance speak for itself — no hype, no corporate PR, just raw capability that had the entire AI community talking.

The result? A model that proves open-source AI can compete with the best proprietary offerings at a fraction of the cost. GLM-5 isn't perfect — it trails Claude Opus 4.5 by a few points on coding benchmarks and has some reported latency issues — but at 7x cheaper with MIT licensing, it's an incredibly compelling option.

The horse has left the stable. And it's running fast.

Want to explore more AI tools? Check out our full directory for the latest in AI development tools, coding assistants, and more.

Pony Alpha Unmasked: GLM-5 is Zhipu AI's Open-Source Challenger to Claude and GPT