Pony Alpha Unmasked: GLM-5 is Zhipu AI's Open-Source Challenger to Claude and GPT#
For the past week, the AI community has been obsessed with a mystery. A stealth model called "Pony Alpha" appeared on OpenRouter on February 6, 2026 — no branding, no company name, just raw performance that had developers losing their minds on Reddit.
Now the secret is out: Pony Alpha is GLM-5, the next-generation flagship model from Zhipu AI (Z.ai), and it's one of the most impressive open-source AI models ever released.
The Great AI Mystery: What Was Pony Alpha?#
When Pony Alpha first appeared on OpenRouter, nobody knew where it came from. But the clues were there:
- The release timing coincided with Zhipu AI's announcement that GLM-5 would launch around Chinese New Year
- The output style matched the GLM series
- When prompted about its origins, the model identified itself as a GLM model developed by Zhipu
- "Pony" references the Year of the Horse in the Chinese zodiac — a playful nod from the team
Reddit threads exploded with speculation. Was it DeepSeek V4? A secret OpenAI experiment? The reveal sent Zhipu AI's stock price surging 60% in two days, pushing its market cap past HKD 150 billion (~$19 billion USD).
GLM-5 by the Numbers#
GLM-5 is a massive leap from its predecessor GLM-4.7. Here's what's under the hood:
| Spec | GLM-5 | GLM-4.7 (Previous) |
|---|---|---|
| Total Parameters | 745B | 355B |
| Active Parameters | 44B (MoE) | 32B |
| Expert Configuration | 256 total / 8 active | — |
| Context Window | 200K tokens | 128K |
| Pre-training Data | 28.5T tokens | 23T |
| Training Hardware | Huawei Ascend | Huawei Ascend |
| License | MIT (Open Source) | MIT |
The Mixture of Experts (MoE) architecture means that while GLM-5 has 745 billion total parameters, only 44 billion are active per inference — keeping it fast and cost-efficient despite its massive scale.
Benchmark Performance: Competing with the Best#
GLM-5 doesn't just compete with open-source models — it goes toe-to-toe with the most expensive proprietary models on the planet.
Coding Benchmarks#
| Benchmark | GLM-5 | Claude Opus 4.5 | GPT-5.2 |
|---|---|---|---|
| SWE-bench Verified | 77.8 | 80.9 | 80.0 |
| SWE-bench Multilingual | 73.3 | 77.5 | — |
| Terminal-Bench 2.0 | 56.2 | 59.3 | 54.0 |
Reasoning Benchmarks#
| Benchmark | GLM-5 |
|---|---|
| AIME 2026 | 92.7 |
| GPQA-Diamond | 86.0 |
| Humanity's Last Exam | 30.5 (with tools: 50.4) |
Agentic Performance#
| Benchmark | GLM-5 | GPT-5.2 |
|---|---|---|
| BrowseComp | 75.9 | 65.8 |
| Vending Bench 2 | $4,432 (#1 open-source) | — |
| MCP-Atlas Public Set | 67.8 | — |
On BrowseComp (web browsing tasks), GLM-5 actually surpasses GPT-5.2 by a significant margin. It also achieved a record-low hallucination rate according to Z.ai's testing.
What Makes GLM-5 Special?#
1. The "Slime" Reinforcement Learning Framework#
GLM-5 introduces a novel open-source RL infrastructure called Slime. Traditional RL training for large models is slow and bottlenecked. Slime decouples data generation from policy updates, achieving up to 3x higher throughput compared to conventional methods.
This isn't just a training trick — it's why GLM-5 excels at long-horizon agentic tasks. The framework specifically optimizes for task completion consistency over superficial metric gaming.
2. DeepSeek Sparse Attention (DSA)#
For the first time in the GLM series, GLM-5 integrates DeepSeek's Sparse Attention mechanism. Traditional transformers suffer from quadratic complexity — doubling context length quadruples computational cost. DSA breaks this ceiling, enabling GLM-5 to handle its full 200K context window without the usual performance degradation.
3. Built for Agentic Engineering#
Unlike chat-optimized models, GLM-5 is purpose-built for multi-step engineering workflows. In the Vending Bench 2 evaluation (simulating a year-long business operation), GLM-5 demonstrated consistent resource allocation, dynamic strategy adjustment, and risk management across 365 simulated days — ranking #1 among all open-source models.
4. Trained Entirely on Chinese Hardware#
GLM-5 was trained exclusively on Huawei Ascend chips using the MindSpore framework — achieving complete independence from US-manufactured semiconductors. This is a significant milestone for China's domestic AI infrastructure and proves that frontier-level models can be built without NVIDIA hardware.
Pricing: 7x Cheaper Than Claude#
This is where GLM-5 gets really interesting for developers:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GLM-5 | $0.80 – $1.00 | $2.56 – $3.20 |
| Claude Opus 4.5 | $5.00 | $25.00 |
| GPT-5.2 | $1.25 | $10.00 |
GLM-5 delivers ~90% of Claude Opus 4.5's coding performance at ~14% of the cost. For teams running high-volume AI workloads, the savings are massive.
Even better: GLM-5 is currently available for free on Kilo Code for a limited time — no API keys, no subscriptions.
How to Use GLM-5#
Option 1: OpenRouter API#
GLM-5 is live on OpenRouter as of February 11, 2026. You can use it with any OpenRouter-compatible tool or framework.
Option 2: Z.ai Platform#
Access GLM-5 directly through Z.ai, Zhipu AI's official platform.
Option 3: Kilo Code (Free)#
Kilo Code offers GLM-5 for free during the launch period. Just install the VS Code extension, select GLM-5 from the model dropdown, and start coding.
Option 4: WaveSpeed API#
WaveSpeed provides optimized API access with competitive pricing.
Option 5: Self-Host (Coming Soon)#
With MIT licensing and weights on HuggingFace and ModelScope, you'll be able to deploy GLM-5 on your own infrastructure. Given the 745B parameter count, you'll need serious hardware — but the 44B active parameter design makes it more manageable than you'd expect.
GLM-5 vs GPT-5.2 vs Claude Opus 4.5: The Full Picture#
| Feature | GLM-5 | GPT-5.2 | Claude Opus 4.5 |
|---|---|---|---|
| Parameters | 745B (44B active) | Undisclosed | Undisclosed |
| Context Window | 200K | 400K in / 128K out | 200K |
| Open Source | ✅ MIT License | ❌ Closed | ❌ Closed |
| SWE-bench | 77.8 | 80.0 | 80.9 |
| BrowseComp | 75.9 | 65.8 | — |
| Terminal-Bench | 56.2 | 54.0 | 59.3 |
| Input Price/1M | $0.80–1.00 | $1.25 | $5.00 |
| Output Price/1M | $2.56–3.20 | $10.00 | $25.00 |
| Training Hardware | Huawei Ascend | NVIDIA | NVIDIA |
| Free Tier | ✅ (Kilo Code) | ❌ | ❌ |
The verdict: GLM-5 is the best value proposition in AI right now. It's not quite at Claude Opus 4.5 levels across the board, but it's remarkably close — and it's open-source, 7x cheaper, and available for free. For most real-world coding and reasoning tasks, the performance gap is negligible.
Who Should Use GLM-5?#
- Budget-conscious developers who want near-frontier performance without frontier pricing
- Open-source advocates who prefer models they can inspect, fine-tune, and self-host
- Agentic AI builders — GLM-5's long-horizon task performance is genuinely best-in-class for open models
- Global teams working in multiple languages (GLM-5 has excellent multilingual support, especially for Chinese + English)
- Anyone curious about the model that broke the internet as "Pony Alpha"
The Bottom Line#
Zhipu AI pulled off one of the most brilliant marketing moves in AI history. By releasing GLM-5 anonymously as "Pony Alpha," they let the model's performance speak for itself — no hype, no corporate PR, just raw capability that had the entire AI community talking.
The result? A model that proves open-source AI can compete with the best proprietary offerings at a fraction of the cost. GLM-5 isn't perfect — it trails Claude Opus 4.5 by a few points on coding benchmarks and has some reported latency issues — but at 7x cheaper with MIT licensing, it's an incredibly compelling option.
The horse has left the stable. And it's running fast.
Want to explore more AI tools? Check out our full directory for the latest in AI development tools, coding assistants, and more.
