vibecode.wiki
RU EN
~/wiki / новости / glm5-novaya-flaganskay-model-ii

GLM-5: Z.ai’s new flagship AI model for complex tasks and agency engineering

◷ 4 min read 2/18/2026

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

Read this article: https://vibecode.morecil.ru/en/%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8/glm5-novaya-flaganskay-model-ii/ Work in my current project context. Create an implementation plan for this stack: 1) what to change 2) which files to edit 3) risks and typical mistakes 4) how to verify everything works If there are options, provide "quick" and "production-ready".
How to use
  1. Copy this prompt and send it to your AI chat.
  2. Attach your project or open the repository folder in the AI tool.
  3. Ask for file-level changes, risks, and a quick verification checklist.

The GLM-5 is a new flagship big language model from Z.ai (also known as Zhipu AI), released on February 12, 2026. This model represents a significant step forward from the previous GLM-4.7, focusing on “agent engineering” – the transition from simple code writing to automated creation of entire projects and systems. GLM-5 is designed for complex tasks in systems engineering and long-term agent scenarios, where AI must plan and execute actions over many steps.

What is GLM-5 and why is it needed?

The GLM-5 is a transformer-based model with a Mixture-of-Experts (MoE) architecture with 744 billion overall parameters and 40 billion active parameters. It is trained on 28.5 trillion data tokens, 5.5 trillion more than GLM-4.5. A key feature is the integration of DeepSeek Sparse Attention (DSA), which reduces the cost of deployment by 6-10 times compared to analogues, preserving the context of up to 205 thousand tokens. This allows the model to handle long sequences without losing performance.

The model focuses on:

  • Coding and development: GLM-5 shows the best results among open-source models in programming problems. At the SWE-bench Verified benchmark, she scored 77.8%, ahead of Claude Opus 4.5.
  • Suitable for long-term planning, where AI acts as an autonomous agent, building complex systems.
  • Understanding and minimizing hallucinations:** The GLM-5 has a record low hallucination rate among all models (including proprietary models), with a rating of -1 on the AA-Omniscience Index - an improvement of 35 points over its predecessor.

The GLM-5 is the first frontier model fully trained on Huawei Ascend chips, demonstrating its independence from American technology. It is released under the MIT license, making it fully open source and suitable for commercial use.

Technical innovation

The development of the GLM-5 included several breakthroughs:

  • Scaling: Increase from 355B (in GLM-4.5) to 744B and data from 23T to 28.5T tokens.
  • ** Asynchronous RL infrastructure "slime":** A new system based on Megatron-LM and SGLang, which accelerates learning in RL (Reinforcement Learning) and allows more iterations of post-learning. This helps the model cope better with real-world programming scenarios.
  • Optimizations: Use of Muon Optimizer, QK Normalization, Partial RoPE and MTP for speculative decoding. The model supports the context of 200K-205K tokens and is compatible with tools like Claude Code and OpenClaw.

In benchmarks, GLM-5 is the leader among open-source: improvements on Humanity's Last Exam (+7.6%), BrowseComp (+8.4%) and Terminal-Bench-2.0 (+28.3%). On the Artificial Analysis Intelligence Index, it scored 77.8 points, making it one of the best in the world.

How to use GLM-5?

GLM-5 is available for free for testing on https://chat.z.ai, a chatbot where you can ask questions and experiment without a proxy, even from Russia. For developers:

  • API: Through api.z.ai or BigModel.cn. Prices: from $0.2 per 1M input tokens (for Air versions).
  • Local launch: Model on Hugging Face (zai-org/GLM-5) or Ollama. For Macs with M chips, use MLX; for NVIDIA, use NIM. Optimized versions (like Unsloth’s 2-bit GGUF) reduce the size to 241GB.
  • Integrations: Works with Claude Code, Kilo Code, OpenClaw and other coding tools.

Users note high speed and quality, although there may be performance issues at the start (e.g., 20 minutes for a simple task). The model is 6-10 times cheaper than analogues like the Claude Opus 4.6.

Conclusion

Z.ai’s GLM-5 is a breakthrough in open-source AI, bringing us closer to Artificial General Intelligence (AGI). It combines power, efficiency and affordability, making complex tasks like system engineering and agency planning easier for everyone.