GLM-5.2 vs Opus 4.8: A Real-World Vibe Test

Z.ai released GLM-5.2, an open-weight model under MIT license, positioning it between Claude Opus 4.7 and 4.8. To cut through the hype, we ran both models head-to-head on a single task: build a 3D platformer from scratch in raw WebGL, no engines or libraries. The results reveal a clear tradeoff between cost and polish.

The Setup

Both models received the same one-shot prompt: create a 3D platformer with a GLB model parser, matrix math, GLSL shaders, skinned animation, collision detection, and a follow camera. We provided identical CC0 assets from Kenney's Platformer Kit. Opus 4.8 ran in Claude Code with extended thinking; GLM-5.2 ran in Pi over OpenRouter with thinking set to High (not Max).

The Numbers

MetricGLM-5.2Opus 4.8
Wall-clock build time1h 10m 40s33m 30s
Output tokens131,000216,809
Peak context window16% of 1M19% of 1M
Tool calls128153
Cost$5.39~$21.92

GLM-5.2 cost roughly one-fifth of Opus, but took twice as long.

Game Quality

Both games run in the browser with WASD controls, mouse camera orbit, and a goal to collect coins and reach a flag. Here's how they differed.

GLM-5.2's game was rough:

  • Character faces backward while moving forward.
  • Textures missing — character renders flat gray because the renderer never loaded the shared color palette file.
  • Spike hazard doesn't kill the player.
  • Reaching the flag triggers no win condition.
  • Debug overlay remained on screen.

Opus's game was cleaner:

  • Camera, controls, and collision worked correctly.
  • Spike kills the player (though placed off the main path).
  • Flag triggers a win condition.
  • Animations and textures applied properly.
  • Only two minor bugs: coyote-time grace period slightly too generous (character stands on air), and win triggers from too far away.

The Multimodal Gap

Opus can read images; GLM-5.2 is text-only. Both models were instructed to verify their work. Opus took a screenshot, inspected it, and noticed the debug overlay — then removed it. GLM-5.2 tried to verify by writing a script to sample pixel colors from the saved frame. It confirmed "grass green, dirt brown, coin gold, flag red" and stopped, never seeing that the character was gray or the overlay was on. On visual tasks, multimodality is a decisive advantage.

Benchmarks

Z.ai published benchmark scores comparing GLM-5.2 to Opus 4.8, GPT-5.5, and Gemini 3.1 Pro. Selected results:

BenchmarkGLM-5.2Opus 4.8
HLE (reasoning)40.549.8*
AIME 202699.295.7
GPQA-Diamond91.293.6
IMO AnswerBench91.083.5

GLM-5.2 leads on math (AIME, IMO) but trails on general reasoning and science.

Pricing and Access

GLM-5.2 is MIT-licensed, weights on Hugging Face and ModelScope. API pricing per 1M tokens:

InputCache readOutput
Claude Opus 4.8$5$0.50$25
GLM-5.2$1.4$0.26$4.4

GLM-5.2 output tokens cost less than a fifth of Opus. You can serve it locally with vLLM, SGLang, or Transformers.

What This Means

For cost-sensitive or self-hosted workflows, GLM-5.2 is a strong option — especially for math-heavy or text-only tasks. But for visual reasoning or production-grade code generation, Opus still delivers better results. The open-weight advantage means GLM-5.2 won't disappear if its vendor pivots; you can always run it yourself.

Try the games yourself:

Bottom line: If you need a cheap, open alternative for agentic coding, GLM-5.2 is worth testing. If you need polished output and visual verification, stick with Opus.