Can a Free Model Carry the Grunt Work? GLM-5.2 vs GPT-5.5 Codex, three delegation tasks
Delegation Benchmark · 2026-07-04 · English edition Can a Free Model Carry the Grunt Work? GLM-5.2 vs GPT-5.5 Codex, three delegation tasks Video version (4 min): https://youtu.be/sRqLns96NQM Context: our main coding agent runs on a limited subscription, so we delegate typing-heavy and file-reading work to models on NVIDIA NIM's free tier — the expensive model only writes specs and does acceptance. When z-ai shipped GLM-5.2 we re-ran our delegation eval, with OpenAI Codex CLI (gpt-5.5) as the paid baseline. Every prompt and acceptance test is included below — rerun it yourself. GLM-5.2 totals 93 / 97 / 82 Generate / Read / Modify (out of 100) Codex totals 95 / 95 / 88 Generate / Read / Modify (out of 100) Functional accuracy Both perfect 36 acceptance checks, zero misses on either side Cost $0 vs paid NIM free tier vs ChatGPT subscription quota 00 One-sentence verdict Free GLM-5.2 now matches GPT-5.5 Codex on functional quality (both aced a...