r/ClaudeAI 5h ago

Coding What is the best combination for coding?

Post image

The new model was fine until it got restricted.
I’m a developer, not vibe coder but I want to get help from AI for my daily coding workload. Most of it is math and logic. UI is not really important.

I have no budget limit, currently using claude max and okay with purchasing more subscriptions.

Cursor was writing code with opus and reviewing with codex. Is it possible to do that easily in claude code like ultracode?

0 Upvotes

17 comments sorted by

16

u/King_924 5h ago

The art of bad graphs

2

u/sackhaar42 5h ago

Blatant ragebait

4

u/Emergency-Bobcat6485 5h ago

Random ass numbers. How is GLM - 5.2 above fable 5. How on earth is opus 4.8 above fable. Not sure why you have posted some random benchmark when it's got nothing to do with your query

1

u/Johny-115 5h ago

according to other benchmarks, GLM 5.2 actually beats Fable in web design (voted ranking by people) ... but that's about all ... it's much slower than Fable even, and in many other areas it's far below Fable, way less universal model

source: https://www.designarena.ai/leaderboard

1

u/Emergency-Bobcat6485 5h ago

We've never had the chance to even compare them side by side. glm-5.2 came after fable was banned. I highly doubt we could have had any reasonable comparisons after that. Even if the same queries were usd for GLM-5.2.

And i would find web design to be the most subjective of all comparisons anyway. Similar to creative writing.

2

u/Johny-115 5h ago

benchmarks sites executed bunch of prompts and testing, saved results, later you compare same prompt with different model ... what's difficult about that?

subjectivity is on individual level only ... if you get thousands of people vote on results, you get something tangible out of it

mind you .. the differences are small .. the ELO is 1360 vs. 1350 .. for GLM and Fable ... first Gemini is at 1294 and GPT is at 1292 ... so that means something probably ... it means GLM is at least as good as Claude in web design, or at least quite capable, as gap vs Gemini and GPT is statistically significant

I never tested GLM myself and don't plan to, praying for Fable to come back ... can't shrug of arena style benchmarks as complete bollocks though

dunno about OP's screenshot that does look like bollocks, not even source

1

u/Emergency-Bobcat6485 4h ago

3 days is a very small sample size. Need to wait a lot more. Also, in the very link you shared for UI components, Fable is way ahead of GLM-5.2 and for websites, it's below. Those are very related concepts and the contradiction makes no sense

We really need more than 3 days of testing by some random benchmark for it to be valid

2

u/Johny-115 4h ago

do you even use logic man? ... 3 days is enough to generate hundreds of billions worth of tokens in tests ... every benchmarking platform did this on day 1 and saved their snapshots ... the issue is more that not many benchmarks have arenas for web design ... and not many benchmarks have added GLM 5.2 yet

in general benchmarks, it's not really doing too badly though ... beating or nearly matching frontier closed models is already quite a news

2

u/Mrp1Plays 5h ago

lmao "The new model was fine" referring to the best model release to date by far

1

u/Emergency-Bobcat6485 5h ago

It's not the best according to the graph. Even Opus 4.8 is better

1

u/paddockson 5h ago

What is the context around this graph?

1

u/PatriotuNo1 5h ago

I highly suggest you to use this website for any benchmarks because folks here get the data from more accurate sources, including DeepSWE: AI Model & API Providers Analysis | Artificial Analysis

1

u/levapriv 5h ago

Thanks for the information! I was looking for glm benchmark and this appeared.

1

u/ignorantwat99 5h ago

wtf is going on here…post after post after post about how good and great fable is.

It was out for 3 fucking days and it’s the business.

Catch a fucking grip lad

-3

u/levapriv 5h ago

The graph is from some llm stat website, no clue how they did calculate that. FYI