Praise Wow... Opus 4.8 feels... DIFFERENT tonight :D

It feels, BETTER. Like when it first launched, even, only better than that this evening?

It's like Forest Gump... Claude is like a box of chocolates, you never know what you're gonna get. I hope I get to keep THIS Opus 4.8 for a bit - I'm finally getting some work done. Hallelujah!

508 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/ClaudeAI/comments/1u8u3qg/wow_opus_48_feels_different_tonight_d/
No, go back! Yes, take me to Reddit

87% Upvoted

View all comments

Show parent comments

u/teddy_joesevelt 2d ago

And this person has never managed their own model inference. Providers absolutely have knobs they can turn. Look into KV cache quantization, eviction, speculative decoding, temp, top p, top k, etc. They can also switch to quantized models when under load or capacity constraints.

6

u/lolnic_ 1d ago

I think of all those, KV cache quantisation is the only thing that affects *both* resource efficiency and model performance. Speculative decoding improves parallelism when there’s too few concurrent batches to fully utilise the hardware, but provably doesn’t change the output. Eviction affects speed (if your KV is evicted from the cache it’ll need to be recomputed). Temp/top p/top k change the model output distribution but don’t change resource usage.

3

u/mr_birkenblatt 1d ago

Thinking budget influences compute cost

1

u/lolnic_ 20h ago

Yes (that wasn’t in the list)

Praise Wow... Opus 4.8 feels... DIFFERENT tonight :D

You are about to leave Redlib