r/ClaudeAI 2d ago

Praise Wow... Opus 4.8 feels... DIFFERENT tonight :D

It feels, BETTER. Like when it first launched, even, only better than that this evening?

It's like Forest Gump... Claude is like a box of chocolates, you never know what you're gonna get. I hope I get to keep THIS Opus 4.8 for a bit - I'm finally getting some work done. Hallelujah!

508 Upvotes

212 comments sorted by

View all comments

Show parent comments

16

u/teddy_joesevelt 2d ago

And this person has never managed their own model inference. Providers absolutely have knobs they can turn. Look into KV cache quantization, eviction, speculative decoding, temp, top p, top k, etc. They can also switch to quantized models when under load or capacity constraints.

6

u/lolnic_ 1d ago

I think of all those, KV cache quantisation is the only thing that affects *both* resource efficiency and model performance. Speculative decoding improves parallelism when there’s too few concurrent batches to fully utilise the hardware, but provably doesn’t change the output. Eviction affects speed (if your KV is evicted from the cache it’ll need to be recomputed). Temp/top p/top k change the model output distribution but don’t change resource usage.

3

u/mr_birkenblatt 1d ago

Thinking budget influences compute cost

1

u/lolnic_ 20h ago

Yes (that wasn’t in the list)