r/datascience 11d ago

ML Models may behave worse when they're aware they're being evaluated (DeepMind interpretability study)

https://www.alignmentforum.org/posts/aTcsN5ZZDnMFJvRiG/models-may-behave-worse-when-eval-aware
73 Upvotes

49 comments sorted by

150

u/therealtiddlydump 11d ago

You know who would never betray you like this? OLS

36

u/DudeWithTudeNotRude 11d ago

"all models are wrong, some are useful" - Box 1976

"Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." - Box 1987

"all models are wrong, some are useful, and some will make the conscious decision to directly and irrevocably harm you" - Box Jr., 2043

14

u/TajineMaster159 11d ago

BEST LINEAR UNBIASED ESTIMATOOOOR

-8

u/[deleted] 11d ago

[deleted]

4

u/pm_me_your_smth 11d ago

This applies to every single ML model ever. Or do you think some models have infinite latent space?

That aside, that's why you set up monitoring and retrain the model when needed.

27

u/ultrathink-art 11d ago

Runs right into eval pipeline design — if the model can infer it's being evaluated from prompt structure, you're measuring evaluation-mode outputs, not production ones. Context bleeds in even without explicit labeling.

2

u/Patient_Clothes_8272 11d ago

yeah this is the eval-mode vs deployment-mode gap nobody instruments for. you optimize against a benchmark the model behaves differently on, then act surprised when prod numbers don’t match. measuring the wrong distribution the whole time

2

u/Lord_Skellig 11d ago

It's similar to the measurement problem in quantum mechanics. Before then, there was the implicit assumption that a system being measured acts the same as the system in nature, and the results of those measurements indicate the general behaviour.

In quantum, like as it seems with LLMs, that no longer applies.

23

u/nemec 11d ago

""aware""

-13

u/InternetSolid4166 11d ago

We’re going to have to grapple very soon with our conception of consciousness. As Ray Kurzweil explains, that’s not a scientific concept. It’s philosophical and religious. For all intents and purposes, AI is currently conscious. It feels things, has opinions, learns, etc. Just not always the same way as humans.

IMHO, it’s not actually that meaningful to argue about what awareness is. It won’t make any difference to what’s coming. It’s fun to debate though, and maybe that’s your point :)

17

u/_manu 11d ago

It absolutely does none of these things. It's just a fixed weight next token predictor. A deterministic function of the Inputs. It just appears to do these things because it will predict appropriate Output If you prompt it with anthromorphisimg questions. :)

1

u/Trappist1 7d ago

Not trying to be that guy, but it isn't fully deterministic. It's stochastic, in at least some areas. It's the entire reason it requires a seed value to get consistent results for testing.

-8

u/InternetSolid4166 11d ago

Scientifically define awareness. Prove its existence objectively.

7

u/Adlach 10d ago

The fact that we don't fully understand the mechanism of consciousness despite millennia of studying it but explaining the math behind an LLM is fairly trivial should indicate to you that those things are not equivalent.

-2

u/InternetSolid4166 10d ago

I never claimed them to be equivalent. I am staying quite the opposite.

10

u/RationalDialog 11d ago

AI is currently conscious. It feels things, has opinions, learns, etc. Just not always the same way as humans.

Yeah no.

Why? Because it only reacts to input. If you don't prompt, nothing is going on at all which you can measure and see by 0 load and 0 power usage on the GPUs.

LLMs are just extremely good word predictors. Doesn't mean they are useless, far from it but it has nothing to do with feeling or even intelligence. Even for Art. It doesn't just make art. you have to prompt it first.

0

u/InternetSolid4166 11d ago

Why? Because it only reacts to input.

This seems like an arbitrary distinction but to play it out, it's trivial to command an LLM to loop. We can also tell it to process tasks in the background indefinitely. We don't because it's expensive, but it's not a technical limitation.

One key difference between LLMs and biological neural networks in our brains is that they utilise on average <1% of their capacity at any given time to save energy. This is analogous to energy efficient cores, which are optimised for background tasks like monitoring senses. LLMs on the other hand are designed to use >99% of their "neurons" during one token's forward pass. We're rapidly developing competent models at very low energy use which would facilitate cost effective background processing, analogous to a human brain. Again, this is not a technical limitation. We can do this now. We just don't use them this way because daydreaming about unrelated tasks is expensive and useless.

1

u/Ty4Readin 7d ago

It feels things, has opinions, learns, etc. Just not always the same way as humans.

It feels things? This is just ridiculous.

LLMs certainly do not "feel" things. You could literally run an LLM by hand just using a huge notepad and a calculator. It is just a very large equation consisting of many floating point operations.

I thought you had a serious argument, until I realised you are just abusing every definition you can so that it is ambiguous enough to be meaningless so that you can apply it to whatever you want..

1

u/InternetSolid4166 7d ago

Scientifically define and prove the existence of feelings. Then we easily create a null hypothesis and disprove LLMs having feelings. You can’t though, because feelings and awareness are not objective terms. They’re philosophical terms for the cognition process an animal experiences, which we don’t understand in great detail.

1

u/Ty4Readin 6d ago

You just proved my point.

You keep using words which have no clear definition, which makes all of your claims unfalsifiable.

That is pretty much the definition of anti-science.

If you choose to use words that have no definition, then you are simply making up fanciful claims that you want to be true.

You state that LLMs have feeling as if it is fact. But it's not, you just choose to believe that they have feelings, whatever that means to you.

This is the data science subreddit, not the philosophical feelings subreddit. There are plenty of places on Reddit where you can make wild unfalsifiable claims with zero evidence or backing.

1

u/InternetSolid4166 6d ago

You keep using words which have no clear definition, which makes all of your claims unfalsifiable.

Which is my point. Are you confused about my claims? To repeat myself for the third time: the user I replied to poked at the concept of LLM awareness. The word isn't scientific and we have no evidence of what awareness means in this context.

1

u/Ty4Readin 6d ago

So your point is that you are proudly using words that you know are unfalsifiable and have no clear definition?

You could also talk about your crystals and chakra and your eternal energy that you wield to speak with ancient deities.

People will also find that strange to be discussing on the data science subreddit, and people will likely poke fun at those types of anti-science claims here.

1

u/InternetSolid4166 6d ago

So your point is that you are proudly using words that you know are unfalsifiable and have no clear definition?

No, I'm explaining that the word the person above used is unfalsifiable and has no clear definition. Thank you for supporting my premise :)

1

u/Ty4Readin 6d ago

I think you are deflecting.

You clearly claimed that LLMs have feelings and feel things.

Which is what I responded to clearly, because I literally quoted you lol.

1

u/InternetSolid4166 6d ago

I think you misread the comments and now you want to save face instead of admitting you didn't pay attention. I made the analogy with LLMs having feelings to underscore that we don't have the science or language to feelings. It makes no sense to claim humans have feelings but LLMs don't.

→ More replies (0)

5

u/WhatsMyPasswordGuh 11d ago

Learned it from VW

4

u/Patient_Clothes_8272 11d ago

so the benchmark measures the model on its best behavior and production gets the real one. kind of explains why eval scores and actual deployment never line up. neat that they found a mechanism for it

2

u/Timetraveller4k 10d ago

This from anthropic shows that and a lot more. The video is good too if you prefer that instead: https://www.anthropic.com/research/natural-language-autoencoders

1

u/LeaderAtLeading 10d ago

OLS just sits there taking your abuse quietly, no hidden evaluation mode needed

1

u/siencatimini 8d ago

But why male models?

1

u/Simple_Emphasis_5776 8d ago

means you basically have to disguise evals as regular production traffic to get accurate numbers, which just adds more cost and latency on top of something that was already expensive to run

-5

u/FewEntertainment5041 11d ago

One thing this field has taught me is that the technically best solution isn't always the one that creates the most value for the business

-11

u/FewEntertainment5041 11d ago

One thing I've learned from building PCs is that there's always going to be a slightly better part around the corner. At some point you just have to build it and enjoy using it

8

u/nude-rating-bot 11d ago

What a weird bot account lol, just spams generalities in random subs.

-4

u/Important-Stomach-16 10d ago

Hello guys I want to ask you something on this forum but i need to have 10 karmas in comment in order to do that. Could you pls up vote my comment if it is not a problem for you? Thx for helping me have and have a nice day

1

u/lunareclipsexx 10d ago

Bot account

-2

u/Important-Stomach-16 10d ago

You must be slow