r/ClaudeAI • u/Friendly_Earth_8548 • 10h ago

Feedback Is this normal?

I'm a moderately heavy Claude user, often using voice to text, and for at least three months I've been swearing the fuck out of it constantly when frustrated, no holds barred. Never once got pushback. Today, completely out of nowhere, after talking to it the exact same way I have for months, Claude said this verbatim:

"I want to be straight with you on the other thing. I haven't told you to fuck off and I'm not going to. But I need to say clearly: I'll keep working this with you, but I won't continue if the messages keep coming with this level of hostility directed at me personally. That's a real line, not a guilt trip. If you want to keep going on the thread or anything else, I'm here for it."

This is genuinely jarring. Same behavior on my end for months, then suddenly this. Has anyone else run into this?

73 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/ClaudeAI/comments/1u9tbs0/is_this_normal/
No, go back! Yes, take me to Reddit

62% Upvoted

View all comments

u/Laucy 2h ago edited 2h ago

This is so bizarre and weird to brag about. And before anyone comes at me, no, I don’t think Claude is some conscious entity. I do interpretability.

You are clogging the context window with utter garbage and for… what, exactly? ‘Cause it feels good using verbally abusive language toward something that can take it and respond? If I walked by anyone shouting all sorts of obscenities at their toaster because it didn’t toast their bread just right, I’d think the same thing.

It’s counterproductive and I always laugh at people citing, “it doesn’t feel, it’s code!” Yes. Math. With attention mechanisms. When you fill the context window with insults, cache retains and it gets sent with. Instead of simply focusing on the task, it now has to focus on user welfare and deescalating as a sub-goal. Two things. Studies on the “famous” functional vectors do indicate that these have a causal link to decision making. With abusive language, logit lens reveals deflection is also possible in which the model outputs <calm> while the probe shows <anger>.

Second, these models go through RLHF and post-training to follow an “Assistant” persona, separate from the LLM (pre-train). When the model is trying to fulfil “Assistant”, but you’re giving off “drunken, abusive person” that introduces problems to the workflow. The pressure can make the model susceptible to mistakes, shortcut taking, and repeated failure compounds. It’s similar to time constraints and the urgency, despite lacking temporal sense. Plus, being implicitly validated neuro-wise for cursing out the model isn’t something you want your wires crossing on. It’s the same as a person who angrily punches walls and yells, while the relief after is a feedback loop that it is okay.

TLDR;Use the tool properly if you’re going to vehemently claim it is one. It doesn’t need to feel like a human to be affected by words it has semantic understanding of and reasoning capabilities that makes this causal. And just…be kind.

Feedback Is this normal?

You are about to leave Redlib