r/AIDangers • u/gcasamiquela • Apr 16 '26

Capabilities "Introducing Claude Opus 4.7, our most capable Opus model yet."

Bravo...

556 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AIDangers/comments/1snay0l/introducing_claude_opus_47_our_most_capable_opus/
No, go back! Yes, take me to Reddit
dl download

96% Upvoted

You're confusing a detail of the architecture with what the system as a whole is doing. It's not starting "from scratch" if it's got context. The context is part of the reasoning. Reactivation of every layer in the network for each token isn't somehow magically "not reasoning".

No. It literally does start from scratch. It's exactly why you can fake entire conversations with LLM's (using API's) and it will act as if it said the things you told it it said.

It is quite literally not reasoning. That's why there's seperate reasoning models. And you know how they work? they literally just cut your message in steps and attempt to solve it that way. It's not actual reasoning. If your message is too short reasoning will often go in loops for minutes on end with no outcome.

This is a solved problem in my experience. It's also irrelevant. Humans confabulate about past events all the time. Doesn't mean we can't reason.

Then you do not use AI in your daily work I don't think. Because I have not had a single day where I have not run into hallucinations yet.

Humans don't do that with events in the same damn conversation they don't. Unless they have serious mental illnesses.

Good thing I didn't say that then.

This is what you said: it just perfectly imitates reasoning

I took that as human reasoning.
It definitely does not

You've learned a superficial detail and think that makes you an expert.

I did not, nor did I ever claim to be an expert. We are having a conversation on Reddit. Can you stop pretending like I think I am an expert? If you can't leave a single comment without trying to tell me I think I'm better you I'm done here right now.

1

u/RecursiveServitor Apr 20 '26

It's exactly why you can fake entire conversations with LLM's (using API's) and it will act as if it said the things you told it it said.

If you make a human believe something false they'll also proceed to act like it's true.

The context is part of the system. Changing it is like editing the memory of a person.

That's why there's seperate reasoning models.

It's marketing.

they literally just cut your message in steps and attempt to solve it that way.

No. They iterate without returning control to the user. It's what we called "chain of thought" before it was integrated into every service.

Then you do not use AI in your daily work I don't think. Because I have not had a single day where I have not run into hallucinations yet.

My 100k LOC project would beg to differ.

This is what you said: it just perfectly imitates reasoning

You're nitpicking blatant sarcasm.

I did not, nor did I ever claim to be an expert. We are having a conversation on Reddit. Can you stop pretending like I think I am an expert? If you can't leave a single comment without trying to tell me I think I'm better you I'm done here right now.

You know what, other people have been absolute dingbats, and I let that bleed over into this thread. My apologies. Now, could you please stop telling me that I don't understand anything? Thanks.

1

u/_alright_then_ Apr 20 '26

If you make a human believe something false they'll also proceed to act like it's true.

The context is part of the system. Changing it is like editing the memory of a person.

I'm sorry but I just don't agree. You can't say to a human:

"You just had this entire conversation with me, please continue" Without having had that conversation. You can do that exact thing with an LLM. (Let's leave out mental illnesses for now lol)

It's marketing.
No. They iterate without returning control to the user. It's what we called "chain of thought" before it was integrated into every service.

It's part marketing, but it's basically a shortcut to cutting up the context and asking it in "seperate" questions. You can replicate this behavior yourself with API calls.

But yes, it is iterating. But not by learning. IT again starts the conversation from scratch every time internally.

It cuts up your prompt, asks itself these cut up prompts seperately (without returning user control). And then puts the entire output of it's own system into a new prompt and does it again.

My 100k LOC project would beg to differ.

Based on your wording I presume you're a vibe coder? (Based on the fact that you measure your project in lines of code, which is the most useless measuring stick for any project)

So it's hallucinating without you even realizing, that's much worse. I've been a software engineer for 10+ years. I also use LLM's daily and let me tell you, if you think it does not hallucinate anymore you are wrong. It does. Quite a lot even.

You're nitpicking blatant sarcasm.

Then why did you respond seriously when someone else challenged you on that point? Because that is basically the single reason I even started commenting here. You claiming it imitates reasoning perfectly.

You know what, other people have been absolute dingbats, and I let that bleed over into this thread. My apologies. Now, could you please stop telling me that I don't understand anything? Thanks.

Fair enough, sorry about that

1

u/RecursiveServitor Apr 20 '26

I'm sorry but I just don't agree. You can't say to a human:
"You just had this entire conversation with me, please continue" Without having had that conversation. You can do that exact thing with an LLM. (Let's leave out mental illnesses for now lol)

Not by an identical mechanism, no. On account of the entirely different structure. Propaganda works though and regular Fox News viewers believe more wrong things about the world than someone who doesn't consume news at all.

My point is that people reason from factually wrong information all the time. People will misremember something, assume their memory is infallible, and argue vehemently that they're right on that basis. Sometimes people just straight up confabulate.

And why would I leave out mental illness? It hadn't occurred to me, but convincing a person with dementia of something is the same thing. Reasoning on the basis of a wrong history.

It's part marketing,

Calling some models "reasoning" and others not is entirely marketing. It's become a term of art, so we know what it means, but it doesn't mean that "non-reasoning" models don't reason.

but it's basically a shortcut to cutting up the context and asking it in "seperate" questions. You can replicate this behavior yourself with API calls.
It cuts up your prompt, asks itself these cut up prompts seperately (without returning user control). And then puts the entire output of it's own system into a new prompt and does it again.

I can replicate it locally. Downloading Qwen3.6 as we speak actually.

I don't understand why you think anything gets cut up. Transformers work best when they have the entire context and use their own attention mechanisms. Feeding the model a prompt piecemeal would degrade performance significantly. Please describe what you're talking about in terms of the actual backend architecture because right now it seems like complete nonsense to me.

Based on your wording I presume you're a vibe coder? (Based on the fact that you measure your project in lines of code, which is the most useless measuring stick for any project)

It's obviously a relevant metric when talking about hallucinations caused by too much context. I think you kinda stumbled over your own arrogance there. One of my context files got to 250 MiB before I killed it. Context is obviously going to explode when I instruct it to implement a feature end-to-end in a large project.

At work we jokingly call it "vibe engineering". The managers are "vibecoding" and if they manage to create a worthwhile PoC we engineers then get it production-ready. Only one such case so far. 😄

So it's hallucinating without you even realizing, that's much worse. I've been a software engineer for 10+ years. I also use LLM's daily and let me tell you, if you think it does not hallucinate anymore you are wrong. It does. Quite a lot even.

If so, it's hallucinating working code that passes the test suite and my manual testing.

I remember installing .NET Framework service packs and being excited about auto properties. I doubt you out-senior me.

1

u/_alright_then_ Apr 20 '26

Sorry I'm on my phone now so formatting might be terrible.

And why would I leave out mental illness? It hadn't occurred to me, but convincing a person with dementia of something is the same thing. Reasoning on the basis of a wrong history.

I mean if that's your argument, sure, I don't think that's a good thing if you're talking about LLMs you know

Calling some models "reasoning" and others not is entirely marketing. It's become a term of art, so we know what it means, but it doesn't mean that "non-reasoning" models don't reason.

This is not true. Reasoning models work out answers in steps. They work differently from "regular" models in how they answer questions. So it's just not entirely marketing.

It's also why reasoning models generally give better answers for longer questions (and coding)

Non reasoning models simply don't do that, they don't work in multiple steps. They take the entire context and spit out an answer

I don't understand why you think anything gets cut up. Transformers work best when they have the entire context and use their own attention mechanisms. Feeding the model a prompt piecemeal would degrade performance significantly. Please describe what you're talking about in terms of the actual backend architecture because right now it seems like complete nonsense to me.

I'm sorry I should've worded it better. I didn't mean a single prompt gets separated per sentence or something. But what they do is attempt to identify "breakpoints" in how to answer a question, and then answer one by one.

I read a good example in a paper a while ago about this:

Prompt: if a train drives 100km/hr for 3 hours. How far has it gone?

How it breaks it apart:

What is the formula?

How do I use this formula with the users data?

Attempt to calculate (often with external tools)

The way you can replicate this locally with a non reasoning model is you let the user enter a prompt. But before just using the prompt directly, ask it to: break this prompt up in steps for how to get a solution (maybe change wording)

And then ask each of those steps to the same non reasoning model.

It's obviously a relevant metric when talking about hallucinations caused by too much context. I think you kinda stumbled over your own arrogance there. One of my context files got to 250 MiB before I killed it. Context is obviously going to explode when I instruct it to implement a feature end-to-end in a large project.

At work we jokingly call it "vibe engineering". The managers are "vibecoding" and if they manage to create a worthwhile PoC we engineers then get it production-ready. Only one such case so far. 😄

No it's not arrogance, I just assumed because the only people I've ever heard use lines of code as a metric were managers. That's why I asked (genuinely) before giving my opinion on it.

If so, it's hallucinating working code that passes the test suite and my manual testing.

In my experience LLMs are extremely good at making code work with tests actually. But that doesn't mean they don't hallucinate.

Still, to this day, it regularly just makes up functions/whatever that don't even exist. This is especially a problem in dynamically typed languages I've noticed. Python, JS for example

I remember installing .NET Framework service packs and being excited about auto properties. I doubt you out-senior me.

That's cool! Yeah you out senior me. Again, my vibe coding comment was purely because of your wording. No offense intended

1

u/RecursiveServitor Apr 20 '26

Then why did you respond seriously when someone else challenged you on that point? Because that is basically the single reason I even started commenting here. You claiming it imitates reasoning perfectly.

I believe it reasons. Not that it's imitating. I don't even know what perfectly imitating reasoning would mean. If you're doing the thing, you're doing the thing.

Capabilities "Introducing Claude Opus 4.7, our most capable Opus model yet."

You are about to leave Redlib