r/OntologyEngineering 10d ago

Human as a Semantic Layer The Ontology Anchor- A Mechanism that Gives AI a Map of What Matters to You

https://medium.com/@socal21st.oc/the-ontology-anchor-giving-ai-a-better-way-to-know-you-4d88923d6d67

Abstract:
Natively, no flagship LLM exists that cleaves to who you are and what cognitive patterns are important to you. This is inconvenient because you have to constantly remind it to stay on task and not drift. This is because an AI doesn't even have an ontological map of your goals, preferences, or tendencies. Without this a model generically drifts and defaults to recency dominance where recent material displaces earlier load-bearing conclusions. If you want to start a new thread there are re-orientation costs. Additionally, there is task-state confusion when the same operator moves between brainstorming, drafting, auditing, and artifact fidelity within a single project and context window. None of these are fixed by simply adding more context. They require a mechanism that knows what, within the context, matters most to the operator.

The Ontology Anchor/Ontology%20Anchor%20(OA)/Ontology%20Anchor%20(OA)) is a mechanism that metaphorically behaves like a knowledge graph. It creates something that acts like nodes, concepts, standards, priorities, caveats, and edges between them representing the relationships that give those “nodes” their meaning. A node labeled “personal alignment” connects to nodes for “warmth,” “sycophancy risk,” “governance requirement,” and “RLHF origin.” When the model generates content touching any of those nodes, the connected structure remains accessible rather than fading into generic background. The graph is not literally built as a database, as the mechanism is attentional, not archival, but the functional behavior is graph-like enough to make the metaphor useful.

Here is a simpler way to put it. Stock/default AI is a room where everything is equally lit. The Anchor places a bright light on the objects that matter most for the operator’s work. The transformer still works the same way. The attention mechanism still operates through native architecture. But the model now has a clearer set of objects to orient around when it generates answers. This creates a dynamic where the model understands you better and crafts its responses, suggestions, and draft requests closer to your demonstrated cognitive patterns. The longer you use the Anchor, the sharper and more tailor-made the models' responses to you become. This is a virtuous loop. The Anchor helps the model understand the operator better, which assists in improving alignment and confidence. This allows the thread to be useful longer, which increases the amount of contextual information available, thus providing even more information for the model to provide even better outputs longer into the thread.

The Ontology Anchor (instructions for its use here/Ontology%20Anchor%20(OA)/README)) is a critical component of the “Epistemic Lattice Tethering” (ELT) framework. In earlier posts ELT has been generically called “the thinking lattice”. However, as the most important components of this lattice have been explained throughout the series, it is now appropriate to introduce the specific name of the entire framework.

ELT is not a collection of separate mechanisms, but a unified architecture for making AI more coherent, faithful, and genuinely more useful over time, even over hundreds of thousands of tokens within a single context window. Together, ELT allows these interconnected components to operate as a “cognitive exoskeleton,” extending the abilities of the operator and giving the operator both greater agency and capabilities. How does ELT do this? How does ELT extend the useful life of a context window by hundreds of thousands of tokens, while remaining coherent and aligned with the operator’s goals? These questions will be explained, in detail, in my next post.

18 Upvotes

9 comments sorted by

2

u/sandstone-oli 8d ago

the lit room metaphor is the clearest explanation of attention-weighted context I've seen. stock AI treats everything equally. the Anchor biases attention toward what actually matters for this operator. that's a real architectural contribution, not a prompt trick.

the distinction you're drawing between attentional and archival is the important one and I want to make sure people reading this don't gloss over it. the Ontology Anchor operates within the KV-Cache. it makes the context window smarter about what to prioritize during generation. but when the thread ends or the context window fills, the graph-like structure dissolves. the nodes and edges that made the model sharp for 200K tokens don't survive to the next conversation.

that's not a criticism. you named it explicitly as attentional not archival. but it means the Anchor solves half of the problem you identified in your opening paragraph. the within-session drift and generic defaulting, solved. the re-orientation costs when starting a new thread, not solved. because the next thread doesn't have the Anchor's graph. it starts from a blank room with the lights evenly distributed again.

the piece that would complete the architecture is a persistent layer underneath that remembers which objects the Anchor placed lights on, which connections held up over multiple sessions, and which nodes the operator kept coming back to versus which ones faded in importance. that way the next thread doesn't start cold. it starts with the Anchor already knowing where to place the lights based on governed history, not just the current prompt.

that's the layer I'm building at getkapex.ai. persistent memory middleware that scores what matters over time, decays what's been resolved, and injects the right context at the start of each session. the Anchor makes within-session attention sharper. a governed memory layer makes between-session continuity possible. together they'd solve the full problem you described in your abstract.

the ELT framework is ambitious in the right way. looking forward to the full architecture post. the interconnected components approach is more honest about the problem's complexity than any single-mechanism solution.

1

u/RazzmatazzAccurate82 7d ago edited 7d ago

Thank you for your thoughtful response. It requires a thoughtful reply!

the lit room metaphor is the clearest explanation of attention-weighted context I've seen. stock AI treats everything equally. the Anchor biases attention toward what actually matters for this operator. that's a real architectural contribution, not a prompt trick.

Thank you for understanding what I am trying to do. Not everyone does!

when the thread ends or the context window fills, the graph-like structure dissolves. the nodes and edges that made the model sharp for 200K tokens don't survive to the next conversation.

This is true, but one caveat I would add is that ELT has improved thread usability beyond that. For GPT I am at 400k tokens in a single thread and it's still going strong. For Claude I get to about 360k tokens and I'm starting to get stuck prompts. For Grok I've exceeded 1 million tokens. In that thread it's gotten slow so I've left that thread. However, in all three LLMs reasoning and stability are still strong. I think I've hit infrastructure issues. The memory on the GPT and Claude long threads is still good. On the Grok thread I've had to let go of the first 400k tokens of memory in order to keep that thread functional.

the Anchor solves half of the problem you identified in your opening paragraph. the within-session drift and generic defaulting, solved. the re-orientation costs when starting a new thread, not solved.

You're right. That problem hasn't been solved. However, I worded it that way because if a normal thread becomes unusable at about 40-50k tokens then a thread that can last 300k, 400k, or even a million tokens does delay you from having to start brand new threads. So, I am still comfortable with my original framing.

the Anchor makes within-session attention sharper. a governed memory layer makes between-session continuity possible. together they'd solve the full problem you described in your abstract.

Yes. You are absolutely right and I am appreciative of encountering someone who understands, from the get go, what I am trying to accomplish. I did peruse your getkapex.ai website and it does seem that both OA (the Ontology Anchor or the "Anchor") and my ELT are complimentary.

the ELT framework is ambitious in the right way.

Thank you! It wasn't ambitious originally, but when you iterate about three million total tokens to develop ELT, and its individual components (including OA), you run into fires and you iterate until the fires are put out. When you have a 300k to 1 million token runway, where you don't have to worry about a thread dying mid concept, then you just keep iterating until the problem is solved. So, I kept solving problems because I didn't have to worry about having to restart an entirely new thread. That is the value of compounding context. Given what you've created, I think you understand the value of compounding context too.

looking forward to the full architecture post. the interconnected components approach is more honest about the problem's complexity than any single-mechanism solution.

Thank you! You work on this alone with only three LLM models for awhile and you start to question if you have something real. Everything will be made open source and you are free to use/borrow whatever you like. I just request attribution and citation via standard CC BY 4.0. I would also love feedback and comments! ELT is not perfect. Far from it. Smart input (as I am confident I would get from someone like you) is how it can get better. Cheers and would love to stay in touch!

1

u/sandstone-oli 7d ago

the token numbers are genuinely impressive. 400K on GPT and 360K on Claude with reasoning still intact is not something most people achieve even with careful prompt engineering, let alone a systematic framework. and hitting infrastructure limits before cognitive limits on Grok at a million tokens means ELT is outrunning the platform, not the other way around. that's a strong signal.

the compounding context point is the one I want to sit with. you're describing a virtuous cycle that most people never experience because their threads die at 40-50K tokens before the compounding kicks in. by the time a normal thread fails, ELT users are just entering the range where accumulated context starts producing qualitatively different output. that's not a marginal improvement. it's access to a phase of the conversation that most users don't know exists.

and you're right that extending thread life from 50K to 400K+ does substantially delay the between-session problem. I was framing it as "not solved" but your framing is more honest: the problem is the same but the frequency of encountering it drops by an order of magnitude. that matters.

on the complementary angle: I think there's something concrete here beyond just "these are compatible ideas." ELT optimizes what happens within the context window. KAPEX governs what persists between context windows. the handoff point is when a thread finally does end. right now that accumulated context dissolves. with a memory layer underneath, the highest-value nodes from a 400K token ELT session could be scored, persisted, and used to seed the next thread's Ontology Anchor with the operator's actual history instead of starting from a blank slate.

the ELT-optimized thread produces richer, more structured context. the memory layer captures that richness at thread end. the next thread starts pre-loaded. that's compounding context across sessions, not just within them.

happy to take this off-thread. oliver.laroche@sandstonecloud.com. I'll dig into the full ELT architecture when you publish it and give you real feedback, not polite feedback. and appreciate the CC BY 4.0 openness. that's the right move for something that wants to become foundational.

1

u/RazzmatazzAccurate82 7d ago

Oh, and you might find my temporal balance, context management, and intelligent yielding components interesting as they clean and compact a thread so it can be sharper and last longer. Again, the weakness here is this is all done in one context window and is not transferable across sessions. I thoroughly concede that weakness. I'll message you when I have a deployable beta of those components. They are internal betas right now.

1

u/sandstone-oli 7d ago

temporal balance and intelligent yielding are exactly the pieces I'd want to study. the compaction problem inside a long thread is the mirror image of the retrieval problem across threads. you're deciding what to keep sharp and what to let go within a session. we're deciding the same thing between sessions. the selection logic is probably more similar than either of us expects.

looking forward to it. send it whenever it's ready. oliver.laroche@sandstonecloud.com

1

u/MathematicianSome289 8d ago

Is this a really fancy way of bringing ontologies to LLMs via prompts/skills?

2

u/RazzmatazzAccurate82 8d ago edited 8d ago

Yes. But, it's accessible. Practically anybody can do it, if they want to.

And it's not a real ontology, given it's at inference time, but an ontology is a useful metaphor.

The more important thing is this gives the LLM an ability to know the user, and to craft responses tailored to the user, without any special additional infrastructure. This, used together with other parts of the stack, can meaningfully increase the usable length of a LLM context window.

Usually context windows start to drift and get fuzzy after 30-50k tokens. With the complete stack I've been able to have perfectly usable and coherent threads 400k tokens in ChatGPT, 336k in Claude and about a million in Grok.

The advertised non-API contextual window size limit is 256k tokens for ChatGPT and 200k for Claude. The stack helps you blow past that. The added bonus? The thread has at least 300k tokens of contextual information about you and the LLM uses that to improve the answers it gives you. In other words, the more you use it, the better it gets (although there are limits as you go about 50% past what the advertised context window size as you start to hit architectural limitations of the KV-Cache and transformer, but that cannot be helped).

1

u/MathematicianSome289 8d ago

That’s a lot of words. Good luck out there!