Jun 4, 2025 3 min read LLM

Context is everything

Interacting with an LLM, whether as a ChatGPT user or an engineer building AI agents, inevitably brings you face-to-face with context management. Today's models are impressively intelligent, yet without clear instructions and situational details, they "hallucinate"-much like a person would in similar circumstances.

Imagine asking a stranger, "What’s the best way to invite John for a walk?" Without knowing John, your relationship, or your intentions, their advice would be a shot in the dark. LLMs, bound by their instructions, will attempt an answer, likely a poor one. But provide context-John is a new colleague you'd like to know better-and the advice sharpens.

Somehow, when it comes to people, even super-smart people, we do not expect them to magically know everything about our circumstances. Yet when a mathematical model in similar circumstances gives us an answer based on no contextual understanding we are disappointed.

With LLMs, as with people, context is everything.

Bottlenecks can't be eliminated, just moved around

I was fortunate to work with Dr. Robber Al-Jaar during my first job in Canada in late 2011. At Silanis Technologies (now OneSpan), I was engaged in building my first performance tests for E-Sign Live, a product Robby, then a VP, deeply cared for. One evening, he found me working late-we were the only two left-and we started chatting. It turned out that he started in quality assurance and had a grove of experience in this domain. His approach to advice and communication remains a lesson in emotional intelligence for me.

The reason I remember this moment because that evening he taught me something that became a pillar mental model for me as an engineer. To this day I look at systems through prism of it. He said "Bottlenecks can't be eliminated, they can only be moved around".

Let me elaborate. Consider an application running slow. You identify a database query bottleneck, optimize it, and the app speeds up-but not enough. Next, a slow API call is the culprit. You implement caching, and performance improves again, yet it's still not optimal. This cycle continues until you hit fundamental limits, like the speed of light for data transfer or hardware constraints.

There will always be a bottleneck.

Each domain has unique compromises and trade-offs

Engineering is a discipline of trade-offs. Ask any seasoned engineer a question, and their answer often begins with "It depends..." (context, eh?). We navigate choices like consistency versus availability, cost versus redundancy, or scalability versus simplicity. The internet is full with articles proclaiming "The Solution": "The best Database is...", "The best Framework is..."; if such absolutes existed, we'd have singular choices, not a multitude.

Each domain, in turn has its own unsolvable problems, like mentioned above availability vs consistency in distributed systems. When it comes to LLMs, this unsolvable problem is context management. Yes there are other issues like evaluation, non-determinism, reasoning limitations, model collapse, etc. None of those are so immediate, so much in-your-face as context management. The worst of all, context is something that you as an engineer or a user will face no matter the state of LLM advancement. LLM can be God-level omnipotent, but if you don't correctly express all the necessary circumstances, you will face “Literal Genie Problem”.

I build with LLMs since early 2023, State of the art then was GPT-3.5 with context window of meager 4k tokens. Now the norm is 128k+, some go all the way to 1M tokens, with the promise to make it even bigger.

Even if the context is infinite, you can't simply put everything into the context because it will cost too much and take too long to process.

Architects of Understanding

And that’s the rub, isn't it? Even with theoretically limitless context windows, the practicalities of cost and processing time bring us right back to earth. It’s like Dr. Robber Al-Jaar said all those years ago: bottlenecks don't just vanish, they shift. With LLMs, the bottleneck of raw context capacity might be easing, but it morphs into new challenges: efficient retrieval, relevance filtering, and avoiding the dreaded "Literal Genie Problem" where the model, lacking nuanced understanding, takes us a bit too literally.

This is where we, as engineers and even savvy users, step into a new role. It's not enough to just talk to these models; we have to become architects of understanding. We must design the context, curate the knowledge, and guide the conversation to bridge the gap between the LLM's vast potential and the specific, grounded intelligence our tasks demand.

So, how do we actually build these bridges of understanding? What are the practical blueprints for becoming effective architects in this new AI landscape? That’s exactly what we'll explore in the next piece. We’ll move from the 'why' of this persistent challenge to the 'how', digging into the strategies, tools, and trade-offs involved in truly mastering context. Get ready to roll up your sleeves!

Bottlenecks can't be eliminated, just moved around

Each domain has unique compromises and trade-offs

Architects of Understanding

Oleksandr (Sasha) Antoshchenko

Comments ( )

You might also like...

Comments ()