Reflections on Training an LLM

How I've changed how I use LLMs

Jun 01, 2026

I recently built an LLM from scratch. This is a reflection based on own changes in how I’m interacting with language models, coding tools and other thoughts that I’ve had since building an LLM.

On context

Whilst many harp on about ‘prompt engineering’, real performance gains are found in managing context. Prompt engineering is actually context engineering. Care about context. Since writing this, someone came up with the grand idea of giving agents the skill of context engineering: see here.

Start with why

Each chat should have a role, objective or raison d’être. Avoid open chats that cover many different topics (like an actual conversation) unless that’s what you’re looking for. Keeping topics compartmentalised ensures your model is outputting tokens that are relevant to your goal. You might start with curiosity, but once you have an objective, start a more directed chat. Remember that when interacting with a chat-interface, the most useful thing it could generate is that unique string of text that will unlock something for you.

Coding agents are a little different. The expected output is usually some functionality, visual component, or digital ‘thing’. The expectations are different, but the context management is just as critical. Use skills and planning extensively - keep the objectives clear.

Have a clear role, objective, or raison d’etre for each chat or agent.
If you are unclear or exploring a concept, start with a separate exploratory chat until you are clearer.

Rephrasing > replying

I used to reply to models when they got things wrong. Training an LLM taught me this is probably one of the worst ways to manage the context window of an LLM and a surefire way to waste tokens in the long run. Now, I rewrite my messages to be clearer, more detailed, and to capture all of the right ideas in the prompt. Replying to the model when things go wrong is only worthwhile if the problem is one that has a high likelihood of recurring - the model will have the problem and solution in the chat context. Approaches such as gbrain would have this error stored in the memory. Most of the time, you’re better off going back to where the conversation branched, and continuing on the topic you started with.

Go back and edit your responses, rather than telling the AI it’s wrong or that there is an error.
Use forks to keep the conversation ‘on topic’.

Each word is precious

Every word is a token, and each token has a group of other tokens related to it in the model’s embedding space. This is why you will sometimes see prompts that have CAPITALS to emphasise - they are tokens, and they have different values in the embedding space than non-capitals, and because it does make a difference. It’s worth investing time making sure the tokens in your context window are contributing towards the goal of the chat. And remember, every single token in your context window matters.

Be as precise and specific as you can with every interaction.

If something isn’t clear, get the language model to give you a set of options.

Expansive vs contractive context engineering.

Expansive Context Engineering

Expansive context engineering is the process in which the language model is ‘exploring’ concepts. Prompts such as:

‘List me all the possibilities of X’
‘What are all the versions of Y that exist’
‘What are my options in doing X’
‘What are all the ways in that Y can go wrong’

are very useful for exploring what paths you have available. They also leverage the most valuable components of modern LLMs - their vast training data and capability to research, gather and compress information.

Exploratory processes are like a messy art studio - lots of different concepts, in different places, at different times, for different reasons. They are really good for understanding ‘what could be’. However, once you’ve explored the different options and have chosen one, it’s then important to begin a contractive process.

Contractive Context Engineering

Contractive context engineering is the opposite of expansive context engineering. This approach is used when we want to focus in, knuckle down, and close in on one idea. For this, its important to go into the details and to focus in on one topic - having your expansive context window in this process will lead to confused and unrelated outputs from the LLM. Of course, knowledge is hierarchical - there are always layers of of depth to a topic. Be clear on your why, and what a good outcome from the chat looks like.

Thinking in Diamonds

The double diamond is an approach originally born out of the world of design, but it’s utility is universally adaptable and can be though of as an approach to understand a problem and focus in on solving its root cause. It is particularly useful when treading through ambiguous territory.

I’ve found thinking in diamonds to be useful when handling agents - treat each stage as its own chat and iterate between expansive and contractive approaches until you reach your goal.

Skills

If you want a quick way of ensuring your chat models have the correct context to perform in a given task, then look at agent skills. There are many places that are now hosting these skills. They are worth looking at, using, deleting and trying to find a set of skills that work for you and your use cases.

Memory & keeping things tidy

If you have memory enabled inside your Claude, ChatGPT, Grok subscriptions, then any time you prompt the models it will be referencing past chats when it generates text. As much as it’s important to ensure you switch chats when necessary, it’s equally important to delete chats that are plain wrong or have aged. This is why Anthropic introduced a dream mode. Not all AI services have such a mode, and none do this automatically. Ensure you delete chats that have shown to be plain wrong or that have strayed off path. Keep as many chats aligned with the truth as possible.

Router files

Router files are files which describe what to do under a given condition. The most basic use of router files is to find things:

[photos] → ~/photos
[videos] → ~/videos

More complex approaches also define instructions

[user asks for a photo] → [run script-a.sh, copy output, run script-b.sh]

Although it has potential to be token intensive, creating ‘router files’ - files that hold conditional statements, becomes far more repeatable and easily extensible than building a harness to do the same.

State Management

State management has been a challenge with language models since its the release of ChatGPT, and there is still no best practice. Although we may have agentic orchestrators, passing context into sub-agents, and ensuring structures remain consistent across many processes are both problems with no clear solution.

Track states in files

Approaches such as gstack attempt to manage this through tracking different states in files - explicitly stating which states to track, and directly instructing the agent to edit the file in order to track the state.

Track states in schemas

Using orchestration frameworks such as Langgraph enable state tracking over an entire network of agents.

Concluding Remarks

The biggest unlock for me in training an LLM was understanding the underlying mechanics. I started out with two questions:

How exactly is it generating these words (tokens)
How do I influence these systems to generate text that is more closely aligned to my goals?

My biggest realisation was just how important context was in keeping system outputs aligned with my goal. Context management is fundamental to ensuring agents perform well. And, knowing when to end a chat, move to a new one or fork a new chat is now much clearer; although there are no hard-and-fast instructions.

I would recommend anyone who is working closely with language models to build one from build one from scratch. It gives you a perspective on why they work the way that they do. Taking the time to do so will lead to you using LLMs more effectively.

Seb's Substack

Discussion about this post

Ready for more?