Agent orchestration

So the agent and I determined what it needed to do and it built a plan and the plan was split into multiple steps with the intention of progressing one step at a time with the reasoning being I want to use as little of the context window as possible for each step so quality remains high. I had a progress file and it would check off each step as it was done, and I’d clear the context window between steps. This approach revealed some problems:

During the implementation of any step, the agent might discover that the effort required to complete the step is actually larger than expected and it will blow the context window. While the agent is working the agent itself cannot monitor this (tooling limitation).
Once the context window is reached, quality drops because any context engineering we did at the start and/or any learnings we did is lossy compressed. The agent now has a poorer idea of what level of quality or completeness the step is at. It will still have the initial prompt however which directs it to complete the step, and so it updates the progress file noting that the step has been completed.
Due to this, I was discovering that the guidance within the plan was not being completely followed. I had to (have the agent) distill a RULES.md and whenever the agent completed a step, it would need to review whether the changes it had made abided by the RULES.md. However this extra effort has the same effect on context - before this sub-step(?) is performed, it should clear the context window for the same reasons as above.
This is a self-reinforcing problem where the size of steps may change, and the number of steps we need may change, and we can’t rely on the non-deterministic nature of the LLM to perform all of the steps to the quality we expect including dynamically updating the steps or adding new ones.

The requirement of any single step, and whether it’s been completed, needs to be persisted and updated outside the agent’s control. At the end of an agent’s execution of any step, when an agent indicates it’s finished the work, the context window usage needs to be deterministically checked and if it popped, the original step needs to be marked as incomplete, and a new step needs to be added (primed with a summary of what occurred during this execution) to dive deeper on the expected effort of the step. The result of this likely results in the step being split up into smaller steps and process completes. While the agent is responsible for generating the requirement of the step, and indicating that (it thinks) the last execution has completed a step, the actual status change of the step should include deterministic verifications that run outside of the agent’s execution loop.

If this was helpful, please share: