stafford williams

How we built our multi-agent research system speaks to Anthropic’s multi-agent build experiences.

We found that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed single-agent Claude Opus 4 by 90.2% on our internal research eval

However these architectures burn through tokens fast:

In our data, agents typically use about 4× more tokens than chat interactions, and multi-agent systems use about 15× more tokens than chats.

Cognition goes further in Don’t Build Multi-Agents

In some cases, libraries such as swarm by OpenAI and autogen by Microsoft actively push concepts which I believe to be the wrong way of building agents.

anthropic, ai, cognition, patterns • 2025-06-19 • 8:27am

A rules-based pattern is emerging for helping agentic workflows produce better results. Examples include GreatScottyMac’s RooFlow and, Geoff Huntley’s specs and stdlib approaches.

ai, code-assist, patterns • 2025-03-19 • 3:50pm

Emerging Patterns in Building GenAI Products - a look at a number of different gen-ai patterns across evals, embeddings, RAG, Guardrails, fine tuning.

ai, evals, patterns • 2025-02-28 • 9:14am

Mark tests that test overall behaviour other tests expand on in more detail proposes improving Jest’s test output by allowing tests to be marked as dependent on others, helping focus on root causes when failures occur in test suites with overlapping assertions. Utlimately this was closed as not planned and --bail was suggested as an alternative.

testing, jest, patterns • 2023-03-10 • 12:11pm

Why Most Unit Testing is Waste argues that excessive unit testing can be counterproductive and suggests focusing on integration tests that verify valuable business logic. (reddit)

testing, patterns • 2022-07-14 • 11:27am

Write tests. Not too many. Mostly integration argues that integration tests provide the best balance between confidence and speed, suggesting that teams should focus more on integration testing than unit testing, while being mindful not to over-test implementation details.

testing, patterns • 2022-07-14 • 11:27am

Unit testing vs BDD explains how BDD is essentially unit testing done right - focusing on verifying behavior rather than implementation details. Discusses the practical value of Gherkin syntax and argues that regular code can achieve similar readability.

testing, patterns, bdd • 2022-07-14 • 11:27am

UnitTest explains how unit tests are low-level tests focusing on a small part of the software system, written by programmers using testing frameworks, and designed to run quickly. Discusses the distinction between solitary unit tests using test doubles and sociable tests that allow real collaborators.

testing, patterns • 2022-07-14 • 11:27am

Practical Test Pyramid provides a comprehensive guide to structuring automated tests, explaining how to balance different types of tests from unit to end-to-end, and how to effectively implement them in a continuous delivery pipeline.

testing, patterns • 2022-07-14 • 11:27am

Strategic Domain Driven Design with Context Mapping

ddd, patterns • 2020-04-03 • 2:59pm

Ordering microservice, part of the eShopOnContainers repo

ddd, patterns • 2018-08-02 • 10:45am

The business value of using DDD, Vaughn Vernon

ddd, patterns • 2018-07-18 • 8:44am

links#