evals
links
Evalite - a vitest-based eval runner by Matt Pocock.
Emerging Patterns in Building GenAI Products - a look at a number of different gen-ai patterns across evals, embeddings, RAG, Guardrails, fine tuning.
Building a SNAP LLM eval - the first write-up in a series about our process of building an “eval” — evaluation — to assess how well AI models perform on prompts
Your AI product needs evals - How to construct domain-specific LLM evaluation systems to improve AI by iterating quickly.
Getting AI-powered features past the post-MVP slump
The non-negotiable first step in systematically improving your AI systems is establishing a solid feedback loop.