stafford williams

A summary of the security concerns with Model Context Protocol.

ai, llm, mcp • 2025-04-08 • 12:55pm

Llama 4 Scout, 109B parameters, 10M context length
Llama 4 Maverick, 400B parameters, 1M context length

The above are distilled from an unreleased, still in training, larger model:

Llama 4 Behemoth, 2T parameters

ai, llm, model, llama, meta • 2025-04-06 • 9:12am

The free lunch that was smashing GitHub Copilot’s premium models with unlimited requests is over and rate limits are coming in May. Any requests not hitting the base model (currently: OpenAI GPT-4o) are considered premium, so Sonnet, Gemini, o1 etc. The rate limits are as follows:

Copilot Free: 50/month
Copilot Pro: 300/month
Copilot Pro+ 1500/month
Copilot Business: 300/month
Copilot Enterprise: 1000/month

The premium models also have a rate multiplier where for example Claude Sonnet 3.5/3.7 have a rate mulitplier of 1, and o1 and GPT-4.5 have a rate multiplier of 10 and 50 respectively.

The news comes with an announcement of the GitHub Copilot Pro+ plan, a new individual tier priced at 39USD/month with 1500 requests/month to premium models.

ai, llm, github-copilot, microsoft • 2025-04-05 • 11:20am •

Gemini 2.5 Pro pricing is out. For prompts less than 200k tokens:

Input: $1.25/mil tokens
Output: $10/mil tokens

For prompts over 200k tokens:

Input: $2.50/mil tokens
Output: $15/mil tokens

Importantly this bumps the rate limit to at least 150RPM & 1,000RPD on Tier 1.

ai, llm, model, gemini, google • 2025-04-05 • 10:38am

The next version of docker (v4.40) will add native llm capability to the docker cli. Docker Model Runner is not yet publicly released, but adds commands like docker model run that will run LLM models outside of containers. Initial reports look promising and may be nice replacement for running llama.cpp, koboldcpp or ollama locally.

docker, ai, llm • 2025-03-29 • 10:29am

Google Deepmind launch Gemini 2.5 Pro, their latest SOTA model, which debuts at #1 on the LLM Leaderboard. No pricing yet, though it’s available for free via Google AI Studio and OpenRouter.

ai, llm, model, gemini, google • 2025-03-26 • 8:30am

Claude now has web search but it’s only

available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.

ai, llm, anthropic, moat • 2025-03-21 • 7:01am

OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.

Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services

ai, model, openai, llm • 2025-03-21 • 6:47am

Evalite - a vitest-based eval runner by Matt Pocock.

ai, llm, evals, vitest • 2025-03-07 • 3:11pm

Introducing GPT-4.5 - hallucinations down, accuracy up, non-reasoning. Rolling out to pro + api. Doesn’t look like anyone will be coding with it any time soon with this type of api pricing:

Input: $75.00 / 1M tokens
Cached input: $37.50 / 1M tokens
Output: $150.00 / 1M tokens

And then in a tweet from sama:

this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!

ai, llm, openai • 2025-02-28 • 9:18am

open llama - A permissively licensed open source reproduction of Meta AI’s LLaMA with 3B, 7B and 13B models trained on RedPajama dataset and other sources, providing a drop-in replacement for LLaMA.

ai, llm, model • 2025-02-28 • 9:15am

What We’ve Learned From A Year of Building with LLMs is a huge overview of findings building LLM applications from:

starting with prompting when prototyping new applications

all the way through:

what is a completely infeasible floor demo or research paper today will become a premium feature in a few years and then a commodity shortly after

ai, llm • 2025-02-28 • 9:15am

Fuck you, show me the prompt is an investigation into extracting the actual prompt that is sent to a model by llm abstraction libraries.

There are many libraries that aim to make the output of your LLMs better by re-writing or constructing the prompt for you. The prompts sent by these tools to the LLM is a natural language description of what these tools are doing, and is the fastest way to understand how they work.

ai, prompting, llm • 2025-02-28 • 9:13am

The Novice’s LLM Training Guide - a look at fine-tuning LLMs using Low Rank Adaption (LoRA)

ai, llm, lora • 2025-02-28 • 9:11am

Claude 3.7 Sonnet and Claude Code - hybrid reasoning model, same price as claude 3.5, improved accuracy. Also a terminal-based agentic coding tool, however this requires an api key.

ai, llm, claude, anthropic • 2025-02-25 • 6:55am

ChatGPT Deep Research hallucinates

it claimed again to produce a complete dataset but in fact only produced ~7 lines, with a placeholder for the other ~3000.

ai, llm, deep-research, chatgpt, openai • 2025-02-20 • 9:15am

Grok3 set to launch though after the “launch” it appears that:

Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.

ai, llm, model, grok • 2025-02-18 • 11:14am

Introducing Perplexity Deep Research - Perplexity undercuts OpenAI by releasing their own Deep Research, for free.

ai, llm, deep-research, perplexity • 2025-02-17 • 5:46am

Building a SNAP LLM eval - the first write-up in a series about our process of building an “eval” — evaluation — to assess how well AI models perform on prompts

ai, llm, evals • 2025-02-16 • 9:02am

Your AI product needs evals - How to construct domain-specific LLM evaluation systems to improve AI by iterating quickly.

ai, llm, evals • 2025-02-16 • 9:01am

OpenAI roadmap update for GPT-4.5 and GPT-5 from sama, which indicates that the model they’ve been cooking for some time can no longer be considered GPT-5.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

ai, llm, openai • 2025-02-13 • 8:44am

Convert a figma design to code - After theoretically setting up claude to read/write from jira & github I remarked that my only job left would be to copy a screenshot from figma into the prompt and ask it to build the UI, but it looks that can can be integrated too.

ai, llm, claude, figma, mcp • 2025-02-13 • 6:23am

Deepseek vs Claude PR Reviews - also demonstrates the value of being able to quickly switch between models.

ai, llm, claude, deepseek • 2025-02-09 • 10:29am

The End of Programming as We Know It is another argument against ai replacing programmingers and for ai extending programmer capability.

ai, llm, future • 2025-02-08 • 9:17am

OpenAI’s Deep Research: Novel User Applications and Community Insights - I prompted Deep Research to research itself

Investigate latest community news of OpenAI’s Deep research function and what novel approaches people are finding it useful for.

ai, llm, chatgpt, deep-research, openai • 2025-02-07 • 5:49pm

LLM Cost Analysis 2023-2026 - I asked ChatGPT deep research to generate a report that investigates the $/million tokens over time across providers and predict the price of tokens in 2026

Prepare a report that investigates the cost per million token of LLMs since 2023, with estimations on what the cost will be in 2026.

ai, llm, chatgpt, deep-research, openai • 2025-02-07 • 5:13pm

Open-source DeepResearch – Freeing our search agents - after the release of OpenAI’s Deep Research, Hugging Face deliver an open source alternative in 24 hours.

ai, llm, deep-research, hugging-face • 2025-02-05 • 8:55pm

Getting AI-powered features past the post-MVP slump

The non-negotiable first step in systematically improving your AI systems is establishing a solid feedback loop.

ai, llm, evals • 2025-02-05 • 8:12am

On DeepSeek and Export Controls - CEO of Anthropic shares his take on DeepSeek - It’s not as good as everyone says it is, but China needs to be further restricted from chips anyway.

ai, llm, deepseek, anthropic • 2025-01-30 • 6:19am

Nvidia releases a 72b multimodal LLM. The article claims it’s open source, but it appears to only have open weights and is otherwise commercially restricted.

ai, model, llm, nvidia • 2024-10-02 • 8:08pm

Introducing OpenAI o1-preview, a thinking/reasoning model.

As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT‑4o will be more capable in the near term.

ai, llm, model, openai • 2024-09-13 • 9:10am

Mistral announce Mistral Large 2

Mistral Large 2 has a 128k context window and supports dozens of languages

ai, llm, model, mistral • 2024-07-25 • 8:44am

Prompting Fundamentals and How to Apply them Effectively has some really good prompting guidance.

ai, llm, prompting • 2024-05-26 • 12:00pm

I pondered whether LLMs would be any good at solving for Vehicle Routing Problem - thankfully I don’t need to investigate as arxiv.org once again delivers. TL;DR - yes, as long as you’re happy with it being wrong 30-40% of the time.

ai, vrp, llm • 2024-05-22 • 10:25am

Microsoft releases Phi-3 vision

a 4.2B parameter multimodal model with language and vision capabilities.

ai, llm, model, microsoft • 2024-05-22 • 8:38am

I’ve been running koboldcpp in wsl, but the Tcl/tk UI is tiny. This looking interesting tho, and already in a container.

ai, llm • 2024-05-20 • 1:57pm

Marc Andreessen on navigating a model’s latent space via prompting.

ai, llm, prompting • 2024-05-17 • 2:00pm

Hello GPT-4o

We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.

openai, llm, model, ai • 2024-05-14 • 8:22am

Introducing the next generation of Claude

The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

ai, llm, claude, anthropic, model • 2024-03-06 • 6:23pm

Open LLM leaderboard - A Hugging Face space that compares and benchmarks various open-source Large Language Models in an open and reproducible way.

ai, llm, hugging-face • 2023-08-31 • 2:37pm

Let’s build GPT:from scratch, in code, spelled out - An in-depth video tutorial demonstrating how to implement a GPT model from scratch, providing step-by-step code implementation.

ai, llm • 2023-08-01 • 10:04am

An explanation of model size including an introduction to model quantization.

ai, llm, hugging-face • 2023-07-06 • 12:42am

gpt4all

GPT4All runs large language models privately on everyday desktops & laptops

ai, llm • 2023-06-24 • 10:41pm

llm

posts#

links#