llm
links#
posts#
- You Might Not Need an AI Framework
- Running DeepSeek R1 Locally
- LLM Agent Assisted Coding
- Data Processing with Large Language Models
links#
A summary of the security concerns with Model Context Protocol.
Meta releases first Llama 4 models:
- Llama 4 Scout, 109B parameters, 10M context length
- Llama 4 Maverick, 400B parameters, 1M context length
The above are distilled from an unreleased, still in training, larger model:
- Llama 4 Behemoth, 2T parameters
The free lunch that was smashing GitHub Copilot’s premium models with unlimited requests is over and rate limits are coming in May. Any requests not hitting the base model (currently: OpenAI GPT-4o) are considered premium, so Sonnet, Gemini, o1 etc. The rate limits are as follows:
Copilot Free: 50/month
Copilot Pro: 300/month
Copilot Pro+ 1500/month
Copilot Business: 300/month
Copilot Enterprise: 1000/month
The premium models also have a rate multiplier where for example Claude Sonnet 3.5/3.7 have a rate mulitplier of 1, and o1 and GPT-4.5 have a rate multiplier of 10 and 50 respectively.
The news comes with an announcement of the GitHub Copilot Pro+ plan, a new individual tier priced at 39USD/month with 1500 requests/month to premium models.
Gemini 2.5 Pro pricing is out. For prompts less than 200k tokens:
Input: $1.25/mil tokens
Output: $10/mil tokens
For prompts over 200k tokens:
Input: $2.50/mil tokens
Output: $15/mil tokens
Importantly this bumps the rate limit to at least 150RPM & 1,000RPD on Tier 1.
The next version of docker (v4.40) will add native llm capability to the docker cli. Docker Model Runner is not yet publicly released, but adds commands like docker model run
that will run LLM models outside of containers. Initial reports look promising and may be nice replacement for running llama.cpp, koboldcpp or ollama locally.
Google Deepmind launch Gemini 2.5 Pro, their latest SOTA model, which debuts at #1 on the LLM Leaderboard. No pricing yet, though it’s available for free via Google AI Studio and OpenRouter.
Claude now has web search but it’s only
available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.
OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.
Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services
Evalite - a vitest-based eval runner by Matt Pocock.
Introducing GPT-4.5 - hallucinations down, accuracy up, non-reasoning. Rolling out to pro + api. Doesn’t look like anyone will be coding with it any time soon with this type of api pricing:
Input: $75.00 / 1M tokens
Cached input: $37.50 / 1M tokens
Output: $150.00 / 1M tokens
And then in a tweet from sama:
this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!
What We’ve Learned From A Year of Building with LLMs is a huge overview of findings building LLM applications from:
starting with prompting when prototyping new applications
all the way through:
what is a completely infeasible floor demo or research paper today will become a premium feature in a few years and then a commodity shortly after
Fuck you, show me the prompt is an investigation into extracting the actual prompt that is sent to a model by llm abstraction libraries.
There are many libraries that aim to make the output of your LLMs better by re-writing or constructing the prompt for you. The prompts sent by these tools to the LLM is a natural language description of what these tools are doing, and is the fastest way to understand how they work.
The Novice’s LLM Training Guide - a look at fine-tuning LLMs using Low Rank Adaption (LoRA)
Claude 3.7 Sonnet and Claude Code - hybrid reasoning model, same price as claude 3.5, improved accuracy. Also a terminal-based agentic coding tool, however this requires an api key.
ChatGPT Deep Research hallucinates
it claimed again to produce a complete dataset but in fact only produced ~7 lines, with a placeholder for the other ~3000.
Grok3 set to launch though after the “launch” it appears that:
Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.
Introducing Perplexity Deep Research - Perplexity undercuts OpenAI by releasing their own Deep Research, for free.
Building a SNAP LLM eval - the first write-up in a series about our process of building an “eval” — evaluation — to assess how well AI models perform on prompts
Your AI product needs evals - How to construct domain-specific LLM evaluation systems to improve AI by iterating quickly.
OpenAI roadmap update for GPT-4.5 and GPT-5 from sama, which indicates that the model they’ve been cooking for some time can no longer be considered GPT-5.
We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.
Convert a figma design to code - After theoretically setting up claude to read/write from jira & github I remarked that my only job left would be to copy a screenshot from figma into the prompt and ask it to build the UI, but it looks that can can be integrated too.
Deepseek vs Claude PR Reviews - also demonstrates the value of being able to quickly switch between models.
The End of Programming as We Know It is another argument against ai replacing programmingers and for ai extending programmer capability.
OpenAI’s Deep Research: Novel User Applications and Community Insights - I prompted Deep Research to research itself
Investigate latest community news of OpenAI’s Deep research function and what novel approaches people are finding it useful for.
LLM Cost Analysis 2023-2026 - I asked ChatGPT deep research to generate a report that investigates the $/million tokens over time across providers and predict the price of tokens in 2026
Prepare a report that investigates the cost per million token of LLMs since 2023, with estimations on what the cost will be in 2026.
Open-source DeepResearch – Freeing our search agents - after the release of OpenAI’s Deep Research, Hugging Face deliver an open source alternative in 24 hours.
Getting AI-powered features past the post-MVP slump
The non-negotiable first step in systematically improving your AI systems is establishing a solid feedback loop.
On DeepSeek and Export Controls - CEO of Anthropic shares his take on DeepSeek - It’s not as good as everyone says it is, but China needs to be further restricted from chips anyway.
Nvidia releases a 72b multimodal LLM. The article claims it’s open source, but it appears to only have open weights and is otherwise commercially restricted.
Introducing OpenAI o1-preview, a thinking/reasoning model.
As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT‑4o will be more capable in the near term.
Mistral announce Mistral Large 2
Mistral Large 2 has a 128k context window and supports dozens of languages
Prompting Fundamentals and How to Apply them Effectively has some really good prompting guidance.
I pondered whether LLMs would be any good at solving for Vehicle Routing Problem - thankfully I don’t need to investigate as arxiv.org once again delivers. TL;DR - yes, as long as you’re happy with it being wrong 30-40% of the time.
Microsoft releases Phi-3 vision
a 4.2B parameter multimodal model with language and vision capabilities.
I’ve been running koboldcpp in wsl, but the Tcl/tk UI is tiny. This looking interesting tho, and already in a container.
Marc Andreessen on navigating a model’s latent space via prompting.
We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.
Introducing the next generation of Claude
The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.
An explanation of model size including an introduction to model quantization.
GPT4All runs large language models privately on everyday desktops & laptops