Links from 2025-02

Introducing GPT-4.5 - hallucinations down, accuracy up, non-reasoning. Rolling out to pro + api. Doesn’t look like anyone will be coding with it any time soon with this type of api pricing:

Input: $75.00 / 1M tokens
Cached input: $37.50 / 1M tokens
Output: $150.00 / 1M tokens

And then in a tweet from sama:

this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!

ai, llm, openai • 2025-02-28 • 9:18am

open llama - A permissively licensed open source reproduction of Meta AI’s LLaMA with 3B, 7B and 13B models trained on RedPajama dataset and other sources, providing a drop-in replacement for LLaMA.

ai, llm, model • 2025-02-28 • 9:15am

What We’ve Learned From A Year of Building with LLMs is a huge overview of findings building LLM applications from:

starting with prompting when prototyping new applications

all the way through:

what is a completely infeasible floor demo or research paper today will become a premium feature in a few years and then a commodity shortly after

ai, llm • 2025-02-28 • 9:15am

Emerging Patterns in Building GenAI Products - a look at a number of different gen-ai patterns across evals, embeddings, RAG, Guardrails, fine tuning.

ai, evals, patterns • 2025-02-28 • 9:14am

Fuck you, show me the prompt is an investigation into extracting the actual prompt that is sent to a model by llm abstraction libraries.

There are many libraries that aim to make the output of your LLMs better by re-writing or constructing the prompt for you. The prompts sent by these tools to the LLM is a natural language description of what these tools are doing, and is the fastest way to understand how they work.

ai, prompting, llm • 2025-02-28 • 9:13am

The Novice’s LLM Training Guide - a look at fine-tuning LLMs using Low Rank Adaption (LoRA)

ai, llm, lora • 2025-02-28 • 9:11am

Claude 3.7 Sonnet and Claude Code - hybrid reasoning model, same price as claude 3.5, improved accuracy. Also a terminal-based agentic coding tool, however this requires an api key.

ai, llm, claude, anthropic • 2025-02-25 • 6:55am

ChatGPT Deep Research hallucinates

it claimed again to produce a complete dataset but in fact only produced ~7 lines, with a placeholder for the other ~3000.

ai, llm, deep-research, chatgpt, openai • 2025-02-20 • 9:15am

Grok3 set to launch though after the “launch” it appears that:

Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.

ai, llm, model, grok • 2025-02-18 • 11:14am

Introducing Perplexity Deep Research - Perplexity undercuts OpenAI by releasing their own Deep Research, for free.

ai, llm, deep-research, perplexity • 2025-02-17 • 5:46am

Building a SNAP LLM eval - the first write-up in a series about our process of building an “eval” — evaluation — to assess how well AI models perform on prompts

ai, llm, evals • 2025-02-16 • 9:02am

Your AI product needs evals - How to construct domain-specific LLM evaluation systems to improve AI by iterating quickly.

ai, llm, evals • 2025-02-16 • 9:01am

OpenAI roadmap update for GPT-4.5 and GPT-5 from sama, which indicates that the model they’ve been cooking for some time can no longer be considered GPT-5.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

ai, llm, openai • 2025-02-13 • 8:44am

Convert a figma design to code - After theoretically setting up claude to read/write from jira & github I remarked that my only job left would be to copy a screenshot from figma into the prompt and ask it to build the UI, but it looks that can can be integrated too.

ai, llm, claude, figma, mcp • 2025-02-13 • 6:23am

Deepseek vs Claude PR Reviews - also demonstrates the value of being able to quickly switch between models.

ai, llm, claude, deepseek • 2025-02-09 • 10:29am

The End of Programming as We Know It is another argument against ai replacing programmingers and for ai extending programmer capability.

ai, llm, future • 2025-02-08 • 9:17am

OpenAI’s Deep Research: Novel User Applications and Community Insights - I prompted Deep Research to research itself

Investigate latest community news of OpenAI’s Deep research function and what novel approaches people are finding it useful for.

ai, llm, chatgpt, deep-research, openai • 2025-02-07 • 5:49pm

LLM Cost Analysis 2023-2026 - I asked ChatGPT deep research to generate a report that investigates the $/million tokens over time across providers and predict the price of tokens in 2026

Prepare a report that investigates the cost per million token of LLMs since 2023, with estimations on what the cost will be in 2026.

ai, llm, chatgpt, deep-research, openai • 2025-02-07 • 5:13pm

GitHub Copilot: The agent awakens - Just when you thought it was Cursor/Claude Desktop/Roo/Cline, m$ reminds you they’ll eat your lunch.

ai, copilot, microsoft, code-assist • 2025-02-07 • 10:23am

Open-source DeepResearch – Freeing our search agents - after the release of OpenAI’s Deep Research, Hugging Face deliver an open source alternative in 24 hours.

ai, llm, deep-research, hugging-face • 2025-02-05 • 8:55pm

Getting AI-powered features past the post-MVP slump

The non-negotiable first step in systematically improving your AI systems is establishing a solid feedback loop.

ai, llm, evals • 2025-02-05 • 8:12am

Beyond the AI MVP: What it really takes

almost no one is talking about how to integrate this stuff into a normal software development lifecycle. There’s a reason no one is talking about this: it’s because most teams, even those at billion-dollar companies, just haven’t built this yet.

ai, complexity • 2025-02-02 • 9:52am

OpenAI o3-mini released.

This model continues our track record of driving down the cost of intelligence—reducing per-token pricing by 95% since launching GPT‑4—while maintaining top-tier reasoning capabilities.

ai, model, openai • 2025-02-01 • 9:06am