links from March 2025
Not sure what tangible changes this might produce considering Elon already owned both but, xAI acquired X
xAI and X’s futures are intertwined. Today, we officially take the step to combine the data, models, compute, distribution and talent. This combination will unlock immense potential by blending xAI’s advanced AI capability and expertise with X’s massive reach.
The next version of docker (v4.40) will add native llm capability to the docker cli. Docker Model Runner is not yet publicly released, but adds commands like docker model run
that will run LLM models outside of containers. Initial reports look promising and may be nice replacement for running llama.cpp, koboldcpp or ollama locally.
OpenAI are adopting MCP. They’ve already integrated it with their Agents SDK and note they’re:
also working on MCP support for the OpenAI API and ChatGPT desktop app
It’s been some time since I wrote a browser extension and it couldn’t be easier to do so with wxt, the Next-gen Web Extension Framework. Based on vite, it can export to both chrome & firefox and has an HMR dev mode that’s very familiar.
MCP C# SDK allows C# developers to build MCP clients and servers in the dotnet ecosystem.
Google Deepmind launch Gemini 2.5 Pro, their latest SOTA model, which debuts at #1 on the LLM Leaderboard. No pricing yet, though it’s available for free via Google AI Studio and OpenRouter.
Burnt a couple of nights chasing this one. Node was throwing ETIMEDOUT AggregateError
when hitting https://api.spacetraders.io/v2
even though curl
to the same address had no issue. Turns out node attempts resolve ipv6
first, and in the case of SpaceTraders API ipv6
can be resolved but the API doesn’t support it, however, the resolution time goes over 250ms from Australia to wherever it’s hosted, throwing ETIMEDOUT
. The solution is to:
javascript
net.setDefaultAutoSelectFamilyAttemptTimeout(500)
A couple of videos from Nvidia GTC:
-
Live at NVIDIA GTC With Acquired
- Cuda lost money for 10 years but now a key contributor to Nvidia’s moat
- Ex-intel CEO said inference cost is 10,000x too expensive and qpu, quantum compute, will be available within 5 years
-
GTC March 2025 Keynote with NVIDIA CEO Jensen Huang
- Tokens/second is everything. This is the purpose of data centres of GPUs and we can call these AI factories.
- Revenue & token/per second are ultimately power limited by how much electricity the AI factory has access to.
- Moore’s Law now applies to energy, not hardware.
- How big/smart the model is needs to be managed against how many tokens/second/per user. Bigger models require more compute taking away capacity from tokens/second/user and serving more users at once takes capacity away from the datacenter. The sweet spot is somewhere in the middle and represented by the area under the curve.
- Nvidia’s new open source Dynamo software:
Efficiently orchestrating and coordinating AI inference requests across a large fleet of GPUs is crucial to ensuring that AI factories run at the lowest possible cost to maximize token revenue generation.
- Reasoning in LLMs improves accuracy with 20x tokens, 100x compute (llama 3.3 70b, 8x h100 vs deepseek r1 16x h100)
- Hopper to Blackwell = 25-40x better inference performance, obliterating previous spend on Hopper. While impressive, I don’t know how lab investors recoop this or subsequent hardware investments.
- Short term roadmap
- Blackwell Ultra - 2nd half 2025
- Vera Rubin - 2nd half 2026
- Rubin Ultra - 2nd half 2027
- Hopper > Rubin - 900x perf, 0.03 cost
- Robotics is next trillion-dollar industry
Considering Snagit transitioned to an Annual Subscription and the price went up a lot, after their 5-year grandfathering of the maintenance support ends, I might need to switch.
Claude now has web search but it’s only
available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.
OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.
Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services
A rules-based pattern is emerging for helping agentic workflows produce better results. Examples include GreatScottyMac’s RooFlow and, Geoff Huntley’s specs and stdlib approaches.
Brendan Humphrey on Vibe Coding, aligns with my own thinking on vibe coding:
…these tools must be carefully supervised by skilled engineers, particularly for production tasks. Engineers need to guide, assess, correct, and ultimately own the output as if they had written every line themselves.
Smashing Create PR with vibe coding output amounts to an attack on the PR process:
Generating vast amounts of code from single prompts effectively DoS attacks reviewers, overwhelming their capacity for meaningful assessment
But there is still some value:
Currently we see one narrow use case where vibe coding is exciting: spikes, proofs of concept, and prototypes. These are always throwaway code. LLM-assisted generation offers enormous value in rapidly testing and validating ideas with implementations we will ultimately discard.
Eugene Yan’s blog - Senior Applied Scientist at Amazon
Simon Willison’s blog - ai researcher, independent open source developer, co-creator of the Django Web Framework
Hamel Husain’s blog - independent AI consultant
Evalite - a vitest-based eval runner by Matt Pocock.