
software by Stafford Williams
Follow @staff0rdblog more
- 2025-03-16 Embrace Vibe Coding, Know Its Limits
- 2025-03-09 You Might Not Need an AI Framework
- 2025-03-03 Understanding AI-assisted Coding Workflows
- 2025-01-29 Running DeepSeek R1 Locally
- 2025-01-12 LLM Agent Assisted Coding
- 2024-10-14 Comparing Netlify and Azure Static Web Apps
- 2024-04-16 Evaluating ngrok
notes more
- 2025-03-03 [javascript, timezones, vitest] timezones
- 2025-01-14 [bookmarklet, chrome] add text to clipboard
- 2024-12-11 [javascript, timezones] datetime libraries
- 2024-11-18 [macos] links
- 2024-03-13 [azure-b2c] limits
- 2024-03-11 [azure-b2c] phone mfa - microsoft samples
- 2024-03-06 [http] testing
devlog more
- 2024-05-29 [spacetraders-v2] v2.14 - more data browser
- 2024-05-16 [spacetraders-v2] v2.13 - data browser
- 2024-05-04 [spacetraders-v2] v2.12 - reset 2024-04-09
- 2024-04-21 [spacetraders-v2] v2.11 - monitoring markets
- 2024-04-18 [spacetraders-v2] v2.10 - trading contracts
- 2024-03-31 [spacetraders-v2] v2.9 - improved waypoint monitoring
- 2024-03-28 [spacetraders-v2] v2.8 - over supply
links more
-
Live at NVIDIA GTC With Acquired
- Cuda lost money for 10 years but now a key contributor to Nvidia’s moat
- Ex-intel CEO said inference cost is 10,000x too expensive and qpu, quantum compute, will be available within 5 years
-
GTC March 2025 Keynote with NVIDIA CEO Jensen Huang
- Tokens/second is everything. This is the purpose of data centres of GPUs and we can call these AI factories.
- Revenue & token/per second are ultimately power limited by how much electricity the AI factory has access to.
- Moore’s Law now applies to energy, not hardware.
- How big/smart the model is needs to be managed against how many tokens/second/per user. Bigger models require more compute taking away capacity from tokens/second/user and serving more users at once takes capacity away from the datacenter. The sweet spot is somewhere in the middle and represented by the area under the curve.
- Nvidia’s new open source Dynamo software:
Efficiently orchestrating and coordinating AI inference requests across a large fleet of GPUs is crucial to ensuring that AI factories run at the lowest possible cost to maximize token revenue generation.
- Reasoning in LLMs improves accuracy with 20x tokens, 100x compute (llama 3.3 70b, 8x h100 vs deepseek r1 16x h100)
- Hopper to Blackwell = 25-40x better inference performance, obliterating previous spend on Hopper. While impressive, I don’t know how lab investors recoop this or subsequent hardware investments.
- Short term roadmap
- Blackwell Ultra - 2nd half 2025
- Vera Rubin - 2nd half 2026
- Rubin Ultra - 2nd half 2027
- Hopper > Rubin - 900x perf, 0.03 cost
- Robotics is next trillion-dollar industry
It’s been some time since I wrote a browser extension and it couldn’t be easier to do so with wxt, the Next-gen Web Extension Framework. Based on vite, it can export to both chrome & firefox and has an HMR dev mode that’s very familiar.
MCP C# SDK allows C# developers to build MCP clients and servers in the dotnet ecosystem.
Google Deepmind launch Gemini 2.5 Pro, their latest SOTA model, which debuts at #1 on the LLM Leaderboard. No pricing yet, though it’s available for free via Google AI Studio and OpenRouter.
Burnt a couple of nights chasing this one. Node was throwing ETIMEDOUT AggregateError
when hitting https://api.spacetraders.io/v2
even though curl
to the same address had no issue. Turns out node attempts resolve ipv6
first, and in the case of SpaceTraders API ipv6
can be resolved but the API doesn’t support it, however, the resolution time goes over 250ms from Australia to wherever it’s hosted, throwing ETIMEDOUT
. The solution is to:
javascript
net.setDefaultAutoSelectFamilyAttemptTimeout(500)
A couple of videos from Nvidia GTC:
Considering Snagit transitioned to an Annual Subscription and the price went up a lot, after their 5-year grandfathering of the maintenance support ends, I might need to switch.
Claude now has web search but it’s only
available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.
OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.
Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services
A rules-based pattern is emerging for helping agentic workflows produce better results. Examples include GreatScottyMac’s RooFlow and, Geoff Huntley’s specs and stdlib approaches.