2025-03-25 2:54pm
A couple of videos from Nvidia GTC:
-
Live at NVIDIA GTC With Acquired
- Cuda lost money for 10 years but now a key contributor to Nvidia’s moat
- Ex-intel CEO said inference cost is 10,000x too expensive and qpu, quantum compute, will be available within 5 years
-
GTC March 2025 Keynote with NVIDIA CEO Jensen Huang
- Tokens/second is everything. This is the purpose of data centres of GPUs and we can call these AI factories.
- Revenue & token/per second are ultimately power limited by how much electricity the AI factory has access to.
- Moore’s Law now applies to energy, not hardware.
- How big/smart the model is needs to be managed against how many tokens/second/per user. Bigger models require more compute taking away capacity from tokens/second/user and serving more users at once takes capacity away from the datacenter. The sweet spot is somewhere in the middle and represented by the area under the curve.
- Nvidia’s new open source Dynamo software:
Efficiently orchestrating and coordinating AI inference requests across a large fleet of GPUs is crucial to ensuring that AI factories run at the lowest possible cost to maximize token revenue generation.
- Reasoning in LLMs improves accuracy with 20x tokens, 100x compute (llama 3.3 70b, 8x h100 vs deepseek r1 16x h100)
- Hopper to Blackwell = 25-40x better inference performance, obliterating previous spend on Hopper. While impressive, I don’t know how lab investors recoop this or subsequent hardware investments.
- Short term roadmap
- Blackwell Ultra - 2nd half 2025
- Vera Rubin - 2nd half 2026
- Rubin Ultra - 2nd half 2027
- Hopper > Rubin - 900x perf, 0.03 cost
- Robotics is next trillion-dollar industry