model
links#
Google releases Gemini 2.5 Flash allowing developers to switch between reasoning/non-reasoning at the API level.
$0.15/1mil tokens input
$0.60/1mil tokens output (no reasoning)
$3.50/1mil tokens output (reasoning)
OpenAI has released o3 and o4-mini, their most powerful reasoning models. These models outperform their predecessors while delivering answers in under a minute. OpenAI also launched Codex CLI, an open-source terminal tool, with a $1 million fund to support related projects.
o3 $10/mil tokens input, $40/mil tokens output
o4-mini $1.1/mil tokens input, $4.4/mil tokens output
GPT 4.1 is announced as 3x developer-centric model releases available specifically via API. An internal instruction following eval was used to improve the model’s ability to follow instructions. Context size is 1m tokens. Pricing appears competitive if the new models have similar performance to Gemini/Sonnet:
gpt-4.1 $2/mil tokens input, $8/mil tokens output
gpt-4.1-mini $0.4/mil tokens input, $1.60/mil tokens output
gpt-4.1-nano $0.1/mil tokens input, $0.4/mil tokens output
GPT 4.5 will be deprecated over API in a few months
Meta releases first Llama 4 models:
- Llama 4 Scout, 109B parameters, 10M context length
- Llama 4 Maverick, 400B parameters, 1M context length
The above are distilled from an unreleased, still in training, larger model:
- Llama 4 Behemoth, 2T parameters
Gemini 2.5 Pro pricing is out. For prompts less than 200k tokens:
Input: $1.25/mil tokens
Output: $10/mil tokens
For prompts over 200k tokens:
Input: $2.50/mil tokens
Output: $15/mil tokens
Importantly this bumps the rate limit to at least 150RPM & 1,000RPD on Tier 1.
Google Deepmind launch Gemini 2.5 Pro, their latest SOTA model, which debuts at #1 on the LLM Leaderboard. No pricing yet, though it’s available for free via Google AI Studio and OpenRouter.
OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.
Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services
Grok3 set to launch though after the “launch” it appears that:
Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.
OpenAI o3-mini released.
This model continues our track record of driving down the cost of intelligence—reducing per-token pricing by 95% since launching GPT‑4—while maintaining top-tier reasoning capabilities.
the company unveiled o3, the successor to the o1 “reasoning” model it released earlier in the year. Neither o3 nor o3-mini are widely available yet, but safety researchers can sign up for a preview for o3-mini starting today.
a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts.
Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance
Introducing Stable Diffusion 3.5 - A nice surprise considering the flop of sd3, the emergence of flux models and the non-commercial license on flux-pro. That first image is next level considering the gimped sd3 (censored) and the prompt “woman lying in grass” drama
Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding.
Nvidia releases a 72b multimodal LLM. The article claims it’s open source, but it appears to only have open weights and is otherwise commercially restricted.
Introducing OpenAI o1-preview, a thinking/reasoning model.
As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT‑4o will be more capable in the near term.
FLUX dropped and it’s blows Stable Diffusion 3 out of the water, though has very high resource requirements. I’m running the schnell version locally. Prompt adherence is great, text capability is incredible.
Mistral announce Mistral Large 2
Mistral Large 2 has a 128k context window and supports dozens of languages
Meta introduces Llama 3.1 including a 405B model. Zuck restates their commitment to open source. Models are up on hugging face, with 405b having a 200gb+ vram requirement.
SD3 weights dropped last night. I gave it a shot last night myself with their supplied comfyui workflows, as a base model it looks extremely promising, details are next level, though it still doesn’t appear to know jack about hands, faces still need hires fix. Very promising for a base model.
Microsoft releases Phi-3 vision
a 4.2B parameter multimodal model with language and vision capabilities.
We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.
Introducing the next generation of Claude
The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.