ai
companies
- OpenAI - ChatGPT
- Stability AI - Stable diffusion
- Meta AI - Llama
- Anthropic - AI assistant
- ElevenLabs - Voice synthesis
- Mistral AI
links
Claude now has web search but it’s only
available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.
OpenAI release o1-pro and it costs $150 per million token input and $600 per million token output.
Currently, it’s only available to select developers — those who’ve spent at least $5 on OpenAI API services
A rules-based pattern is emerging for helping agentic workflows produce better results. Examples include GreatScottyMac’s RooFlow and, Geoff Huntley’s specs and stdlib approaches.
Brendan Humphrey on Vibe Coding, aligns with my own thinking on vibe coding:
…these tools must be carefully supervised by skilled engineers, particularly for production tasks. Engineers need to guide, assess, correct, and ultimately own the output as if they had written every line themselves.
Smashing Create PR with vibe coding output amounts to an attack on the PR process:
Generating vast amounts of code from single prompts effectively DoS attacks reviewers, overwhelming their capacity for meaningful assessment
But there is still some value:
Currently we see one narrow use case where vibe coding is exciting: spikes, proofs of concept, and prototypes. These are always throwaway code. LLM-assisted generation offers enormous value in rapidly testing and validating ideas with implementations we will ultimately discard.
Eugene Yan’s blog - Senior Applied Scientist at Amazon
Simon Willison’s blog - ai researcher, independent open source developer, co-creator of the Django Web Framework
Hamel Husain’s blog - independent AI consultant
Evalite - a vitest-based eval runner by Matt Pocock.
Introducing GPT-4.5 - hallucinations down, accuracy up, non-reasoning. Rolling out to pro + api. Doesn’t look like anyone will be coding with it any time soon with this type of api pricing:
Input: $75.00 / 1M tokens
Cached input: $37.50 / 1M tokens
Output: $150.00 / 1M tokens
And then in a tweet from sama:
this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!
What We’ve Learned From A Year of Building with LLMs is a huge overview of findings building LLM applications from:
starting with prompting when prototyping new applications
all the way through:
what is a completely infeasible floor demo or research paper today will become a premium feature in a few years and then a commodity shortly after
Emerging Patterns in Building GenAI Products - a look at a number of different gen-ai patterns across evals, embeddings, RAG, Guardrails, fine tuning.
Fuck you, show me the prompt is an investigation into extracting the actual prompt that is sent to a model by llm abstraction libraries.
There are many libraries that aim to make the output of your LLMs better by re-writing or constructing the prompt for you. The prompts sent by these tools to the LLM is a natural language description of what these tools are doing, and is the fastest way to understand how they work.
The Novice’s LLM Training Guide - a look at fine-tuning LLMs using Low Rank Adaption (LoRA)
Claude 3.7 Sonnet and Claude Code - hybrid reasoning model, same price as claude 3.5, improved accuracy. Also a terminal-based agentic coding tool, however this requires an api key.
ChatGPT Deep Research hallucinates
it claimed again to produce a complete dataset but in fact only produced ~7 lines, with a placeholder for the other ~3000.
Grok3 set to launch though after the “launch” it appears that:
Not all the models and related features of Grok 3 are available yet (some are in beta), but they began rolling out on Monday.
Introducing Perplexity Deep Research - Perplexity undercuts OpenAI by releasing their own Deep Research, for free.
Building a SNAP LLM eval - the first write-up in a series about our process of building an “eval” — evaluation — to assess how well AI models perform on prompts
Your AI product needs evals - How to construct domain-specific LLM evaluation systems to improve AI by iterating quickly.
OpenAI roadmap update for GPT-4.5 and GPT-5 from sama, which indicates that the model they’ve been cooking for some time can no longer be considered GPT-5.
We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.
Convert a figma design to code - After theoretically setting up claude to read/write from jira & github I remarked that my only job left would be to copy a screenshot from figma into the prompt and ask it to build the UI, but it looks that can can be integrated too.
Deepseek vs Claude PR Reviews - also demonstrates the value of being able to quickly switch between models.
The End of Programming as We Know It is another argument against ai replacing programmingers and for ai extending programmer capability.
OpenAI’s Deep Research: Novel User Applications and Community Insights - I prompted Deep Research to research itself
Investigate latest community news of OpenAI’s Deep research function and what novel approaches people are finding it useful for.
LLM Cost Analysis 2023-2026 - I asked ChatGPT deep research to generate a report that investigates the $/million tokens over time across providers and predict the price of tokens in 2026
Prepare a report that investigates the cost per million token of LLMs since 2023, with estimations on what the cost will be in 2026.
GitHub Copilot: The agent awakens - Just when you thought it was Cursor/Claude Desktop/Roo/Cline, m$ reminds you they’ll eat your lunch.
Open-source DeepResearch – Freeing our search agents - after the release of OpenAI’s Deep Research, Hugging Face deliver an open source alternative in 24 hours.
Getting AI-powered features past the post-MVP slump
The non-negotiable first step in systematically improving your AI systems is establishing a solid feedback loop.
Beyond the AI MVP: What it really takes
almost no one is talking about how to integrate this stuff into a normal software development lifecycle. There’s a reason no one is talking about this: it’s because most teams, even those at billion-dollar companies, just haven’t built this yet.
OpenAI o3-mini released.
This model continues our track record of driving down the cost of intelligence—reducing per-token pricing by 95% since launching GPT‑4—while maintaining top-tier reasoning capabilities.
DeepSeek hit with ‘large-scale’ cyber-attack after AI chatbot tops app stores - Attack forces Chinese company to temporarily limit registrations as app becomes highest rated free app in US.
I noted I could access the chat signup page after a few refreshes but the api signup was constantly throwing 500.
On DeepSeek and Export Controls - CEO of Anthropic shares his take on DeepSeek - It’s not as good as everyone says it is, but China needs to be further restricted from chips anyway.
New image model family: Janus-Pro - DeepSeek creators just dropped a stable diffusion competitor.
Janus-Pro, which DeepSeek describes as a “novel autoregressive framework,” can both analyze and create new images… [and] most Janus-Pro models can only analyze small images with a resolution of up to 384 x 384.
This course will teach you about natural language processing (NLP) using libraries from the Hugging Face ecosystem
The Short Case for Nvidia Stock, a 60min read, but, also a very good overview of the current state of ai, including cerebras + deepseek.
Why everyone in AI is freaking out about DeepSeek
The open-source availability of DeepSeek-R1, its high performance, and the fact that it seemingly “came out of nowhere” to challenge the former leader of generative AI, has sent shockwaves throughout Silicon Valley and far beyond
Ignore the Grifters - AI Isn’t Going to Kill the Software Industry
It’s highly unlikely that software developers are going away any time soon. The job is definitely going to change, but I think there are going to be even more opportunities for software developers to make a comfortable living making cool stuff.
the company unveiled o3, the successor to the o1 “reasoning” model it released earlier in the year. Neither o3 nor o3-mini are widely available yet, but safety researchers can sign up for a preview for o3-mini starting today.
a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts.
Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance
Elon Musk wanted an OpenAI for-profit
in 2017, Elon not only wanted, but actually created, a for-profit as OpenAI’s proposed new structure. When he didn’t get majority equity and full control, he walked away and told us we would fail.
Cerebras Now The Fastest LLM Inference Processor; Its Not Even Close
To put it into perspective, Cerebras ran the 405B model nearly twice as fast as the fastest GPU cloud ran the 1B model. Twice the speed on a model that is two orders of magnitude more complex.
OpenAI and others seek new path to smarter AI as current methods hit limitations
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, told Reuters recently that results from scaling up pre-training - the phase of training an AI model that use s a vast amount of unlabeled data to understand language patterns and structures - have plateaued.
Then from Yann LeCun:
I don’t wanna say “I told you so”, but I told you so.
Also, from Gary Marcus;
Yann LeCun is absolute conniving thief
Introducing Stable Diffusion 3.5 - A nice surprise considering the flop of sd3, the emergence of flux models and the non-commercial license on flux-pro. That first image is next level considering the gimped sd3 (censored) and the prompt “woman lying in grass” drama
Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding.
Legacy Modernization meets GenAI - I am constantly pondering when and how AI will help me understand, maintain and/or uplift an existing codebase and here’s an article on the subject, tho, TL;DR: keep waiting
Nvidia releases a 72b multimodal LLM. The article claims it’s open source, but it appears to only have open weights and is otherwise commercially restricted.
OpenAI to remove non-profit control and give Sam Altman equity
The OpenAI non-profit will continue to exist and own a minority stake in the for-profit company
Mira Murati, the CTO of OpenAI steps down.
An open letter to European policymakers requesting improvement to AI regulations in the region.
The EU’s ability to compete with the rest of the world on AI and reap the benefits of open-source models rests on its single market and shared regulatory rulebook.
Zuck & Yann LeCun included as signatories.
Introducing OpenAI o1-preview, a thinking/reasoning model.
As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT‑4o will be more capable in the near term.
FLUX dropped and it’s blows Stable Diffusion 3 out of the water, though has very high resource requirements. I’m running the schnell version locally. Prompt adherence is great, text capability is incredible.
Mistral announce Mistral Large 2
Mistral Large 2 has a 128k context window and supports dozens of languages
Meta introduces Llama 3.1 including a 405B model. Zuck restates their commitment to open source. Models are up on hugging face, with 405b having a 200gb+ vram requirement.
Stability AI holds on, appointing a new CEO.
SD3 weights dropped last night. I gave it a shot last night myself with their supplied comfyui workflows, as a base model it looks extremely promising, details are next level, though it still doesn’t appear to know jack about hands, faces still need hires fix. Very promising for a base model.
xAI gets a Series B funding round of $6 billion
Prompting Fundamentals and How to Apply them Effectively has some really good prompting guidance.
Yann LeCun says LLM’s won’t achieve AGI.
I pondered whether LLMs would be any good at solving for Vehicle Routing Problem - thankfully I don’t need to investigate as arxiv.org once again delivers. TL;DR - yes, as long as you’re happy with it being wrong 30-40% of the time.
Microsoft releases Phi-3 vision
a 4.2B parameter multimodal model with language and vision capabilities.
Scarlett Johansson issues a statement on OpenAI and OpenAI posts about How the voices for ChatGPT were chosen.
I’ve been running koboldcpp in wsl, but the Tcl/tk UI is tiny. This looking interesting tho, and already in a container.
Yann LeCun reminding us there is no such thing as a rogue super intelligence.
Doomers have lost the AI fight
When Ilya Sutskever left OpenAI this week, the firm lost its last influential leader known to question CEO Sam Altman’s push to deploy AI fast.
Marc Andreessen on navigating a model’s latent space via prompting.
In the first quarter of 2024, Stability AI generated less than $5 million in revenue and lost more than $30 million, the report said, adding that the company currently owes close to $100 million in outstanding bills to cloud computing providers and others.
glif lets you package your comfyui workflow into an app with no code.
Sam Altman on Ilya leaving OpenAI
We’re announcing GPT‑4o, our new flagship model that can reason across audio, vision, and text in real time.
Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters - fascinating interview - the least robot-like I’ve ever seen Zuck, he’s getting that billion-dollar media training. Highlights include:
- They got the edge in GPU race because in 2022 they realised they were short on GPUs for training their Reels recommendation system, so purchased double what they needed.
- They foresee the bottleneck being energy production (not chips) both in the (regulatory) time and tech required to produce enough energy to power the chips
- They have their own chips now, so they can lessen their reliance on more expensive Nvidia chips - they wont train llama4 with their own silicon but might train llama5
Amazon pours additional $2.75bn into AI startup Anthropic
Extra financing will bring technology giant’s total investment in OpenAI rival to $4bn
Stability AI CEO resigns because you’re ‘not going to beat centralized AI with more centralized AI’
Stability AI, which has lost more than half a dozen key talent in recent quarters, said Mostaque is stepping down to pursue decentralized AI.
Microsoft CEO on owning OpenAI, from Elon vs OpenAI lawsuit
Microsoft’s CEO boasted that it would not matter if OpenAI disappeared tomorrow. He explained that “we have all the IP rights and all the capability. We have the people, we have the compute, we have the data, we have everything. We are below them, above them, around them.”
Stable Diffusion 3: Research Paper
Stable Diffusion 3 outperforms state-of-the-art text-to-image generation systems such as DALL·E 3, Midjourney v6, and Ideogram v1 in typography and prompt adherence, based on human preference evaluations.
Today we’re excited to introduce Devin, the first AI software engineer.
Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.
Introducing the next generation of Claude
The family includes three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.
The last 12 months has seen explosive growth in open source (stable diffusion, llama, hugging face), open research and open datasets while the drive to censor/limit commercial/closed models makes them perform worse. Large companies appear not well suited to this pace such that there’s prediction that opensource will overtake closed source offerings.
The AI Now Institute produces diagnosis and actionable policy research on artificial intelligence.
sounddraw - generate tracks with AI
GPT4All runs large language models privately on everyday desktops & laptops