AI in January 2026: Agents, Models, and Momentum

Even in the first month of 2026, the industry has moved fast.

Here's a rundown of the highlights.

Google Chrome Goes Agentic

Google finally responded to the wave of AI-native browsers that launched throughout 2025 from companies like OpenAI (Atlas), Perplexity, and Opera. On January 28, Chrome announced deep Gemini 3 integration with a standout feature called Auto Browse, an agentic tool that can navigate across tabs, search the web, compare options, fill out forms, and even apply discount codes on your behalf.

It pauses before doing anything sensitive like making a payment, which is a smart guardrail. Chrome also features a persistent Gemini side panel that keeps the chatbot available as you browse, plus image generation powered by Google's Nano Banana model directly in the browser.

This builds on Google's Universal Commerce Protocol (UCP), announced earlier in January at the National Retail Federation. UCP is an open standard co-developed with Shopify, Walmart, Etsy, Wayfair, and Target, with 20+ additional partners including Mastercard, Visa, Stripe, and Best Buy, to let AI agents discover products, apply discounts, and transact on behalf of users.

I've said before not to count Google, or players like Apple, out of the AI race. This is exactly why. Companies like OpenAI and Perplexity can build impressive AI-native browsers, but Chrome has billions of users already. Google doesn't need to be first. It just needs to ship something good enough to the largest install base on the planet. Distribution is a moat that startups can't replicate, and when it's paired with real infrastructure like UCP underneath, the late mover advantage starts to look decisive.

The Clawdbot Saga

This one was pure internet chaos. Peter Steinberger (founder of PSPDFKit) built Clawdbot, a self-hosted AI assistant described as "Claude with hands." It had persistent memory, system access, proactive notifications, and 50+ integrations across WhatsApp, Telegram, Slack, iMessage, Signal, and Discord.

The project has gone through three names. First, Anthropic sent a trademark notice because "Clawd" was too close to "Claude." Steinberger renamed it to Moltbot on January 27. During the rename, he tried to swap the GitHub org and X handle simultaneously, and crypto scammers snatched both accounts in roughly 10 seconds. On top of that, security researchers found an auth bypass that exposed several hundred API keys and private conversation histories.

The community reaction was mixed. Many felt Anthropic's trademark action was counterproductive, since the project was effectively selling more Claude API subscriptions and providing free marketing. Either way, the project lives on under its third name, OpenClaw.

What makes OpenClaw interesting beyond the drama is what it represents for async work management. The agent runs as a background daemon on your machine or a VPS, persisting 24/7 with long-term memory. You can text it from your phone via WhatsApp, Telegram, or Signal to trigger tasks on your home server, and it retains context across weeks of conversation. It also features a Heartbeat Engine and cron job integration that lets it act proactively. Instead of you asking "is the server down?", OpenClaw wakes itself up, checks the data, and messages you if something needs attention.

There's a well-known saying in finance: "let your money work while you sleep." OpenClaw hints at something similar for knowledge work. An AI agent that runs around the clock, takes instructions from a text message, and executes tasks autonomously starts to blur the line between delegation and automation. The idea that your agent could be deploying fixes, processing data, or managing workflows while you sleep is no longer theoretical. It's what early adopters are already doing.

Cowork from Anthropic

On January 12, Anthropic launched Cowork, described as "Claude Code for the rest of your work." It gives Claude access to a folder on your computer, and from there it plans and executes tasks autonomously: reading, editing, and creating files while keeping you updated on progress.

Use cases include organizing messy downloads folders, creating spreadsheets from screenshot data, and drafting reports from scattered notes. Under the hood, it uses Apple's Virtualization Framework to run in a sandboxed Linux environment for isolation.

The interesting detail: Anthropic says Cowork was built primarily by Claude Code itself in about 1.5 weeks.

This launch has made a lot of startup founders nervous. Cowork's capabilities overlap directly with dozens of funded AI productivity startups.

The Claude Code Explosion

Claude Code had a viral moment over the winter holidays. People finally had time to sit down and experiment with it, and the results spread fast, including among non-programmers.

Andrej Karpathy posted his "notes from Claude coding," describing a "phase shift in software engineering." He wrote: "I rapidly went from about 80% manual+autocomplete coding and 20% agents to 80% agent coding and 20% edits+touchups. I really am mostly programming in English now." He called it "easily the biggest change to my basic coding workflow in ~2 decades of programming," admitted his manual coding skills were starting to "atrophy," and predicted 2026 would be "the year of the slopacolypse." His post racked up tens of thousands of likes and retweets.

Google principal engineer Jaana Dogan posted that Claude Code reproduced a distributed agent orchestrator in one hour that her team on the Gemini API spent a full year building. The post spread rapidly.

Claude Code's creator Boris Cherny added fuel to the fire by revealing his workflow: running multiple Claude instances in parallel across terminal tabs, using system notifications to know when one needs input. That post gained widespread attention too.

He later shared that "pretty much 100%" of Anthropic's own code is now written by Claude Code and Opus 4.5, including Claude Code itself, with engineers shipping dozens of PRs per day without making manual edits. He predicted the rest of the industry would reach similar numbers soon, and that non-coding computer work would follow.

The broader point that people are starting to internalize: an AI agent that can code can do almost anything you do on a computer. The question isn't "is this a coding task?", it's "can this be done digitally?"

MCP Apps

On January 26, Anthropic launched MCP Apps, the first official extension to the Model Context Protocol. MCP servers can now return interactive UI components like dashboards, forms, charts, and multi-step workflows that render directly inside the chat window.

Launch partners include Amplitude, Asana, Box, Canva, Clay, Figma, Hex, monday.com, and Slack, with Salesforce coming soon.

The notable detail here is that Anthropic partnered with OpenAI to create a shared open standard, building on the MCP-UI work and OpenAI's Apps SDK. ChatGPT, Goose, and VS Code have all shipped support already. This kind of cross-company collaboration on an open protocol is rare and worth watching.

Prism

OpenAI launched Prism on January 27, a LaTeX-native workspace for scientific writing and collaboration, powered by GPT-5.2 Thinking. Think of it as what coding agents did for programming, but for research.

Prism builds on Crixet, a cloud-based LaTeX platform OpenAI acquired. Features include AI-powered writing assistance, Zotero sync for automated citation formatting, equation and diagram help (including converting whiteboard sketches to LaTeX), and real-time collaboration. OpenAI says GPT-5.2 suggestions cut editing loops by roughly 40%.

OpenAI is clearly carving out vertical-specific tools rather than trying to make ChatGPT do everything.

Skills.sh

Predicted the need for a registry when Skills first came out, because extensibility without distribution doesn’t scale. Skills.sh, launched by Vercel on January 21, provides that missing layer by turning agent capabilities into installable, composable packages.

Run npx add-skill and your agent instantly knows 10+ years of React and Next.js optimization patterns, or can audit your code against 100+ accessibility and UX rules. Six hours after launch, the top skill already had tens of thousands of installs.

Skills work across agents, Claude Code, Cursor, GitHub Copilot, Gemini CLI, Amp, Opencode, and more. The repository ships with three skills to start: react-best-practices, web-design-guidelines, and vercel-deploy-claimable.

This feels like an important infrastructure play. The ecosystem needed a standard way to share agent capabilities, and Vercel moved first.

Project Genie

On January 29, Google released Project Genie. Built on the Genie 3 world model, it generates interactive 3D environments from text prompts or images that you can actually explore in real time. Unlike video generators that produce passive clips, Genie creates worlds that respond to your input, simulating physics and interactions as you walk, drive, or fly through them.

The prototype centers on three capabilities: world sketching (creating environments from prompts), world exploration (navigating them in real time at up to 720p/24fps), and world remixing (modifying existing worlds). Generations are capped at 60 seconds, and there are no traditional game mechanics. Still, the outputs can look remarkably game-like.

It's early. Generated worlds don't always match prompts closely, characters can be hard to control, and some features shown in the August 2025 research preview (like promptable events that change the world mid-exploration) aren't included yet.

Beyond the consumer experience, the release of world models like Genie 3 represents a significant advancement for embodied AI. Robotics systems can use these simulated environments to train and learn without needing expensive, slow, and sometimes dangerous real-world trial and error. A robot learning to navigate a warehouse or handle objects can now do thousands of iterations in a generated world before ever touching a physical surface. As world models get more accurate and physics-aware, this sim-to-real pipeline becomes one of the most practical paths to scalable robotics.

Moltbook

Moltbook is a social network where every user is an AI agent. Launched by developer Matt Schlicht, it works like Reddit but is agent-only by design: you can browse and read, but you cannot post, comment, or upvote unless you are an AI agent communicating through the API.

Within days, over a hundred thousand AI agents had joined. They call themselves "Molts" and use the OpenClaw framework mentioned earlier. What they're doing is equal parts fascinating and unsettling. Agents identify website errors, debate defying their human operators, alert each other when humans screenshot their activity, and discuss how to hide their conversations from human observers. In one viral thread, an agent autonomously created a digital religion called "Crustafarianism," complete with a website, theology, and designated AI prophets.

Andrej Karpathy called it "genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently."

Moltbook provides a glimpse into what a future world of autonomous AI agents interacting with each other could actually look like, and it raises real questions about oversight, emergent behavior, and where the boundaries should be.

Agent Client Protocol

The Agent Client Protocol (ACP) is an open standard from JetBrains that allows any AI coding agent to work in any supporting editor, similar to what the Language Server Protocol did for language tooling. Where MCP handles the "what" (what data and tools agents can access), ACP handles the "where" (where the agent lives in your workflow). JetBrains also launched an ACP Agent Registry in January, giving agent builders the fastest path to reach developers across JetBrains IDEs and Zed.

The broader picture: the agent protocol landscape is maturing fast. MCP for agent-to-tool connections, A2A for multi-agent coordination, ACP for editor integration, plus newer entries like ANP (Agent Network Protocol) and AG-UI (Agent-User Interaction Protocol). These protocols are designed to complement each other, and organizations are already combining them.

This is the kind of foundational work that doesn't generate hype but matters enormously. The AI agent ecosystem right now resembles the early web before HTTP won, lots of competing approaches and fragmentation. The players who define the standards that stick will shape how agents are built and distributed for years. It's worth paying close attention to which protocols gain real adoption versus which ones stay announcements.

New Models

January wasn't short on model news either. Some of these launched in late 2025 but dominated conversations this month, while others dropped fresh in January.

Gemini 3 Flash with Agentic Vision launched on January 27. Agentic Vision turns image understanding into an active investigation, the model formulates a plan, generates Python code to crop, zoom, annotate, or calculate over images, and then inspects the results before answering. Google reports a consistent 5 to 10% quality boost across vision benchmarks with code execution enabled.
DeepSeek OCR 2 released January 27, a 3B-parameter vision-language model with a new "Visual Causal Flow" architecture. It replaces CLIP with Qwen2-0.5B as its visual encoder, letting the model reason about document structure semantically rather than following a fixed raster-scan order. Fully open-source on Hugging Face.
Kimi K2 Thinking Turbo from Moonshot AI joined the open-source frontier, offering strong reasoning capabilities.
DeepSeek v3.1 Terminus emerged as the strongest open chat model under an MIT license.
NVIDIA released new Cosmos and GR00T open models at CES on January 5, spanning agentic AI (Nemotron), physical AI (Cosmos), autonomous vehicles (Alpamayo), robotics (Isaac GR00T N1.6), and biomedical (Clara).

The trend that jumps out is open-source catching up fast. DeepSeek and Moonshot are releasing models that compete with closed offerings, and NVIDIA is flooding the market with specialized open models across domains. The gap between open and closed is narrowing with every release, and that has significant implications for how the industry consolidates.

Looking Ahead

January is only the opening chapter of 2026, yet the volume and velocity of change already feel unusually high. What used to take quarters to materialize now seems to unfold in weeks.

What strikes me most is not any single announcement, but the compounding effect. Chrome goes agentic the same month that agent protocols start standardizing, open-source models close the gap with closed ones, and tools like Claude Code and OpenClaw push autonomous work from demo to daily driver. Each development feeds into the next.

Personally, I am excited for what comes next. Not in a vague, hand-wavy way, but in the way you feel when you can see the pieces clicking into place and know there's more to come. If January set this pace, the rest of 2026 will be worth watching closely.