Categories
AI

What Is a Mixture of Experts Model?

Most of the frontier AI models you use every day don’t run all their parameters when processing your message. That’s not a bug. It’s the entire point.

The short answer

A mixture of experts (MoE) model is a neural network architecture that divides its parameters into discrete subnetworks called “experts.” When the model processes a token (a word or word-piece), a small learned network called a router decides which experts activate for that token. Only a fraction of the total experts fire at once, which means the model can be enormous in total size while remaining fast and cheap to run.

The key insight: a model with 100 billion total parameters might activate only 10 to 20 billion per token. You get the knowledge encoded in a very large model with the inference cost of a much smaller one.

The long answer

Where this came from

The concept of mixture of experts predates modern deep learning by decades. Statisticians used it for combining predictions from multiple models. What changed in 2017 was the work of Noam Shazeer and colleagues at Google, who introduced the sparsely-gated MoE layer in “Outrageously Large Neural Networks.” Their paper showed you could scale to 137 billion parameters while keeping training costs manageable by routing each token to just a handful of expert subnetworks out of thousands. That paper is where modern MoE architectures trace their lineage.

Google’s 2021 Switch Transformer pushed the idea further: a one-trillion-parameter model that activates only one expert per token. The trade-off for extreme sparsity is that each expert sees less total training data, so balancing expert load becomes a real engineering problem.

MoE became widely known when Mistral AI published Mixtral 8x7B in late 2023, an open-weight model with 8 experts per layer and 2 active per token. Total parameters: 46.7 billion. Active parameters per token: roughly 12.9 billion. Mixtral matched or exceeded Llama 2 70B on most benchmarks while running considerably faster at inference. That result made it hard for anyone to ignore the architecture.

How routing works

Inside each transformer layer of an MoE model, instead of a single feed-forward network, there are N expert networks, often 8, 16, or 64 depending on the design. Alongside them sits a routing network: a small learned transformation that maps each token’s internal representation to a score for each expert.

The top-K experts (usually K=1 or K=2) receive the token. Each activated expert processes it and produces an output, those outputs get combined proportionally to their routing scores, and the result flows forward to the next layer. The router itself is tiny relative to the experts, so its compute overhead is negligible.

Load balancing is where most of the engineering complexity lives. If one expert attracts the majority of tokens during training, the others see too little data and specialize poorly. Standard practice is to add an auxiliary load-balancing term to the training loss that penalizes uneven routing. Getting this right is one of the reasons MoE training is harder than dense training at equivalent scale.

Memory versus compute: the key distinction

This is the part that confuses most people.

Memory: An MoE model needs all its experts resident in memory during serving. Mixtral 8x7B’s 46.7 billion parameters require roughly 93 GB in half-precision (FP16), even though only 12.9 billion parameters activate per forward pass. You need the hardware to hold all of them.

Compute: Because only K experts fire per token, the actual matrix multiplications per forward pass track the active parameter count, not the total. This is why throughput is high relative to model capacity.

The practical implication: MoE models are well-suited to data centers with abundant GPU memory and high request volume, where the per-token compute savings add up fast. Running Mixtral locally requires around 50 GB of memory, which puts it out of reach for most consumer hardware. A dense model of equivalent capability would need similar RAM and run slower. Neither option is comfortable on a 16 GB laptop.

Dense versus sparse

Dense models activate every parameter for every token. GPT-2, early Llama models, and most consumer-facing small models work this way. Dense is simpler to train, more predictable in its behavior, and typically more memory-efficient to serve per parameter. The cost is that scaling capacity requires proportionally more compute.

MoE models can achieve higher capacity per compute dollar spent, but they come with failure modes dense models don’t have. Routing collapse during training (where all tokens pile into one or two experts) is a real risk. Expert load imbalance creates uneven specialization. Serving infrastructure needs to handle variable activation patterns efficiently, which complicates deployment.

The field has largely concluded that MoE is worth the complexity at scale. GPT-4, Gemini 1.5, and Grok-1 are all widely reported to use MoE architectures. Neither OpenAI nor Google have published complete architecture specifications, so some of these details remain unconfirmed.

Why this matters in 2026

MoE is not a niche trick anymore. It is the dominant architectural choice for frontier labs trying to push capability without proportionally increasing inference costs. When a model improves sharply on benchmarks without a corresponding jump in API pricing, MoE is often a contributing factor.

For developers building on AI APIs from Africa or other emerging markets, where dollar costs matter more relative to local revenue, the practical effect is favorable. MoE architectures are part of why API prices have fallen so fast. A provider can deploy a much more capable model than inference costs alone would suggest, and pass some of that efficiency to customers. Access to genuinely frontier models at consumer price points would have been economically implausible without architectural improvements of this kind.

The other implication: as open-weight MoE models continue to improve, the gap between what you can run locally and what the top API providers offer is narrowing in capability terms, even if memory requirements remain a barrier.

Common misconceptions

“MoE models are faster because they have fewer parameters.” They have the same total parameters. They’re faster because fewer activate per forward pass. The distinction matters because serving cost scales with memory as well as compute.

“Each expert specializes in a domain, like one for coding and one for math.” Intuitive but not accurate. Experts develop statistical specializations from training data, not explicit topic labels. An expert might activate heavily for certain syntactic patterns rather than for a human-legible category. The specializations are real but they don’t map neatly to subject areas.

“You can keep adding experts to make a model smarter.” More experts increase total capacity, but each expert needs enough training data to specialize effectively. The router can become a bottleneck. Load balancing gets harder. Past a point, returns diminish and training instability increases.

“MoE only makes sense for giant models.” Recent work on smaller MoE variants, including MoE adaptations of vision-language models, shows the technique applies at more modest scales. The efficiency gains are less dramatic below a few billion parameters, but the architecture is viable there.

Where to learn more

Sources

Categories
AI

Google DeepMind’s Gemini Robotics Model Can Now Reason About the Physical World

Google DeepMind released Gemini Robotics-ER 1.6, a model built specifically for robots that need to reason about the physical world.

ER stands for “embodied reasoning.” The model extends Gemini’s multimodal capabilities (vision, language, spatial understanding) into real-time physical interaction. Instead of a chatbot that describes what it sees, this is a system designed to understand spatial relationships, predict physical outcomes, and plan multi-step actions in environments it hasn’t seen before.

The timing matters. This dropped the same week OpenAI shipped computer use for Codex (desktop apps, background agents, 111 plugins) and Anthropic is scaling Claude Code Routines for unattended coding tasks. The frontier labs are racing through the same capability ladder: text, then code, then computer use, then physical-world agents. DeepMind just jumped ahead on the physical step.

Gemini Robotics-ER 1.6 is not a consumer product. It’s a foundation model aimed at robotics researchers and companies building physical AI systems. The pitch is that general-purpose reasoning (the kind that makes Gemini good at conversation and code) transfers meaningfully to physical tasks when paired with the right sensory inputs and action spaces.

This is the embodied cognition thesis that robotics researchers have argued about for decades, now backed by a frontier-scale model and Google’s compute budget. Previous approaches to robot learning relied on massive amounts of task-specific demonstration data. Gemini Robotics-ER attempts to shortcut that with a pre-trained reasoning backbone that generalizes across physical environments.

The 1.6 version number suggests this has been iterating quietly. DeepMind published research on RT-2 (Robotic Transformer 2) in 2023, demonstrating that vision-language models could directly output robot actions. Gemini Robotics-ER is the productized evolution of that research line, now integrated into the Gemini model family.

Why We’re Watching

The frontier AI labs are converging on the same roadmap: text, code, computer, body. OpenAI ships computer use. Anthropic ships unattended coding agents. Google ships a robot reasoning model. Each company is attacking the next layer of the physical-digital stack. The question isn’t whether AI agents will operate in the physical world. It’s which foundation model will be the default brain.

Robotics has historically been a hardware-constrained field. What changes with Gemini Robotics-ER is the argument that the bottleneck has shifted from hardware to reasoning capability. If a general-purpose model can understand physics well enough to plan novel actions, the hardware becomes interchangeable.

Watch for partnerships with industrial robotics companies (Fanuc, ABB, Boston Dynamics). That’s where this model finds its first real deployments, and where the gap between demo and production will be tested.

Sources

Categories
AI

OpenAI Just Made Codex a Desktop Agent With 111 Plugins and Background Computer Use

OpenAI turned Codex into a full desktop agent today.

The update rolling out to macOS users adds computer use (operating desktop apps in the background without interrupting your work), an in-app browser with inline commenting, 111 new plugins spanning GitLab, Atlassian Rovo, and Microsoft Suite, and a memory feature that lets the agent remember context across sessions. Multiple agents can now run in parallel on the same machine.

This is the most aggressive Codex release since the tool launched. Head of Codex Thibault Sottiaux called it building “the super app out in the open,” though he framed the current scope narrowly: developers first, everyone else later.

The timing is not subtle. OpenAI launched a $100/month ChatGPT Pro plan on April 9, price-matched exactly to Anthropic’s Claude Max. That plan includes 5x the Codex usage of the $20 Plus tier, with a launch promo bumping it to 10x through May. One week later, this feature dump arrives.

The underlying model is GPT-5.3-Codex, released February 5. OpenAI claims it’s 25% faster than its predecessor and scores 77.3% on Terminal-Bench 2.0 versus Claude’s 65.4%. On SWE-Bench Pro, the gap narrows to almost nothing: 56.8% versus Claude’s comparable range.

“This is helpful for testing and iterating on frontend changes, testing apps, or working in apps that don’t expose an API.”, OpenAI, on computer use for developers

The computer use approach differs meaningfully from Anthropic’s Claude Cowork. Codex runs background agents that don’t take over your screen. OpenAI claims a “secret sauce” that lets agents operate apps without bogging down the system. Claude Cowork, by contrast, leans into a step-by-step collaborative model where you watch the agent work.

Thread Automations might be the sleeper feature here. Codex can now schedule future work for itself and wake up to continue long-term tasks automatically. That’s the same territory as Claude Code Routines, which lets coding agents run on cron schedules without a human present. Both companies clearly believe unattended agent work is the next frontier.

The memory feature (opt-in, still preview) stores personal preferences, corrections, and context from past sessions. Enterprise and EU rollout is listed as “soon,” which in OpenAI’s vocabulary means anywhere from next week to next quarter.

Why We’re Watching

The AI coding tool war just became a desktop agent war. A year ago, these tools autocompleted lines of code. Today they operate your entire computer. OpenAI and Anthropic are converging on the same product vision (background agents, scheduled tasks, persistent memory) from opposite architectural directions. OpenAI builds a centralized super-app. Anthropic builds a CLI-native local tool. Both cost $100/month at the top tier.

The benchmark race matters less than the UX bet. Developers will pick whichever tool disappears into their workflow. Right now that’s a coin flip.

Watch the plugin ecosystem. 111 integrations on day one is OpenAI flexing its platform gravity. If third-party MCP servers and plugins consolidate around one platform, the other loses the distribution game regardless of model quality.

Sources

Categories
Blockchain

How to Set Up Claude Code From Scratch

By the end of this guide, you’ll have Claude Code running in your terminal, authenticated to your account, and ready to read, write, and run code inside any project you point it at. The whole process takes about 15 minutes if you already have a subscription. If you don’t, add 2 minutes to sign up.

What you’ll need

  • macOS 13.0 or later, Ubuntu 20.04+, Debian 10+, or Windows 10 (1809+)
  • 4 GB RAM minimum
  • A Claude Pro subscription ($20/month) or a Max, Team, or Enterprise plan. The free Claude.ai tier doesn’t include Claude Code access. Alternatively, you can authenticate through the Anthropic Console if you have API credits there.
  • A project directory. Claude Code works best when it has actual code to look at.

Step 1: Install the native binary

Anthropic provides a native installer that handles everything and sets up auto-updates. This is the recommended path. The old npm install -g @anthropic-ai/claude-code method still works but requires Node.js 18+ and doesn’t auto-update. Skip it.

On macOS or Linux (and WSL on Windows):

curl -fsSL https://claude.ai/install.sh | bash

On Windows PowerShell (native, not WSL):

irm https://claude.ai/install.ps1 | iex

If you’re on macOS and prefer Homebrew, brew install --cask claude-code also works, though you’ll need to run brew upgrade claude-code manually for updates.

The installer takes under a minute. When it finishes, open a new terminal window and confirm the binary is on your path:

claude --version

You should see a version string. If the command isn’t found, close and reopen the terminal. On some systems the PATH update only takes effect in a fresh shell.

Windows users: The native installer requires Git for Windows to be installed first. WSL 2 is an alternative and skips that requirement, but comes with some feature differences in sandboxing.

Step 2: Authenticate

Run claude from any directory to trigger the browser login flow:

claude

Your default browser opens with an Anthropic authentication page. Log in with the account tied to your Pro or Max subscription. After you approve access, the browser closes and the terminal session picks up with your account connected.

You won’t see an API key anywhere in this flow. Claude Code uses OAuth, so there’s no key to copy, paste, or accidentally commit. If you’d rather use Amazon Bedrock, Google Vertex AI, or Microsoft Foundry as the backend instead of Anthropic-hosted endpoints, you can configure that in your settings after authentication.

Step 3: Navigate to your project and start a session

Claude Code’s core loop is simple: you cd into a project, type claude, and start talking.

cd ~/code/my-project
claude

On first run in a project, Claude Code reads the directory structure and picks up any CLAUDE.md file at the root if one exists. That file is how you give it persistent instructions about your project (more on that below). If there’s no CLAUDE.md, Claude Code just asks you what you want to do.

Try something concrete to confirm it’s working:

> What's the entry point for this app?

Claude Code will scan your files and answer based on what it actually finds, not what it guesses. It can also open files, make edits, run commands, and ask before doing anything destructive.

Step 4: Create a CLAUDE.md for your project

This step is optional, but it’s the highest-value thing you can do after setup.

CLAUDE.md is a plain text file you add to your project root. Claude Code reads it at the start of every session. You use it to capture things you’d otherwise repeat out loud every time: your tech stack, your naming conventions, which commands to run tests, things you don’t want Claude touching.

A minimal example:

# My Project

- Stack: Next.js 15, Supabase, Tailwind
- Test command: `npm run test`
- Lint: `npm run lint`
- Never modify /migrations directly. Always create new migration files.
- API routes live in /app/api/

You can also add project-level permissions here, like which shell commands Claude is allowed to run without asking first. The official docs on CLAUDE.md cover the full spec, but the file is just freeform markdown. Write what would help a smart contractor who’s new to your codebase.

Verifying it works

Two commands help confirm everything is healthy:

claude --version    # Should print version number
claude doctor       # Runs a detailed install check

claude doctor checks auth status, binary integrity, shell integration, and a few other things. If anything is misconfigured, it tells you what’s wrong and points to the fix.

Common pitfalls

  • Using sudo npm install -g @anthropic-ai/claude-code: This is the old method. It creates permission problems, doesn’t auto-update, and ties you to Node.js version management headaches. Use the native installer. If you already have the npm version installed, uninstall it (sudo npm uninstall -g @anthropic-ai/claude-code) and reinstall with the curl script.
  • Launching from the wrong directory: Claude Code’s understanding of your project depends on what directory you’re in. If you run claude from your home folder, it won’t have context about any specific codebase. Always cd into the project first.
  • Expecting it to work on the free tier: The free Claude.ai plan does not include Claude Code. You need at least Pro. If authentication succeeds but Claude Code immediately errors out on usage, check your subscription.
  • Homebrew installs that stop updating: If you used Homebrew, auto-updates are off by default. You get no notifications that a new version exists. Add brew upgrade claude-code to a periodic maintenance routine or switch to the native installer.

Next steps

Once you have a working session, the useful directions to go from here are: learning how to write effective CLAUDE.md files, setting up Claude Code Routines for scheduled agent tasks that run without you present, and exploring permission settings that let you control exactly what Claude is and isn’t allowed to do autonomously. The official documentation covers all of these. Start with the routines feature if you want to understand where AI-assisted development is heading.

Sources

Categories
AI

Google Just Turned Your Best Gemini Prompts Into One-Click Browser Tools

Google can now save your Gemini prompts and turn them into reusable one-click tools inside Chrome.

The feature is called Skills. After running a prompt in Gemini’s Chrome side panel, you can save it directly from your chat history. From there, type / or click the + button to call it up on any page, and it runs against whatever tab you’re looking at. You can even run a single Skill across multiple open tabs simultaneously, pulling information from several pages at once.

Google is also shipping a Skills library alongside the personal prompt-saving tool. The library starts with curated workflows across productivity, shopping, recipes, and budgeting, so users who aren’t sure what to save can start with prebuilt templates and modify them from there.

The rollout started April 14, 2026 for Chrome desktop users signed in to their Google account. It’s English (US) only at launch. Actions that do something consequential, like adding a calendar event or sending an email, will ask for confirmation before executing.

Why We’re Watching

This looks like a convenience feature. It’s actually a distribution play. Google is making Gemini sticky in a way that no-one-size-fits-all AI products can match: it’s storing your personal vocabulary, your recurring tasks, your saved workflows directly inside the browser you use for everything. The more Skills a user saves, the harder it becomes to switch to a different AI assistant. OpenAI and Anthropic can build excellent models, but they don’t control the browser. Google does.

There’s also something important happening at the interface layer. Saving a prompt once and reusing it across any tab is a basic workflow automation capability. It’s not agentic. But it trains users to think of AI as a persistent tool rather than a conversational novelty, and that shift in mental model is exactly what Google needs to cement Gemini as the default rather than the alternative.

The real test is whether Skills retention matches the initial hype. Most browser features have high initial activation and rapid abandonment.

Watch the engagement curve over the first 90 days. If Google reports strong retention in Q3, it means Gemini has found a stickiness mechanism that neither ChatGPT nor Claude currently has inside the browser. If it’s quiet after the launch week, Skills joins the list of browser AI experiments that looked promising and got archived.

Sources