Google released Gemini 3.1 Ultra, its biggest model of the year. The headline feature is a 2-million token context window that works natively across text, image, audio and video โ no separate transcription step.
The big upgrades
- 2M token context: feed entire codebases, long videos or hundreds of pages at once.
- Native multimodal: mix images, audio and video in the same prompt without converting them first.
- Sandboxed code execution: the model can write, run and test code mid-conversation and use the results.
- Better grounding: fewer hallucinations on factual queries.
How it stacks up
Against GPT-5.5 and Claude Opus 4.8, Gemini 3.1 Ultra’s standout is raw context size and native video understanding. For pure coding, benchmarks remain close between the three frontier models.
What this means for you
- If you work with long documents or video, Ultra removes the chunking headaches you had with smaller context windows.
- Developers get a real “run my code” loop inside the chat instead of copy-pasting to a terminal.
- For everyday questions, the cheaper Gemini 3.5 Flash is usually enough โ save Ultra for heavy context tasks.
FAQ
Is 2M context actually usable? Yes, but expect higher latency and cost on full-context prompts.
Do I need Ultra for coding? Not necessarily โ Flash and Pro handle most coding well; Ultra shines on huge inputs.
