OpenAI launches GPT-5.3-Codex-Spark, the first real-time coding model

◷ 3 min read 2/13/2026

Codex Ultrafast renewal

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

How to use

Copy this prompt and send it to your AI chat.
Attach your project or open the repository folder in the AI tool.
Ask for file-level changes, risks, and a quick verification checklist.

February 12, 2026 – OpenAI announced the launch of the research preview model GPT-5.3-Codex-Spark. This is a smaller version of GPT-5.3-Codex, specially designed for instant interaction with the developer.

The main thing is 30 seconds

More than 1,000 tokens per second on Cerebras hardware
128k context as long as the text
Available for all ChatGPT Pro users
The first model in the line of "ultra-fast" Codex models

Why does a developer need it? n

Until now, Codex has been great at long tasks (hours and days of battery life), but in real time felt "reflective."
Spark changes this completely: you can now interrupt the model, instantly change the logic, rebuild the interface and immediately see the result.

Codex-Spark is the first step towards two modes of operation: long horizon + instant iteration. Over time, they will merge into one seamless experience, the official announcement reads.

Results on benchmarks

SWE-Bench Pro*
Spark shows accuracy close to the large model, but performs tasks many times faster.

Terminal-Bench 2.0

GPT-5.3-Codex-Spark → 58.4%
GPT-5.3-Codex > 77.3% (but much slower)
GPT-5.1-Codex-mini 46.1%

Technical improvements to latency

OpenAI didn’t just make a fast model — they completely rebuilt the pipeline

80% decrease in overhead for every roundtrip
30% less overhead on token
50% faster than time-to-first-token

All this is possible thanks to a constant WebSocket connection and optimizations of the Responses API. All other Codex models will receive these improvements soon.

Partnership with Cerebras

It is powered by the Wafer Scale Engine 3 (Cerebras). This is the first time OpenAI has used a dedicated low-latency accelerator in parallel with its GPU clusters.

“What excites us most is the new patterns of interaction that become possible at this speed,” said Sean Lie, CTO and co-founder of Cerebras.

How can I try it

Today, Spark is available at:

Codex app (latest version)
Codex CLI
VS Code Extension

The model has separate rate limits (so as not to burden the main infrastructure). With high demand, queues are possible.

What next

OpenAI is the first model in the ultra-fast family. Plans:

large-scale
multimodality
longer context
smooth switching between “long” and “instant” modes

Conclusion for Vibcoders

Whereas Codex used to be a “smart assistant who thinks sometimes,” it’s now a real partner who reacts faster than you can finish the next line.

For those who already work with Codex every day, this is one of the most notable jumps in the last year and a half

OpenAI launches GPT-5.3-Codex-Spark, the first real-time coding model

The main thing is 30 seconds

Why does a developer need it? n

Results on benchmarks

Technical improvements to latency

Partnership with Cerebras

How can I try it

What next

Conclusion for Vibcoders

Output GPT-5.3-Codex - working notes on the model

Cline CLI 2.0: 20 команд для управления AI-агентом без хаоса

OpenAI launches GPT-5.3-Codex-Spark, the first real-time coding model

## The main thing is 30 seconds

## Why does a developer need it? n

## Results on benchmarks

## Technical improvements to latency

## Partnership with Cerebras

## How can I try it

## What next

## Conclusion for Vibcoders

Output GPT-5.3-Codex - working notes on the model

Cline CLI 2.0: 20 команд для управления AI-агентом без хаоса

The main thing is 30 seconds

Why does a developer need it? n

Results on benchmarks

Technical improvements to latency

Partnership with Cerebras

How can I try it

What next

Conclusion for Vibcoders