Output GPT-5.3-Codex - working notes on the model

◷ 3 min read 2/6/2026

GPT Codex5. 3

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

How to use

Copy this prompt and send it to your AI chat.
Attach your project or open the repository folder in the AI tool.
Ask for file-level changes, risks, and a quick verification checklist.

GPT-5.3-Codex is a new version of Codex that combines:

coding capabilities of GPT-5.2-Codex
reasoning and professional knowledge of GPT-5. 2

The model works about 25% faster and is designed for long tasks: research, work with tools, sequential execution of steps.

Used in Codex app, CLI, IDE and web. The API is in progress.

What is fundamentally new in this version

One model instead of two modes

There used to be a division:

code-model
reasoning

In GPT-5.3-Codex, this is combined.
The code, analysis, hypothesis testing, and further edits are all in the same context, without the feeling of “switching.”.

In practice, this does not mean that the answers have become “smarter,” but that they have become “more consistent.”.

Independent model development

This is the first version of Codex, which was actively used for its own development:

tuition
evaluation
deploitation
diagnostics

By behavior, this is noticeable: the model works more carefully with infrastructure tasks and is better oriented in complex, not isolated scenarios.

Change level: Agent architecture Codex ceased to be a tool for individual steps and became an agent that can handle the whole task

Benchmark

GPT-5.3-Codex performs best on several key tests:

SWE-Bench Pro*
Real engineering tasks, multiple languages, less pollution.
The result is higher than that of GPT-5.2 and GPT-5.2-Codex.
Terminal-Bench 2.0
Check your ability to work in the terminal.
GPT-5.3-Codex - 77.3% accuracy versus ~64% in the previous version.
OSWorld-Verified
Work in a visual desktop environment.
64.7% is a significant increase over previous models.
GDPval
Professional tasks (documents, tables, presentations).
The level is comparable to GPT-5.2, but within a more versatile agent.

Separately, the model achieves these results with less tokens, that is, it works more economically.

The Web and Long Tasks

The model was tested on long standalone scenarios, such as game development where the agent:

iteratively improves the result
bug
complement
it does this over millions of tokens

In more mundane tasks (landings, sites) it is noticeable:

default
fewer "empty" solutions
more assembled structure without detailed TK

Beyond coding

GPT-5.3-Codex is designed not only for:

PRD
presentation
table
analysis
user research
metric

According to GDPval, the model confidently copes with such tasks.
Importantly, it’s not a separate mode—it’s all in the same agent context.

Interactive work

Codex is now reporting progress more frequently:

do
which decisions are taken
where

It is possible to intervene in the course of a task without resetting the context.
This reduces the need to either wait for the finals or constantly restart the dialogue.

Personal feeling

Most notably, there is less need to control the agent manually.

Earlier:

task had to be split
frequent
continuously check if the context is gone

Now:

the task can be formulated more broadly
agent holds the direction
control remains, but in moderation

Work becomes less stressful, especially on long tasks

Output GPT-5.3-Codex - working notes on the model

## What is fundamentally new in this version

### One model instead of two modes

## Independent model development

## Benchmark

## The Web and Long Tasks

## Beyond coding

## Interactive work

## Personal feeling

—

OpenAI launches GPT-5.3-Codex-Spark, the first real-time coding model