vibecode.wiki
RU EN
~/wiki / новости / выход-gpt-5-3-codex--рабочие-заметки-по-модели

Output GPT-5.3-Codex - working notes on the model

◷ 3 min read 2/6/2026

Next step

Open the bot or continue inside this section.

$ cd section/ $ open @mmorecil_bot

Article -> plan in AI

Paste this article URL into any AI and get an implementation plan for your project.

Read this article: https://vibecode.morecil.ru/en/%D0%BD%D0%BE%D0%B2%D0%BE%D1%81%D1%82%D0%B8/%D0%B2%D1%8B%D1%85%D0%BE%D0%B4-gpt-5-3-codex--%D1%80%D0%B0%D0%B1%D0%BE%D1%87%D0%B8%D0%B5-%D0%B7%D0%B0%D0%BC%D0%B5%D1%82%D0%BA%D0%B8-%D0%BF%D0%BE-%D0%BC%D0%BE%D0%B4%D0%B5%D0%BB%D0%B8/ Work in my current project context. Create an implementation plan for this stack: 1) what to change 2) which files to edit 3) risks and typical mistakes 4) how to verify everything works If there are options, provide "quick" and "production-ready".
How to use
  1. Copy this prompt and send it to your AI chat.
  2. Attach your project or open the repository folder in the AI tool.
  3. Ask for file-level changes, risks, and a quick verification checklist.

GPT-5.3-Codex is a new version of Codex that combines:

  • coding capabilities of GPT-5.2-Codex
  • reasoning and professional knowledge of GPT-5. 2

The model works about 25% faster and is designed for long tasks: research, work with tools, sequential execution of steps.

Used in Codex app, CLI, IDE and web. The API is in progress.


What is fundamentally new in this version

One model instead of two modes

There used to be a division:

  • code-model
  • reasoning

In GPT-5.3-Codex, this is combined.
The code, analysis, hypothesis testing, and further edits are all in the same context, without the feeling of “switching.”.

In practice, this does not mean that the answers have become “smarter,” but that they have become “more consistent.”.

Independent model development

This is the first version of Codex, which was actively used for its own development:

  • tuition
  • evaluation
  • deploitation
  • diagnostics

By behavior, this is noticeable: the model works more carefully with infrastructure tasks and is better oriented in complex, not isolated scenarios.

Change level: Agent architecture Codex ceased to be a tool for individual steps and became an agent that can handle the whole task

Benchmark

GPT-5.3-Codex performs best on several key tests:

  • SWE-Bench Pro*
    Real engineering tasks, multiple languages, less pollution.
    The result is higher than that of GPT-5.2 and GPT-5.2-Codex.
  • Terminal-Bench 2.0
    Check your ability to work in the terminal.
    GPT-5.3-Codex - 77.3% accuracy versus ~64% in the previous version.
  • OSWorld-Verified
    Work in a visual desktop environment.
    64.7% is a significant increase over previous models.
  • GDPval
    Professional tasks (documents, tables, presentations).
    The level is comparable to GPT-5.2, but within a more versatile agent.

Separately, the model achieves these results with less tokens, that is, it works more economically.

The Web and Long Tasks

The model was tested on long standalone scenarios, such as game development where the agent:

  • iteratively improves the result
  • bug
  • complement
  • it does this over millions of tokens

In more mundane tasks (landings, sites) it is noticeable:

  • default
  • fewer "empty" solutions
  • more assembled structure without detailed TK

Beyond coding

GPT-5.3-Codex is designed not only for:

  • PRD
  • presentation
  • table
  • analysis
  • user research
  • metric

According to GDPval, the model confidently copes with such tasks.
Importantly, it’s not a separate mode—it’s all in the same agent context.

Interactive work

Codex is now reporting progress more frequently:

  • do
  • which decisions are taken
  • where

It is possible to intervene in the course of a task without resetting the context.
This reduces the need to either wait for the finals or constantly restart the dialogue.

Personal feeling

Most notably, there is less need to control the agent manually.

Earlier:

  • task had to be split
  • frequent
  • continuously check if the context is gone

Now:

  • the task can be formulated more broadly
  • agent holds the direction
  • control remains, but in moderation

Work becomes less stressful, especially on long tasks