OpenAI’s GPT-5.3-Codex thinks deeper and wider about coding work

On Thursday, OpenAI released GPT-5.3-Codex, a new model that extends its Codex coding agent beyond writing and reviewing code to performing a much wider range of work tasks. The release comes as competition continues to heat up among AI companies vying for market share in the AI-powered coding tools space.

OpenAI says GPT-5.3 combines the coding performance of GPT-5.2-Codex with the reasoning and professional-knowledge capabilities of GPT-5.2, while running 25% faster. This allows GPT-5.3-Codex to handle long-running tasks that involve research, tool use such as web search or database calls, and complex execution and planning across both general work tasks and software development.

Codex has reached over 1 million developers, OpenAI claims. And while Anthropic’s Claude Code has also seen rapid adoption, head-to-head data comparing the two tools remains scarce. SemiAnalysis reports that 4% of GitHub public commits, or new code uploaded to repositories, are currently being authored by Claude Code, and projects that figure could reach 20% or more by the end of 2026.

Benchmark one-upsmanship

OpenAI says GPT-5.3-Codex now has the best score of any model on SWE-Bench Pro, which evaluates real-world software engineering across four programming languages. The same is true for Terminal-Bench 2.0, which measures the terminal skills coding agents need.

More significantly, the new model is capable of taking into account larger bodies of information while working on a task, as well as reasoning about those tasks for longer periods without human intervention. In testing, OpenAI says it observed GPT-5.3-Codex autonomously iterating on game development over millions of tokens using generic prompts like “fix the bug” or “improve the game.”

Rival companies are making similar claims. Anthropic says its new Claude Opus 4.6 model, when powering Claude Code, can also comprehend larger code bases and make more thoughtful decisions about how to add new code. In a Thursday blog post, the company said Opus 4.6 achieved top scores on several industry benchmarks, including Humanity’s Last Exam, which measures complex multidisciplinary reasoning, GDPval-AA, which focuses on economically valuable knowledge work, and BrowseComp, which tests hard-to-find information search.

Beyond coding to knowledge work

OpenAI says GPT-5.3-Codex is built to support the full software lifecycle, including debugging, deploying, and monitoring code, as well as writing product requirement documents and conducting research. The same agentic capabilities can apply to tasks well outside software development, the company says, extending to work like creating slide decks and analyzing data in spreadsheets. (Anthropic has taken Claude Code in a similar direction, positioning it to support a broader pool of information workers across a wider range of business tasks.)

On GDPval, an OpenAI evaluation measuring performance on well-specified knowledge-work tasks across 44 occupations, GPT-5.3-Codex matches GPT-5.2 while adding stronger coding capabilities. On OSWorld-Verified, which tests computer use in a visual desktop environment, GPT-5.3-Codex achieved 64.7% accuracy compared to 38.2% for its predecessor.

GPT-5.3-Codex is the first model OpenAI classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first the company has directly trained to identify software vulnerabilities. OpenAI is committing $10 million in API credits to accelerate cyber defense, particularly for open source software and critical infrastructure systems.

ChatGPT subscribers can use the GPT-5.3-Codex model as the brain for Codex while using the coding tool via the Codex app, the IDE (Integrated Development Environment) interface, or within the command line interface of their computer.

Source link

OpenAI’s GPT-5.3-Codex thinks deeper and wider about coding work

This Startup Wants to Bring Concierge Healthcare to the Masses

Apple Watch Saves Cincinnati Man After He Collapsed at His Home

12,000 pounds of bacon recalled as USDA slaps product with dreaded ‘Class 1’ designation. Is that serious?

3 ways the attacks on Iran could impact a U.S. economy already hit by tariffs and a weak job market

YouTube Apologizes For Biden-Era Censorship

Economic Warfare – US V Spain

How to hide your sensitive info (for real) when using ChatGPT and other AI chatbots

This Startup Wants to Bring Concierge Healthcare to the Masses

Apple Watch Saves Cincinnati Man After He Collapsed at His Home

12,000 pounds of bacon recalled as USDA slaps product with dreaded ‘Class 1’ designation. Is that serious?

Top Picks

This Startup Wants to Bring Concierge Healthcare to the Masses

Apple Watch Saves Cincinnati Man After He Collapsed at His Home

12,000 pounds of bacon recalled as USDA slaps product with dreaded ‘Class 1’ designation. Is that serious?

OpenAI’s GPT-5.3-Codex thinks deeper and wider about coding work

Benchmark one-upsmanship

Beyond coding to knowledge work

Related Posts