Why Codex Security does not emit SAST reports
Contents
OpenAI published an article explaining why Codex Security does not emit SAST-style reports. The piece describes how AI-driven constraint reasoning is used to avoid the false-positive problem that traditional SAST tools run into.
What SAST is
SAST, or Static Application Security Testing, analyzes source code or bytecode without running it. It is rule-based and pattern-oriented, and it has been around for a long time.
Common SAST tools include:
| Tool | Method | Main target |
|---|---|---|
| Semgrep | pattern matching | multi-language |
| CodeQL | data-flow analysis | multi-language, with GitHub integration |
| Checkmarx | syntax-tree analysis | multi-language, enterprise |
| Bandit | rule set | Python |
| SonarQube | static analysis plus quality metrics | multi-language |
The best-known problem with SAST is the number of false positives. Because it mechanically reports code that matches a rule, it often flags code paths that are not actually exploitable. That leads to alert fatigue.
How Codex Security is different
Codex Security does not rely on SAST-style pattern matching. It reads the whole project in context and uses AI-driven constraint reasoning to decide whether something is actually exploitable.
The process has three stages:
flowchart TD
A[Fetch the whole repository] --> B[Context analysis<br/>understand security structure<br/>build a threat model]
B --> C[Vulnerability identification<br/>rank by real-world impact]
C --> D[Sandbox validation<br/>confirm reproducibility and exploitability]
D --> E[Propose a fix<br/>aligned with system behavior]
Context analysis maps the project’s security structure, such as auth flows, validation paths, and datastore access patterns, and builds a mutable threat model.
Vulnerability identification then ranks issues by real attack impact instead of by pattern match alone.
The biggest difference from old-school SAST is the validation phase. The identified issue is actually exercised in a sandbox to see whether it can be exploited. If it cannot be exploited, it is not reported. That is the core of the false-positive reduction.
Beta results
OpenAI published beta numbers from 1.2 million commits.
| Metric | Value |
|---|---|
| Critical vulnerabilities | 792 |
| High-severity vulnerabilities | 10,561 |
| False-positive reduction | 50%+ across repositories |
SAST can look impressive by producing lots of findings, but developers only have so much attention. Codex Security is designed to report only confirmed, real issues instead of raw scan output.
This does not make SAST obsolete
SAST is still useful for immediate feedback and for blocking known-pattern issues early in CI/CD.
What Codex Security is replacing is the habit of treating SAST output as the final report without validating exploitability. It is an answer to the workflow where developers end up triaging thousands of alerts instead of fixing real issues.
What constraint reasoning means
AI-driven constraint reasoning formally represents the data-flow constraints in code and reasons about which attacker-controlled inputs can satisfy those constraints and reach vulnerable paths.
For SQL injection, a SAST tool might say “user input is concatenated into a query string.” A constraint-reasoning system asks whether that input is actually reachable from outside, whether escaping or validation always runs first, and whether the exploit path really exists. If not, it does not report the issue.
Frontier-model reasoning is what gives this validation step its accuracy. The model can handle complex dependencies and context-sensitive conditions that rule-based SAST has trouble expressing.
Availability
Codex Security is available as a research preview for ChatGPT Pro, Enterprise, Business, and Edu users, and it is free for the first month after release.