Tech 5 min read

Cloudflare's bot-management script was reading React's internal state

IkesanContents

When you open ChatGPT, there is a tiny pause before the text box becomes available. A reverse-engineering article decrypted the encrypted bytecode that runs during that moment and showed what is happening.

According to the report by Buchodi, Cloudflare’s bot-management script on the ChatGPT login flow inspects not only browser hardware information but also internal React Fiber state.

The custom VM behind Turnstile

Cloudflare Turnstile is the non-interactive bot-checking system that replaces old-style CAPTCHAs. Instead of making the user solve puzzles, it runs a JavaScript challenge in the background and decides whether the browser looks genuine. It has three modes: Non-interactive, Managed, and Invisible.

Inside the version used by ChatGPT, a custom virtual machine with 28 opcodes executes the bytecode. The instruction set includes ADD, XOR, CALL, BTOA, RESOLVE, BIND_METHOD, and JSON_STRINGIFY. Register addresses are assigned from random floating-point values on every request.

How the double-XOR encryption was broken

Buchodi collected 377 encrypted programs from network traffic and worked out the decryption flow. The encryption has two layers.

graph TD
    A["prepare request"] --> B["get p token"]
    C["prepare response"] --> D["read turnstile.dx<br/>~28,000 chars of Base64"]
    B --> E["outer decryption<br/>XOR base64decode dx, p"]
    D --> E
    E --> F["outer bytecode<br/>89 VM instructions"]
    F --> G["find a 5-argument instruction<br/>after the 19KB blob"]
    G --> H["extract a floating-point key<br/>from the fifth argument, e.g. 97.35"]
    H --> I["inner decryption<br/>XOR inner blob with float key"]
    I --> J["main bytecode<br/>417-580 instructions"]

The key detail is that the inner XOR key is embedded in the bytecode itself as a floating-point literal. The report says decryption succeeded 50 times in a row, so the process is reproducible. The author manually formatted and deobfuscated the SDK source file (sdk.js, 1,411 lines) and mapped all 28 opcodes.

The encryption is basically zero-knowledge from a security point of view. The key lives in the same data stream, so it only blocks casual inspection. The real goal is to make static analysis of the fingerprinting items harder and hide what is being checked.

The 55-field fingerprint

The decrypted program collects 55 properties across three layers.

Layer 1: browser fingerprinting, 38 fields

CategoryCountWhat is collected
WebGL8UNMASKED_VENDOR_WEBGL, UNMASKED_RENDERER_WEBGL, WEBGL_debug_renderer_info, getExtension, getParameter, getContext, canvas, webgl
Screen8colorDepth, pixelDepth, width, height, availWidth, availHeight, availLeft, availTop
Hardware5hardwareConcurrency, deviceMemory, maxTouchPoints, platform, vendor
Font measurement4create hidden div -> apply fonts -> measure with getBoundingClientRect -> remove element
DOM probing8createElement, appendChild, removeChild, div, style, position, visibility, ariaHidden
Storage5write to localStorage with key 6f376b6560133c2c, then inspect quota.estimate and usage

Layer 2: Cloudflare edge headers, 5 fields

These are server-side values injected by Cloudflare infrastructure: cfIpCity, cfIpLatitude, cfIpLongitude, cfConnectingIp, and userRegion. They are unavailable from the client side and are probably used to check consistency between network and application layers.

Layer 3: React application state, 3 fields

The script reads three internal React properties from the DOM:

  • __reactRouterContext
  • loaderData
  • clientBootstrap

A headless browser that only loaded the HTML and did not run the JavaScript bundle would not have those properties. So Turnstile is checking not just whether the browser is real, but whether a specific React app has fully booted.

Behavior analysis and proof of work

Besides fingerprinting, a 271-instruction Signal Orchestrator performs behavioral analysis.

It installs six event listeners: keydown, pointermove, click, scroll, paste, and wheel. Those feed 36 window.__oai_so_* properties that record typing cadence, mouse speed, scroll pattern, idle time, and paste behavior.

It also includes a proof-of-work challenge. A SHA-256 cache is applied to 25 fingerprint fields, and the difficulty is set with a uniform random range of 400k to 500k iterations. In the test results, 72% completed in under 5 ms, so the user experience impact is minimal.

The PoW response includes seven binary flags: ai, createPRNG, cache, solana, dump, InstallTrigger, and data. All were zero in the 100-sample test. The solana flag appears to be aimed at crypto-mining browser extensions, and InstallTrigger is a Firefox-specific property used to check browser type.

The final token generation

Once all checks finish, the last four instructions run:

graph LR
    A["JSON.stringify<br/>fingerprint"] --> B["store"]
    B --> C["XOR<br/>JSON string, key"]
    C --> D["RESOLVE<br/>return to parent frame"]

The output is attached to the login request as the OpenAI-Sentinel-Turnstile-Token header. In other words, the fingerprint is encrypted into the token itself and then verified by Cloudflare.

What this means for web developers

The bot-checking layer has moved beyond network and browser checks and now reaches into the application layer. What used to be decided by User-Agent, IP reputation, and JavaScript execution support now extends to React internal state. Cloudflare also recently shipped AI Security for Apps, so the inspection surface around bot management keeps getting wider.

The analysis also shows that the encryption is only partially effective. If the XOR key is in the same stream, a determined analyst can read the contents. The design seems aimed less at cryptographic strength and more at making it easy to change what is checked server-side. New bot techniques can be handled without updating the client SDK.

From a privacy standpoint, persistent writes to localStorage combined with GPU, font, and screen information make for a very accurate device fingerprint. Turnstile’s policy says the data is used for bot detection, but the amount of information collected is substantial compared with a normal CAPTCHA replacement.