Async HLS Video Extraction Pipeline and Pinterest's Internal MCP Ecosystem

A design article appeared on DEV Community describing a downloader that extracts Pinterest videos from HLS streams, converts them to MP4, and delivers them asynchronously. This isn’t about Pinterest’s official media delivery infrastructure — it’s a design memo for a service that untangles HLS and React’s state tree from the outside to extract videos.

Separately, Pinterest Engineering Blog published a design article on their internal MCP ecosystem.

Extracting Video URLs from Pinterest’s HTML

Pinterest videos aren’t placed in HTML as simple MP4 links. The page HTML embeds a JSON-LD block and a script block called PWS_DATA.

JSON-LD is structured metadata for SEO — it contains video thumbnails and basic info but not high-resolution source URLs. The extraction target is PWS_DATA, which stores React’s Redux state tree itself. High-bitrate media sources sit in deeply nested hierarchies, and a Schema Parser dynamically maps the state tree to identify the highest-bitrate resource.

While browser automation tools manipulate the DOM, this type of extractor focuses on analyzing embedded state within the page and network responses rather than the DOM. Unlike MCP servers such as Chrome DevTools MCP that manipulate DOM and network via the DevTools Protocol, the extractor statically pulls needed data at HTML parse time.

That said, PWS_DATA is Pinterest’s internal implementation, not a public API. A frontend deploy that changes the structure breaks it immediately. If you need stable integration, build within the scope of official APIs or embedded metadata — depending on internal state means shouldering the entire maintenance burden.

HLS Structure and FFmpeg Copy Mux

Pinterest videos are delivered via HLS (HTTP Live Streaming). Handing users the .m3u8 directly doesn’t work in typical players, so the extractor needs to collect TS segments and mux them into MP4.

HLS Delivery Structure

HLS is an adaptive bitrate streaming protocol designed by Apple that splits video into small segments and delivers them over HTTP. The structure has three layers.

graph TD
    A["Master Playlist<br/>.m3u8"] --> B["Media Playlist 1080p<br/>.m3u8"]
    A --> C["Media Playlist 720p<br/>.m3u8"]
    A --> D["Media Playlist 480p<br/>.m3u8"]
    B --> E["segment0.ts"]
    B --> F["segment1.ts"]
    B --> G["segment2.ts"]
    C --> H["segment0.ts"]
    C --> I["segment1.ts"]
    C --> J["segment2.ts"]

The master playlist references multiple media playlists for different resolutions and bitrates, and each media playlist points to the actual TS segments. The player switches between media playlists based on network bandwidth, so playback stays smooth even on slow connections.

TS segments contain H.264 or HEVC video streams and AAC audio streams packaged in MPEG-2 Transport Stream containers. What the extractor does is fetch these segments in order and reassemble them into a single MP4 container.

Copy Mux vs. Re-encoding

When you specify -c copy in FFmpeg, it converts only the container without decoding or re-encoding the video and audio codec streams. No pixel recalculation means processing is orders of magnitude faster with zero quality loss.

	Copy mux (`-c copy`)	Re-encoding
Processing	Container swap only	Decode -> Filter -> Encode
CPU load	Low (I/O-bound)	High (CPU-bound)
Speed	5-min video in seconds	5-min video in tens of seconds to minutes
Quality	No degradation	Degradation from re-encoding
Requirement	Input codec valid in output container	Always possible

If the TS segments contain H.264+AAC, the MP4 container accepts them directly and copy mux works. HEVC is also supported in MP4 (ISO BMFF), but the placement of HEVC parameter sets (VPS/SPS/PPS) can cause issues during TS-to-MP4 conversion. In that case, a bitstream filter like -bsf:v hevc_mp4toannexb is needed.

The design article describes a configuration that stores TS segments in an in-memory circular buffer, fetches them in parallel with a coroutine pool, and feeds them into FFmpeg.

Async Streaming Pipeline

The most in-depth part of the design article is the “zero-storage pipe” configuration that streams fetched video to users in chunks without saving to the server.

Pipeline Flow

graph LR
    A["Client"] --> B["FastAPI"]
    B --> C["Pinterest CDN<br/>TS segment fetch"]
    C --> D["Memory buffer"]
    D --> E["FFmpeg<br/>copy mux"]
    E --> F["StreamingResponse<br/>chunked delivery"]
    F --> A

FastAPI’s asyncio event loop asynchronously awaits TS segment fetches from Pinterest CDN, writing received chunks to an in-memory circular buffer. The FFmpeg process reads TS from the buffer, muxes it into MP4, and its output streams to the client via FastAPI’s StreamingResponse.

Zero disk writes means zero storage cost, and parallel fetching and delivery keeps TTFB short. The article claims sub-200ms TTFB and 85% memory usage reduction compared to the previous approach.

Segment Fetch Failures

The hard part of zero-storage delivery shows up not on success but on failure.

If even some TS segments fail to fetch, the in-progress MP4 stream breaks. The client has already received the beginning, so the response can’t be rewound. When the client disconnects mid-stream, upstream TS segment fetching needs to stop, but without properly coordinating stdin close timing for the FFmpeg process with asyncio task cancellation propagation, zombie processes remain.

Redis cluster is used for distributed session management, holding short-lived credentials to reduce duplicate authentication requests to the Pinterest API. It can also handle job state management and rate control, but the minimum pipeline works without Redis. A prototype runs with just FastAPI + asyncio + FFmpeg stdout pipe; add Redis when session sharing or rate control becomes necessary.

As I wrote in the Cloudflare Workers realtime analytics article, for low-latency pipelines, “how to close a broken stream” matters more for operational quality than average throughput.

WAF Evasion Design Concerns

What bothered me most about the design article was the part that openly describes WAF evasion, TLS fingerprint mimicry, and HTTP/2 frame characteristic reproduction as part of the “architecture.”

HLS fetching, copy mux, and async streaming are general-purpose technologies that work just as well for delivering your own content. Fingerprint spoofing to bypass a target service’s bot detection and rate limiting, on the other hand, directly conflicts with terms of service and access controls.

For your own content or licensed materials, the pipeline design up to this point works as-is. For extraction from third-party services, official APIs, user data export permissions, and ToS verification come first — centering WAF evasion only makes the product’s accountability burden heavier.

Pinterest’s Internal MCP Ecosystem

The MCP article published by Pinterest Engineering Blog in March 2026 was a compelling read as a concrete example of running MCP at enterprise scale in production.

MCP (Model Context Protocol) is a standard protocol for LLMs to access external tools and data sources, published by Anthropic in late 2024. The early typical use case was individuals running stdio-based MCP servers locally from a CLI, but Pinterest deploys this as cloud-hosted HTTP across the entire organization. The direction aligns with the Cloudflare Mesh and enterprise MCP reference architecture, but Pinterest is a step more concrete in publishing actual production numbers.

MCP Server Architecture

Pinterest uses a design with multiple small, domain-specific MCP servers for different purposes. No giant all-in-one MCP server.

MCP Server	Purpose	Characteristics
Presto MCP	Data analysis query execution	Highest traffic. On-demand Presto-backed data retrieval in analysis workflows
Spark MCP	Spark job failure diagnosis	Log aggregation and structured root cause analysis for job failures
Knowledge MCP	Internal knowledge search	General search endpoint for internal docs and knowledge bases

Each server gets infrastructure from a unified deploy pipeline — only server-specific business logic needs implementation. MCP server tool calls look like any other tool call from the LLM’s perspective, providing unified access across surfaces like internal chat, IDEs, and web chat. Specific tools can also be restricted to specific communication channels.

Central Registry

MCP server management uses a central registry.

The registry centrally manages the list of approved servers, connection info, owning teams, support channels, and status. A web UI shows each server’s state and tool list, and an API handles server discovery, validation, and access control queries for the client side. Servers not registered in the registry aren’t approved for production use.

As noted in the MCP server 50-scan article, many individually published MCP servers have weak input validation and permission boundaries. Pinterest’s central registry addresses the rogue MCP problem at the organizational level.

Two-Layer Authorization

Pinterest’s MCP processes authorization in two layers, bypassing the per-server OAuth consent screens the MCP spec envisions. MCP is layered on top of the existing internal auth stack.

graph TD
    A["User"] --> B["Internal Auth Stack"]
    B --> C["JWT Issuance"]
    C --> D["MCP Registry"]
    C --> E["Envoy Proxy"]
    E -->|"JWT Verification"| F["X-Forwarded-User<br/>X-Forwarded-Groups"]
    F --> G["MCP Server"]
    H["Service-to-Service"] --> I["SPIFFE-based<br/>mesh identity"]
    I --> G

Layer 1: End-User JWT

When a user accesses an MCP server from internal LLM chat or IDE integration, the internal auth stack issues a JWT. The JWT is sent to the registry and target MCP server, where the Envoy proxy verifies it and converts it into X-Forwarded-User and X-Forwarded-Groups headers.

Fine-grained access control can be applied per tool on the MCP server side using @authorize_tool(policy='...') decorators. For servers accessing sensitive data like Presto, sessions are gated so only users belonging to specific business groups can establish them.

Layer 2: Service-to-Service Mesh Identity

For low-risk read-only scenarios, service-to-service authentication uses only SPIFFE (Secure Production Identity Framework for Everyone)-based mesh identity without end-user JWTs. Used for MCP tool calls in batch processing and system-to-system integration where no human is involved.

Piggyback on existing sessions rather than implementing OAuth consent screens per server — operationally sound. Any environment with an existing internal auth stack can take the same approach.

Security Review and Production Deployment

Before deploying an MCP server to production, three teams — Security, Legal/Privacy, and GenAI — issue review tickets. Only after approval is the server registered in the production registry, with access policies set based on review results. Which surfaces can access which servers is determined by these policies.

Dangerous operations require human-in-the-loop: agents propose actions, humans approve or reject. Destructive operations like data overwrites use MCP spec elicitation (confirmation dialogs). Batch approval is also supported for bulk processing of routine operations.

66,000 Monthly Calls and 7,000 Hours Saved

Operational numbers as of January 2025 have been published.

Metric	Value
Monthly MCP tool calls	66,000
Monthly active users	844
Monthly hours saved	~7,000

Hours saved are calculated by having MCP server owners declare “minutes saved per call” and multiplying by call count. A rough estimate, but it measures the ecosystem’s value by “how much manual work was replaced” rather than server or tool counts.

Running Presto queries from chat instead of manually typing them in the console. Getting structured failure analysis from an MCP server instead of tracing Spark logs yourself. These concrete time savings are what’s working for 844 monthly users.

Differences from stdio-based Local MCP

Design assumptions are fundamentally different between local MCP servers and Pinterest’s enterprise MCP.

	Local MCP	Pinterest Enterprise MCP
Hosting	Local process (stdio)	Cloud-hosted (HTTP)
Server discovery	Hand-written mcp.json	Central registry API
Authorization	None or env var tokens	JWT + mesh identity two-layer
Security review	None	Security / Legal / GenAI three-party review
Dangerous operations	Implementation-dependent	Human-in-the-loop required
Monitoring	None	Call counts, MAU, hours saved tracked
Server ownership	Individual	Team-registered and managed