Tech 8 min read

Streaming SSR Shrinks the Blank Time Caused by the Slowest API

IkesanContents

Pareto’s article “Your Page Is Only as Fast as Your Slowest API: The Case for Streaming SSR,” reposted on DEV Community, was a good read.
The intro shows an HTML document stuck at 1,400 ms of “Waiting for response” in Chrome DevTools’ Network tab, while APM reveals five APIs waiting behind the scenes.

The breakdown given as an example:

DataLatency
Auth check30 ms
Cart count50 ms
Header navigation80 ms
Product details200 ms
Personalized recommendations1,400 ms

With traditional SSR that Promise.alls everything before returning the HTML, TTFB gets dragged down to the slowest recommendation API.
Four data sources are ready within 200 ms, yet the browser receives zero bytes of HTML for 1,400 ms.

The Problem Is the Max, Not the Average

The more dependencies a page has, the higher the probability that at least one of them is slow.
The original article assumes each API has a 1% chance of being 5x slower than normal.

With 5 APIs, the probability that at least one is slow is about 4.9%.
With 10, about 9.6%.
The perceived speed of the whole page is determined not by the average API latency but by the maximum in the set of concurrent waits.

This commonly happens on e-commerce product pages, post-login dashboards, admin panels, and SaaS settings pages.
As you pile user info, permissions, navigation, notification counts, A/B tests, recommendations, inventory, pricing, and external SaaS status into a single loader, the page gradually becomes “waiting for the slowest dependency.”

In Building a Real-Time Analytics Backend on Cloudflare Workers, I looked at P95 for event ingestion and dashboard update latency.
That was about the request processing pipeline; this time it’s about the time before the browser receives the first HTML byte.
Same pursuit of low latency, different vantage point.

Streaming SSR Splits the Waiting Boundary

Streaming SSR changes HTML generation from “wait for everything, then send it all at once” to “send the shell first, stream the rest later.”
React’s official renderToPipeableStream docs also explain that content can be shown to users before all data has loaded.

In Pareto’s article, fast data is awaited inside the loader, while the slow recommendation API is returned as a Promise via defer().

export async function loader(ctx) {
  const [user, nav, product] = await Promise.all([
    getUser(ctx),
    getNav(ctx),
    getProduct(ctx),
  ]);

  return defer({
    user,
    nav,
    product,
    recs: getRecs(ctx),
  });
}

On the page side, only the slow region is wrapped in Suspense and Await.

export default function ProductPage({ data }) {
  return (
    <>
      <Header user={data.user} nav={data.nav} />
      <ProductDetail product={data.product} />

      <Suspense fallback={<RecsSkeleton />}>
        <Await resolve={data.recs}>
          {(recs) => <Recommendations items={recs} />}
        </Await>
      </Suspense>
    </>
  );
}

In this case, the HTML shell can be flushed once the product detail’s 200 ms completes.
The recommendation API still takes 1,400 ms, but the user can already read the header and product details.

According to the original article’s numbers, traditional SSR TTFB is about 1,440 ms.
With Streaming SSR, it drops to about 240 ms including RTT.
The backend workload doesn’t change.
What changes is whether the slow API holds the entire page hostage.

When Client-Side Fetch and Caching Aren’t Enough

Fetching only the slow parts on the client side might seem sufficient.
However, it just turns into a different waterfall: HTML, CSS, JS, hydrate, fetch, render. The slow region’s display completion can actually shift later.

If SEO is involved, there’s also the issue of content not being in the initial HTML.
When you put search-relevant information like product reviews, regional pricing, and comparison tables solely behind client-side fetching, the page’s main content looks thin.

Caching isn’t a silver bullet either.
Right after a deploy, with key misses, personalized content, or real-time data like inventory and pricing, you can’t assume cache hits from the start.

Streaming SSR isn’t a cache replacement.
It’s a combination: cache what you can, and move what’s hard to cache to a position that doesn’t block the entire page.

In DevTools, Check the HTML Row and Waterfall

This kind of problem starts by looking at the HTML document request in the browser.
Not JS or images — the Document row at the very top.

The main things to check:

AspectWhat to look at
HTML WaitingHow long the server takes before sending the first byte
TTFBTime to first response including network RTT
FCPTime from HTML receipt to first paint
Suspense fallbackWhether the skeleton for the slow region appears first
CLSWhether layout shifts occur from height differences between fallback and actual data

If the Document’s Waiting is long on the browser side and one fetch stands out on the APM side, it’s a candidate for Streaming SSR.
Conversely, if the Document returns quickly but JS download or hydration is the bottleneck, the issue is bundle splitting or client execution time, not Streaming SSR.

In HLS Video Extraction Async Pipeline and Pinterest’s Internal MCP Platform, I wrote about shortening TTFB by streaming FFmpeg output through FastAPI’s StreamingResponse.
The idea of “send what you can before processing completes” is the same here, but with SSR, the concerns shift to Suspense fallback heights, error boundaries, and how much SEO-relevant data to include in the initial HTML.

Not Everything Should Be Streamed

Streaming SSR works when the page has “parts that can be shown immediately” and “parts that can arrive late without breaking meaning.”
Splits like product details vs. recommendations, profile body vs. post list, dashboard summary vs. heavy graphs, or settings form vs. audit log work well.

Conversely, on screens where layout and meaning fall apart without all data present, fine-grained streaming just means more spinners.
Small pages where everything returns in under 50 ms aren’t worth the streaming design overhead either.

With Next.js App Router, RSC and Suspense handle this. Remix and React Router use defer() and Await. Pareto uses the same loader-plus-defer() model.
As noted in Cloudflare Rebuilt Next.js on Vite in a Week with AI: vinext in Production, modern React frameworks bundle SSR, RSC, and streaming as framework features.
The differences are in API surface and operational scope; the goal of not letting a slow dependency stall the entire HTML is shared.

As a debugging sequence: start by checking P95 for loader fetches in APM.
If a dependency significantly exceeds 200 ms and isn’t essential for initial display, it’s a candidate for removing from await.
Then check the Document row in DevTools to verify that TTFB drops to around “remaining slowest await + RTT.”

Streaming SSR doesn’t make slow APIs faster.
It just means a slow API is no longer a reason to keep the entire page blank.

Moving to a Worker Doesn’t Eliminate the Wait

Another option is offloading slow API calls to Node.js Worker Threads.
But the bottleneck here is I/O wait, not CPU.
Running fetch(slowRecsAPI) in a Worker Thread doesn’t change the fact that the API response takes 1,400 ms.
The wait just moves from the main thread to a worker, and the timing of when HTML can be sent stays the same.

Worker Threads are effective when CPU occupies the main thread — massive JSON parsing or image resizing.
Waiting for external API responses doesn’t block the event loop, so offloading to a worker won’t shrink TTFB.

If you fire off a worker and return the SSR response first, the wait disappears.
But that’s structurally identical to a client-side fetch, and the data won’t be in the initial HTML.
For SEO-relevant data, the same issues from the client-fetch section apply.

There’s also a zombie concern.
Node.js Worker Threads aren’t guaranteed to terminate automatically when the parent process crashes.
Workers keep running even if the request is cancelled, and if they hang on an unhandled rejection, it’s hard to notice from outside.
If you spawn a worker per request, you can manage them with a pool, but pool monitoring, restarts, and timeout handling add operational overhead.

Streaming SSR’s Suspense and defer() let the framework manage chunk transmission and error boundaries.
No need to babysit worker lifecycles yourself, and failure modes are more predictable.

The same structural argument applies to browser-side Web Workers fetching slow APIs in the background.
fetch() is asynchronous I/O on the main thread too, so moving it to a Web Worker doesn’t change the wait time.
If JSON.parse is huge and blocks for tens of milliseconds, splitting it off makes sense, but for typical API response sizes, the main thread load is negligible.

For zombies, Dedicated Workers are cleaned up on page unload.
But in SPAs, route transitions don’t trigger unload, so forgetting worker.terminate() leaves workers from previous routes alive.
The classic pattern of postMessage callbacks touching state on unmounted components and triggering warnings is also common.
If AbortController can cancel the fetch, there’s little reason to interpose a Worker.


I dug into zombie processes themselves in yesterday’s article.
Worker zombies also popped up quite a bit in my own Kana Chat, actually making things slower.
It’s a tricky problem.

References: