Built a Mobile App in 3 Days. The Hard Part Was Keeping It Connected

A post appeared on DEV Community about a web developer building an AI chat app in 3 days with React Native + Expo.
The original: “I Built a Mobile App in 3 Days. The Hard Part Was Keeping It Connected.”

What caught my attention wasn’t the “AI tools let you build mobile apps fast” angle — it was how things broke afterwards.
Streaming responses that worked fine on the web fell apart the moment real usage started on an iPhone.
The user opens WhatsApp, locks the screen, browses Instagram for a bit.
That alone pushes the app to the background, and iOS kills the connection.

The AI chat response is still being generated server-side, but the phone has stopped listening.
The result: tokens get consumed, but the message comes back empty.
That’s a particularly nasty way to break.

What 3 Days Could Build, and What 3 Days Couldn’t Reveal

The original article’s Synapse is apparently a personal AI companion the author’s wife uses daily.
It started as a Turborepo monorepo containing a web app, backend, and shared packages — then the React Native app was added on top.

With this setup, an AI coding agent can see the existing web components, Convex backend, and shared type definitions all at once.
No need to copy-paste API specs from a separate repo when building screens.
Clerk auth, Expo Router, Convex mutations/queries, existing themes — reusing all of these, the author got chat UI, real-time streaming, memory management, and persona switching working in 3 days.

Read just that part and you’d think “monorepo + AI coding is powerful” and move on.
But the real story comes after — the gap between the simulator and daily use on a real device.

For web streaming, you read from response.body via a ReadableStream from fetch.
But in the original article, the Hermes runtime on React Native for iOS couldn’t use that approach, so the author fell back to XMLHttpRequest’s onprogress to pick up incremental text.

That works during development.
But smartphone usage doesn’t mean “staring at the screen until the reply finishes.”
This is where the assumption of a maintained connection breaks down.

The Stream Is a Display Path, Not a Persistence Path

The fix in the original article was to treat Convex not as a simple relay, but as middleware that completes generation even after disconnection.
While the client is connected, deltas from the AI service get written back directly.
If a write fails, a client-disconnected flag is set, and the server stops writing to the dead connection.

But generation itself doesn’t stop.
The server accumulates the text and saves it as a completed message in the database.
When the user returns to the app, they read the finished message, not the mid-stream partial.

Here’s how the flow looks:

flowchart TD
    A["iPhone app<br/>sends request"]
    B["Convex HTTP endpoint"]
    C["AI generation service"]
    D["Stream display"]
    E["iOS suspends app"]
    F["Server continues generation"]
    G["Save completed message to DB"]
    H["App resumes<br/>fetches saved message"]

    A --> B --> C
    C --> B --> D
    D --> E
    E --> F --> G --> H

This is a pretty important concept for AI chat.
The stream is a display path for user experience — it’s not the source of truth.
The source of truth lives in the server-side message state.

On top of that, there’s a race condition in client-side error handling.
From the phone’s perspective, the XHR failed, so it wants to report “this message errored.”
But server-side, generation may have completed after the disconnection.
The original article adds a guard that prevents the client’s failure report from overwriting messages that already have completedAt set.

Without squashing this race condition, a server-side successfully completed response gets overwritten back to an error state by the returning client.

Whether PWA or Native, Phones Leave the Screen

In my own Kana Chat v2, I turned the app into a PWA for accessing AI agents from an iPhone — adding WebSocket streaming, Web Push, and HTTPS via Tailscale Serve.
That article solved the “notifications don’t arrive when the browser isn’t open” problem with PWA push, but the DEV article addresses something one layer deeper.

Not notifications — the connection itself dies mid-generation.
Whether PWA or native, smartphone users leave the screen.
So instead of “success if the connection holds,” the design should lean toward “the job completes even if the connection drops, and the client can re-fetch state on return.”

The low-latency realtime sync article compared WebSocket, SSE, WebRTC and others, but for mobile AI chat, looking at latency alone isn’t enough.
There are situations where post-disconnection state transitions matter more than always-on connection quality.
LLM responses in particular have per-generation costs — dropping them mid-way hurts not just user experience but billing too.

How you manage message state matters more than which transport you pick.

Concern	What happens on failure	Design approach
Client suspension	iOS kills XHR or WebSocket	Server continues generation
Partial progress	Streamed text is lost	Save completed text to DB
On resume	Client marks it as failed	Don’t overwrite if already completed
Cost	Generation runs but result disappears	Track via requestId or messageId

“Building mobile apps fast with AI” isn’t remarkable anymore.
But once it becomes an AI app used daily, how it breaks during commutes, screen locks, and app switches matters more than how fast you can build it.

The numbers from the original article: 546 messages from web and 239 from mobile in one month.
Recently, some days mobile usage matches web.
At that point, “occasionally drops” stops being a peripheral bug and becomes a reliability issue on the main path.

Even for a personal AI, the product’s character changes the moment it goes on a phone.
It shifts from something you sit at a desk and talk with for a while, to something you tap briefly and repeatedly while cooking or commuting.
A design that gives up partial progress when the connection drops can’t survive that usage pattern.

Not Just an iOS Problem

The original article focuses on iOS behavior, but Android has similar mechanisms.

Doze mode, introduced in Android 6.0, restricts network access and WakeLocks after the screen has been off and the device stationary for a certain period.
Android 9’s App Standby Buckets further categorize apps by usage frequency — less-used apps get longer intervals between allowed jobs and alarms.

The difference from iOS is that restrictions ramp up gradually.
iOS starts cutting network connections within about 30 seconds of the app going to background.
Android doesn’t cut immediately, but once Doze kicks in, network access is only available during periodic maintenance windows.
iOS means “sudden disconnection.” Android means “gradual throttling.”
The timing of breakage differs, but the risk of long generations getting interrupted is the same.

flowchart LR
    subgraph iOS
        A1["Enters background"] --> A2["Network cut<br/>in ~30 seconds"]
    end
    subgraph Android
        B1["Screen off + stationary"] --> B2["Doze mode"] --> B3["Network only during<br/>maintenance windows"]
    end

Android does have an escape hatch: foreground services.
In exchange for showing a persistent icon in the notification bar, apps can maintain network communication in the background.
Messaging apps like LINE and WhatsApp use this.
AI chat apps could use the same trick, but you need to explain to users why the app is resident.
Since Android 14, foreground service type declarations have been tightened — declaring a type that doesn’t match the actual use gets rejected during Play Store review.

The tricky part is that stock Android specs aren’t the whole story.
Samsung, Xiaomi, Huawei, and OPPO layer their own power-saving mechanisms on top of Doze and App Standby Buckets.
Samsung’s Device Care kills battery-optimized apps more aggressively than Doze.
Xiaomi’s MIUI blocks background communication by default for apps that aren’t whitelisted for auto-start.
The site dontkillmyapp.com catalogs per-manufacturer behavior and workarounds — it’s a well-known headache for developers.

Tests that pass on a stock Android emulator may not reproduce on a real user’s Galaxy.
iOS at least behaves identically across all devices, making the branching conditions clear.
Android requires absorbing manufacturer-specific differences as a gradient.
Even if React Native promises cross-platform support, real-device testing for background restrictions has to happen per device.

On the React Native side, Android’s fetch implementation differs from iOS/Hermes and can support ReadableStream in some cases.
But the cause of disconnection isn’t streaming API support — it’s the OS killing the app’s communication.
The design of completing generation server-side and saving to the database is needed regardless of OS.

Why It’s Less Visible on Desktop, and When It Still Happens

Desktop browsers face almost no background restrictions from the OS.
WebSocket and SSE connections survive tab switches.
Chrome throttles JavaScript timers in background tabs to 1-second intervals, but doesn’t kill the connections themselves.

So when using the same AI chat from a desktop browser, this problem rarely surfaces.
Few people have experienced “I switched tabs and my ChatGPT/Claude reply disappeared.”

But similar situations do occur on desktop.

Laptop sleep is one.
Close the lid after starting generation, and the OS takes down the network interface entirely.
On resume, the browser redraws the tab, but the stream connection is gone.
macOS Power Nap keeps some communication alive, but long-lived connections like WebSocket aren’t covered.
Windows Modern Standby machines behave similarly — the OS selectively maintains network during low-power states.

Network switching is another.
Moving from home Wi-Fi to tethering, VPN reconnection, corporate proxy re-authentication.
All of these break the TCP connection.
It happens less frequently on desktop, but when it does, the breakage is identical to mobile.

The reason desktop is less problematic isn’t just looser OS restrictions.
The usage pattern is different.
When you sit at a PC and send an AI chat message, you usually watch that tab until the reply finishes.
On a phone, you reflexively switch apps when a notification arrives, and pocketing the phone locks the screen.
Connection lifetime gets shortened by both OS restrictions and user behavior — that’s what makes mobile different.

The Same Breakage Pattern Beyond AI Chat

The connection drops mid-way, and the server-side result is left in limbo.
This pattern isn’t unique to AI chat.

Mobile video upload is a classic example.
When an app goes to background during a multi-hundred-MB upload, the transfer is interrupted.
The Tus protocol addresses this by building chunked upload and resumption into the protocol level. AWS S3’s multipart upload follows the same idea — even if it cuts out, you resume from completed parts.

E-commerce payment flows are another dangerous spot.
If the app is suspended right after hitting the payment API, the payment succeeds but the client never receives the result.
Stripe provides Idempotency Keys precisely to absorb this situation.
Retrying after resumption is guaranteed at the API level to not double-charge.

Video transcoding and image generation share the same structure — a server-side job running for tens of seconds to minutes, with no guarantee the client will wait.
A design that processes “send request” and “receive result” on the same connection doesn’t hold up on mobile.

Real-time collaborative editing has this structure too.
When using Google Docs or Notion on a phone and the app gets suspended, local edits remain unsynced with the server.
On resume, if the server-side version has diverged, you need conflict resolution.
CRDTs and Operational Transformation exist for this purpose — their very necessity tells you connection continuity can’t be assumed.

WebRTC voice and video calls also face routine disconnection on mobile.
Switching apps during a LINE or Discord call can mute the mic or drop the session entirely.
Unlike HTTP requests, audio streams can’t be retried — after a disconnection, you have to establish a new session.
The quality of “reconnect automatically after a drop” logic significantly affects the call experience.

Design Options That Assume Disconnection

The original article’s Convex approach was “complete generation server-side → save to DB → client fetches on return.”
Generalizing this, there are several approaches.

A job queue is the most straightforward.
The client receives a request ID and then waits for completion via polling or a realtime subscription.
Stream display becomes an optional “show it if there’s a connection” feature, with the source of truth always on the server.
Reactive backends like Firebase Realtime Database, Convex, or Supabase Realtime automatically deliver the latest state the moment the client reconnects.

Explicitly defining message state transitions is also effective.
Beyond just the happy path of pending → streaming → completed, you define an interruption path: pending → streaming → interrupted → completed.
When a client sees interrupted, it can distinguish between “the server is still generating” and “it actually failed.”
The completedAt guard in the original article is a lightweight version of this state machine.

stateDiagram-v2
    [*] --> pending
    pending --> streaming
    streaming --> completed
    streaming --> interrupted
    interrupted --> completed
    interrupted --> failed
    completed --> [*]
    failed --> [*]

For stream resumption, you need server-side offset management.
SSE’s Last-Event-ID header supports this use case by spec — the client sends the last received event ID on reconnect, and the server resumes from there.
WebSocket has no such mechanism, so you’d roll your own sequence numbers at the application layer.

From a cost perspective, LLM response generation is billed per token, so generating a result that never arrives is a real-money problem on top of a UX one.
The original article touches on this too — without tracking completed results via messageId or requestId, resending the same question leads to double billing.
Stripe’s Idempotency Key concept applies here as well.

Platform background APIs can be partially leveraged.
iOS’s BGProcessingTask allows several minutes of background processing, and NSURLSession background transfers let the OS continue downloads even after the app is suspended.
But these don’t work for maintaining stream connections like SSE or WebSocket.
Android’s WorkManager is a constrained async task scheduler that runs during Doze maintenance windows — suited for completion polling.
Not a universal solution, but useful specifically for the “detect generation completion and fetch the result” part.

Push notifications for completion is a simple and robust pattern.
When server-side generation finishes, fire a push via APNs or FCM.
The notification reaches the user even if the app is closed, and tapping it shows the completed message.
Stream display becomes “a bonus visible while connected,” with the source of truth always server-side.
The user experience doesn’t break even if the connection drops.
ChatGPT’s iOS app sometimes sends a push notification when a long response completes. That’s exactly this trade-off in action.

References

I Built a Mobile App in 3 Days. The Hard Part Was Keeping It Connected.