Paper vs tablet manga fMRI study: tablet slowed integration-question RT
Contents
This is not a story you should collapse into “paper manga is better for your brain.”
The PLOS One paper behind this ITmedia News article is not about a preference for paper versus digital. It is an experiment that asked whether the medium you read the first half of a story on leaves a trace in the later contextual integration.
The paper is Manga reading on paper vs. digital devices.
A team led by Professor Kuniyoshi L. Sakai of the Graduate School of Arts and Sciences at the University of Tokyo published it as a joint study with Coamix.
The University of Tokyo announcement uses the phrase “energy saving,” but the same announcement also warns that reading a rise in brain activity directly as “good for the brain” is scientifically inaccurate.
Only the tablet condition stretched Set 2 response time
The participants were 25 university and graduate students.
Using a “zapping story” manga that draws the same story from multiple characters’ viewpoints, each episode’s first half was read on a paper book or a tablet before entering the MRI scanner.
In both conditions the second half was read on an LCD goggle display inside the scanner.
Because a tablet cannot be brought into the MRI scanner, the second-half reading environment was held constant by design.
For the paper and tablet first halves, screen size and brightness were matched, and the tablet condition switched pages instantly by touch.
There were two kinds of questions.
Set 1 could be answered from the first half alone.
Set 2 could not be answered without integrating information from the first and second halves.
There was no significant difference in accuracy, while only the condition that read the first half on a tablet showed a longer response time on Set 2.
The flow of the experiment looks like this. Only the first-half medium was manipulated; the second half and the test were held to a common environment in both conditions.
flowchart TD
A[Read the first half<br/>medium manipulated] --> B[Read on the paper book]
A --> C[Read on the tablet]
B --> D[Read the second half on in-scanner goggles<br/>common to both conditions]
C --> D
D --> E[Set 1 questions<br/>solvable from the first half]
D --> F[Set 2 questions<br/>require integrating both halves]
E --> G[No medium difference in accuracy]
F --> H[Only the tablet condition<br/>showed longer response time]
Organizing the conditions and results gives the following table.
| Aspect | Read first half on paper | Read first half on tablet |
|---|---|---|
| Set 1 (solvable from first half) accuracy | No medium difference | No medium difference |
| Set 2 (integrate both halves) accuracy | No medium difference | No medium difference |
| Set 2 response time | No increase from Set 1 | Significantly longer than Set 1 |
| Left language-area activity during second-half reading | Small | Large |
| Right hippocampus / right fusiform gyrus during second-half reading | Absent | Additional activity |
| Left language-area activity while answering | Large | Large |
What the study showed is not “you cannot comprehend with digital.”
There was no medium difference in accuracy.
But on questions that require connecting the first and second halves to understand them, the tablet condition took longer to reach the answer.
Before going further, let me pin down what “response time grew” and “brain activity is larger” mean.
fMRI (functional magnetic resonance imaging) is a method that estimates which parts of the brain are working from changes in blood flow.
When neurons fire actively, the proportion of oxygen-rich blood changes in that location.
This change (the BOLD signal, blood-oxygen-level-dependent) is imaged with MRI, and regions that responded strongly during a task are described as “activated (larger activity).”
So “larger brain activity” is not a measure of how smart you are, but a rough indicator of how much neural resource was mobilized for that processing (and only an indirect indicator mediated by blood flow, not the amount of neural activity itself).
Response time (RT) is the time from when a question appears to when you answer with a button.
Even if accuracy is the same, a longer response time can be interpreted as an extra processing step inserted before reaching the answer.
This time’s “only the tablet condition stretched response time on Set 2” reads as taking a detour for integration even though the answer was correct.
”Energy saving” is about where processing decreased
The fMRI also compares brain activity during second-half reading and during question answering.
While answering questions, the left-hemisphere language area was strongly active in both the paper and tablet conditions.
During the second-half manga reading, however, the tablet condition showed similar activity, while in the paper condition that activity was small.
The University of Tokyo announcement explains this difference as “understanding the first-half content through the paper book saved language-area activity during the second-half reading.”
Furthermore, splitting Set 1 and Set 2, in the paper condition the supplementary right frontal activity on Set 1, answerable from the first half alone, was also small.
This “small activity” is not like a battery-efficient device.
Coamix’s press-conference report also explains that “energy saving” refers to suppressing extra activity, and is distinct from efficiency or convenience.
This experiment does not produce a result that “the depth of thinking” itself changes between paper and digital.
On the e-book side, the takeaways are page position and ease of going back
The research team cites stable spatial and tactile cues as an advantage of paper.
In a paper book, a scene you read earlier tends to stay as “around this page” or “lower right side.”
For reading that follows complex foreshadowing or character viewpoints, these cues help connect information across the front and back.
Pulled toward e-book design, this does not stay merely a matter of resolution or page-turn speed.
When a reader goes back to an earlier scene, can they recall what was where, and at which position?
Do two-page spreads, page numbers, thumbnails, reading history, bookmarks, and the back operation avoid interfering with the memory of the story?
If you draw product requirements directly from this study, that is roughly where they land.
It would also be a leap to tie this directly to evaluating smartphone vertical-scroll manga or webtoons.
What was used in the experiment was a manga-book format with conditions matched between paper and tablet, and the second half was read on LCD goggles.
For vertical-scroll works, scroll distance or continuous display can serve as cues instead of page position.
This connects not to a simple superiority of paper over digital, but to a design question of where to place the cues for re-reading a long story.
The manga used in the experiment and the control of reading conditions
What was used was the first to fourth chapters of volume 1 of “Kuu Neru Futari Sumu Futari” by Kinoko Higurashi, published by Coamix.
It has a zapping structure that draws the same event of a cohabiting couple from the two people’s viewpoints, split into a first and second half, with each chapter’s first half spanning 18–21 pages and the second half 17–22 pages.
The paper book was 24.4 × 18.2 cm as a spread, with illumination matched to 11.7 EV in reflected light (about 2,100 lx at ISO 400 equivalent).
The tablet was a Microsoft Surface Pro X 13-inch (2019 model), with a screen size of 25.5 × 18.2 cm.
It was adjusted to the same 11.7 EV with the backlight, and set so pages switched instantly by touch.
Half the participants read chapters 1 and 4 on paper and chapters 2 and 3 on the tablet, and the other half read the reverse combination, counterbalancing.
First-half reading time averaged 8.1 ± 2.4 seconds per page on paper and 7.5 ± 2.1 seconds on the tablet, with no significant difference (t[24] = 1.9, p = 0.07).
Paper page-turn time was measured separately with 3 people and averaged 0.7 seconds.
Inside the MRI scanner, the second half was shown on a goggle-type display (VisuaStim Digital, resolution 800 × 600) at a fixed pace of 20 seconds per spread.
After each spread, participants rated empathy on a 4-point scale.
The questions totaled 24, with 6 per chapter.
The breakdown is 13 Set 1 questions answerable from the first half alone and 11 Set 2 questions requiring integration of both halves.
Each was 4-choice, scored so the best option was 2 points, the second best 1 point, and the rest 0 points.
Breakdown of the response-time difference
In the two-way ANOVA on accuracy, the main effect of medium was F[1, 24] = 0.01 (p = 0.9), and the main effect of question set was F[1, 24] = 0.7 (p = 0.4); neither was significant.
For response time, the main effect of question set was F[1, 24] = 28, p < 0.0001.
Set 2 is consistently slower than Set 1.
However, the medium (paper/tablet) × question set interaction was not significant (F[1, 24] = 1.6, p = 0.2).
That is, what the ANOVA can state strongly is only the main effect of “slowing down overall on Set 2,” and “there is a difference between paper and tablet” must be read as a post-hoc-level tendency.
On that basis, separating paper and tablet and computing the Set 2 − Set 1 difference (delta-RT), no difference appears for paper at t(24) = 0.2, p = 0.8.
For the tablet it grew significantly at t(24) = 2.9, p = 0.008.
In Tukey’s HSD test, the two significant pairs were tablet Set 2 being longer than tablet Set 1 (q[24] = 4.9, p = 0.01) and tablet Set 2 being longer than paper Set 1 (q[24] = 5.1, p = 0.007).
In the condition that read the first half on paper, the response did not slow even on questions integrating both halves.
But since the interaction is not significant, this is a story of the “within-tablet front-back difference” coming out clearly, and the difference between media is not statistically established.
Three syntactic networks and the difference in activation patterns
The paper uses three syntactic networks defined in Professor Sakai’s prior work as the framework for analysis.
| Network | Brodmann area | Region | Role |
|---|---|---|---|
| I | BA 44/45 | Left inferior frontal gyrus (opercular and triangular parts) | Core syntactic processing |
| II | BA 6/8 | Left lateral premotor cortex (LPMC) | Temporal-sequence processing, coordination with the cerebellum |
| III | BA 45/47 | Left inferior frontal gyrus (triangular and orbital parts) | Integration of meaning and syntax |
In the fMRI contrast [question answering − manga reading], the paper condition showed significant activity in the left LPMC/IFG (regions corresponding to networks I–II).
The tablet condition had no significant region in this contrast.
In the condition that read the first half on paper, activity in the left frontal language area was low during the second-half manga reading and rose only while answering questions.
That gap is detected as a contrast.
This is the neuroscientific substance of the reported “energy saving”: in the paper condition, language-area activity was low during second-half reading, and the gap from answering time came out large.
In the [Set 2 − Set 1] contrast, the right LPMC/IFG appeared in the paper condition.
The supplementary right frontal activity on Set 1 (questions answerable from the first half alone) is low in the paper condition.
In the tablet condition, instead of the right LPMC/IFG, only the right angular gyrus (R. AG) was active.
The angular gyrus is considered a region involved in reconstructing spatial layout, and in the tablet condition processing to reorganize the story’s spatial memory may have been added (the paper also notes an alternative interpretation: a higher syntactic load within syntactic network I).
However, this right angular gyrus activity was detected at an exploratory threshold (uncorrected p < 0.01), and the paper itself notes that further verification is needed.
In the ROI analysis, the main effect of question set in the left LPMC/IFG was F[1, 24] = 4.6, p = 0.046.
Set 2 has larger activity than Set 1.
Re-verified with an independent ROI defined from a prior language-acquisition task, the same direction came out at F[1, 24] = 6.4, p = 0.02.
The right hippocampus and right fusiform gyrus that appeared in the tablet condition
In the reverse contrast [manga reading − question answering], the right hippocampus and right fusiform gyrus were significantly active only in the tablet condition.
They do not appear in the paper condition.
The right hippocampus is involved in encoding episodic memory, and the right fusiform gyrus in visual form processing.
In the tablet condition, brain activity for encoding that was unnecessary with paper was added during manga reading.
On the tablet condition’s Set 2 questions, a positive correlation appeared between right LPMC/IFG activity and accuracy (r = 0.40, p = 0.046).
Participants who worked the right frontal area more tended to do better on the integration questions across the two halves.
The paper condition has no such correlation.
But p = 0.046 is borderline, and one cannot go so far as to say “only participants who did extra processing kept their accuracy.”
And this is a between-participant correlation in the first place; it does not show a causal claim that increasing activity raises performance.
The change in empathy is the same for paper and tablet
The empathy ratings taken after each spread inside the MRI scanner were also analyzed.
Empathy rose significantly as the story progressed (F[1, 24] = 13, p < 0.0001).
There was no significant difference between paper and tablet (F[1, 24] = 3.7, p = 0.07); paper showed a slightly lower tendency, but it was not statistically significant.
There were differences in response time and brain activity, but not in subjective empathy.
The tendency for empathy toward the protagonist to rise toward the second half does not depend on the first-half medium.
The conditions of a 25-person, single-work fMRI experiment
The sample was N = 25 (from 29 recruited, excluding one left-hander and three for clinical issues, wearing corrective wires, and excessive head motion across multiple runs).
Further, at the preprocessing stage, 5 people and 6 runs in total whose body motion exceeded criteria were removed from the fMRI analysis.
The power analysis in the paper reports that, based on a prior 16-trial multiple-choice task, an average power of 99% was obtained at FDR-corrected p < 0.05.
This power is high as a number reported for fMRI metrics, but the manga used was a single work, “Kuu Neru Futari Sumu Futari” (4 chapters).
Whether the same pattern appears for works of different genre or style—for example battle manga or mystery, with a different distribution of information density—requires a separate experiment.
The tablet was a Microsoft Surface Pro X (13-inch, landscape orientation), a condition with screen size and light level matched to paper.
The reading experience differs from a smartphone’s small screen or vertical-scroll webtoons.
The paper also touches on McLuhan’s concept of “light-on (paper that receives light) and light-through (a screen that emits light).”
But this experiment was designed to look at the effect of the instantaneousness of touch switching cutting off spatial cues, and it does not separate the physiological effects of backlight versus reflected light.
The behavioral and fMRI data are publicly available in the OSF repository (osf.io/gxkvs).
The paper’s authors are three members of the Graduate School of Arts and Sciences at the University of Tokyo, with funding from a KAKENHI grant to Keisho Umejima (Early-Career Scientists 24K16045) and from Coamix.
Coamix, which is also the work’s publisher, is a funding source, but it is declared to have no involvement in the study design, data analysis, or paper writing.
At the same university, research on the “drawing side” exists separately
This experiment looked at the cognition of the “reading” side of manga, but the University of Tokyo also has research on the “drawing (generating)” side of manga.
MangaFlow, released in May 2026 by a research group including Hideki Nakayama (University of Tokyo), is an agent-style framework that generates multi-page manga from a story’s text (an arXiv preprint).
The input is the story text corresponding to a scenario, and the output is a manga page complete with paneling, dialogue, and lettering.
Why “directly generating a single image from text” does not become manga
If you make an image-generation model draw a whole page in one shot, the paneling does not come out as specified.
The paper points out that with direct generation, layout becomes an implicit outcome of image synthesis, so the number of panels does not match, panel borders melt together, and the composition the author decided is ignored.
Manga paneling is not visual decoration but a narrative device that creates the flow of the gaze and the pauses; the author designs the page composition before drawing the panels.
But a one-shot generation model mixes layout, characters, and dialogue into a single output, so they cannot be controlled individually.
MangaFlow solves this with a pipeline that separates the steps.
The six-stage pipeline
flowchart TD
A[Story text<br/>page and panel-count specification] --> B[1 Story planning<br/>decompose into scenes]
B --> C[2 Memory construction<br/>link references for characters, scenes, props]
C --> D[3 Layout construction<br/>generate panel frames as structural variables]
D --> E[4 Reference-conditioned panel drawing<br/>generate each panel's art]
E --> F[5 Page composition<br/>place panels per the layout]
F --> G[6 Dialogue and lettering placement<br/>speech balloons, narration]
G --> H[Finished multi-page manga]
The role of each stage is as follows.
| Stage | What it does |
|---|---|
| 1. Story planning | Decompose the input story into connected scenes (story sections) |
| 2. Memory construction | Link visual references for the characters, scenes, and props appearing in each scene, so they can be reused across panels |
| 3. Layout construction | Generate and reuse panel frames as geometric structural variables. Choose from manual specification, template reuse, or automatic generation |
| 4. Panel drawing | Generate each panel’s art conditioned on the memory’s character/scene references and per-panel prompts |
| 5. Page composition | Lay out the drawn panels onto the page per the target layout (placed deterministically) |
| 6. Lettering | Place speech balloons, narration boxes, and thought bubbles while watching the speaker’s position and readability |
The key is to hold the layout not as a byproduct of image generation but as an explicit structure that can be edited and specified afterward, kept independently.
Because panel frames and panel counts are treated deterministically, the user can specify panel count, page count, panel placement, character and scene reference images, art style, and language. The story can be passed as detailed JSON or as a natural-language prompt.
The ones actually drawing the art are FLUX and Gemini
MangaFlow itself is not a “drawing” model but an orchestration layer that calls models per step.
The one that actually produces the panel art is the swappable image-generation model (backbone).
The paper uses two—FLUX.2 9B (Black Forest Labs) and Gemini 2.5 Flash Image (Google, the so-called Nano Banana)—and compares them as MangaFlow-FLUX and MangaFlow-Gemini. The “Gemini / FLUX” in the earlier table refers to this swapping.
What MangaFlow handles is the staging—decomposing the story, paneling, linking references, lettering—while the art itself is drawn by this backbone. So “the art quality depends on the backbone.”
What is worth noting is that the LLM driving the pipeline (the agent that makes the plan and the per-panel prompts) is described in the paper only as “LLM-based,” without specifying which model.
GPT-4.1, which appears in the evaluation, is the judge role that reads the generated manga and scores readability and comprehension; it is not involved in the drawing.
The gap from direct generation
In the paper’s benchmark, adherence to the specification opens up widely from direct generation (direct generation comes in two kinds, the Gemini version and the FLUX version).
| Metric | MangaFlow (Gemini / FLUX) | Direct generation (Gemini / FLUX) |
|---|---|---|
| Panel-count match | 100% (deterministic) | 27.9% / 44.2% |
| Layout match (IoU) | 100% (deterministic) | 42.8% / 41.1% |
| Character identity (CIDS) | 0.643 / 0.619 | 0.562 / 0.591 |
| Balloon placement score | 97.4% | Not evaluable (could not produce readable text) |
| Readability (5-point) | 4.87 / 4.71 | 3.80 / 2.67 |
Because panel counts and panel frames are placed deterministically by separating the steps, they are 100%, and character consistency and lettering readability also exceed direct generation.
Limitations are written down too. The final art quality depends on the underlying image-generation model (backbone), and on complex panels it is hard to place balloons that identify the speaker (manga-style faces make position identification hard).
MangaGen-MetaBench, used for evaluation, is also disclaimed: it is not a real manga dataset annotated by hand for layout, balloons, and speakers, but a meta-benchmark assembled from existing story-visualization data.
The reading-side Sakai Lab and the drawing-side Nakayama Lab are different teams with different themes, and this is not directly connected to the fMRI study at hand. Research that handles manga from both the brain side and the AI side is coming out in parallel from the same university.