Auto-assigning piano fingerings from a score based on hand size via physics simulation

The original post is I Taught a Browser to Play Piano — Here’s How It Figures Out Which Finger Goes Where. The title makes it sound like a browser toy that plays piano.
It isn’t.
The tool reads a MusicXML score, evaluates a physical hand model against the notes, and returns the optimal fingering — a surprisingly serious piece of work.

Piano scores often don’t come with fingering marks.
Even when they do, the optimal fingering for a child’s small hand differs from an adult’s, and “the textbook fingering is impossible for my hand” is a daily problem for piano learners.
A teacher would rewrite it on the spot. Self-learners get stuck.

The demo is at audiotool.monkeymore.app/en/piano-finger, runnable directly in a browser.
Drop a MusicXML file and the fingering-annotated animation starts within seconds.
Nothing is sent to a server.

What is actually being solved

Piano fingering can be formulated as a combinatorial optimization: assign a finger number (1–5) to each note.
There are three evaluation axes.

Physical hand constraints: max span between adjacent fingers, cost of pressing black keys with the thumb, viability of crossings and overlaps
Hand size: a child’s hand and an adult’s hand physically reach different ranges in the first place
Per-finger strength: ring finger and pinky move less independently and tend to collapse on strong beats

A naive dynamic programming approach that accumulates minimum cost note by note looks like it should work.
In practice, whether you assign the thumb to the current note breaks the viable choices several bars ahead, so local optima don’t stack into global optima.

Concrete example: on a C major ascending scale, assign 1-2-3 for “C-D-E” then continue 1-2-3-4-5 for “F-G-A-B-C”. You run out of hand the moment the pinky hits the top C.
The correct move is 1-2-3 for “C-D-E”, then tuck the thumb under for “F” to return to 1.
Locally, putting the thumb on the note after “E” looks expensive because the thumb is far from the current position. Look a few notes ahead and the thumb-tuck is actually the cheaper branch.

With DP that only tracks “which finger is current” as state, behaviors that only pay off several notes later can’t be evaluated correctly.
You could add “whole hand position” to the state and DP would handle it, but the state space explodes.
The 9-note-lookahead backtracking is the pragmatic middle ground for this trade-off.

The fingering algorithm as a physics simulation

The tool uses “depth-limited backtracking search + 9-note sliding window.”
It enumerates all finger assignments for the next 9 notes, picks the lowest-cost one, and slides the window forward by one note.

flowchart TD
  A[MusicXML input] --> B[Normalize to INote sequence<br/>pitch, time, cm position]
  B --> C[Sliding window<br/>next 9 notes]
  C --> D[Enumerate finger assignments<br/>5^9 candidates]
  D --> E[Drop impossible configurations<br/>pruning]
  E --> F[Evaluate with cost function<br/>avg finger velocity + corrections]
  F --> G[Commit the best single note]
  G --> H{End of score?}
  H -- No --> C
  H -- Yes --> I[Output finger sequence]

The core is this backtracking loop (simplified from the JavaScript source):

const backtrack = (level) => {
  if (level === depth) {
    const velocity = aveVelocity(candidate, nseq);
    if (velocity < minvel) {
      bestFingering = [...candidate];
      minvel = velocity;
    }
    return;
  }
  for (const finger of fingers) {
    if (level > 0 && skip(candidate[level - 1], finger, ...)) continue;
    candidate[level] = finger;
    backtrack(level + 1);
  }
};

Typical configurations the pruning (skip function) drops:

Same finger on consecutive different notes
Invalid finger crossing (e.g., placing finger 4 to the left of finger 3 on the right hand)
Physically unreachable stretches (exceeding hand size)
Thumb on a black key during an ascending passage (makes the thumb-tuck difficult)

With pruning engaged, enumerating all 5^9 ≒ 1.95M combinations finishes in milliseconds.
The choice of 9 notes is well calibrated: 5^7 ≒ 78k, 5^9 ≒ 1.95M, 5^11 ≒ 48.8M — the curve grows exponentially, so extending lookahead to 10 notes or more yields diminishing practical benefit.
9 notes is the practical lower bound for covering chords and short local phrase movements.

What the cost function actually looks like

The article’s keyword is “average finger velocity,” but the implementation is slightly more nuanced.
At its core is per-note movement speed, with finger strength and black-key comfort layered on as correction terms.

function aveVelocity(fingering, notes) {
  let vmean = 0.0;
  for (let i = 1; i < depth; i++) {
    const dx = Math.abs(notes[i].x - fingerPos);
    const dt = Math.abs(notes[i].time - notes[i - 1].time) + 0.1;
    let v = dx / dt;
    const weight = this.weights[fingering[i]] ?? 1.0;
    if (notes[i].isBlack) {
      v /= weight * this.bfactor[fingering[i]];
    } else {
      v /= weight;
    }
    vmean += v;
  }
  return vmean / (depth - 1);
}

The correction tables have these concrete values.

Finger	Name	weights (strength)	bfactor (black-key comfort)
1	Thumb	1.1	0.3
2	Index	1.0	1.0
3	Middle	1.1	1.1
4	Ring	0.9	0.8
5	Pinky	0.8	0.7

Since it’s v /= weight, larger values drive down velocity cost (= easier to use).
The thumb and middle finger score 1.1 as the easiest, the pinky 0.8 as the hardest.
Experientially the thumb and middle finger do move fast independently, so the numbers are reasonable.

The interesting part is bfactor: the value drops to 0.3 only when the thumb is pressing a black key.
The thumb is shorter than the other fingers, and black keys sit further back than white keys, so using the thumb on a black key requires pushing the whole hand forward, breaking posture.
0.3 quantifies this awkwardness — effectively “a thumb on a black key costs 3x or more.”

What’s worth noticing about this cost function is that these parameters are not learned, they’re hand-tuned constants.
A lookup table numericalizing a piano teacher’s heuristics, nothing like RL estimating coefficients from training data.
The simplicity of “end user picks a hand size, tool just runs” is exactly what this training-free design enables.

Simultaneous notes (chords) collide on the time axis so dt becomes zero.
The +0.1 offset prevents division by zero, and chords get an artificial 50ms offset to spread them out along the time axis.
Chord fingerings are evaluated as the decomposed monophonic sequence.

The hand-size axis

Existing fingering tools — notably PianoPlayer, the Python predecessor that this JavaScript tool is a port of — also incorporate hand size.
A Hand class holds XXS–XXL presets, and a scale factor adjusts the following parameters in lockstep.

this.frest = [null, -7.0, -2.8, 0.0, 2.8, 5.6];  // relaxed fingertip positions (cm) for fingers 1-5
this.weights = [null, 1.1, 1.0, 1.1, 0.9, 0.8];
this.bfactor = [null, 0.3, 1.0, 1.1, 0.8, 0.7];
this.hf = Hand.size_factor(size);  // multiplier per XXS-XXL
this.max_span_cm = 21.0 * this.hf;       // max span from thumb to pinky
this.max_follow_lag_cm = 2.5 * this.hf;  // tolerated distance from target position
this.min_finger_gap_cm = 0.15 * this.hf; // minimum gap between adjacent fingers

frest gives the cm coordinate of each fingertip at rest, with the middle finger at origin (0.0), thumb 7cm to the left, pinky 5.6cm to the right.
This is the numerical model of “the hand at rest.”
At the M-size baseline, max_span_cm = 21.0 is the max thumb-to-pinky distance, a range comfortably covering a piano octave (about 16.5cm).

Changing hand size rescales hf and all the parameters above in lockstep.
XXS gives hf ≒ 0.33, max span 7cm (a toddler’s hand). XXL gives hf ≒ 1.2, max span 25cm — close to the range attributed to Rachmaninoff reaching a 13th (≒26cm) with one hand.
Set the size to XXS and physically unreachable configurations all get pruned.

For learners, this means measuring your own hand span, picking whichever of XXS/S/M fits, and possibly getting a realistic fingering different from the textbook’s.
The left hand uses the same algorithm with mirrored coordinates — write the right-hand logic and the left hand comes for free.

Where this fits in the academic lineage

Automatic piano fingering generation is a long-running topic in music information processing, with three main schools.

1. Rule-based + dynamic programming (1990s–) The origin is Parncutt 1997 (“An ergonomic model of keyboard fingering for melodic fragments”). 12 rules (max span between finger pairs, thumb-tuck conditions, black-key handling, etc.) are mapped to costs and minimized via dynamic programming.
Limited to right-hand monophonic lines and short fragments, but it remains the baseline for later research.
It numericalizes the historical conventions found in Couperin and Bach’s fingering notation.

2. Metaheuristics (2010s) Herremans 2015 applies tabu search and variable neighborhood search aimed at polyphonic scores.
For polyphony, DP’s state space explodes, so local search approximates the solution.
A lineage that extended what’s computable in a different direction from DP.

3. Reinforcement learning (2020s) Since Ramoneda 2021 (“Piano Fingering with Reinforcement Learning”), research framing fingering as a Markov decision process and solving it via RL has proliferated.
State is “current hand position,” action is “which finger for the next note,” reward is “inverse of finger travel distance,” and so on.
The 2023 model-based RL paper replaces the Q-table with a hash table and uses prioritized sweeping.
Lately this reaches research on having robot hands physically play piano (RP1M dataset etc.).

Where this tool sits It’s a JavaScript port of PianoPlayer (Python, in development since the late 2010s), so in lineage it belongs to branch 1.
The difference is that it chose depth-limited backtracking rather than dynamic programming — a compromise that avoids state explosion while also dodging the local-optimum trap.
9-note lookahead is a middle-ground decision that matches neither Parncutt’s melodic-fragments assumption nor Herremans’ full-score polyphonic search.

Because it uses no learning, results come out the instant MusicXML is dropped in, and browser-only operation becomes feasible.
RL-based approaches need a pre-trained model, and model sizes are too large for convenient browser loading.
This is why the classical approach fits a web tool.

Applicability to other instruments

The same framework — “assign fingers to a note sequence under physical constraints” — isn’t specific to piano.

Guitar Automatic guitar fingering has a long research history, with DP as the mainstream.
Implementations like guitar_dp exist.
State is a tuple of “string number + fret number + finger number + hand position (index finger position),” giving higher dimensionality than piano (string × fret is 2D plus 4 fingers).
Cost function design resembles piano’s: travel distance + finger independence + fret-jump penalty.
Methods like Path Difference Learning even use gradient descent on real tablature to learn cost function coefficients.

Other keyboard instruments Organ adds foot pedals, making it a 4D fingering problem across both hands and both feet.
Accordion has different physical models between left-hand bass buttons and right-hand keyboard.
All extensible from the piano framework, but implementations are sparse.

String instruments (violin, etc.) Violin fingering research exists but has to couple with bowing, becoming a more complex multi-objective optimization than piano.

So the algorithmic skeleton of this tool (biomechanical cost + pruning + windowed search) isn’t instrument-dependent. Swap the physical constraints and the same framework transfers to other instruments.
Better to view it less as a refined piano-specific solution and more as one sample of a broader “performance motion optimization” framework.

Why browser-only matters

The author lists three advantages to being browser-only.

Privacy: the file is never sent to a server
No latency: no network round-trip, animation starts instantly
Zero setup: no install, just open a URL

Piano scores often include copyright-ambiguous data: scans of commercial method books, your own unpublished compositions, arrangements of copyrighted works.
For a server-upload tool, you just don’t want to use it, legal issues aside.
This tool being browser-only isn’t so much for convenience as a near-necessity for the target audience.

The tech stack is straightforward.
MusicXML parser (musicxml_io.js), React + Canvas 2D (with DPR-aware high-DPI rendering), Web Audio API sine-wave synthesis, requestAnimationFrame for the animation loop.
The core is a JavaScript port of the PianoPlayer engine.
No WebAssembly, no WebWorker.
Which is a benchmark-style conclusion: JavaScript’s raw speed handles the 9-note combinatorial search in under a second.
For anyone trying to build similar tooling, the confirmation that JavaScript’s cost model is adequate for this kind of optimization workload is quietly important information.

Who this helps, and what to check before trying it

This tool helps:

Self-learners: people stuck because the textbook fingering doesn’t fit their hand
Piano teachers: who need to quickly show hand-size-appropriate fingerings to students
Arrangers: who find annotating their own arrangements with fingerings tedious
People deciding fingerings at the reading stage: who want fingerings visualized before sight-reading practice

Three things to check when trying it:

How hand-span selection changes the output: run the same score with XXS and XL, compare the diff
Chord handling: simultaneous notes are processed with 50ms offset, so if chords look off, suspect the settings
Left-hand mode: implemented via coordinate mirroring, so for pieces requiring asymmetric configurations, compare outputs

There are limits too.
Backtracking only sees local optima inside the window, so it doesn’t make global judgments like “for this piece, this fingering pattern ties the whole thing together” across tens of bars.
Probably more accurate to treat it as a different kind of tool from the phrasing-aware fingering decisions a skilled human pianist makes at the reading stage.

I actually dropped in the MusicXML of Bach’s Invention No. 1 (BWV 772) and ran it.
On playback, the pressed finger darkens on the keyboard, and the pitch name (A4, C5, etc.) of the pressed note appears at the fingertip.
The finger number (1–5) isn’t displayed directly — you have to track which finger is moving to read it.
The information is there, but it doesn’t register at a glance, and the screen reminded me of KEYBOARDMANIA in old arcades.

Which reminds me: no matter how much you grind GuitarFreaks or DrumMania, you don’t become able to play a real guitar or real drum kit. Electone players struggle with KEYBOARDMANIA too.
The skill of processing falling notes in a rhythm game and the fingering skill for a real instrument are separate abilities.
This tool sits somewhere similar. Watching the animation and being able to play with your own fingers are different skills.
What it gives you is an analysis — “for this score and this hand, this fingering is feasible” — not the motor memory, which you have to drill in separately.

It’s not a tool that’s fun to play with.
But confirmation that 9-note backtracking runs smoothly on a laptop is, for anyone looking to implement this algorithm themselves, a quietly useful piece of information.