Tested Q_rsqrt on Apple M4 (Mac mini) and Zen 3 (Ryzen 5800HS / WSL2). M4's -O2 already rewrites 1/sqrtf to frsqrte and ties Q_rsqrt; x86 clang needs -ffast-math or hits a 12x gap. Hand-written NEON/SSE wrappers turn out slower. Newton 0/1/2 error and the Lomont constant covered too.