Gradience: a tool that audits whether your LoRA rank is actually necessary

What Is Gradience?

Gradience is a tool that performs spectral analysis, specifically singular value decomposition, on adapter weight matrices after LoRA fine-tuning to measure how much of the allocated rank is actually being used. It is licensed under Apache 2.0.

Even if you train LoRA with rank=64, the decomposed weight matrix after training may show that only 15 dimensions are effectively being used. The remaining 49 dimensions may just be memorizing noise. Gradience makes that visible numerically.

What Does It Measure?

Gradience does not measure the loss curve during training or the training speed. Instead, it analyzes the trained adapter weight matrices after the fact. It mainly reports three metrics.

Stable Rank

This is the matrix’s effective dimensionality. It estimates how many dimensions are meaningfully used based on the energy distribution of the singular values.

stable_rank(M) = ||M||^2_F / ||M||^2_2 = (sum sigma_i^2) / sigma_1^2

Energy Rank (k@90%)

This is the number of singular values needed to cover 90% of the matrix energy. In practice, it becomes a compression target: keep this many dimensions and you preserve about 90% of the information.

Utilization

utilization = stable_rank / allocated_rank

This shows actual usage relative to the rank you allocated. If the number is 0.3, then roughly 70% of the rank is going unused and is a candidate for compression.

Experimental Results on Mistral-7B + GSM8K

The original article fine-tunes Mistral-7B on GSM8K, a math problem benchmark.

It first runs a probe training with rank=64 for 1,200 steps, audits that model with Gradience, then retrains and evaluates several compressed variants across three seeds after receiving the suggestion that rank=32 should be enough.

Variant	Mean accuracy	Compression
probe (r=64)	28.7%	-
uniform_median (r=32)	28.8%	50%
uniform_p90 (r=32)	33.7%	50%
per_layer	31.8%	~3%

uniform_p90 at rank 32 outperformed the original rank-64 probe by about five points on average. The conclusion is that reducing rank acted as a form of regularization and helped prevent noise fitting.

How the suggestions work

The audit produces two main kinds of rank recommendations:

Median suggestion: the median stable rank across all layers
P90 suggestion: the 90th percentile, leaving some margin to cover most layers

In the experiment, the median suggestion was too aggressive, while the p90 suggestion was more stable.

Usage

pip install gradience
gradience audit --peft-dir ./your-adapter

It can also be integrated into Hugging Face Trainer as a callback.

from gradience import GradienceCallback

trainer = Trainer(
    model=model,
    args=training_args,
    callbacks=[GradienceCallback()]
)

Can It Be Used for Image LoRA?

This is the part I was most curious about, but right now the answer is unclear.

Gradience has been tested on text LLMs, specifically Mistral with a math task. There is no validation yet for image LoRA workflows such as Stable Diffusion or FLUX. The tool also assumes the Hugging Face PEFT format, so it is not clear whether it can be used directly on safetensors adapters produced by tools like kohya_ss.

That said, singular value decomposition of LoRA weight matrices is not tied to a specific modality, so the idea itself should transfer. It would be interesting if image-LoRA workflows could reveal things like “we trained with rank 128, but rank 32 would have been enough.”

The problem of image LoRA failing to generate outputs that resemble the training data has many factors besides rank, including learning rate, step count, caption quality, and whether regularization images are used. Gradience is not a silver bullet. Still, it looks like a useful way to eliminate one uncertainty in the iteration loop: whether rank is actually sufficient.