Password Rotation Examined Through Probability

Got into an argument at work over whether to build a periodic password rotation feature. The other side says “it’s safer if users can rotate regularly.” I think it’s probabilistically almost useless.

Of course, not changing a leaked password is bad. But that’s “change it because there’s evidence of compromise,” not “change it because 90 days have passed.”

NIST SP 800-63B takes the position that verifiers should not require periodic password changes, but should force a change when there’s evidence of authenticator compromise. Microsoft 365’s admin guidance also recommends setting cloud-only accounts to never expire passwords. The reasoning is practical: the more rules you pile on, the weaker the passwords people choose.

The Window Where Rotation Actually Helps Is Narrow

Periodic password rotation prevents roughly this one scenario.

flowchart TD
    A[Password is leaked] --> B[Not yet detected]
    B --> C[Attacker hasn't used it yet]
    C --> D[Rotation date arrives]
    D --> E[Old password is invalidated]

In other words, rotation only saves you when the attacker has obtained the password but hasn’t logged in yet, and the user happens to change it in time.

In the following scenarios, rotation doesn’t help much.

Scenario	Effect of rotation
Attacker logs in right after phishing	They’re in before rotation day
Password is reused across services	Leaked via another service anyway
Device is infected with malware	New password gets captured too
Session cookie is stolen	Entry via a different vector entirely
Leak is already detected	Immediate change needed, not scheduled

Saying “rotation is pointless” is a bit sloppy. It has value. But the window where it works is narrow: the password has leaked, it hasn’t been used yet, and it hasn’t been detected either.

The probability that rotation invalidates a password before the attacker uses it is a product of three conditions.

P_{\text{save}} = P_{\text{leak}} \times P_{\text{undetected}} \times P_{\text{unused until rotation}}

Suppose the probability of password leakage per 90-day cycle is 1%, the probability the leak goes undetected is 50%, and the probability the attacker doesn’t use it before rotation day is 10%.

0.01 \times 0.5 \times 0.1 = 0.0005

That’s 0.05% per account per 90-day cycle. Double every assumption and you still only get 0.1%. The order of magnitude doesn’t change.

For an organization with 10,000 accounts running 4 rotations per year, the expected number of accounts saved by rotation is:

10{,}000 \times 0.0005 \times 4 = 20

20 per year. Not zero, but considering the helpdesk calls and lockouts that come with password resets, whether it’s worth it organizationally is questionable.

With 90-day rotation, the average time from leak to invalidation is 45 days. With 30-day rotation, it’s 15 days. Looking at these numbers alone, shorter seems stronger.

But if attackers use stolen passwords within minutes to hours, shortening from 45 to 15 days barely makes a difference. It works against “old leaked credential lists that might be used someday” but not against “steal now, use now” attacks.

Google’s research shows that phished credentials are used for login attempts almost immediately. If rotation cycles are measured in 90 or 30 days, they’re 2-3 orders of magnitude off from attacker speed.

User Behavior Degradation Belongs in the Probability Too

What’s often missing from the rotation debate is the probability that forced changes make users weaker.

People don’t come up with entirely new strong passwords every 90 days. A significant fraction will increment a trailing number, insert a season name, recycle old patterns, write it on a sticky note, or reuse the same password on other services.

Research from UNC Chapel Hill (Zhang et al., 2010) found that in an environment with forced periodic changes, 41% of new passwords could be cracked from the previous password within 3 seconds. Most transformation patterns were mechanical: incrementing a trailing number, shifting uppercase positions, swapping symbols. If an attacker knows the rules, the search space shrinks dramatically. In formula terms, the conditional probability of an attacker guessing the new password given the old one is $P(\text{crack} \mid \text{old\_pw}) \approx 0.41$ (offline, within 3 seconds). Even in online attacks, $P \approx 0.17$ within 5 attempts. Even if rotation invalidates the old password, an attacker holding it still breaks through 40% of the time.

Microsoft’s guidance also states that rules requiring password changes tend to weaken password quality. NIST’s shift to “don’t force periodic changes” isn’t just about being nice to users. Forced rotation creates patterns that attackers can predict.

From a probability perspective, the comparison looks like this.

Policy	What increases	What decreases
With periodic rotation	Weak derivative passwords, support tickets, lockouts, reuse	Validity period of undetected leaks
Without periodic rotation	Duration old passwords remain valid	Change fatigue, lazy derivatives, written-down passwords

If you’re going to implement rotation, the benefit of “shortening the validity period of undetected leaks” needs to exceed the cost of “users drifting toward weaker practices.” Implementing it without estimating this tradeoff—just because “it seems safer”—is a risky spec decision.

When Passwords Should Be Changed

Periodic rotation is thin, but the password change feature itself is needed.

What’s needed is event-driven changes, not calendar-driven ones.

Trigger	Response
Service-side breach	Force change for affected users
Match against known-breached password lists	Reject at login or change time
Suspicious login	Additional authentication, notification, change request if needed
User self-report	Invalidate existing sessions and change
Admin account recovery	Reset flow, not a temporary password

The OWASP Authentication Cheat Sheet also calls for credential rotation when breaches or compromises are confirmed, while discouraging periodic change requirements. This is pretty consistent across the board.

Better Expected Value by Investing Beyond Passwords

For the same engineering effort, there are things to do before building a rotation screen.

Allow sufficient password length. Block common and breached passwords. Rate-limit login attempts. Don’t interfere with password managers. Implement MFA (multi-factor authentication). Notify on suspicious logins. Decide how to handle existing sessions on password change.

I’ve previously written about implementing TOTP authentication in your own service. TOTP itself isn’t a phishing-resistant method, but it’s significantly better than passwords alone. Microsoft’s statistics show that over 99.9% of automated attacks against MFA-enabled accounts are blocked. Compared to rotation’s contribution at the 0.05% order, MFA is over 3 orders of magnitude more efficient for the same effort. For even stronger protection, move toward phishing-resistant methods like passkeys or WebAuthn.

This kind of spec isn’t about building an unbreakable system. It’s about where you place attack costs and user burden. As I wrote in an article on voting system design, even identity verification can’t reduce fraud to zero. Authentication is the same: you can only decide what probability of residual risk you’ll accept.

2FA and 2SA (Two-Step Authentication) Are Different Things

When MFA comes up, I often see 2FA (two-factor authentication) and 2SA (two-step authentication) treated as the same thing. In practice, these offer different levels of defense.

2FA means two different types of authentication factors are used. Authentication factors fall into three categories.

Factor	Description	Examples
Knowledge (Something you know)	Information only the user knows	Password, PIN
Possession (Something you have)	A physical item the user has	Smartphone, security key, smart card
Biometric (Something you are)	A physical characteristic of the user	Fingerprint, face, iris

2FA combines two different types from these categories. With password (knowledge) + TOTP app (possession), a leaked password alone isn’t enough. The attacker also needs access to the phone.

2SA, on the other hand, means “there are two authentication steps,” but not necessarily two different factor types. For example, entering a password and then receiving an SMS confirmation code looks like two steps but has a subtlety. If the SMS is intercepted via a SIM swap attack (where the attacker takes over the phone number), the possession factor is weak. NIST SP 800-63B also classifies SMS-based OTP as a “restricted authenticator” with lower priority.

Ranked by realistic strength, it looks like this.

flowchart LR
    A["Password only"] --> B["Password +<br/>SMS OTP"]
    B --> C["Password +<br/>TOTP app"]
    C --> D["Password +<br/>Security key"]
    D --> E["Passkey<br/>(passwordless)"]

Password + SMS OTP is a common 2SA configuration, but as 2FA the SMS possession factor can be compromised through SIM swapping (hijacking the phone number) or SS7 interception (exploiting vulnerabilities in the telephone signaling protocol to intercept SMS). With TOTP apps or security keys, the factors stay independent unless the attacker physically takes the device.

When someone says “add two-step authentication” in a spec discussion, evaluate based on factor independence, not step count. Calling SMS OTP “2FA” and feeling secure is the most dangerous outcome.

Character-Class Requirements Also Erode Entropy

“Must contain uppercase, lowercase, numbers, and symbols” is another common requirement. In theory, more character types expand the search space. In practice, user behavior says otherwise.

Choosing an 8-character password completely at random from 95 printable ASCII characters gives $95^8 \approx 6.6 \times 10^{15}$ combinations, about 52.6 bits of entropy. Adding a constraint requiring all 4 character types doesn’t narrow the theoretical space much if generation is truly random.

But most users forced to include all character types capitalize the first letter of an English word, append 2 digits, and add ! or @ at the end. The effective search space becomes: dictionary word count × digit combinations × symbol choices. With 20,000 common words, 100 two-digit suffixes (00-99), and 33 symbols:

20{,}000 \times 100 \times 33 = 6.6 \times 10^{7}

That’s about 26 bits of entropy. Weaker by over 11 bits than random lowercase 8-character passwords at $26^8 \approx 2.1 \times 10^{11}$ (about 37.6 bits). $2^{11} \approx 2{,}000$ times the difference.

Character-class requirements maintain the theoretical space but bias the distribution humans actually choose, lowering effective entropy. NIST SP 800-63B explicitly states that composition rules should not be imposed.

To reliably increase password strength, raising the minimum length is more effective than requiring character types.

Condition	Combinations	Entropy
4-type forced 8-char (patterned)	$6.6 \times 10^{7}$	~26 bits
Lowercase-only random 8-char	$2.1 \times 10^{11}$	~37.6 bits
Lowercase-only random 12-char	$9.5 \times 10^{16}$	~56.4 bits
Lowercase-only random 16-char	$4.4 \times 10^{22}$	~75.2 bits
95-type random 8-char	$6.6 \times 10^{15}$	~52.6 bits

A patterned 8-character password forced to include all 4 character types is $2^{30} \approx 10^{9}$ times weaker than random lowercase 12 characters. “Include uppercase and symbols” raises attacker search cost far less than “make it 12+ characters.”

GPU Crack Times

How entropy differences play out in real attacks, converted from Hashcat benchmarks. A single RTX 4090 achieves roughly 164 billion hashes/sec for MD5 and about 5,750 hashes/sec for bcrypt (cost=10). Algorithm choice alone changes speed by 7 orders of magnitude.

Full search times for each password type at these speeds:

Password type	Entropy	MD5	bcrypt (cost=10)
4-type forced patterned 8-char	~26 bits	0.0004 sec	~3 hours
Lowercase-only random 8-char	~37.6 bits	~1.3 sec	~1.2 years
Lowercase-only random 12-char	~56.4 bits	~6.7 days	~520,000 years
95-type random 8-char	~52.6 bits	~11 hours	~37,000 years

A user told to “include uppercase, numbers, and symbols” who creates Password1! can be cracked in 3 hours even with bcrypt. A random lowercase 12-character string like vkrmxjqwpbtf takes nearly a week even stored as MD5.

What matters isn’t the number of character types but whether the password follows a pattern attackers can predict. Character-class enforcement maintains the theoretical space but biases the distribution humans choose. The constraint meant to widen the space ends up making the attacker’s job easier.

How I’d Spec This

If it were up to me, I’d cut password specs like this.

Spec	Decision
User-initiated voluntary change	Include
Forced change on breach / suspicious login	Include
Password expiration	Don’t include by default
Admin bulk forced change	Include for incident response
Session invalidation on change	Optional, or forced for high-risk
Blocking breached passwords	Include
Character-class enforcement	Don’t include. Control via minimum length
MFA	Include wherever possible

Not zeroing out “the ability to change passwords,” but stopping “change passwords because the clock says so.” Making this distinction makes the conversation much easier.

If regulations or contracts explicitly mandate periodic rotation, you comply within that scope. But if the spec decision is yours, the probability says to invest in detection, notification, MFA, and breached-password blocking rather than periodic rotation.