Tools · 6 min read

Backtest Framework for Mean Reversion Strategies

Build and validate mean reversion strategies with a rigorous backtest framework. Test entry/exit logic, z-score bands, and holding periods before risking capital.

Mean reversion strategies fail not because the statistical edge disappears — it’s that traders deploy them without understanding the half-life of the spread. Academic research on equity pairs shows that roughly 60% of cointegrated pairs break down within 12 months of identification. A backtest framework built specifically for mean reversion surfaces that decay before it costs real capital.

The risk profile of mean reversion is structurally different from trend-following. You’re selling strength and buying weakness into an ongoing move, which means drawdown duration is your primary threat — not peak drawdown magnitude. A generic backtester that tracks Sharpe and max drawdown misses the metrics that actually govern whether a mean reversion book survives: time-in-drawdown, half-life of reversion, and hit rate across z-score thresholds.

This page covers the exact components a mean reversion backtest framework needs — from spread construction and z-score normalization to entry/exit logic, holding period optimization, and regime filtering. Each section includes the specific parameters to test and a ready-to-use prompt for running the analysis against historical data.

Spread Construction and Stationarity Testing

The foundation of any mean reversion backtest is a stationary spread. Whether you’re trading a single-asset z-score against its rolling mean or a synthetic spread between two correlated instruments, the spread must pass an Augmented Dickey-Fuller test before any entry logic is applied. Skipping this step is the single most common reason mean reversion backtests look strong in-sample and collapse out-of-sample.

Hedge ratio stability matters as much as the initial cointegration result. Use a rolling Engle-Granger or Johansen procedure with a 60-day recalibration window. A static hedge ratio calculated on 3 years of data and applied forward assumes a structural relationship that equity pairs rarely maintain through earnings cycles, index rebalances, and sector rotation.

For single-asset mean reversion — trading a stock against its own Bollinger Band or VWAP deviation — stationarity is assumed but should still be tested on a rolling basis. Stocks in trending regimes exhibit persistent autocorrelation that breaks the mean reversion assumption entirely.

  • Run ADF test on the spread at p < 0.05 before including any pair in the backtest universe
  • Recalculate hedge ratios on a rolling 60-day window, not statically
  • Track half-life using an Ornstein-Uhlenbeck fit — target half-lives between 5 and 30 days
  • Reject pairs where the half-life exceeds your maximum intended holding period
  • Test stationarity separately on in-sample and out-of-sample windows

Z-Score Band Calibration and Entry Logic

Entry thresholds in mean reversion are not one-size-fits-all. A z-score of ±2.0 is a default, not a rule. The correct threshold depends on the spread’s volatility regime and the half-life. Tighter bands on fast-reverting spreads generate higher trade frequency but lower average profit per trade. Wider bands on slow-reverting spreads reduce frequency and increase holding period risk. Your backtest framework needs to sweep both dimensions simultaneously.

Scaled entry — adding to a position as the z-score widens — looks compelling in backtests but introduces severe tail risk if the spread trends rather than reverts. Model it explicitly: define a maximum position size at each z-score level, cap total exposure, and test what happens in the 5% of cases where the spread hits z = ±4 or beyond. That tail behavior determines whether the strategy is tradeable, not the median outcome.

Entry logic should also condition on spread momentum. Entering at z = ±2.0 when the spread is still moving away from the mean has a materially lower hit rate than entering when momentum has turned. Add a one-day lag confirmation or a short-window RSI on the spread itself as a filter.

You are a quantitative analyst building a mean reversion backtest framework.
For the spread [SPREAD NAME], with a calculated half-life of [X] days:
1. Recommend optimal z-score entry thresholds given this half-life
2. Design a scaled entry schedule with position caps at z = 2.0, 2.5, and 3.0
3. Define a momentum filter on the spread to improve entry timing
4. Specify the exit logic: fixed z-score exit, time-based stop, or both
5. Flag the conditions under which this entry logic should be suspended (regime breaks)

Exit Logic and Holding Period Optimization

Mean reversion exits are more complex than entries. Exiting at z = 0 maximizes theoretical profit per trade but ignores transaction costs and the risk of mean overshoot. In practice, exiting at z = ±0.5 on the way back captures the bulk of the reversion while reducing time-in-trade by 20-30% — a meaningful improvement when you’re running dozens of pairs simultaneously.

Time-based stops are underused in mean reversion frameworks. If a spread entered at z = 2.0 has not reverted to z = 0.5 within 1.5× the estimated half-life, the cointegration relationship may be breaking down. A forced exit at that point limits the damage from regime shifts that a purely price-based exit misses entirely.

Backtest holding period distributions, not just average holding periods. A strategy averaging 8-day holds with a standard deviation of 20 days has a completely different capital allocation profile than one averaging 8 days with a 3-day standard deviation. The former will unexpectedly tie up capital during drawdowns.

  • Set primary exit at z = ±0.5, not z = 0, to reduce slippage and overshoot risk
  • Implement time-based exits at 1.5× the spread’s half-life
  • Add a stop-loss exit at z = ±3.5 to cap tail exposure on trending spreads
  • Backtest the full distribution of holding periods, not just the mean
  • Test profit targets vs. pure z-score exits — partial profit-taking often improves Sharpe

BACKTEST TOOL

Assistly's backtester is built for mean reversion logic — test z-score bands, holding period rules, and regime filters on historical data without writing a single line of code.

Regime Filtering to Protect the Strategy

Mean reversion strategies have known failure modes: trend regimes and correlation breakdowns. A backtest that runs the strategy through 2008, March 2020, and the 2022 rate shock without regime filters will show brutal drawdowns that obscure the strategy’s actual edge in normal market conditions. The framework needs explicit regime detection built into the signal generation layer.

Use a rolling 60-day realized correlation between the two legs of a pair as a regime filter. If correlation drops below 0.6, suspend new entries. For single-asset strategies, a 200-day moving average slope filter on the underlying removes the periods where momentum dominates mean reversion at the price level.

Volatility regime filtering is equally important. Mean reversion strategies entered during VIX spikes above 30 face spreads that widen faster than the position can absorb. Backtest with and without a volatility filter and measure the difference in maximum drawdown duration — that number justifies or rejects the filter cost.

Act as a quantitative risk manager reviewing a mean reversion strategy backtest.
Given a strategy trading [ASSET PAIR or SINGLE ASSET] with entries at z = ±2.0:
1. Identify the top three regime conditions where this strategy historically underperforms
2. Design a rolling correlation filter for pairs strategies with specific threshold and lookback
3. Design a volatility regime filter using VIX or realized vol with entry/suspension rules
4. Quantify the performance impact of applying each filter on a 10-year backtest window
5. Recommend whether each filter should be a hard rule or a position-sizing adjustment

Performance Metrics Specific to Mean Reversion

Standard metrics — annualized return, Sharpe ratio, max drawdown — are insufficient for evaluating mean reversion strategies. A strategy with a 1.4 Sharpe can still be untradeable if 40% of its time is spent in drawdown, which is common for mean reversion books during trending markets. Report time-in-drawdown as a primary metric alongside Sharpe.

Hit rate and profit factor are more informative than average P&L for mean reversion. Target hit rates above 60% — the edge in mean reversion comes from consistency, not large winners. A profit factor below 1.5 signals that losing trades are too large relative to winners, typically caused by insufficient stop-loss discipline on spread trending events.

Capacity and market impact analysis closes the loop. A mean reversion strategy that looks excellent on daily close prices may be impractical on instruments with $2M average daily volume. Build volume-constrained position sizing into the backtest from the start — not as an afterthought when deployment becomes the question.

  • Time-in-drawdown: target below 25% of total trading days
  • Hit rate: target above 60% for z-score-based entries
  • Profit factor: reject strategies below 1.5
  • Average holding period vs. half-life ratio: should be below 1.0
  • Turnover-adjusted Sharpe: account for transaction costs at realistic bid-ask spreads
  • Capacity estimate: maximum position size as a percentage of 20-day average volume

Running the Full Backtest: What to Document

A rigorous mean reversion backtest documents not just results but the decisions made at each stage: the universe construction method, the cointegration testing window, the hedge ratio recalibration frequency, the entry and exit thresholds, and the regime filters applied. Without this audit trail, you cannot identify which parameters drove the result and which changes improve robustness versus overfit.

Walk-forward testing is the minimum standard for mean reversion validation. Split your data into 70% in-sample for parameter selection and 30% out-of-sample for validation. Run the walk-forward across multiple starting points — a single in/out split is insufficient to rule out a lucky out-of-sample window. Monte Carlo simulations on trade ordering add another layer of robustness beyond walk-forward.

Document failure cases explicitly: which pairs failed the stationarity test, which were suspended by the regime filter, and which hit time-based stops rather than z-score exits. The ratio of clean exits to forced exits is a leading indicator of how the strategy will perform in the next drawdown period.

The AI edge for serious traders

Test your mean reversion edge before it costs you real capital.

Run your full backtest — spread construction, z-score thresholds, and regime filters — in one framework. No code required.