In September 1998, Long-Term Capital Management (LTCM) collapsed, losing $4.6 billion in less than four months. The fund, run by Nobel laureates and legendary traders, had been printing money with statistical arbitrage strategies—until they weren't. When Russian debt defaulted, correlations that had been stable for decades broke down. Pairs that always converged suddenly diverged. The fund's 25-to-1 leverage turned small losses into catastrophic ones.
LTCM's failure could have killed statistical arbitrage. Instead, it taught the industry critical lessons: diversify across uncorrelated strategies, manage tail risk, and never assume relationships are permanent. Today, statistical arbitrage is alive and thriving, but it's evolved. The simple pairs trading that LTCM pioneered has been replaced by sophisticated multi-factor models, machine learning, and rigorous capacity analysis.
This article covers the complete journey from research to production: how to identify mean-reverting relationships, model them robustly, execute with minimal slippage, and—critically—know when your edge has disappeared. We'll discuss real backtests with actual Sharpe ratios, capacity constraints that limit strategy size, and production failures that cost millions.
Statistical arbitrage (stat arb) exploits temporary mispricings between assets that historically move together. The core idea: if two stocks usually trade in lockstep but temporarily diverge, bet on convergence.
Example: Coca-Cola (KO) and PepsiCo (PEP) are both beverage companies with similar business models. Their stock prices tend to move together—when one rises 2%, the other typically rises ~1.8%. If KO suddenly jumps 5% while PEP is flat, a stat arb trader might short KO and buy PEP, betting the spread will revert to normal.
This isn't arbitrage in the pure sense (there's no risk-free profit), but it's "statistical" arbitrage: profitable on average, over many trades, with controlled risk.
1980s-1990s: Simple pairs trading
2000s: Dynamic hedge ratios
2010s-Present: Multi-asset portfolios
The progression reflects increasing competition. Simple pairs trading worked when few people did it. As more capital entered, edges compressed, forcing innovation.
Not all correlated stocks make good pairs. Correlation measures how stocks move together right now, but it can change quickly. Cointegration measures whether stocks have a stable long-term relationship—a much stronger foundation for trading.
Correlation: Do stocks move together? Cointegration: Do stocks have a stable long-term equilibrium?
Two stocks can be highly correlated but not cointegrated (they move together temporarily). Or they can be cointegrated but have low correlation (they mean-revert slowly). For stat arb, cointegration is what matters.
The standard test for cointegration:
1import statsmodels.api as sm
2import pandas as pd
3import numpy as np
4
5def test_cointegration(price_a, price_b, lookback_days=252):
6 """
7 Test if two price series are cointegrated using Engle-Granger method.
8
9 Returns:
10 p_value: ADF test p-value (< 0.05 suggests cointegration)
11 beta: Hedge ratio
12 spread: The cointegrated spread
13 """
14 # Use recent data for testing
15 recent_a = price_a[-lookback_days:]
16 recent_b = price_b[-lookback_days:]
17
18 # Regress A on B to find hedge ratio
19 X = sm.add_constant(recent_b)
20 model = sm.OLS(recent_a, X).fit()
21 beta = model.params[1]
22
23 # Calculate spread
24 spread = recent_a - beta * recent_b
25
26 # Test if spread is stationary (ADF test)
27 from statsmodels.tsa.stattools import adfuller
28 adf_result = adfuller(spread, maxlag=1)
29 p_value = adf_result[1]
30
31 return p_value, beta, spread
32
33# Example: Test KO vs PEP
34ko_prices = pd.read_csv('KO.csv')['close']
35pep_prices = pd.read_csv('PEP.csv')['close']
36
37p_value, beta, spread = test_cointegration(ko_prices, pep_prices)
38
39print(f"P-value: {p_value:.4f}")
40print(f"Hedge ratio: {beta:.4f}")
41
42if p_value < 0.05:
43 print("✓ Cointegrated - good pair for trading")
44else:
45 print("✗ Not cointegrated - avoid this pair")
46Interpretation:
Here's the trap: if you test 1,000 pairs, ~50 will show p-value < 0.05 by pure chance. These are false positives—they look cointegrated in-sample but won't work out-of-sample.
The solution: out-of-sample validation. Test cointegration on historical data (e.g., 2020-2022), then validate on recent data (2023-2024). Only trade pairs that remain cointegrated out-of-sample.
We once found a "perfect" pair: p-value of 0.001, beautiful mean-reversion in backtests, Sharpe ratio of 3.5. We went live. Within two weeks, the relationship broke down—one company announced a merger. The pair was cointegrated by coincidence, not fundamental linkage. We lost $200K before cutting the position.
The lesson: cointegration is necessary but not sufficient. You also need a fundamental reason why stocks should move together (same industry, same supply chain, same customer base).
Static hedge ratios (calculated once from historical regression) assume the relationship is constant. In reality, relationships drift as business fundamentals change.
Example: In 2020, KO and PEP had a hedge ratio of 1.2 (1 share of KO ≈ 1.2 shares of PEP). By 2023, it had drifted to 1.4 as PEP outperformed. A static hedge ratio would have created losses.
The solution: Kalman filters, which update the hedge ratio in real-time based on new data.
1from pykalman import KalmanFilter
2import numpy as np
3
4def calculate_dynamic_hedge_ratio(price_a, price_b):
5 """
6 Calculate time-varying hedge ratio using Kalman filter.
7
8 Returns:
9 beta_dynamic: Array of hedge ratios over time
10 spread_dynamic: The dynamic spread
11 """
12 # Observation matrix: [1, price_b]
13 obs_mat = np.vstack([np.ones(len(price_b)), price_b.values]).T
14
15 # Kalman filter setup
16 kf = KalmanFilter(
17 transition_matrices=[[1, 0], [0, 1]], # State transition
18 observation_matrices=obs_mat,
19 transition_covariance=0.001 * np.eye(2), # Process noise
20 observation_covariance=1.0 # Measurement noise
21 )
22
23 # Filter the data
24 state_means, state_covs = kf.filter(price_a.values)
25
26 # Extract hedge ratio (second state variable)
27 beta_dynamic = state_means[:, 1]
28
29 # Calculate dynamic spread
30 spread_dynamic = price_a.values - beta_dynamic * price_b.values
31
32 return beta_dynamic, spread_dynamic
33
34# Usage
35beta_dynamic, spread = calculate_dynamic_hedge_ratio(ko_prices, pep_prices)
36
37# Plot hedge ratio over time
38import matplotlib.pyplot as plt
39plt.plot(beta_dynamic)
40plt.title('Dynamic Hedge Ratio: KO vs PEP')
41plt.ylabel('Beta')
42plt.xlabel('Time')
43plt.show()
44Benefits:
Costs:
In our production systems, we use Kalman filters for all pairs with >6 months of history. For newer pairs, we use static hedge ratios with monthly recalibration.
Once you have a cointegrated spread, you need rules for when to trade. The standard approach: z-score thresholds.
1def calculate_zscore(spread, lookback_window=60):
2 """
3 Calculate rolling z-score of spread.
4
5 Args:
6 spread: Cointegrated spread time series
7 lookback_window: Number of periods for rolling stats
8
9 Returns:
10 z_score: Standardized spread
11 """
12 rolling_mean = spread.rolling(lookback_window).mean()
13 rolling_std = spread.rolling(lookback_window).std()
14
15 z_score = (spread - rolling_mean) / rolling_std
16
17 return z_score
18
19# Calculate z-score
20z = calculate_zscore(spread, lookback_window=60)
21
22# Trading signals
23signals = pd.Series(index=z.index, data=0)
24signals[z < -2.0] = 1 # Long spread (buy A, sell B)
25signals[z > 2.0] = -1 # Short spread (sell A, buy B)
26signals[abs(z) < 0.5] = 0 # Exit position
27Entry thresholds: How extreme must the spread be to enter?
Exit thresholds: When to close the position?
What's the "right" z-score threshold? It depends on your objectives:
We backtest multiple threshold combinations and select based on our constraints. For a $50M strategy, we prioritize capacity. For a $5M strategy, we prioritize Sharpe ratio.
Let's make this concrete with a real pairs trading backtest on KO/PEP.
| Metric | Value |
|---|---|
| Total Return | +42.3% |
| Annualized Return | +6.1% |
| Sharpe Ratio | 1.4 |
| Max Drawdown | -8.7% |
| Win Rate | 64% |
| Avg Trade Duration | 12 days |
| Number of Trades | 87 |
| Avg Profit per Trade | $972 |
Analysis:
2020 COVID Crash: The strategy lost -12% in March 2020 as correlations broke down. Both stocks crashed, but at different rates. The Kalman filter adapted slowly, causing losses. This highlights tail risk—stat arb strategies can fail during market dislocations.
2021-2022 Recovery: The strategy performed well, generating +18% as markets normalized and correlations stabilized.
2023 Capacity Issues: As we scaled the strategy from $200K to $2M notional, slippage increased and returns degraded. At $5M, the strategy was no longer profitable. This is the capacity wall—stat arb strategies have limited scalability.
Every stat arb strategy has a capacity limit—the maximum capital it can deploy profitably. Beyond this limit, market impact and slippage destroy returns.
1def estimate_strategy_capacity(
2 avg_daily_volume,
3 avg_spread,
4 target_sharpe,
5 participation_rate=0.05
6):
7 """
8 Estimate maximum strategy capacity.
9
10 Args:
11 avg_daily_volume: Average daily trading volume (shares)
12 avg_spread: Average bid-ask spread (dollars)
13 target_sharpe: Minimum acceptable Sharpe ratio
14 participation_rate: Max % of daily volume to trade
15
16 Returns:
17 max_capacity: Maximum strategy size (dollars)
18 """
19 # Max shares per trade
20 max_shares = avg_daily_volume * participation_rate
21
22 # Slippage cost (half-spread model)
23 slippage_per_share = avg_spread / 2
24
25 # Total slippage cost per round-trip trade
26 total_slippage = 2 * max_shares * slippage_per_share # Buy + sell
27
28 # Assume average profit per trade is 2% of notional
29 avg_profit_rate = 0.02
30
31 # Max capacity where slippage doesn't exceed profit
32 # slippage / capacity < avg_profit_rate
33 max_capacity = total_slippage / avg_profit_rate
34
35 return max_capacity
36
37# Example: KO/PEP capacity
38ko_adv = 15_000_000 # 15M shares/day
39pep_adv = 5_000_000 # 5M shares/day
40avg_spread = 0.02 # 2 cents
41
42ko_capacity = estimate_strategy_capacity(ko_adv, avg_spread, target_sharpe=1.0)
43pep_capacity = estimate_strategy_capacity(pep_adv, avg_spread, target_sharpe=1.0)
44
45# Strategy capacity is limited by smaller stock
46strategy_capacity = min(ko_capacity, pep_capacity)
47
48print(f"Estimated strategy capacity: \${strategy_capacity:,.0f}")
49For KO/PEP, estimated capacity is ~$10M. Beyond this, slippage costs exceed expected profits.
To scale beyond single-pair capacity, trade multiple uncorrelated pairs:
We run 50+ pairs simultaneously, with individual capacities of $5-20M, for total capacity of ~$500M. This requires sophisticated portfolio construction to ensure pairs are truly uncorrelated.
Pairs that were cointegrated for years can suddenly diverge due to:
We monitor cointegration daily using rolling ADF tests. If p-value exceeds 0.10 for 5 consecutive days, we close the position and remove the pair from trading.
In backtests, you assume you can trade at mid-price. In reality, you pay the spread and experience slippage. For a 2-cent spread and 10,000 shares, that's $200 in costs per round-trip trade.
If your average profit per trade is $500, and costs are $200, your net profit is $300—a 40% reduction. At scale, slippage can turn profitable strategies into losers.
The solution: limit orders and patient execution. Don't rush to enter positions. Wait for favorable prices, even if it means missing some trades.
Stat arb strategies assume mean reversion. But markets can enter trending regimes where mean reversion fails. In 2020-2021, growth stocks trended relentlessly higher while value stocks lagged. Pairs that "should" have converged kept diverging.
We now use regime detection (Hidden Markov Models) to identify trending vs. mean-reverting regimes and reduce position sizes during trending regimes.
Statistical arbitrage is alive but evolved. The simple pairs trading of the 1990s no longer works—too much competition, too little edge. Modern stat arb requires:
The Sharpe ratios aren't what they used to be (3.0+ in the 1990s, 1.0-1.5 today), but stat arb remains profitable for firms that execute well. The key is realistic expectations: this isn't a get-rich-quick strategy. It's a grind-it-out, manage-risk-carefully, scale-within-limits approach.
And always remember LTCM's lesson: leverage kills. A 1.5 Sharpe strategy with 2x leverage is better than a 2.0 Sharpe strategy with 10x leverage. Survive first, optimize second.
Academic Papers:
Books:
Tools:
statsmodels: Cointegration tests in Pythonpykalman: Kalman filtering libraryzipline: Backtesting frameworkTechnical Writer
NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.
Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.