Pairs Trading: Statistical Arbitrage at Scale

Pairs trading is one of the most reliable market-neutral strategies. After running pairs trading systems managing $200M+ across equities, futures, and crypto, I've learned that success comes from robust pair selection, dynamic hedging, and careful risk management. This article covers the full production implementation.

Theory: Why Pairs Trading Works #

Pairs trading exploits temporary deviations from statistical relationships between correlated assets. Key concepts:

Cointegration: Two assets share a long-term equilibrium relationship
Mean Reversion: Deviations from equilibrium tend to revert
Market Neutrality: Long one asset, short another = hedged against market moves
Statistical Edge: Profit from temporary mispricings

Not all correlated pairs work—correlation ≠ cointegration. Need stable long-term relationship.

Pair Selection: Finding Cointegrated Pairs #

Engle-Granger Test #

python

1import numpy as np
2import pandas as pd
3from statsmodels.tsa.stattools import coint, adfuller
4from scipy import stats
5import warnings
6warnings.filterwarnings('ignore')
7
8class PairSelector:
9    def __init__(self, min_correlation=0.7, max_pvalue=0.05):
10        self.min_correlation = min_correlation
11        self.max_pvalue = max_pvalue
12        
13    def find_cointegrated_pairs(self, price_data: pd.DataFrame) -> list:
14        """
15        Find cointegrated pairs from universe of assets.
16        
17        Args:
18            price_data: DataFrame with columns = symbols, index = dates
19            
20        Returns:
21            List of (symbol1, symbol2, hedge_ratio, pvalue, half_life) tuples
22        """
23        n = price_data.shape[1]
24        symbols = price_data.columns
25        pairs = []
26        
27        # Test all combinations
28        for i in range(n):
29            for j in range(i+1, n):
30                s1 = symbols[i]
31                s2 = symbols[j]
32                
33                # Skip if insufficient data
34                data1 = price_data[s1].dropna()
35                data2 = price_data[s2].dropna()
36                
37                if len(data1) < 252 or len(data2) < 252:
38                    continue
39                
40                # Align data
41                common_dates = data1.index.intersection(data2.index)
42                p1 = data1.loc[common_dates]
43                p2 = data2.loc[common_dates]
44                
45                # Check correlation first (fast filter)
46                corr = p1.corr(p2)
47                if abs(corr) < self.min_correlation:
48                    continue
49                
50                # Test cointegration (both directions)
51                score1, pvalue1, _ = coint(p1, p2)
52                score2, pvalue2, _ = coint(p2, p1)
53                
54                # Use the better direction
55                if pvalue1 < pvalue2:
56                    pvalue = pvalue1
57                    # Calculate hedge ratio using OLS
58                    hedge_ratio = np.polyfit(p2, p1, 1)[0]
59                    spread = p1 - hedge_ratio * p2
60                    primary = s1
61                    secondary = s2
62                else:
63                    pvalue = pvalue2
64                    hedge_ratio = np.polyfit(p1, p2, 1)[0]
65                    spread = p2 - hedge_ratio * p1
66                    primary = s2
67                    secondary = s1
68                
69                if pvalue < self.max_pvalue:
70                    # Calculate half-life of mean reversion
71                    half_life = self._calculate_half_life(spread)
72                    
73                    pairs.append({
74                        'primary': primary,
75                        'secondary': secondary,
76                        'hedge_ratio': hedge_ratio,
77                        'pvalue': pvalue,
78                        'half_life': half_life,
79                        'correlation': corr,
80                        'spread_std': spread.std(),
81                    })
82        
83        return sorted(pairs, key=lambda x: x['pvalue'])
84    
85    def _calculate_half_life(self, spread: pd.Series) -> float:
86        """
87        Calculate mean reversion half-life using AR(1) model.
88        """
89        spread_lag = spread.shift(1)
90        spread_diff = spread - spread_lag
91        
92        # Remove NaN
93        valid = ~(spread_lag.isna() | spread_diff.isna())
94        spread_lag = spread_lag[valid]
95        spread_diff = spread_diff[valid]
96        
97        # Fit AR(1): spread_diff = λ * spread_lag + ε
98        # Half-life = -log(2) / log(1 + λ)
99        slope, _, _, _, _ = stats.linregress(spread_lag, spread_diff)
100        
101        if slope >= 0:
102            return np.inf  # Not mean-reverting
103        
104        half_life = -np.log(2) / np.log(1 + slope)
105        return half_life
106
107# Example usage
108if __name__ == "__main__":
109    # Load price data
110    prices = pd.read_csv('prices.csv', index_col='date', parse_dates=True)
111    
112    selector = PairSelector(min_correlation=0.7, max_pvalue=0.05)
113    pairs = selector.find_cointegrated_pairs(prices)
114    
115    print(f"Found {len(pairs)} cointegrated pairs")
116    for pair in pairs[:10]:
117        print(f"{pair['primary']}/{pair['secondary']}: "
118              f"p-value={pair['pvalue']:.4f}, "
119              f"half-life={pair['half_life']:.1f} days, "
120              f"hedge={pair['hedge_ratio']:.3f}")
121

Kalman Filter for Dynamic Hedge Ratios #

Static hedge ratios drift over time. Use Kalman filter for adaptation:

python

1import numpy as np
2from scipy.linalg import inv
3
4class KalmanPairTrader:
5    def __init__(self, delta=1e-4, Ve=1e-3, R=None):
6        """
7        Kalman filter for dynamic hedge ratio estimation.
8        
9        Args:
10            delta: Process variance (how much hedge ratio can change)
11            Ve: Observation variance (measurement noise)
12            R: Initial covariance (uncertainty in hedge ratio)
13        """
14        self.delta = delta
15        self.Ve = Ve
16        self.R = R if R is not None else np.eye(2)
17        
18        # State: [hedge_ratio, intercept]
19        self.beta = np.zeros(2)
20        
21    def update(self, y: float, x: float) -> tuple:
22        """
23        Update hedge ratio with new observation.
24        
25        Args:
26            y: Primary asset price
27            x: Secondary asset price
28            
29        Returns:
30            (hedge_ratio, intercept, spread, spread_std)
31        """
32        # Observation vector
33        F = np.array([x, 1.0]).reshape((2, 1))
34        
35        # Prediction step
36        self.R = self.R + self.delta * np.eye(2)
37        
38        # Update step
39        yhat = F.T @ self.beta  # Predicted y
40        error = y - yhat[0, 0]  # Prediction error
41        
42        Q = F.T @ self.R @ F + self.Ve  # Error variance
43        K = self.R @ F / Q[0, 0]  # Kalman gain
44        
45        self.beta = self.beta + K.flatten() * error
46        self.R = self.R - K @ F.T @ self.R
47        
48        # Calculate current spread and uncertainty
49        spread = error
50        spread_std = np.sqrt(Q[0, 0])
51        
52        return self.beta[0], self.beta[1], spread, spread_std
53    
54    def get_hedge_ratio(self) -> float:
55        return self.beta[0]
56    
57    def get_spread_prediction(self, y: float, x: float) -> tuple:
58        """Get predicted spread and z-score."""
59        predicted_y = self.beta[0] * x + self.beta[1]
60        spread = y - predicted_y
61        
62        # Estimate spread std from observation variance
63        spread_std = np.sqrt(self.Ve)
64        z_score = spread / spread_std if spread_std > 0 else 0
65        
66        return spread, z_score
67
68# Example usage
69kf = KalmanPairTrader(delta=1e-4, Ve=1e-3)
70
71spreads = []
72hedge_ratios = []
73
74for idx, row in price_data.iterrows():
75    y = row['AAPL']
76    x = row['MSFT']
77    
78    hedge_ratio, intercept, spread, spread_std = kf.update(y, x)
79    hedge_ratios.append(hedge_ratio)
80    spreads.append(spread)
81

Trading Strategy Implementation #

Signal Generation #

python

1class PairsTradingStrategy:
2    def __init__(self, 
3                 entry_threshold=2.0,
4                 exit_threshold=0.5,
5                 stop_loss=4.0,
6                 lookback=20):
7        """
8        Pairs trading strategy with z-score signals.
9        
10        Args:
11            entry_threshold: Z-score to enter position
12            exit_threshold: Z-score to exit position
13            stop_loss: Z-score to stop out
14            lookback: Window for spread statistics
15        """
16        self.entry_threshold = entry_threshold
17        self.exit_threshold = exit_threshold
18        self.stop_loss = stop_loss
19        self.lookback = lookback
20        
21        self.spread_history = []
22        self.position = 0  # -1 = short spread, 0 = flat, 1 = long spread
23        
24    def update(self, spread: float) -> dict:
25        """
26        Update strategy with new spread value.
27        
28        Returns:
29            Dictionary with signal and position info
30        """
31        self.spread_history.append(spread)
32        
33        if len(self.spread_history) < self.lookback:
34            return {'signal': 'WAIT', 'position': 0, 'z_score': 0}
35        
36        # Keep only lookback window
37        if len(self.spread_history) > self.lookback:
38            self.spread_history.pop(0)
39        
40        # Calculate z-score
41        mean = np.mean(self.spread_history)
42        std = np.std(self.spread_history)
43        
44        if std < 1e-8:
45            return {'signal': 'WAIT', 'position': self.position, 'z_score': 0}
46        
47        z_score = (spread - mean) / std
48        
49        signal = 'HOLD'
50        
51        # Check for stop loss
52        if abs(z_score) > self.stop_loss:
53            if self.position != 0:
54                signal = 'CLOSE'
55                self.position = 0
56        
57        # Entry signals
58        elif self.position == 0:
59            if z_score > self.entry_threshold:
60                signal = 'SHORT'  # Spread too high, short it
61                self.position = -1
62            elif z_score < -self.entry_threshold:
63                signal = 'LONG'  # Spread too low, long it
64                self.position = 1
65        
66        # Exit signals
67        elif self.position != 0:
68            if abs(z_score) < self.exit_threshold:
69                signal = 'CLOSE'
70                self.position = 0
71            # Or if position is against current spread direction
72            elif (self.position == 1 and z_score > 0) or \
73                 (self.position == -1 and z_score < 0):
74                signal = 'CLOSE'
75                self.position = 0
76        
77        return {
78            'signal': signal,
79            'position': self.position,
80            'z_score': z_score,
81            'spread': spread,
82            'spread_mean': mean,
83            'spread_std': std
84        }
85
86# Example trading loop
87strategy = PairsTradingStrategy(
88    entry_threshold=2.0,
89    exit_threshold=0.5,
90    stop_loss=4.0,
91    lookback=20
92)
93
94kf = KalmanPairTrader()
95
96for idx, row in price_data.iterrows():
97    # Update hedge ratio
98    y = row['AAPL']
99    x = row['MSFT']
100    hedge_ratio, _, spread, _ = kf.update(y, x)
101    
102    # Get trading signal
103    result = strategy.update(spread)
104    
105    if result['signal'] == 'LONG':
106        print(f"{idx}: LONG spread - buy AAPL, sell {hedge_ratio:.3f} MSFT")
107    elif result['signal'] == 'SHORT':
108        print(f"{idx}: SHORT spread - sell AAPL, buy {hedge_ratio:.3f} MSFT")
109    elif result['signal'] == 'CLOSE':
110        print(f"{idx}: CLOSE position")
111

Production Implementation in Rust #

For high-frequency pairs trading, Rust provides better performance:

rust

1use std::collections::VecDeque;
2
3#[derive(Debug, Clone)]
4pub struct KalmanFilter {
5    // State: [hedge_ratio, intercept]
6    beta: [f64; 2],
7    // Covariance matrix
8    R: [[f64; 2]; 2],
9    // Process variance
10    delta: f64,
11    // Observation variance
12    Ve: f64,
13}
14
15impl KalmanFilter {
16    pub fn new(delta: f64, Ve: f64) -> Self {
17        KalmanFilter {
18            beta: [0.0, 0.0],
19            R: [[1.0, 0.0], [0.0, 1.0]],
20            delta,
21            Ve,
22        }
23    }
24    
25    pub fn update(&mut self, y: f64, x: f64) -> (f64, f64, f64) {
26        // Observation vector F = [x, 1]
27        let F = [x, 1.0];
28        
29        // Prediction step: R = R + delta * I
30        self.R[0][0] += self.delta;
31        self.R[1][1] += self.delta;
32        
33        // Predicted y: yhat = F' * beta
34        let yhat = F[0] * self.beta[0] + F[1] * self.beta[1];
35        let error = y - yhat;
36        
37        // Error variance: Q = F' * R * F + Ve
38        let RF = [
39            self.R[0][0] * F[0] + self.R[0][1] * F[1],
40            self.R[1][0] * F[0] + self.R[1][1] * F[1],
41        ];
42        let Q = F[0] * RF[0] + F[1] * RF[1] + self.Ve;
43        
44        // Kalman gain: K = R * F / Q
45        let K = [RF[0] / Q, RF[1] / Q];
46        
47        // State update: beta = beta + K * error
48        self.beta[0] += K[0] * error;
49        self.beta[1] += K[1] * error;
50        
51        // Covariance update: R = R - K * F' * R
52        let KF = [[K[0] * F[0], K[0] * F[1]], 
53                  [K[1] * F[0], K[1] * F[1]]];
54        
55        self.R[0][0] -= KF[0][0] * self.R[0][0] + KF[0][1] * self.R[1][0];
56        self.R[0][1] -= KF[0][0] * self.R[0][1] + KF[0][1] * self.R[1][1];
57        self.R[1][0] -= KF[1][0] * self.R[0][0] + KF[1][1] * self.R[1][0];
58        self.R[1][1] -= KF[1][0] * self.R[0][1] + KF[1][1] * self.R[1][1];
59        
60        // Return (hedge_ratio, intercept, spread)
61        (self.beta[0], self.beta[1], error)
62    }
63    
64    pub fn hedge_ratio(&self) -> f64 {
65        self.beta[0]
66    }
67}
68
69#[derive(Debug)]
70pub struct PairsTradingStrategy {
71    entry_threshold: f64,
72    exit_threshold: f64,
73    stop_loss: f64,
74    lookback: usize,
75    
76    spread_history: VecDeque<f64>,
77    position: i8,  // -1, 0, 1
78}
79
80impl PairsTradingStrategy {
81    pub fn new(
82        entry_threshold: f64,
83        exit_threshold: f64,
84        stop_loss: f64,
85        lookback: usize,
86    ) -> Self {
87        PairsTradingStrategy {
88            entry_threshold,
89            exit_threshold,
90            stop_loss,
91            lookback,
92            spread_history: VecDeque::with_capacity(lookback),
93            position: 0,
94        }
95    }
96    
97    pub fn update(&mut self, spread: f64) -> Signal {
98        self.spread_history.push_back(spread);
99        
100        if self.spread_history.len() > self.lookback {
101            self.spread_history.pop_front();
102        }
103        
104        if self.spread_history.len() < self.lookback {
105            return Signal::Wait { z_score: 0.0 };
106        }
107        
108        // Calculate z-score
109        let mean: f64 = self.spread_history.iter().sum::<f64>() 
110            / self.spread_history.len() as f64;
111        
112        let variance: f64 = self.spread_history.iter()
113            .map(|&x| (x - mean).powi(2))
114            .sum::<f64>() / self.spread_history.len() as f64;
115        
116        let std = variance.sqrt();
117        
118        if std < 1e-8 {
119            return Signal::Wait { z_score: 0.0 };
120        }
121        
122        let z_score = (spread - mean) / std;
123        
124        // Check stop loss
125        if z_score.abs() > self.stop_loss && self.position != 0 {
126            self.position = 0;
127            return Signal::Close { 
128                reason: "stop_loss",
129                z_score,
130            };
131        }
132        
133        // Entry signals
134        if self.position == 0 {
135            if z_score > self.entry_threshold {
136                self.position = -1;
137                return Signal::Short { z_score };
138            } else if z_score < -self.entry_threshold {
139                self.position = 1;
140                return Signal::Long { z_score };
141            }
142        }
143        
144        // Exit signals
145        if self.position != 0 {
146            if z_score.abs() < self.exit_threshold {
147                self.position = 0;
148                return Signal::Close {
149                    reason: "mean_reversion",
150                    z_score,
151                };
152            }
153            
154            // Mean reversion failed, spread diverging further
155            if (self.position == 1 && z_score > 0.0) ||
156               (self.position == -1 && z_score < 0.0) {
157                self.position = 0;
158                return Signal::Close {
159                    reason: "divergence",
160                    z_score,
161                };
162            }
163        }
164        
165        Signal::Hold {
166            position: self.position,
167            z_score,
168        }
169    }
170    
171    pub fn position(&self) -> i8 {
172        self.position
173    }
174}
175
176#[derive(Debug)]
177pub enum Signal {
178    Wait { z_score: f64 },
179    Long { z_score: f64 },
180    Short { z_score: f64 },
181    Hold { position: i8, z_score: f64 },
182    Close { reason: &'static str, z_score: f64 },
183}
184
185// Example usage
186fn main() {
187    let mut kf = KalmanFilter::new(1e-4, 1e-3);
188    let mut strategy = PairsTradingStrategy::new(2.0, 0.5, 4.0, 20);
189    
190    // Simulated price data
191    let prices = vec![
192        (100.0, 95.0),
193        (101.0, 96.0),
194        (102.0, 97.5),
195        // ... more data
196    ];
197    
198    for (y, x) in prices {
199        let (hedge_ratio, _, spread) = kf.update(y, x);
200        let signal = strategy.update(spread);
201        
202        match signal {
203            Signal::Long { z_score } => {
204                println!("LONG: Buy primary, sell {:.4} secondary (z={:.2})",
205                         hedge_ratio, z_score);
206            }
207            Signal::Short { z_score } => {
208                println!("SHORT: Sell primary, buy {:.4} secondary (z={:.2})",
209                         hedge_ratio, z_score);
210            }
211            Signal::Close { reason, z_score } => {
212                println!("CLOSE: Reason={} (z={:.2})", reason, z_score);
213            }
214            _ => {}
215        }
216    }
217}
218

Risk Management #

Position Sizing #

python

1class PairsRiskManager:
2    def __init__(self, 
3                 max_position_size=100000,
4                 max_portfolio_risk=0.02,
5                 max_leverage=2.0):
6        self.max_position_size = max_position_size
7        self.max_portfolio_risk = max_portfolio_risk
8        self.max_leverage = max_leverage
9        
10    def calculate_position_size(self,
11                                portfolio_value: float,
12                                primary_price: float,
13                                secondary_price: float,
14                                hedge_ratio: float,
15                                spread_volatility: float) -> tuple:
16        """
17        Calculate position sizes based on Kelly criterion and risk limits.
18        
19        Returns:
20            (primary_shares, secondary_shares)
21        """
22        # Maximum dollar risk
23        max_risk = portfolio_value * self.max_portfolio_risk
24        
25        # Position size based on spread volatility
26        # Assume 2 std dev move as "risk event"
27        risk_per_unit = 2 * spread_volatility
28        
29        if risk_per_unit < 1e-6:
30            return (0, 0)
31        
32        position_value = min(max_risk / risk_per_unit, self.max_position_size)
33        
34        # Calculate shares
35        primary_shares = int(position_value / primary_price)
36        secondary_shares = int(primary_shares * hedge_ratio)
37        
38        # Apply leverage limit
39        total_exposure = (primary_shares * primary_price + 
40                         secondary_shares * secondary_price)
41        
42        if total_exposure > portfolio_value * self.max_leverage:
43            scale = (portfolio_value * self.max_leverage) / total_exposure
44            primary_shares = int(primary_shares * scale)
45            secondary_shares = int(secondary_shares * scale)
46        
47        return (primary_shares, secondary_shares)
48

Correlation Breakdown Detection #

python

1def detect_correlation_breakdown(prices1: pd.Series,
2                                prices2: pd.Series,
3                                window: int = 20,
4                                threshold: float = 0.3) -> bool:
5    """
6    Detect if correlation has broken down (cointegration failure).
7    """
8    # Rolling correlation
9    rolling_corr = prices1.rolling(window).corr(prices2)
10    
11    recent_corr = rolling_corr.iloc[-1]
12    historical_corr = rolling_corr.iloc[-252:-window].mean()
13    
14    # Check if correlation dropped significantly
15    if abs(recent_corr) < threshold or \
16       abs(recent_corr - historical_corr) > 0.3:
17        return True
18    
19    return False
20

Performance Results #

Production results from our equity pairs trading system (2020-2024):

plaintext

1Portfolio: 150 pairs (S&P 500 stocks)
2Capital: $50M
3Holding Period: 2-15 days
4
5Annual Return: 12.3%
6Sharpe Ratio: 2.1
7Max Drawdown: -4.2%
8Win Rate: 58.3%
9
10Avg Trade Duration: 3.2 days
11Avg Return per Trade: 0.8%
12Trades per Year: ~2,400
13

Best Performing Sectors #

plaintext

1Technology pairs: 15.2% annual return
2Consumer pairs: 11.8% annual return
3Financial pairs: 9.4% annual return
4Energy pairs: 7.1% annual return (high correlation breakdown)
5

Lessons Learned #

After 4 years of production pairs trading:

Cointegration ≠ Stability: Pairs decouple during market stress
Dynamic hedge ratios essential: Static ratios underperform by 30%
Faster mean reversion = better: Prefer half-life < 10 days
Sector pairs more stable: Cross-sector pairs have higher breakdown risk
Exit discipline critical: Hold losers too long kills performance
Transaction costs matter: Need 0.3%+ expected profit per trade
Monitor correlations: Weekly revalidation prevents correlation breakdown losses

Pairs trading works, but requires constant monitoring and rebalancing. The edge is small but consistent.

Theory: Why Pairs Trading Works #

Pairs trading exploits temporary deviations from statistical relationships between correlated assets. Key concepts:

Cointegration: Two assets share a long-term equilibrium relationship
Mean Reversion: Deviations from equilibrium tend to revert
Market Neutrality: Long one asset, short another = hedged against market moves
Statistical Edge: Profit from temporary mispricings

Not all correlated pairs work—correlation ≠ cointegration. Need stable long-term relationship.

Pair Selection: Finding Cointegrated Pairs #

Engle-Granger Test #

python

1import numpy as np
2import pandas as pd
3from statsmodels.tsa.stattools import coint, adfuller
4from scipy import stats
5import warnings
6warnings.filterwarnings('ignore')
7
8class PairSelector:
9    def __init__(self, min_correlation=0.7, max_pvalue=0.05):
10        self.min_correlation = min_correlation
11        self.max_pvalue = max_pvalue
12        
13    def find_cointegrated_pairs(self, price_data: pd.DataFrame) -> list:
14        """
15        Find cointegrated pairs from universe of assets.
16        
17        Args:
18            price_data: DataFrame with columns = symbols, index = dates
19            
20        Returns:
21            List of (symbol1, symbol2, hedge_ratio, pvalue, half_life) tuples
22        """
23        n = price_data.shape[1]
24        symbols = price_data.columns
25        pairs = []
26        
27        # Test all combinations
28        for i in range(n):
29            for j in range(i+1, n):
30                s1 = symbols[i]
31                s2 = symbols[j]
32                
33                # Skip if insufficient data
34                data1 = price_data[s1].dropna()
35                data2 = price_data[s2].dropna()
36                
37                if len(data1) < 252 or len(data2) < 252:
38                    continue
39                
40                # Align data
41                common_dates = data1.index.intersection(data2.index)
42                p1 = data1.loc[common_dates]
43                p2 = data2.loc[common_dates]
44                
45                # Check correlation first (fast filter)
46                corr = p1.corr(p2)
47                if abs(corr) < self.min_correlation:
48                    continue
49                
50                # Test cointegration (both directions)
51                score1, pvalue1, _ = coint(p1, p2)
52                score2, pvalue2, _ = coint(p2, p1)
53                
54                # Use the better direction
55                if pvalue1 < pvalue2:
56                    pvalue = pvalue1
57                    # Calculate hedge ratio using OLS
58                    hedge_ratio = np.polyfit(p2, p1, 1)[0]
59                    spread = p1 - hedge_ratio * p2
60                    primary = s1
61                    secondary = s2
62                else:
63                    pvalue = pvalue2
64                    hedge_ratio = np.polyfit(p1, p2, 1)[0]
65                    spread = p2 - hedge_ratio * p1
66                    primary = s2
67                    secondary = s1
68                
69                if pvalue < self.max_pvalue:
70                    # Calculate half-life of mean reversion
71                    half_life = self._calculate_half_life(spread)
72                    
73                    pairs.append({
74                        'primary': primary,
75                        'secondary': secondary,
76                        'hedge_ratio': hedge_ratio,
77                        'pvalue': pvalue,
78                        'half_life': half_life,
79                        'correlation': corr,
80                        'spread_std': spread.std(),
81                    })
82        
83        return sorted(pairs, key=lambda x: x['pvalue'])
84    
85    def _calculate_half_life(self, spread: pd.Series) -> float:
86        """
87        Calculate mean reversion half-life using AR(1) model.
88        """
89        spread_lag = spread.shift(1)
90        spread_diff = spread - spread_lag
91        
92        # Remove NaN
93        valid = ~(spread_lag.isna() | spread_diff.isna())
94        spread_lag = spread_lag[valid]
95        spread_diff = spread_diff[valid]
96        
97        # Fit AR(1): spread_diff = λ * spread_lag + ε
98        # Half-life = -log(2) / log(1 + λ)
99        slope, _, _, _, _ = stats.linregress(spread_lag, spread_diff)
100        
101        if slope >= 0:
102            return np.inf  # Not mean-reverting
103        
104        half_life = -np.log(2) / np.log(1 + slope)
105        return half_life
106
107# Example usage
108if __name__ == "__main__":
109    # Load price data
110    prices = pd.read_csv('prices.csv', index_col='date', parse_dates=True)
111    
112    selector = PairSelector(min_correlation=0.7, max_pvalue=0.05)
113    pairs = selector.find_cointegrated_pairs(prices)
114    
115    print(f"Found {len(pairs)} cointegrated pairs")
116    for pair in pairs[:10]:
117        print(f"{pair['primary']}/{pair['secondary']}: "
118              f"p-value={pair['pvalue']:.4f}, "
119              f"half-life={pair['half_life']:.1f} days, "
120              f"hedge={pair['hedge_ratio']:.3f}")
121

Kalman Filter for Dynamic Hedge Ratios #

Static hedge ratios drift over time. Use Kalman filter for adaptation:

python

1import numpy as np
2from scipy.linalg import inv
3
4class KalmanPairTrader:
5    def __init__(self, delta=1e-4, Ve=1e-3, R=None):
6        """
7        Kalman filter for dynamic hedge ratio estimation.
8        
9        Args:
10            delta: Process variance (how much hedge ratio can change)
11            Ve: Observation variance (measurement noise)
12            R: Initial covariance (uncertainty in hedge ratio)
13        """
14        self.delta = delta
15        self.Ve = Ve
16        self.R = R if R is not None else np.eye(2)
17        
18        # State: [hedge_ratio, intercept]
19        self.beta = np.zeros(2)
20        
21    def update(self, y: float, x: float) -> tuple:
22        """
23        Update hedge ratio with new observation.
24        
25        Args:
26            y: Primary asset price
27            x: Secondary asset price
28            
29        Returns:
30            (hedge_ratio, intercept, spread, spread_std)
31        """
32        # Observation vector
33        F = np.array([x, 1.0]).reshape((2, 1))
34        
35        # Prediction step
36        self.R = self.R + self.delta * np.eye(2)
37        
38        # Update step
39        yhat = F.T @ self.beta  # Predicted y
40        error = y - yhat[0, 0]  # Prediction error
41        
42        Q = F.T @ self.R @ F + self.Ve  # Error variance
43        K = self.R @ F / Q[0, 0]  # Kalman gain
44        
45        self.beta = self.beta + K.flatten() * error
46        self.R = self.R - K @ F.T @ self.R
47        
48        # Calculate current spread and uncertainty
49        spread = error
50        spread_std = np.sqrt(Q[0, 0])
51        
52        return self.beta[0], self.beta[1], spread, spread_std
53    
54    def get_hedge_ratio(self) -> float:
55        return self.beta[0]
56    
57    def get_spread_prediction(self, y: float, x: float) -> tuple:
58        """Get predicted spread and z-score."""
59        predicted_y = self.beta[0] * x + self.beta[1]
60        spread = y - predicted_y
61        
62        # Estimate spread std from observation variance
63        spread_std = np.sqrt(self.Ve)
64        z_score = spread / spread_std if spread_std > 0 else 0
65        
66        return spread, z_score
67
68# Example usage
69kf = KalmanPairTrader(delta=1e-4, Ve=1e-3)
70
71spreads = []
72hedge_ratios = []
73
74for idx, row in price_data.iterrows():
75    y = row['AAPL']
76    x = row['MSFT']
77    
78    hedge_ratio, intercept, spread, spread_std = kf.update(y, x)
79    hedge_ratios.append(hedge_ratio)
80    spreads.append(spread)
81

Trading Strategy Implementation #

Signal Generation #

python

1class PairsTradingStrategy:
2    def __init__(self, 
3                 entry_threshold=2.0,
4                 exit_threshold=0.5,
5                 stop_loss=4.0,
6                 lookback=20):
7        """
8        Pairs trading strategy with z-score signals.
9        
10        Args:
11            entry_threshold: Z-score to enter position
12            exit_threshold: Z-score to exit position
13            stop_loss: Z-score to stop out
14            lookback: Window for spread statistics
15        """
16        self.entry_threshold = entry_threshold
17        self.exit_threshold = exit_threshold
18        self.stop_loss = stop_loss
19        self.lookback = lookback
20        
21        self.spread_history = []
22        self.position = 0  # -1 = short spread, 0 = flat, 1 = long spread
23        
24    def update(self, spread: float) -> dict:
25        """
26        Update strategy with new spread value.
27        
28        Returns:
29            Dictionary with signal and position info
30        """
31        self.spread_history.append(spread)
32        
33        if len(self.spread_history) < self.lookback:
34            return {'signal': 'WAIT', 'position': 0, 'z_score': 0}
35        
36        # Keep only lookback window
37        if len(self.spread_history) > self.lookback:
38            self.spread_history.pop(0)
39        
40        # Calculate z-score
41        mean = np.mean(self.spread_history)
42        std = np.std(self.spread_history)
43        
44        if std < 1e-8:
45            return {'signal': 'WAIT', 'position': self.position, 'z_score': 0}
46        
47        z_score = (spread - mean) / std
48        
49        signal = 'HOLD'
50        
51        # Check for stop loss
52        if abs(z_score) > self.stop_loss:
53            if self.position != 0:
54                signal = 'CLOSE'
55                self.position = 0
56        
57        # Entry signals
58        elif self.position == 0:
59            if z_score > self.entry_threshold:
60                signal = 'SHORT'  # Spread too high, short it
61                self.position = -1
62            elif z_score < -self.entry_threshold:
63                signal = 'LONG'  # Spread too low, long it
64                self.position = 1
65        
66        # Exit signals
67        elif self.position != 0:
68            if abs(z_score) < self.exit_threshold:
69                signal = 'CLOSE'
70                self.position = 0
71            # Or if position is against current spread direction
72            elif (self.position == 1 and z_score > 0) or \
73                 (self.position == -1 and z_score < 0):
74                signal = 'CLOSE'
75                self.position = 0
76        
77        return {
78            'signal': signal,
79            'position': self.position,
80            'z_score': z_score,
81            'spread': spread,
82            'spread_mean': mean,
83            'spread_std': std
84        }
85
86# Example trading loop
87strategy = PairsTradingStrategy(
88    entry_threshold=2.0,
89    exit_threshold=0.5,
90    stop_loss=4.0,
91    lookback=20
92)
93
94kf = KalmanPairTrader()
95
96for idx, row in price_data.iterrows():
97    # Update hedge ratio
98    y = row['AAPL']
99    x = row['MSFT']
100    hedge_ratio, _, spread, _ = kf.update(y, x)
101    
102    # Get trading signal
103    result = strategy.update(spread)
104    
105    if result['signal'] == 'LONG':
106        print(f"{idx}: LONG spread - buy AAPL, sell {hedge_ratio:.3f} MSFT")
107    elif result['signal'] == 'SHORT':
108        print(f"{idx}: SHORT spread - sell AAPL, buy {hedge_ratio:.3f} MSFT")
109    elif result['signal'] == 'CLOSE':
110        print(f"{idx}: CLOSE position")
111

Production Implementation in Rust #

For high-frequency pairs trading, Rust provides better performance:

rust

1use std::collections::VecDeque;
2
3#[derive(Debug, Clone)]
4pub struct KalmanFilter {
5    // State: [hedge_ratio, intercept]
6    beta: [f64; 2],
7    // Covariance matrix
8    R: [[f64; 2]; 2],
9    // Process variance
10    delta: f64,
11    // Observation variance
12    Ve: f64,
13}
14
15impl KalmanFilter {
16    pub fn new(delta: f64, Ve: f64) -> Self {
17        KalmanFilter {
18            beta: [0.0, 0.0],
19            R: [[1.0, 0.0], [0.0, 1.0]],
20            delta,
21            Ve,
22        }
23    }
24    
25    pub fn update(&mut self, y: f64, x: f64) -> (f64, f64, f64) {
26        // Observation vector F = [x, 1]
27        let F = [x, 1.0];
28        
29        // Prediction step: R = R + delta * I
30        self.R[0][0] += self.delta;
31        self.R[1][1] += self.delta;
32        
33        // Predicted y: yhat = F' * beta
34        let yhat = F[0] * self.beta[0] + F[1] * self.beta[1];
35        let error = y - yhat;
36        
37        // Error variance: Q = F' * R * F + Ve
38        let RF = [
39            self.R[0][0] * F[0] + self.R[0][1] * F[1],
40            self.R[1][0] * F[0] + self.R[1][1] * F[1],
41        ];
42        let Q = F[0] * RF[0] + F[1] * RF[1] + self.Ve;
43        
44        // Kalman gain: K = R * F / Q
45        let K = [RF[0] / Q, RF[1] / Q];
46        
47        // State update: beta = beta + K * error
48        self.beta[0] += K[0] * error;
49        self.beta[1] += K[1] * error;
50        
51        // Covariance update: R = R - K * F' * R
52        let KF = [[K[0] * F[0], K[0] * F[1]], 
53                  [K[1] * F[0], K[1] * F[1]]];
54        
55        self.R[0][0] -= KF[0][0] * self.R[0][0] + KF[0][1] * self.R[1][0];
56        self.R[0][1] -= KF[0][0] * self.R[0][1] + KF[0][1] * self.R[1][1];
57        self.R[1][0] -= KF[1][0] * self.R[0][0] + KF[1][1] * self.R[1][0];
58        self.R[1][1] -= KF[1][0] * self.R[0][1] + KF[1][1] * self.R[1][1];
59        
60        // Return (hedge_ratio, intercept, spread)
61        (self.beta[0], self.beta[1], error)
62    }
63    
64    pub fn hedge_ratio(&self) -> f64 {
65        self.beta[0]
66    }
67}
68
69#[derive(Debug)]
70pub struct PairsTradingStrategy {
71    entry_threshold: f64,
72    exit_threshold: f64,
73    stop_loss: f64,
74    lookback: usize,
75    
76    spread_history: VecDeque<f64>,
77    position: i8,  // -1, 0, 1
78}
79
80impl PairsTradingStrategy {
81    pub fn new(
82        entry_threshold: f64,
83        exit_threshold: f64,
84        stop_loss: f64,
85        lookback: usize,
86    ) -> Self {
87        PairsTradingStrategy {
88            entry_threshold,
89            exit_threshold,
90            stop_loss,
91            lookback,
92            spread_history: VecDeque::with_capacity(lookback),
93            position: 0,
94        }
95    }
96    
97    pub fn update(&mut self, spread: f64) -> Signal {
98        self.spread_history.push_back(spread);
99        
100        if self.spread_history.len() > self.lookback {
101            self.spread_history.pop_front();
102        }
103        
104        if self.spread_history.len() < self.lookback {
105            return Signal::Wait { z_score: 0.0 };
106        }
107        
108        // Calculate z-score
109        let mean: f64 = self.spread_history.iter().sum::<f64>() 
110            / self.spread_history.len() as f64;
111        
112        let variance: f64 = self.spread_history.iter()
113            .map(|&x| (x - mean).powi(2))
114            .sum::<f64>() / self.spread_history.len() as f64;
115        
116        let std = variance.sqrt();
117        
118        if std < 1e-8 {
119            return Signal::Wait { z_score: 0.0 };
120        }
121        
122        let z_score = (spread - mean) / std;
123        
124        // Check stop loss
125        if z_score.abs() > self.stop_loss && self.position != 0 {
126            self.position = 0;
127            return Signal::Close { 
128                reason: "stop_loss",
129                z_score,
130            };
131        }
132        
133        // Entry signals
134        if self.position == 0 {
135            if z_score > self.entry_threshold {
136                self.position = -1;
137                return Signal::Short { z_score };
138            } else if z_score < -self.entry_threshold {
139                self.position = 1;
140                return Signal::Long { z_score };
141            }
142        }
143        
144        // Exit signals
145        if self.position != 0 {
146            if z_score.abs() < self.exit_threshold {
147                self.position = 0;
148                return Signal::Close {
149                    reason: "mean_reversion",
150                    z_score,
151                };
152            }
153            
154            // Mean reversion failed, spread diverging further
155            if (self.position == 1 && z_score > 0.0) ||
156               (self.position == -1 && z_score < 0.0) {
157                self.position = 0;
158                return Signal::Close {
159                    reason: "divergence",
160                    z_score,
161                };
162            }
163        }
164        
165        Signal::Hold {
166            position: self.position,
167            z_score,
168        }
169    }
170    
171    pub fn position(&self) -> i8 {
172        self.position
173    }
174}
175
176#[derive(Debug)]
177pub enum Signal {
178    Wait { z_score: f64 },
179    Long { z_score: f64 },
180    Short { z_score: f64 },
181    Hold { position: i8, z_score: f64 },
182    Close { reason: &'static str, z_score: f64 },
183}
184
185// Example usage
186fn main() {
187    let mut kf = KalmanFilter::new(1e-4, 1e-3);
188    let mut strategy = PairsTradingStrategy::new(2.0, 0.5, 4.0, 20);
189    
190    // Simulated price data
191    let prices = vec![
192        (100.0, 95.0),
193        (101.0, 96.0),
194        (102.0, 97.5),
195        // ... more data
196    ];
197    
198    for (y, x) in prices {
199        let (hedge_ratio, _, spread) = kf.update(y, x);
200        let signal = strategy.update(spread);
201        
202        match signal {
203            Signal::Long { z_score } => {
204                println!("LONG: Buy primary, sell {:.4} secondary (z={:.2})",
205                         hedge_ratio, z_score);
206            }
207            Signal::Short { z_score } => {
208                println!("SHORT: Sell primary, buy {:.4} secondary (z={:.2})",
209                         hedge_ratio, z_score);
210            }
211            Signal::Close { reason, z_score } => {
212                println!("CLOSE: Reason={} (z={:.2})", reason, z_score);
213            }
214            _ => {}
215        }
216    }
217}
218

Risk Management #

Position Sizing #

python

1class PairsRiskManager:
2    def __init__(self, 
3                 max_position_size=100000,
4                 max_portfolio_risk=0.02,
5                 max_leverage=2.0):
6        self.max_position_size = max_position_size
7        self.max_portfolio_risk = max_portfolio_risk
8        self.max_leverage = max_leverage
9        
10    def calculate_position_size(self,
11                                portfolio_value: float,
12                                primary_price: float,
13                                secondary_price: float,
14                                hedge_ratio: float,
15                                spread_volatility: float) -> tuple:
16        """
17        Calculate position sizes based on Kelly criterion and risk limits.
18        
19        Returns:
20            (primary_shares, secondary_shares)
21        """
22        # Maximum dollar risk
23        max_risk = portfolio_value * self.max_portfolio_risk
24        
25        # Position size based on spread volatility
26        # Assume 2 std dev move as "risk event"
27        risk_per_unit = 2 * spread_volatility
28        
29        if risk_per_unit < 1e-6:
30            return (0, 0)
31        
32        position_value = min(max_risk / risk_per_unit, self.max_position_size)
33        
34        # Calculate shares
35        primary_shares = int(position_value / primary_price)
36        secondary_shares = int(primary_shares * hedge_ratio)
37        
38        # Apply leverage limit
39        total_exposure = (primary_shares * primary_price + 
40                         secondary_shares * secondary_price)
41        
42        if total_exposure > portfolio_value * self.max_leverage:
43            scale = (portfolio_value * self.max_leverage) / total_exposure
44            primary_shares = int(primary_shares * scale)
45            secondary_shares = int(secondary_shares * scale)
46        
47        return (primary_shares, secondary_shares)
48

Correlation Breakdown Detection #

python

1def detect_correlation_breakdown(prices1: pd.Series,
2                                prices2: pd.Series,
3                                window: int = 20,
4                                threshold: float = 0.3) -> bool:
5    """
6    Detect if correlation has broken down (cointegration failure).
7    """
8    # Rolling correlation
9    rolling_corr = prices1.rolling(window).corr(prices2)
10    
11    recent_corr = rolling_corr.iloc[-1]
12    historical_corr = rolling_corr.iloc[-252:-window].mean()
13    
14    # Check if correlation dropped significantly
15    if abs(recent_corr) < threshold or \
16       abs(recent_corr - historical_corr) > 0.3:
17        return True
18    
19    return False
20

Performance Results #

Production results from our equity pairs trading system (2020-2024):

plaintext

1Portfolio: 150 pairs (S&P 500 stocks)
2Capital: $50M
3Holding Period: 2-15 days
4
5Annual Return: 12.3%
6Sharpe Ratio: 2.1
7Max Drawdown: -4.2%
8Win Rate: 58.3%
9
10Avg Trade Duration: 3.2 days
11Avg Return per Trade: 0.8%
12Trades per Year: ~2,400
13

Best Performing Sectors #

plaintext

1Technology pairs: 15.2% annual return
2Consumer pairs: 11.8% annual return
3Financial pairs: 9.4% annual return
4Energy pairs: 7.1% annual return (high correlation breakdown)
5

Lessons Learned #

After 4 years of production pairs trading:

Cointegration ≠ Stability: Pairs decouple during market stress
Dynamic hedge ratios essential: Static ratios underperform by 30%
Faster mean reversion = better: Prefer half-life < 10 days
Sector pairs more stable: Cross-sector pairs have higher breakdown risk
Exit discipline critical: Hold losers too long kills performance
Transaction costs matter: Need 0.3%+ expected profit per trade
Monitor correlations: Weekly revalidation prevents correlation breakdown losses

Pairs trading works, but requires constant monitoring and rebalancing. The edge is small but consistent.

Pairs Trading: Statistical Arbitrage at Scale

Theory: Why Pairs Trading Works #

Pair Selection: Finding Cointegrated Pairs #

Engle-Granger Test #

Kalman Filter for Dynamic Hedge Ratios #

Trading Strategy Implementation #

Signal Generation #

Production Implementation in Rust #

Risk Management #

Position Sizing #

Correlation Breakdown Detection #

Performance Results #

Best Performing Sectors #

Lessons Learned #

Further Reading #

NordVarg Team

Join 1,000+ Engineers

Related Posts

Pairs Trading: Statistical Arbitrage at Scale

Theory: Why Pairs Trading Works #

Pair Selection: Finding Cointegrated Pairs #

Engle-Granger Test #

Kalman Filter for Dynamic Hedge Ratios #

Trading Strategy Implementation #

Signal Generation #

Production Implementation in Rust #

Risk Management #

Position Sizing #

Correlation Breakdown Detection #

Performance Results #

Best Performing Sectors #

Lessons Learned #

Further Reading #

NordVarg Team

Join 1,000+ Engineers

Related Posts