In 2021, a quantitative hedge fund was testing a new mean-reversion strategy. They wanted to optimize across 1,000 parameter combinations (different lookback periods, entry thresholds, stop-losses) to find the best configuration. On their 16-core server, the full parameter sweep took 18.5 hours. By the time results came back, the market had moved, and the optimal parameters might have changed.
They couldn't iterate fast enough. Research velocity was their bottleneck—not ideas, not data, but compute. A researcher would submit a backtest Friday afternoon and get results Monday morning. Three-day feedback loops killed productivity.
We built a distributed backtesting system using Ray, scaling to 320 CPUs across 20 cloud instances. The same 1,000-parameter sweep now completes in 52 minutes—a 21x speedup. Researchers iterate 20+ times per day instead of once every 3 days. Research velocity increased 60x.
But distributed backtesting isn't free. The infrastructure cost $1.20/hour ($28/day if running continuously). The engineering effort was 4 months. And debugging distributed systems is 10x harder than debugging single-threaded code. This article covers when distributed backtesting is worth it, how to build it, and what goes wrong.
Single-threaded backtesting hits limits quickly:
Testing a strategy with 4 parameters, each with 10 possible values, creates 10,000 combinations. If each backtest takes 10 seconds, that's 27.8 hours. Add walk-forward analysis (splitting data into 10 periods), and you're at 278 hours (11.5 days).
Example: Mean-reversion strategy parameters
Total combinations: 10 × 5 × 4 × 4 = 800
At 10 seconds per backtest: 800 × 10s = 8,000s = 2.2 hours (single-threaded)
With 20 workers: 8,000s / 20 = 400s = 6.7 minutes
The speedup is linear with worker count (assuming no overhead).
Backtesting a portfolio of 100 stocks requires simulating all 100 simultaneously, tracking correlations, rebalancing, and risk management. This is 100x more expensive than backtesting a single stock.
For a universe of 500 stocks, testing 100 different portfolio construction methods takes weeks on a single machine. Distributed systems make it feasible.
Some strategies require Monte Carlo simulation: run the same strategy 1,000+ times with different random seeds to estimate distribution of outcomes. This is embarrassingly parallel—each simulation is independent.
We use Ray, a distributed computing framework that makes parallelization trivial. Ray handles:
Alternatives considered:
Ray wins because:
1import ray
2import pandas as pd
3import numpy as np
4from typing import Dict, List
5from dataclasses import dataclass
6import time
7
8@dataclass
9class BacktestConfig:
10 """Configuration for a single backtest"""
11 strategy_name: str
12 parameters: Dict
13 start_date: str
14 end_date: str
15 initial_capital: float = 1_000_000.0
16 transaction_cost_bps: float = 5.0 # 5 basis points
17
18@ray.remote
19class BacktestWorker:
20 """
21 Ray actor that runs backtests.
22 Each worker maintains its own data cache.
23 """
24
25 def __init__(self, data_ref):
26 """
27 Args:
28 data_ref: Ray object reference to shared market data
29 """
30 # Load data from shared object store (zero-copy)
31 self.market_data = ray.get(data_ref)
32 print(f"Worker initialized with {len(self.market_data)} rows of data")
33
34 def run_backtest(self, config: BacktestConfig) -> Dict:
35 """
36 Execute a single backtest.
37
38 Returns:
39 Dictionary with performance metrics
40 """
41 start_time = time.time()
42
43 # Filter data for backtest period
44 data = self.market_data[
45 (self.market_data['date'] >= config.start_date) &
46 (self.market_data['date'] <= config.end_date)
47 ].copy()
48
49 # Run strategy
50 results = self._execute_strategy(data, config)
51
52 # Calculate metrics
53 metrics = self._calculate_metrics(results, config)
54 metrics['backtest_time_seconds'] = time.time() - start_time
55 metrics['parameters'] = config.parameters
56
57 return metrics
58
59 def _execute_strategy(self, data, config):
60 """Execute trading strategy logic"""
61 portfolio_value = [config.initial_capital]
62 positions = []
63 cash = config.initial_capital
64 shares = 0
65
66 params = config.parameters
67
68 for i in range(len(data)):
69 row = data.iloc[i]
70
71 # Calculate indicators
72 if i < params['lookback_period']:
73 positions.append(0)
74 portfolio_value.append(cash + shares * row['close'])
75 continue
76
77 # Mean reversion signal
78 recent_prices = data.iloc[i-params['lookback_period']:i]['close']
79 mean_price = recent_prices.mean()
80 std_price = recent_prices.std()
81
82 if std_price == 0:
83 z_score = 0
84 else:
85 z_score = (row['close'] - mean_price) / std_price
86
87 # Trading logic
88 signal = 0
89 if shares == 0: # No position
90 if z_score < -params['entry_threshold']:
91 signal = 1 # Buy
92 elif z_score > params['entry_threshold']:
93 signal = -1 # Short (if allowed)
94 else: # Have position
95 if abs(z_score) < params['exit_threshold']:
96 signal = -np.sign(shares) # Close position
97 elif shares > 0 and z_score < -params['stop_loss_z']:
98 signal = -1 # Stop loss on long
99 elif shares < 0 and z_score > params['stop_loss_z']:
100 signal = 1 # Stop loss on short
101
102 # Execute trade
103 if signal != 0:
104 trade_shares = int(cash / row['close']) if signal > 0 else -shares
105 trade_cost = abs(trade_shares) * row['close'] * (config.transaction_cost_bps / 10000)
106
107 cash -= trade_shares * row['close'] + trade_cost
108 shares += trade_shares
109
110 positions.append(shares)
111 portfolio_value.append(cash + shares * row['close'])
112
113 return {
114 'portfolio_value': portfolio_value,
115 'positions': positions,
116 'final_value': portfolio_value[-1]
117 }
118
119 def _calculate_metrics(self, results, config):
120 """Calculate performance metrics"""
121 pv = pd.Series(results['portfolio_value'])
122 returns = pv.pct_change().dropna()
123
124 # Count trades
125 positions = results['positions']
126 num_trades = sum(1 for i in range(1, len(positions))
127 if positions[i] != positions[i-1])
128
129 # Calculate metrics
130 total_return = (results['final_value'] / config.initial_capital) - 1
131
132 if len(returns) > 0 and returns.std() > 0:
133 sharpe_ratio = returns.mean() / returns.std() * np.sqrt(252)
134 else:
135 sharpe_ratio = 0
136
137 # Maximum drawdown
138 cummax = pv.cummax()
139 drawdown = (pv - cummax) / cummax
140 max_drawdown = drawdown.min()
141
142 return {
143 'total_return': total_return,
144 'sharpe_ratio': sharpe_ratio,
145 'max_drawdown': max_drawdown,
146 'num_trades': num_trades,
147 'final_value': results['final_value']
148 }
149
150
151class DistributedBacktester:
152 """
153 Coordinator for distributed backtesting.
154 """
155
156 def __init__(self, n_workers: int = 20):
157 """
158 Args:
159 n_workers: Number of parallel workers
160 """
161 # Initialize Ray
162 if not ray.is_initialized():
163 ray.init(ignore_reinit_error=True)
164
165 print(f"Initializing {n_workers} workers...")
166
167 # Load and share market data
168 market_data = self._load_market_data()
169 self.data_ref = ray.put(market_data) # Store in object store
170
171 # Create worker pool
172 self.workers = [
173 BacktestWorker.remote(self.data_ref)
174 for _ in range(n_workers)
175 ]
176
177 print(f"Workers initialized. Data size: {len(market_data)} rows")
178
179 def _load_market_data(self):
180 """Load market data (in production: from database)"""
181 # Simulated data for example
182 dates = pd.date_range('2010-01-01', '2024-01-01', freq='D')
183 data = pd.DataFrame({
184 'date': dates,
185 'close': 100 + np.cumsum(np.random.randn(len(dates)) * 0.5)
186 })
187 return data
188
189 def run_parameter_sweep(self,
190 strategy_name: str,
191 parameter_grid: Dict[str, List],
192 start_date: str,
193 end_date: str) -> pd.DataFrame:
194 """
195 Run backtest across all parameter combinations.
196
197 Args:
198 strategy_name: Name of strategy
199 parameter_grid: Dict mapping parameter names to lists of values
200 start_date, end_date: Backtest period
201
202 Returns:
203 DataFrame with results sorted by Sharpe ratio
204 """
205 # Generate all parameter combinations
206 from itertools import product
207
208 param_names = list(parameter_grid.keys())
209 param_values = list(parameter_grid.values())
210
211 configs = []
212 for values in product(*param_values):
213 params = dict(zip(param_names, values))
214 configs.append(BacktestConfig(
215 strategy_name=strategy_name,
216 parameters=params,
217 start_date=start_date,
218 end_date=end_date
219 ))
220
221 print(f"Running {len(configs)} backtests across {len(self.workers)} workers...")
222 start_time = time.time()
223
224 # Distribute work across workers (round-robin)
225 tasks = []
226 for i, config in enumerate(configs):
227 worker = self.workers[i % len(self.workers)]
228 task = worker.run_backtest.remote(config)
229 tasks.append(task)
230
231 # Collect results
232 results = ray.get(tasks)
233
234 elapsed = time.time() - start_time
235 print(f"Completed in {elapsed:.1f} seconds ({len(configs)/elapsed:.1f} backtests/sec)")
236
237 # Convert to DataFrame
238 df = pd.DataFrame(results)
239 return df.sort_values('sharpe_ratio', ascending=False)
240
241 def shutdown(self):
242 """Clean up Ray resources"""
243 ray.shutdown()
244
245
246# Example usage
247if __name__ == "__main__":
248 # Create distributed backtester
249 backtester = DistributedBacktester(n_workers=20)
250
251 # Define parameter grid
252 parameter_grid = {
253 'lookback_period': [20, 40, 60, 80, 100],
254 'entry_threshold': [1.5, 2.0, 2.5],
255 'exit_threshold': [0.25, 0.5, 0.75],
256 'stop_loss_z': [2.5, 3.0, 3.5]
257 }
258
259 # Run parameter sweep
260 results = backtester.run_parameter_sweep(
261 strategy_name='mean_reversion',
262 parameter_grid=parameter_grid,
263 start_date='2015-01-01',
264 end_date='2023-12-31'
265 )
266
267 print("\nTop 10 parameter combinations:")
268 print(results.head(10)[['sharpe_ratio', 'total_return', 'max_drawdown', 'num_trades', 'parameters']])
269
270 backtester.shutdown()
271Key design decisions:
A $200M quantitative hedge fund runs 50+ strategies across multiple asset classes. Each strategy has 5-10 tunable parameters. They want to:
Scale: 50 strategies × 1,000 parameter combinations × 10 walk-forward periods = 500,000 backtests
Single-threaded time: 500,000 × 10 seconds = 5,000,000 seconds = 1,389 hours = 58 days
Clearly infeasible.
Infrastructure:
Results:
ROI:
Problem 1: Data transfer bottleneck
Initially, each worker loaded market data independently from S3. With 320 workers, this created 320 simultaneous S3 requests, overwhelming the bucket and causing throttling.
Solution: Load data once on the coordinator, store in Ray's object store, share references to workers. This reduced data transfer from 320× to 1×.
Problem 2: Straggler tasks
Some parameter combinations took 10x longer than others (e.g., high-frequency strategies with many trades). This created "stragglers"—the last 5% of tasks took 30% of total time.
Solution: Sort tasks by estimated runtime (based on parameter complexity), schedule longest tasks first. This improved load balancing and reduced total time by 20%.
Problem 3: Memory leaks
Workers accumulated memory over time, eventually crashing. Debugging showed pandas DataFrames weren't being garbage collected properly.
Solution: Restart workers every 100 backtests. Ray makes this trivial—just kill and respawn the actor.
Distributed systems add complexity. Sometimes, a faster single machine is better.
For 100 backtests at 10 seconds each, that's 16.7 minutes on a single machine. Setting up a distributed cluster takes longer than just running it locally.
Rule of thumb: Distribute only if single-threaded time >1 hour.
Distributed backtesting makes debugging 10x harder. Print statements don't work (output scattered across workers). Breakpoints don't work (can't attach debugger to remote workers).
For strategy development, use single-threaded backtesting. Only distribute for production parameter sweeps.
Running 320 CPUs costs $1.20/hour. For a 4-hour job, that's $4.80. If you run this daily, it's $1,752/year.
For a large fund, this is trivial. For an individual trader, it's significant. Consider whether the speed improvement justifies the cost.
We track:
Dashboards (Grafana) show real-time progress. Alerts trigger if throughput drops below 500 backtests/minute (indicates worker failures).
Workers fail. S3 throttles. Networks partition. The system must handle failures gracefully.
We implement:
We use AWS spot instances (70% cheaper than on-demand). Spot instances can be terminated with 2-minute notice, but Ray handles this gracefully—just reschedules tasks to other workers.
For non-urgent jobs, spot instances reduce cost from $1.20/hour to $0.36/hour.
Distributed backtesting transformed research velocity from days to minutes. Researchers iterate 60x faster, test more ideas, and find better strategies.
But it's not free. The infrastructure costs money, the engineering takes months, and debugging is harder. The decision depends on scale:
Distribute when:
Don't distribute when:
For the hedge fund, distributed backtesting was transformative. For an individual trader, a faster laptop might suffice. As always: measure, analyze, choose the right tool for your scale.
Tools:
Books:
Papers:
Technical Writer
NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.
Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.