Deep Learning for Portfolio Optimization: Beyond Mean-Variance
Using neural networks to build adaptive portfolio optimization systems that handle non-linear dependencies and regime changes
Traditional portfolio optimization relies on mean-variance analysis, a framework developed by Harry Markowitz in 1952. While elegant, it makes strong assumptions that often fail in real markets: normal returns, stable correlations, and static risk profiles.
Deep learning offers a path beyond these limitations. After building portfolio optimization systems for institutional clients managing billions in AUM, we've learned how to apply neural networks effectively while avoiding common pitfalls.
Mean-variance optimization has well-known issues:
| Problem | Impact | Traditional Solution | DL Solution |
|---|---|---|---|
| Non-normal returns | Fat tails, skewness ignored | Robust optimization | Learn full distribution |
| Unstable correlations | Estimation error amplification | Shrinkage estimators | Temporal dependencies |
| Regime changes | Fixed allocations fail | Conditional optimization | Regime-aware networks |
| Non-linear dependencies | Missed diversification | Higher moments | Attention mechanisms |
| High dimensionality | Curse of dimensionality | Factor models | Representation learning |
We use a transformer-based architecture that learns temporal dependencies and cross-asset relationships:
1import torch
2import torch.nn as nn
3import numpy as np
4
5class PortfolioTransformer(nn.Module):
6 """
7 Transformer-based portfolio optimizer that learns:
8 1. Temporal patterns in asset returns
9 2. Cross-asset dependencies (beyond correlation)
10 3. Regime-conditional allocations
11 4. Risk-return trade-offs
12 """
13
14 def __init__(
15 self,
16 n_assets: int,
17 n_features: int,
18 d_model: int = 256,
19 n_heads: int = 8,
20 n_layers: int = 6,
21 dropout: float = 0.1
22 ):
23 super().__init__()
24
25 self.n_assets = n_assets
26 self.n_features = n_features
27
28 # Feature embedding for each asset
29 self.feature_embedding = nn.Linear(n_features, d_model)
30
31 # Positional encoding for time series
32 self.temporal_encoding = TemporalEncoding(d_model)
33
34 # Multi-head attention layers
35 encoder_layer = nn.TransformerEncoderLayer(
36 d_model=d_model,
37 nhead=n_heads,
38 dim_feedforward=d_model * 4,
39 dropout=dropout,
40 batch_first=True
41 )
42 self.transformer = nn.TransformerEncoder(encoder_layer, n_layers)
43
44 # Asset attention: learn cross-asset dependencies
45 self.asset_attention = nn.MultiheadAttention(
46 d_model, n_heads, dropout=dropout, batch_first=True
47 )
48
49 # Regime detection head
50 self.regime_head = nn.Sequential(
51 nn.Linear(d_model, 128),
52 nn.ReLU(),
53 nn.Dropout(dropout),
54 nn.Linear(128, 4), # 4 market regimes
55 nn.Softmax(dim=-1)
56 )
57
58 # Portfolio weight head
59 self.weight_head = nn.Sequential(
60 nn.Linear(d_model, 128),
61 nn.ReLU(),
62 nn.Dropout(dropout),
63 nn.Linear(128, 1)
64 )
65
66 # Risk prediction head
67 self.risk_head = nn.Sequential(
68 nn.Linear(d_model, 128),
69 nn.ReLU(),
70 nn.Dropout(dropout),
71 nn.Linear(128, 1),
72 nn.Softplus() # Ensure positive risk
73 )
74
75 def forward(self, features, mask=None):
76 """
77 Args:
78 features: (batch, n_assets, seq_len, n_features)
79 mask: Optional attention mask
80
81 Returns:
82 weights: (batch, n_assets) - Portfolio weights
83 regime: (batch, 4) - Regime probabilities
84 risk: (batch, n_assets) - Asset risk predictions
85 """
86 batch_size, n_assets, seq_len, n_features = features.shape
87
88 # Reshape for processing
89 x = features.reshape(batch_size * n_assets, seq_len, n_features)
90
91 # Feature embedding
92 x = self.feature_embedding(x) # (batch*n_assets, seq_len, d_model)
93
94 # Add temporal encoding
95 x = self.temporal_encoding(x)
96
97 # Process through transformer
98 x = self.transformer(x, mask=mask) # (batch*n_assets, seq_len, d_model)
99
100 # Take last time step
101 x = x[:, -1, :] # (batch*n_assets, d_model)
102
103 # Reshape back to separate assets
104 x = x.reshape(batch_size, n_assets, -1) # (batch, n_assets, d_model)
105
106 # Cross-asset attention
107 asset_features, attention_weights = self.asset_attention(x, x, x)
108
109 # Predict regime (from pooled representation)
110 pooled = torch.mean(asset_features, dim=1) # (batch, d_model)
111 regime = self.regime_head(pooled) # (batch, 4)
112
113 # Predict weights and risk for each asset
114 raw_weights = self.weight_head(asset_features).squeeze(-1) # (batch, n_assets)
115 risk = self.risk_head(asset_features).squeeze(-1) # (batch, n_assets)
116
117 # Normalize weights to sum to 1 (with optional constraints)
118 weights = self._normalize_weights(raw_weights)
119
120 return weights, regime, risk, attention_weights
121
122 def _normalize_weights(self, raw_weights):
123 """
124 Convert raw logits to valid portfolio weights.
125 Supports long-only, long-short, and bounded constraints.
126 """
127 # Long-only: softmax
128 weights = torch.softmax(raw_weights, dim=-1)
129
130 # Could also implement:
131 # - Long-short: tanh normalization
132 # - Bounded: sigmoid with scaling
133 # - Sector constraints: hierarchical softmax
134
135 return weights
136
137class TemporalEncoding(nn.Module):
138 """
139 Add temporal information to sequences.
140 Combines sinusoidal encoding with learnable parameters.
141 """
142
143 def __init__(self, d_model: int, max_len: int = 5000):
144 super().__init__()
145
146 # Sinusoidal positional encoding
147 position = torch.arange(max_len).unsqueeze(1)
148 div_term = torch.exp(
149 torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model)
150 )
151
152 pe = torch.zeros(max_len, d_model)
153 pe[:, 0::2] = torch.sin(position * div_term)
154 pe[:, 1::2] = torch.cos(position * div_term)
155
156 self.register_buffer('pe', pe)
157
158 # Learnable temporal embeddings
159 self.temporal_embedding = nn.Parameter(torch.randn(1, max_len, d_model) * 0.02)
160
161 def forward(self, x):
162 """
163 Args:
164 x: (batch, seq_len, d_model)
165 """
166 seq_len = x.size(1)
167 x = x + self.pe[:seq_len, :].unsqueeze(0)
168 x = x + self.temporal_embedding[:, :seq_len, :]
169 return x
170Portfolio optimization requires custom loss functions that balance return, risk, and constraints:
1class PortfolioLoss(nn.Module):
2 """
3 Multi-objective loss for portfolio optimization.
4 Combines return prediction, risk management, and constraints.
5 """
6
7 def __init__(
8 self,
9 risk_aversion: float = 1.0,
10 regime_weight: float = 0.1,
11 turnover_penalty: float = 0.01,
12 concentration_penalty: float = 0.05
13 ):
14 super().__init__()
15 self.risk_aversion = risk_aversion
16 self.regime_weight = regime_weight
17 self.turnover_penalty = turnover_penalty
18 self.concentration_penalty = concentration_penalty
19
20 def forward(
21 self,
22 weights,
23 predicted_regime,
24 predicted_risk,
25 returns,
26 true_regime=None,
27 previous_weights=None
28 ):
29 """
30 Compute portfolio optimization loss.
31
32 Args:
33 weights: (batch, n_assets) - Predicted portfolio weights
34 predicted_regime: (batch, 4) - Regime probabilities
35 predicted_risk: (batch, n_assets) - Risk predictions
36 returns: (batch, n_assets) - Realized returns (forward-looking)
37 true_regime: (batch,) - True regime labels (optional)
38 previous_weights: (batch, n_assets) - Previous period weights
39 """
40 batch_size = weights.shape[0]
41
42 # 1. Return component: maximize expected portfolio return
43 portfolio_returns = torch.sum(weights * returns, dim=1)
44 return_loss = -torch.mean(portfolio_returns)
45
46 # 2. Risk component: minimize portfolio variance
47 # Use predicted risk to estimate covariance
48 portfolio_variance = self._calculate_portfolio_variance(
49 weights, predicted_risk, returns
50 )
51 risk_loss = torch.mean(portfolio_variance)
52
53 # 3. Sharpe ratio objective (combining return and risk)
54 sharpe_loss = -torch.mean(
55 portfolio_returns / (torch.sqrt(portfolio_variance) + 1e-8)
56 )
57
58 # 4. Regime prediction loss (if labels available)
59 regime_loss = 0
60 if true_regime is not None:
61 regime_loss = nn.CrossEntropyLoss()(predicted_regime, true_regime)
62
63 # 5. Turnover penalty (transaction costs)
64 turnover_loss = 0
65 if previous_weights is not None:
66 turnover = torch.sum(torch.abs(weights - previous_weights), dim=1)
67 turnover_loss = torch.mean(turnover)
68
69 # 6. Concentration penalty (avoid putting all eggs in one basket)
70 concentration = torch.sum(weights ** 2, dim=1) # Herfindahl index
71 concentration_loss = torch.mean(concentration)
72
73 # Combine losses
74 total_loss = (
75 sharpe_loss +
76 self.risk_aversion * risk_loss +
77 self.regime_weight * regime_loss +
78 self.turnover_penalty * turnover_loss +
79 self.concentration_penalty * concentration_loss
80 )
81
82 return {
83 'total': total_loss,
84 'sharpe': sharpe_loss,
85 'return': return_loss,
86 'risk': risk_loss,
87 'regime': regime_loss,
88 'turnover': turnover_loss,
89 'concentration': concentration_loss
90 }
91
92 def _calculate_portfolio_variance(self, weights, predicted_risk, returns):
93 """
94 Estimate portfolio variance using predicted risk and empirical correlation.
95 """
96 # Simple approach: use predicted individual risks and sample correlation
97 n_assets = weights.shape[1]
98
99 # Empirical correlation from recent returns
100 correlation = self._estimate_correlation(returns)
101
102 # Construct covariance matrix
103 risk_matrix = torch.diag_embed(predicted_risk)
104 covariance = risk_matrix @ correlation @ risk_matrix
105
106 # Portfolio variance: w^T Σ w
107 portfolio_var = torch.sum(
108 weights.unsqueeze(1) @ covariance @ weights.unsqueeze(2),
109 dim=[1, 2]
110 )
111
112 return portfolio_var
113
114 def _estimate_correlation(self, returns):
115 """Estimate correlation matrix from returns"""
116 # Center returns
117 returns_centered = returns - torch.mean(returns, dim=0, keepdim=True)
118
119 # Compute correlation
120 correlation = torch.matmul(
121 returns_centered.T, returns_centered
122 ) / returns.shape[0]
123
124 # Normalize to correlation
125 std = torch.sqrt(torch.diag(correlation))
126 correlation = correlation / (std.unsqueeze(0) * std.unsqueeze(1) + 1e-8)
127
128 return correlation
129Quality features are critical for portfolio optimization:
1class PortfolioDataPipeline:
2 """
3 Feature engineering pipeline for portfolio optimization.
4 """
5
6 def __init__(self, lookback_days: int = 252):
7 self.lookback_days = lookback_days
8 self.feature_generators = [
9 self._momentum_features,
10 self._volatility_features,
11 self._value_features,
12 self._quality_features,
13 self._macro_features
14 ]
15
16 def prepare_features(self, prices, fundamentals, macro_data):
17 """
18 Generate comprehensive feature set for each asset.
19 """
20 features = {}
21
22 # Technical features from price data
23 returns = prices.pct_change()
24
25 for generator in self.feature_generators:
26 feature_dict = generator(prices, returns, fundamentals, macro_data)
27 features.update(feature_dict)
28
29 # Normalize features
30 normalized = self._normalize_features(features)
31
32 # Create sequences for time-series model
33 sequences = self._create_sequences(normalized, self.lookback_days)
34
35 return sequences
36
37 def _momentum_features(self, prices, returns, fundamentals, macro):
38 """Multi-timeframe momentum indicators"""
39 return {
40 'momentum_1m': returns.rolling(21).mean(),
41 'momentum_3m': returns.rolling(63).mean(),
42 'momentum_6m': returns.rolling(126).mean(),
43 'momentum_12m': returns.rolling(252).mean(),
44 'momentum_acceleration': returns.rolling(21).mean() - returns.rolling(63).mean(),
45 'trend_strength': self._calculate_trend_strength(returns),
46 }
47
48 def _volatility_features(self, prices, returns, fundamentals, macro):
49 """Volatility and risk indicators"""
50 return {
51 'volatility_21d': returns.rolling(21).std(),
52 'volatility_63d': returns.rolling(63).std(),
53 'volatility_ratio': returns.rolling(21).std() / returns.rolling(63).std(),
54 'downside_volatility': self._downside_volatility(returns, window=63),
55 'max_drawdown': self._rolling_max_drawdown(prices, window=252),
56 'var_95': returns.rolling(252).quantile(0.05),
57 'cvar_95': self._conditional_var(returns, confidence=0.95, window=252)
58 }
59
60 def _value_features(self, prices, returns, fundamentals, macro):
61 """Valuation metrics"""
62 return {
63 'pe_ratio': fundamentals['price'] / fundamentals['earnings_per_share'],
64 'pb_ratio': fundamentals['price'] / fundamentals['book_value_per_share'],
65 'dividend_yield': fundamentals['dividend'] / fundamentals['price'],
66 'earnings_yield': fundamentals['earnings_per_share'] / fundamentals['price'],
67 'fcf_yield': fundamentals['free_cash_flow'] / fundamentals['market_cap']
68 }
69
70 def _quality_features(self, prices, returns, fundamentals, macro):
71 """Quality and fundamental strength"""
72 return {
73 'roa': fundamentals['net_income'] / fundamentals['total_assets'],
74 'roe': fundamentals['net_income'] / fundamentals['shareholders_equity'],
75 'profit_margin': fundamentals['net_income'] / fundamentals['revenue'],
76 'debt_to_equity': fundamentals['total_debt'] / fundamentals['shareholders_equity'],
77 'current_ratio': fundamentals['current_assets'] / fundamentals['current_liabilities'],
78 'earnings_growth': fundamentals['earnings_per_share'].pct_change(4) # YoY
79 }
80
81 def _macro_features(self, prices, returns, fundamentals, macro):
82 """Macroeconomic indicators"""
83 return {
84 'interest_rate': macro['fed_funds_rate'],
85 'yield_curve': macro['10y_yield'] - macro['2y_yield'],
86 'inflation': macro['cpi'].pct_change(12),
87 'gdp_growth': macro['gdp'].pct_change(4),
88 'credit_spread': macro['baa_yield'] - macro['10y_treasury'],
89 'vix': macro['vix_index']
90 }
91
92 def _downside_volatility(self, returns, window):
93 """Calculate downside deviation (semi-variance)"""
94 negative_returns = returns.copy()
95 negative_returns[negative_returns > 0] = 0
96 return negative_returns.rolling(window).std()
97
98 def _conditional_var(self, returns, confidence, window):
99 """Calculate Conditional Value at Risk (Expected Shortfall)"""
100 var = returns.rolling(window).quantile(1 - confidence)
101 cvar = returns.rolling(window).apply(
102 lambda x: x[x <= x.quantile(1 - confidence)].mean()
103 )
104 return cvar
1051class PortfolioTrainer:
2 """
3 Training pipeline for portfolio optimization network.
4 """
5
6 def __init__(
7 self,
8 model: PortfolioTransformer,
9 loss_fn: PortfolioLoss,
10 learning_rate: float = 1e-4,
11 device: str = 'cuda'
12 ):
13 self.model = model.to(device)
14 self.loss_fn = loss_fn
15 self.device = device
16
17 # Use AdamW with weight decay for regularization
18 self.optimizer = torch.optim.AdamW(
19 model.parameters(),
20 lr=learning_rate,
21 weight_decay=0.01
22 )
23
24 # Learning rate scheduler
25 self.scheduler = torch.optim.lr_scheduler.OneCycleLR(
26 self.optimizer,
27 max_lr=learning_rate,
28 total_steps=10000,
29 pct_start=0.1
30 )
31
32 def train_epoch(self, train_loader):
33 """Train for one epoch"""
34 self.model.train()
35 total_loss = 0
36
37 for batch in train_loader:
38 features = batch['features'].to(self.device)
39 returns = batch['returns'].to(self.device)
40 regime = batch.get('regime')
41 prev_weights = batch.get('previous_weights')
42
43 # Forward pass
44 weights, pred_regime, pred_risk, attention = self.model(features)
45
46 # Calculate loss
47 loss_dict = self.loss_fn(
48 weights=weights,
49 predicted_regime=pred_regime,
50 predicted_risk=pred_risk,
51 returns=returns,
52 true_regime=regime,
53 previous_weights=prev_weights
54 )
55
56 # Backward pass
57 self.optimizer.zero_grad()
58 loss_dict['total'].backward()
59
60 # Gradient clipping
61 torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
62
63 self.optimizer.step()
64 self.scheduler.step()
65
66 total_loss += loss_dict['total'].item()
67
68 return total_loss / len(train_loader)
69
70 def evaluate(self, val_loader):
71 """Evaluate model performance"""
72 self.model.eval()
73 metrics = {
74 'sharpe_ratio': [],
75 'max_drawdown': [],
76 'turnover': [],
77 'concentration': []
78 }
79
80 with torch.no_grad():
81 for batch in val_loader:
82 features = batch['features'].to(self.device)
83 returns = batch['returns'].to(self.device)
84
85 weights, _, _, _ = self.model(features)
86
87 # Calculate realized metrics
88 portfolio_returns = (weights * returns).sum(dim=1)
89
90 sharpe = self._calculate_sharpe(portfolio_returns)
91 drawdown = self._calculate_max_drawdown(portfolio_returns)
92
93 metrics['sharpe_ratio'].append(sharpe)
94 metrics['max_drawdown'].append(drawdown)
95
96 return {k: np.mean(v) for k, v in metrics.items()}
971class ProductionPortfolioOptimizer:
2 """
3 Production-ready portfolio optimizer with risk management.
4 """
5
6 def __init__(self, model_path: str, config: dict):
7 self.model = self._load_model(model_path)
8 self.config = config
9 self.risk_manager = PortfolioRiskManager(config)
10
11 async def optimize(self, market_data, current_positions):
12 """
13 Generate optimal portfolio weights.
14 """
15 # Prepare features
16 features = self._prepare_features(market_data)
17
18 # Generate predictions
19 with torch.no_grad():
20 weights, regime, risk, _ = self.model(features)
21
22 # Apply risk constraints
23 constrained_weights = await self.risk_manager.apply_constraints(
24 weights,
25 current_positions,
26 regime,
27 risk
28 )
29
30 # Generate rebalancing trades
31 trades = self._generate_trades(current_positions, constrained_weights)
32
33 return {
34 'target_weights': constrained_weights,
35 'trades': trades,
36 'regime': regime,
37 'expected_risk': risk
38 }
39
40class PortfolioRiskManager:
41 """Risk management for portfolio optimization"""
42
43 async def apply_constraints(self, weights, current_positions, regime, risk):
44 """Apply risk and regulatory constraints"""
45
46 # 1. Position limits
47 weights = self._apply_position_limits(weights)
48
49 # 2. Sector limits
50 weights = self._apply_sector_limits(weights)
51
52 # 3. Liquidity constraints
53 weights = await self._apply_liquidity_constraints(weights, current_positions)
54
55 # 4. Risk budget
56 weights = self._apply_risk_budget(weights, risk)
57
58 # 5. Turnover limits
59 weights = self._apply_turnover_limits(weights, current_positions)
60
61 return weights
62Deep learning for portfolio optimization is powerful but requires careful engineering. The key is combining neural networks' ability to learn complex patterns with sound risk management and proper validation.
Success comes from treating this as an engineering problem, not a pure ML research project.
Need help building AI-powered portfolio optimization systems? Get in touch to discuss your needs.
Technical Writer
NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.
Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.