Deep Learning for Portfolio Optimization: Beyond Mean-Variance

Introduction #

Traditional portfolio optimization relies on mean-variance analysis, a framework developed by Harry Markowitz in 1952. While elegant, it makes strong assumptions that often fail in real markets: normal returns, stable correlations, and static risk profiles.

Deep learning offers a path beyond these limitations. After building portfolio optimization systems for institutional clients managing billions in AUM, we've learned how to apply neural networks effectively while avoiding common pitfalls.

The Problem with Traditional Approaches #

Mean-variance optimization has well-known issues:

Problem	Impact	Traditional Solution	DL Solution
Non-normal returns	Fat tails, skewness ignored	Robust optimization	Learn full distribution
Unstable correlations	Estimation error amplification	Shrinkage estimators	Temporal dependencies
Regime changes	Fixed allocations fail	Conditional optimization	Regime-aware networks
Non-linear dependencies	Missed diversification	Higher moments	Attention mechanisms
High dimensionality	Curse of dimensionality	Factor models	Representation learning

Architecture: Attention-Based Portfolio Network #

We use a transformer-based architecture that learns temporal dependencies and cross-asset relationships:

python

1import torch
2import torch.nn as nn
3import numpy as np
4
5class PortfolioTransformer(nn.Module):
6    """
7    Transformer-based portfolio optimizer that learns:
8    1. Temporal patterns in asset returns
9    2. Cross-asset dependencies (beyond correlation)
10    3. Regime-conditional allocations
11    4. Risk-return trade-offs
12    """
13    
14    def __init__(
15        self,
16        n_assets: int,
17        n_features: int,
18        d_model: int = 256,
19        n_heads: int = 8,
20        n_layers: int = 6,
21        dropout: float = 0.1
22    ):
23        super().__init__()
24        
25        self.n_assets = n_assets
26        self.n_features = n_features
27        
28        # Feature embedding for each asset
29        self.feature_embedding = nn.Linear(n_features, d_model)
30        
31        # Positional encoding for time series
32        self.temporal_encoding = TemporalEncoding(d_model)
33        
34        # Multi-head attention layers
35        encoder_layer = nn.TransformerEncoderLayer(
36            d_model=d_model,
37            nhead=n_heads,
38            dim_feedforward=d_model * 4,
39            dropout=dropout,
40            batch_first=True
41        )
42        self.transformer = nn.TransformerEncoder(encoder_layer, n_layers)
43        
44        # Asset attention: learn cross-asset dependencies
45        self.asset_attention = nn.MultiheadAttention(
46            d_model, n_heads, dropout=dropout, batch_first=True
47        )
48        
49        # Regime detection head
50        self.regime_head = nn.Sequential(
51            nn.Linear(d_model, 128),
52            nn.ReLU(),
53            nn.Dropout(dropout),
54            nn.Linear(128, 4),  # 4 market regimes
55            nn.Softmax(dim=-1)
56        )
57        
58        # Portfolio weight head
59        self.weight_head = nn.Sequential(
60            nn.Linear(d_model, 128),
61            nn.ReLU(),
62            nn.Dropout(dropout),
63            nn.Linear(128, 1)
64        )
65        
66        # Risk prediction head
67        self.risk_head = nn.Sequential(
68            nn.Linear(d_model, 128),
69            nn.ReLU(),
70            nn.Dropout(dropout),
71            nn.Linear(128, 1),
72            nn.Softplus()  # Ensure positive risk
73        )
74    
75    def forward(self, features, mask=None):
76        """
77        Args:
78            features: (batch, n_assets, seq_len, n_features)
79            mask: Optional attention mask
80            
81        Returns:
82            weights: (batch, n_assets) - Portfolio weights
83            regime: (batch, 4) - Regime probabilities
84            risk: (batch, n_assets) - Asset risk predictions
85        """
86        batch_size, n_assets, seq_len, n_features = features.shape
87        
88        # Reshape for processing
89        x = features.reshape(batch_size * n_assets, seq_len, n_features)
90        
91        # Feature embedding
92        x = self.feature_embedding(x)  # (batch*n_assets, seq_len, d_model)
93        
94        # Add temporal encoding
95        x = self.temporal_encoding(x)
96        
97        # Process through transformer
98        x = self.transformer(x, mask=mask)  # (batch*n_assets, seq_len, d_model)
99        
100        # Take last time step
101        x = x[:, -1, :]  # (batch*n_assets, d_model)
102        
103        # Reshape back to separate assets
104        x = x.reshape(batch_size, n_assets, -1)  # (batch, n_assets, d_model)
105        
106        # Cross-asset attention
107        asset_features, attention_weights = self.asset_attention(x, x, x)
108        
109        # Predict regime (from pooled representation)
110        pooled = torch.mean(asset_features, dim=1)  # (batch, d_model)
111        regime = self.regime_head(pooled)  # (batch, 4)
112        
113        # Predict weights and risk for each asset
114        raw_weights = self.weight_head(asset_features).squeeze(-1)  # (batch, n_assets)
115        risk = self.risk_head(asset_features).squeeze(-1)  # (batch, n_assets)
116        
117        # Normalize weights to sum to 1 (with optional constraints)
118        weights = self._normalize_weights(raw_weights)
119        
120        return weights, regime, risk, attention_weights
121    
122    def _normalize_weights(self, raw_weights):
123        """
124        Convert raw logits to valid portfolio weights.
125        Supports long-only, long-short, and bounded constraints.
126        """
127        # Long-only: softmax
128        weights = torch.softmax(raw_weights, dim=-1)
129        
130        # Could also implement:
131        # - Long-short: tanh normalization
132        # - Bounded: sigmoid with scaling
133        # - Sector constraints: hierarchical softmax
134        
135        return weights
136
137class TemporalEncoding(nn.Module):
138    """
139    Add temporal information to sequences.
140    Combines sinusoidal encoding with learnable parameters.
141    """
142    
143    def __init__(self, d_model: int, max_len: int = 5000):
144        super().__init__()
145        
146        # Sinusoidal positional encoding
147        position = torch.arange(max_len).unsqueeze(1)
148        div_term = torch.exp(
149            torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model)
150        )
151        
152        pe = torch.zeros(max_len, d_model)
153        pe[:, 0::2] = torch.sin(position * div_term)
154        pe[:, 1::2] = torch.cos(position * div_term)
155        
156        self.register_buffer('pe', pe)
157        
158        # Learnable temporal embeddings
159        self.temporal_embedding = nn.Parameter(torch.randn(1, max_len, d_model) * 0.02)
160    
161    def forward(self, x):
162        """
163        Args:
164            x: (batch, seq_len, d_model)
165        """
166        seq_len = x.size(1)
167        x = x + self.pe[:seq_len, :].unsqueeze(0)
168        x = x + self.temporal_embedding[:, :seq_len, :]
169        return x
170

Training with Custom Loss Functions #

Portfolio optimization requires custom loss functions that balance return, risk, and constraints:

python

1class PortfolioLoss(nn.Module):
2    """
3    Multi-objective loss for portfolio optimization.
4    Combines return prediction, risk management, and constraints.
5    """
6    
7    def __init__(
8        self,
9        risk_aversion: float = 1.0,
10        regime_weight: float = 0.1,
11        turnover_penalty: float = 0.01,
12        concentration_penalty: float = 0.05
13    ):
14        super().__init__()
15        self.risk_aversion = risk_aversion
16        self.regime_weight = regime_weight
17        self.turnover_penalty = turnover_penalty
18        self.concentration_penalty = concentration_penalty
19    
20    def forward(
21        self,
22        weights,
23        predicted_regime,
24        predicted_risk,
25        returns,
26        true_regime=None,
27        previous_weights=None
28    ):
29        """
30        Compute portfolio optimization loss.
31        
32        Args:
33            weights: (batch, n_assets) - Predicted portfolio weights
34            predicted_regime: (batch, 4) - Regime probabilities
35            predicted_risk: (batch, n_assets) - Risk predictions
36            returns: (batch, n_assets) - Realized returns (forward-looking)
37            true_regime: (batch,) - True regime labels (optional)
38            previous_weights: (batch, n_assets) - Previous period weights
39        """
40        batch_size = weights.shape[0]
41        
42        # 1. Return component: maximize expected portfolio return
43        portfolio_returns = torch.sum(weights * returns, dim=1)
44        return_loss = -torch.mean(portfolio_returns)
45        
46        # 2. Risk component: minimize portfolio variance
47        # Use predicted risk to estimate covariance
48        portfolio_variance = self._calculate_portfolio_variance(
49            weights, predicted_risk, returns
50        )
51        risk_loss = torch.mean(portfolio_variance)
52        
53        # 3. Sharpe ratio objective (combining return and risk)
54        sharpe_loss = -torch.mean(
55            portfolio_returns / (torch.sqrt(portfolio_variance) + 1e-8)
56        )
57        
58        # 4. Regime prediction loss (if labels available)
59        regime_loss = 0
60        if true_regime is not None:
61            regime_loss = nn.CrossEntropyLoss()(predicted_regime, true_regime)
62        
63        # 5. Turnover penalty (transaction costs)
64        turnover_loss = 0
65        if previous_weights is not None:
66            turnover = torch.sum(torch.abs(weights - previous_weights), dim=1)
67            turnover_loss = torch.mean(turnover)
68        
69        # 6. Concentration penalty (avoid putting all eggs in one basket)
70        concentration = torch.sum(weights ** 2, dim=1)  # Herfindahl index
71        concentration_loss = torch.mean(concentration)
72        
73        # Combine losses
74        total_loss = (
75            sharpe_loss +
76            self.risk_aversion * risk_loss +
77            self.regime_weight * regime_loss +
78            self.turnover_penalty * turnover_loss +
79            self.concentration_penalty * concentration_loss
80        )
81        
82        return {
83            'total': total_loss,
84            'sharpe': sharpe_loss,
85            'return': return_loss,
86            'risk': risk_loss,
87            'regime': regime_loss,
88            'turnover': turnover_loss,
89            'concentration': concentration_loss
90        }
91    
92    def _calculate_portfolio_variance(self, weights, predicted_risk, returns):
93        """
94        Estimate portfolio variance using predicted risk and empirical correlation.
95        """
96        # Simple approach: use predicted individual risks and sample correlation
97        n_assets = weights.shape[1]
98        
99        # Empirical correlation from recent returns
100        correlation = self._estimate_correlation(returns)
101        
102        # Construct covariance matrix
103        risk_matrix = torch.diag_embed(predicted_risk)
104        covariance = risk_matrix @ correlation @ risk_matrix
105        
106        # Portfolio variance: w^T Σ w
107        portfolio_var = torch.sum(
108            weights.unsqueeze(1) @ covariance @ weights.unsqueeze(2),
109            dim=[1, 2]
110        )
111        
112        return portfolio_var
113    
114    def _estimate_correlation(self, returns):
115        """Estimate correlation matrix from returns"""
116        # Center returns
117        returns_centered = returns - torch.mean(returns, dim=0, keepdim=True)
118        
119        # Compute correlation
120        correlation = torch.matmul(
121            returns_centered.T, returns_centered
122        ) / returns.shape[0]
123        
124        # Normalize to correlation
125        std = torch.sqrt(torch.diag(correlation))
126        correlation = correlation / (std.unsqueeze(0) * std.unsqueeze(1) + 1e-8)
127        
128        return correlation
129

Data Pipeline and Feature Engineering #

Quality features are critical for portfolio optimization:

python

1class PortfolioDataPipeline:
2    """
3    Feature engineering pipeline for portfolio optimization.
4    """
5    
6    def __init__(self, lookback_days: int = 252):
7        self.lookback_days = lookback_days
8        self.feature_generators = [
9            self._momentum_features,
10            self._volatility_features,
11            self._value_features,
12            self._quality_features,
13            self._macro_features
14        ]
15    
16    def prepare_features(self, prices, fundamentals, macro_data):
17        """
18        Generate comprehensive feature set for each asset.
19        """
20        features = {}
21        
22        # Technical features from price data
23        returns = prices.pct_change()
24        
25        for generator in self.feature_generators:
26            feature_dict = generator(prices, returns, fundamentals, macro_data)
27            features.update(feature_dict)
28        
29        # Normalize features
30        normalized = self._normalize_features(features)
31        
32        # Create sequences for time-series model
33        sequences = self._create_sequences(normalized, self.lookback_days)
34        
35        return sequences
36    
37    def _momentum_features(self, prices, returns, fundamentals, macro):
38        """Multi-timeframe momentum indicators"""
39        return {
40            'momentum_1m': returns.rolling(21).mean(),
41            'momentum_3m': returns.rolling(63).mean(),
42            'momentum_6m': returns.rolling(126).mean(),
43            'momentum_12m': returns.rolling(252).mean(),
44            'momentum_acceleration': returns.rolling(21).mean() - returns.rolling(63).mean(),
45            'trend_strength': self._calculate_trend_strength(returns),
46        }
47    
48    def _volatility_features(self, prices, returns, fundamentals, macro):
49        """Volatility and risk indicators"""
50        return {
51            'volatility_21d': returns.rolling(21).std(),
52            'volatility_63d': returns.rolling(63).std(),
53            'volatility_ratio': returns.rolling(21).std() / returns.rolling(63).std(),
54            'downside_volatility': self._downside_volatility(returns, window=63),
55            'max_drawdown': self._rolling_max_drawdown(prices, window=252),
56            'var_95': returns.rolling(252).quantile(0.05),
57            'cvar_95': self._conditional_var(returns, confidence=0.95, window=252)
58        }
59    
60    def _value_features(self, prices, returns, fundamentals, macro):
61        """Valuation metrics"""
62        return {
63            'pe_ratio': fundamentals['price'] / fundamentals['earnings_per_share'],
64            'pb_ratio': fundamentals['price'] / fundamentals['book_value_per_share'],
65            'dividend_yield': fundamentals['dividend'] / fundamentals['price'],
66            'earnings_yield': fundamentals['earnings_per_share'] / fundamentals['price'],
67            'fcf_yield': fundamentals['free_cash_flow'] / fundamentals['market_cap']
68        }
69    
70    def _quality_features(self, prices, returns, fundamentals, macro):
71        """Quality and fundamental strength"""
72        return {
73            'roa': fundamentals['net_income'] / fundamentals['total_assets'],
74            'roe': fundamentals['net_income'] / fundamentals['shareholders_equity'],
75            'profit_margin': fundamentals['net_income'] / fundamentals['revenue'],
76            'debt_to_equity': fundamentals['total_debt'] / fundamentals['shareholders_equity'],
77            'current_ratio': fundamentals['current_assets'] / fundamentals['current_liabilities'],
78            'earnings_growth': fundamentals['earnings_per_share'].pct_change(4)  # YoY
79        }
80    
81    def _macro_features(self, prices, returns, fundamentals, macro):
82        """Macroeconomic indicators"""
83        return {
84            'interest_rate': macro['fed_funds_rate'],
85            'yield_curve': macro['10y_yield'] - macro['2y_yield'],
86            'inflation': macro['cpi'].pct_change(12),
87            'gdp_growth': macro['gdp'].pct_change(4),
88            'credit_spread': macro['baa_yield'] - macro['10y_treasury'],
89            'vix': macro['vix_index']
90        }
91    
92    def _downside_volatility(self, returns, window):
93        """Calculate downside deviation (semi-variance)"""
94        negative_returns = returns.copy()
95        negative_returns[negative_returns > 0] = 0
96        return negative_returns.rolling(window).std()
97    
98    def _conditional_var(self, returns, confidence, window):
99        """Calculate Conditional Value at Risk (Expected Shortfall)"""
100        var = returns.rolling(window).quantile(1 - confidence)
101        cvar = returns.rolling(window).apply(
102            lambda x: x[x <= x.quantile(1 - confidence)].mean()
103        )
104        return cvar
105

Training Strategy #

python

1class PortfolioTrainer:
2    """
3    Training pipeline for portfolio optimization network.
4    """
5    
6    def __init__(
7        self,
8        model: PortfolioTransformer,
9        loss_fn: PortfolioLoss,
10        learning_rate: float = 1e-4,
11        device: str = 'cuda'
12    ):
13        self.model = model.to(device)
14        self.loss_fn = loss_fn
15        self.device = device
16        
17        # Use AdamW with weight decay for regularization
18        self.optimizer = torch.optim.AdamW(
19            model.parameters(),
20            lr=learning_rate,
21            weight_decay=0.01
22        )
23        
24        # Learning rate scheduler
25        self.scheduler = torch.optim.lr_scheduler.OneCycleLR(
26            self.optimizer,
27            max_lr=learning_rate,
28            total_steps=10000,
29            pct_start=0.1
30        )
31    
32    def train_epoch(self, train_loader):
33        """Train for one epoch"""
34        self.model.train()
35        total_loss = 0
36        
37        for batch in train_loader:
38            features = batch['features'].to(self.device)
39            returns = batch['returns'].to(self.device)
40            regime = batch.get('regime')
41            prev_weights = batch.get('previous_weights')
42            
43            # Forward pass
44            weights, pred_regime, pred_risk, attention = self.model(features)
45            
46            # Calculate loss
47            loss_dict = self.loss_fn(
48                weights=weights,
49                predicted_regime=pred_regime,
50                predicted_risk=pred_risk,
51                returns=returns,
52                true_regime=regime,
53                previous_weights=prev_weights
54            )
55            
56            # Backward pass
57            self.optimizer.zero_grad()
58            loss_dict['total'].backward()
59            
60            # Gradient clipping
61            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
62            
63            self.optimizer.step()
64            self.scheduler.step()
65            
66            total_loss += loss_dict['total'].item()
67        
68        return total_loss / len(train_loader)
69    
70    def evaluate(self, val_loader):
71        """Evaluate model performance"""
72        self.model.eval()
73        metrics = {
74            'sharpe_ratio': [],
75            'max_drawdown': [],
76            'turnover': [],
77            'concentration': []
78        }
79        
80        with torch.no_grad():
81            for batch in val_loader:
82                features = batch['features'].to(self.device)
83                returns = batch['returns'].to(self.device)
84                
85                weights, _, _, _ = self.model(features)
86                
87                # Calculate realized metrics
88                portfolio_returns = (weights * returns).sum(dim=1)
89                
90                sharpe = self._calculate_sharpe(portfolio_returns)
91                drawdown = self._calculate_max_drawdown(portfolio_returns)
92                
93                metrics['sharpe_ratio'].append(sharpe)
94                metrics['max_drawdown'].append(drawdown)
95        
96        return {k: np.mean(v) for k, v in metrics.items()}
97

Production Deployment #

python

1class ProductionPortfolioOptimizer:
2    """
3    Production-ready portfolio optimizer with risk management.
4    """
5    
6    def __init__(self, model_path: str, config: dict):
7        self.model = self._load_model(model_path)
8        self.config = config
9        self.risk_manager = PortfolioRiskManager(config)
10        
11    async def optimize(self, market_data, current_positions):
12        """
13        Generate optimal portfolio weights.
14        """
15        # Prepare features
16        features = self._prepare_features(market_data)
17        
18        # Generate predictions
19        with torch.no_grad():
20            weights, regime, risk, _ = self.model(features)
21        
22        # Apply risk constraints
23        constrained_weights = await self.risk_manager.apply_constraints(
24            weights,
25            current_positions,
26            regime,
27            risk
28        )
29        
30        # Generate rebalancing trades
31        trades = self._generate_trades(current_positions, constrained_weights)
32        
33        return {
34            'target_weights': constrained_weights,
35            'trades': trades,
36            'regime': regime,
37            'expected_risk': risk
38        }
39
40class PortfolioRiskManager:
41    """Risk management for portfolio optimization"""
42    
43    async def apply_constraints(self, weights, current_positions, regime, risk):
44        """Apply risk and regulatory constraints"""
45        
46        # 1. Position limits
47        weights = self._apply_position_limits(weights)
48        
49        # 2. Sector limits
50        weights = self._apply_sector_limits(weights)
51        
52        # 3. Liquidity constraints
53        weights = await self._apply_liquidity_constraints(weights, current_positions)
54        
55        # 4. Risk budget
56        weights = self._apply_risk_budget(weights, risk)
57        
58        # 5. Turnover limits
59        weights = self._apply_turnover_limits(weights, current_positions)
60        
61        return weights
62

Lessons Learned #

Feature Quality > Model Complexity: Good features with a simple model beat poor features with a complex model
Transaction Costs Matter: Model turnover explicitly; frequent rebalancing kills returns
Risk Management is Critical: Constraints prevent the model from taking excessive risks
Regime Awareness: Market conditions change; models must adapt
Validation is Hard: Walk-forward testing with realistic costs is essential
Interpretability Helps: Attention weights show what the model learned about cross-asset relationships

Conclusion #

Deep learning for portfolio optimization is powerful but requires careful engineering. The key is combining neural networks' ability to learn complex patterns with sound risk management and proper validation.

Success comes from treating this as an engineering problem, not a pure ML research project.

Need help building AI-powered portfolio optimization systems? Get in touch to discuss your needs.

Introduction #

The Problem with Traditional Approaches #

Mean-variance optimization has well-known issues:

Problem	Impact	Traditional Solution	DL Solution
Non-normal returns	Fat tails, skewness ignored	Robust optimization	Learn full distribution
Unstable correlations	Estimation error amplification	Shrinkage estimators	Temporal dependencies
Regime changes	Fixed allocations fail	Conditional optimization	Regime-aware networks
Non-linear dependencies	Missed diversification	Higher moments	Attention mechanisms
High dimensionality	Curse of dimensionality	Factor models	Representation learning

Architecture: Attention-Based Portfolio Network #

We use a transformer-based architecture that learns temporal dependencies and cross-asset relationships:

python

1import torch
2import torch.nn as nn
3import numpy as np
4
5class PortfolioTransformer(nn.Module):
6    """
7    Transformer-based portfolio optimizer that learns:
8    1. Temporal patterns in asset returns
9    2. Cross-asset dependencies (beyond correlation)
10    3. Regime-conditional allocations
11    4. Risk-return trade-offs
12    """
13    
14    def __init__(
15        self,
16        n_assets: int,
17        n_features: int,
18        d_model: int = 256,
19        n_heads: int = 8,
20        n_layers: int = 6,
21        dropout: float = 0.1
22    ):
23        super().__init__()
24        
25        self.n_assets = n_assets
26        self.n_features = n_features
27        
28        # Feature embedding for each asset
29        self.feature_embedding = nn.Linear(n_features, d_model)
30        
31        # Positional encoding for time series
32        self.temporal_encoding = TemporalEncoding(d_model)
33        
34        # Multi-head attention layers
35        encoder_layer = nn.TransformerEncoderLayer(
36            d_model=d_model,
37            nhead=n_heads,
38            dim_feedforward=d_model * 4,
39            dropout=dropout,
40            batch_first=True
41        )
42        self.transformer = nn.TransformerEncoder(encoder_layer, n_layers)
43        
44        # Asset attention: learn cross-asset dependencies
45        self.asset_attention = nn.MultiheadAttention(
46            d_model, n_heads, dropout=dropout, batch_first=True
47        )
48        
49        # Regime detection head
50        self.regime_head = nn.Sequential(
51            nn.Linear(d_model, 128),
52            nn.ReLU(),
53            nn.Dropout(dropout),
54            nn.Linear(128, 4),  # 4 market regimes
55            nn.Softmax(dim=-1)
56        )
57        
58        # Portfolio weight head
59        self.weight_head = nn.Sequential(
60            nn.Linear(d_model, 128),
61            nn.ReLU(),
62            nn.Dropout(dropout),
63            nn.Linear(128, 1)
64        )
65        
66        # Risk prediction head
67        self.risk_head = nn.Sequential(
68            nn.Linear(d_model, 128),
69            nn.ReLU(),
70            nn.Dropout(dropout),
71            nn.Linear(128, 1),
72            nn.Softplus()  # Ensure positive risk
73        )
74    
75    def forward(self, features, mask=None):
76        """
77        Args:
78            features: (batch, n_assets, seq_len, n_features)
79            mask: Optional attention mask
80            
81        Returns:
82            weights: (batch, n_assets) - Portfolio weights
83            regime: (batch, 4) - Regime probabilities
84            risk: (batch, n_assets) - Asset risk predictions
85        """
86        batch_size, n_assets, seq_len, n_features = features.shape
87        
88        # Reshape for processing
89        x = features.reshape(batch_size * n_assets, seq_len, n_features)
90        
91        # Feature embedding
92        x = self.feature_embedding(x)  # (batch*n_assets, seq_len, d_model)
93        
94        # Add temporal encoding
95        x = self.temporal_encoding(x)
96        
97        # Process through transformer
98        x = self.transformer(x, mask=mask)  # (batch*n_assets, seq_len, d_model)
99        
100        # Take last time step
101        x = x[:, -1, :]  # (batch*n_assets, d_model)
102        
103        # Reshape back to separate assets
104        x = x.reshape(batch_size, n_assets, -1)  # (batch, n_assets, d_model)
105        
106        # Cross-asset attention
107        asset_features, attention_weights = self.asset_attention(x, x, x)
108        
109        # Predict regime (from pooled representation)
110        pooled = torch.mean(asset_features, dim=1)  # (batch, d_model)
111        regime = self.regime_head(pooled)  # (batch, 4)
112        
113        # Predict weights and risk for each asset
114        raw_weights = self.weight_head(asset_features).squeeze(-1)  # (batch, n_assets)
115        risk = self.risk_head(asset_features).squeeze(-1)  # (batch, n_assets)
116        
117        # Normalize weights to sum to 1 (with optional constraints)
118        weights = self._normalize_weights(raw_weights)
119        
120        return weights, regime, risk, attention_weights
121    
122    def _normalize_weights(self, raw_weights):
123        """
124        Convert raw logits to valid portfolio weights.
125        Supports long-only, long-short, and bounded constraints.
126        """
127        # Long-only: softmax
128        weights = torch.softmax(raw_weights, dim=-1)
129        
130        # Could also implement:
131        # - Long-short: tanh normalization
132        # - Bounded: sigmoid with scaling
133        # - Sector constraints: hierarchical softmax
134        
135        return weights
136
137class TemporalEncoding(nn.Module):
138    """
139    Add temporal information to sequences.
140    Combines sinusoidal encoding with learnable parameters.
141    """
142    
143    def __init__(self, d_model: int, max_len: int = 5000):
144        super().__init__()
145        
146        # Sinusoidal positional encoding
147        position = torch.arange(max_len).unsqueeze(1)
148        div_term = torch.exp(
149            torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model)
150        )
151        
152        pe = torch.zeros(max_len, d_model)
153        pe[:, 0::2] = torch.sin(position * div_term)
154        pe[:, 1::2] = torch.cos(position * div_term)
155        
156        self.register_buffer('pe', pe)
157        
158        # Learnable temporal embeddings
159        self.temporal_embedding = nn.Parameter(torch.randn(1, max_len, d_model) * 0.02)
160    
161    def forward(self, x):
162        """
163        Args:
164            x: (batch, seq_len, d_model)
165        """
166        seq_len = x.size(1)
167        x = x + self.pe[:seq_len, :].unsqueeze(0)
168        x = x + self.temporal_embedding[:, :seq_len, :]
169        return x
170

Training with Custom Loss Functions #

Portfolio optimization requires custom loss functions that balance return, risk, and constraints:

python

1class PortfolioLoss(nn.Module):
2    """
3    Multi-objective loss for portfolio optimization.
4    Combines return prediction, risk management, and constraints.
5    """
6    
7    def __init__(
8        self,
9        risk_aversion: float = 1.0,
10        regime_weight: float = 0.1,
11        turnover_penalty: float = 0.01,
12        concentration_penalty: float = 0.05
13    ):
14        super().__init__()
15        self.risk_aversion = risk_aversion
16        self.regime_weight = regime_weight
17        self.turnover_penalty = turnover_penalty
18        self.concentration_penalty = concentration_penalty
19    
20    def forward(
21        self,
22        weights,
23        predicted_regime,
24        predicted_risk,
25        returns,
26        true_regime=None,
27        previous_weights=None
28    ):
29        """
30        Compute portfolio optimization loss.
31        
32        Args:
33            weights: (batch, n_assets) - Predicted portfolio weights
34            predicted_regime: (batch, 4) - Regime probabilities
35            predicted_risk: (batch, n_assets) - Risk predictions
36            returns: (batch, n_assets) - Realized returns (forward-looking)
37            true_regime: (batch,) - True regime labels (optional)
38            previous_weights: (batch, n_assets) - Previous period weights
39        """
40        batch_size = weights.shape[0]
41        
42        # 1. Return component: maximize expected portfolio return
43        portfolio_returns = torch.sum(weights * returns, dim=1)
44        return_loss = -torch.mean(portfolio_returns)
45        
46        # 2. Risk component: minimize portfolio variance
47        # Use predicted risk to estimate covariance
48        portfolio_variance = self._calculate_portfolio_variance(
49            weights, predicted_risk, returns
50        )
51        risk_loss = torch.mean(portfolio_variance)
52        
53        # 3. Sharpe ratio objective (combining return and risk)
54        sharpe_loss = -torch.mean(
55            portfolio_returns / (torch.sqrt(portfolio_variance) + 1e-8)
56        )
57        
58        # 4. Regime prediction loss (if labels available)
59        regime_loss = 0
60        if true_regime is not None:
61            regime_loss = nn.CrossEntropyLoss()(predicted_regime, true_regime)
62        
63        # 5. Turnover penalty (transaction costs)
64        turnover_loss = 0
65        if previous_weights is not None:
66            turnover = torch.sum(torch.abs(weights - previous_weights), dim=1)
67            turnover_loss = torch.mean(turnover)
68        
69        # 6. Concentration penalty (avoid putting all eggs in one basket)
70        concentration = torch.sum(weights ** 2, dim=1)  # Herfindahl index
71        concentration_loss = torch.mean(concentration)
72        
73        # Combine losses
74        total_loss = (
75            sharpe_loss +
76            self.risk_aversion * risk_loss +
77            self.regime_weight * regime_loss +
78            self.turnover_penalty * turnover_loss +
79            self.concentration_penalty * concentration_loss
80        )
81        
82        return {
83            'total': total_loss,
84            'sharpe': sharpe_loss,
85            'return': return_loss,
86            'risk': risk_loss,
87            'regime': regime_loss,
88            'turnover': turnover_loss,
89            'concentration': concentration_loss
90        }
91    
92    def _calculate_portfolio_variance(self, weights, predicted_risk, returns):
93        """
94        Estimate portfolio variance using predicted risk and empirical correlation.
95        """
96        # Simple approach: use predicted individual risks and sample correlation
97        n_assets = weights.shape[1]
98        
99        # Empirical correlation from recent returns
100        correlation = self._estimate_correlation(returns)
101        
102        # Construct covariance matrix
103        risk_matrix = torch.diag_embed(predicted_risk)
104        covariance = risk_matrix @ correlation @ risk_matrix
105        
106        # Portfolio variance: w^T Σ w
107        portfolio_var = torch.sum(
108            weights.unsqueeze(1) @ covariance @ weights.unsqueeze(2),
109            dim=[1, 2]
110        )
111        
112        return portfolio_var
113    
114    def _estimate_correlation(self, returns):
115        """Estimate correlation matrix from returns"""
116        # Center returns
117        returns_centered = returns - torch.mean(returns, dim=0, keepdim=True)
118        
119        # Compute correlation
120        correlation = torch.matmul(
121            returns_centered.T, returns_centered
122        ) / returns.shape[0]
123        
124        # Normalize to correlation
125        std = torch.sqrt(torch.diag(correlation))
126        correlation = correlation / (std.unsqueeze(0) * std.unsqueeze(1) + 1e-8)
127        
128        return correlation
129

Data Pipeline and Feature Engineering #

Quality features are critical for portfolio optimization:

python

1class PortfolioDataPipeline:
2    """
3    Feature engineering pipeline for portfolio optimization.
4    """
5    
6    def __init__(self, lookback_days: int = 252):
7        self.lookback_days = lookback_days
8        self.feature_generators = [
9            self._momentum_features,
10            self._volatility_features,
11            self._value_features,
12            self._quality_features,
13            self._macro_features
14        ]
15    
16    def prepare_features(self, prices, fundamentals, macro_data):
17        """
18        Generate comprehensive feature set for each asset.
19        """
20        features = {}
21        
22        # Technical features from price data
23        returns = prices.pct_change()
24        
25        for generator in self.feature_generators:
26            feature_dict = generator(prices, returns, fundamentals, macro_data)
27            features.update(feature_dict)
28        
29        # Normalize features
30        normalized = self._normalize_features(features)
31        
32        # Create sequences for time-series model
33        sequences = self._create_sequences(normalized, self.lookback_days)
34        
35        return sequences
36    
37    def _momentum_features(self, prices, returns, fundamentals, macro):
38        """Multi-timeframe momentum indicators"""
39        return {
40            'momentum_1m': returns.rolling(21).mean(),
41            'momentum_3m': returns.rolling(63).mean(),
42            'momentum_6m': returns.rolling(126).mean(),
43            'momentum_12m': returns.rolling(252).mean(),
44            'momentum_acceleration': returns.rolling(21).mean() - returns.rolling(63).mean(),
45            'trend_strength': self._calculate_trend_strength(returns),
46        }
47    
48    def _volatility_features(self, prices, returns, fundamentals, macro):
49        """Volatility and risk indicators"""
50        return {
51            'volatility_21d': returns.rolling(21).std(),
52            'volatility_63d': returns.rolling(63).std(),
53            'volatility_ratio': returns.rolling(21).std() / returns.rolling(63).std(),
54            'downside_volatility': self._downside_volatility(returns, window=63),
55            'max_drawdown': self._rolling_max_drawdown(prices, window=252),
56            'var_95': returns.rolling(252).quantile(0.05),
57            'cvar_95': self._conditional_var(returns, confidence=0.95, window=252)
58        }
59    
60    def _value_features(self, prices, returns, fundamentals, macro):
61        """Valuation metrics"""
62        return {
63            'pe_ratio': fundamentals['price'] / fundamentals['earnings_per_share'],
64            'pb_ratio': fundamentals['price'] / fundamentals['book_value_per_share'],
65            'dividend_yield': fundamentals['dividend'] / fundamentals['price'],
66            'earnings_yield': fundamentals['earnings_per_share'] / fundamentals['price'],
67            'fcf_yield': fundamentals['free_cash_flow'] / fundamentals['market_cap']
68        }
69    
70    def _quality_features(self, prices, returns, fundamentals, macro):
71        """Quality and fundamental strength"""
72        return {
73            'roa': fundamentals['net_income'] / fundamentals['total_assets'],
74            'roe': fundamentals['net_income'] / fundamentals['shareholders_equity'],
75            'profit_margin': fundamentals['net_income'] / fundamentals['revenue'],
76            'debt_to_equity': fundamentals['total_debt'] / fundamentals['shareholders_equity'],
77            'current_ratio': fundamentals['current_assets'] / fundamentals['current_liabilities'],
78            'earnings_growth': fundamentals['earnings_per_share'].pct_change(4)  # YoY
79        }
80    
81    def _macro_features(self, prices, returns, fundamentals, macro):
82        """Macroeconomic indicators"""
83        return {
84            'interest_rate': macro['fed_funds_rate'],
85            'yield_curve': macro['10y_yield'] - macro['2y_yield'],
86            'inflation': macro['cpi'].pct_change(12),
87            'gdp_growth': macro['gdp'].pct_change(4),
88            'credit_spread': macro['baa_yield'] - macro['10y_treasury'],
89            'vix': macro['vix_index']
90        }
91    
92    def _downside_volatility(self, returns, window):
93        """Calculate downside deviation (semi-variance)"""
94        negative_returns = returns.copy()
95        negative_returns[negative_returns > 0] = 0
96        return negative_returns.rolling(window).std()
97    
98    def _conditional_var(self, returns, confidence, window):
99        """Calculate Conditional Value at Risk (Expected Shortfall)"""
100        var = returns.rolling(window).quantile(1 - confidence)
101        cvar = returns.rolling(window).apply(
102            lambda x: x[x <= x.quantile(1 - confidence)].mean()
103        )
104        return cvar
105

Training Strategy #

python

1class PortfolioTrainer:
2    """
3    Training pipeline for portfolio optimization network.
4    """
5    
6    def __init__(
7        self,
8        model: PortfolioTransformer,
9        loss_fn: PortfolioLoss,
10        learning_rate: float = 1e-4,
11        device: str = 'cuda'
12    ):
13        self.model = model.to(device)
14        self.loss_fn = loss_fn
15        self.device = device
16        
17        # Use AdamW with weight decay for regularization
18        self.optimizer = torch.optim.AdamW(
19            model.parameters(),
20            lr=learning_rate,
21            weight_decay=0.01
22        )
23        
24        # Learning rate scheduler
25        self.scheduler = torch.optim.lr_scheduler.OneCycleLR(
26            self.optimizer,
27            max_lr=learning_rate,
28            total_steps=10000,
29            pct_start=0.1
30        )
31    
32    def train_epoch(self, train_loader):
33        """Train for one epoch"""
34        self.model.train()
35        total_loss = 0
36        
37        for batch in train_loader:
38            features = batch['features'].to(self.device)
39            returns = batch['returns'].to(self.device)
40            regime = batch.get('regime')
41            prev_weights = batch.get('previous_weights')
42            
43            # Forward pass
44            weights, pred_regime, pred_risk, attention = self.model(features)
45            
46            # Calculate loss
47            loss_dict = self.loss_fn(
48                weights=weights,
49                predicted_regime=pred_regime,
50                predicted_risk=pred_risk,
51                returns=returns,
52                true_regime=regime,
53                previous_weights=prev_weights
54            )
55            
56            # Backward pass
57            self.optimizer.zero_grad()
58            loss_dict['total'].backward()
59            
60            # Gradient clipping
61            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
62            
63            self.optimizer.step()
64            self.scheduler.step()
65            
66            total_loss += loss_dict['total'].item()
67        
68        return total_loss / len(train_loader)
69    
70    def evaluate(self, val_loader):
71        """Evaluate model performance"""
72        self.model.eval()
73        metrics = {
74            'sharpe_ratio': [],
75            'max_drawdown': [],
76            'turnover': [],
77            'concentration': []
78        }
79        
80        with torch.no_grad():
81            for batch in val_loader:
82                features = batch['features'].to(self.device)
83                returns = batch['returns'].to(self.device)
84                
85                weights, _, _, _ = self.model(features)
86                
87                # Calculate realized metrics
88                portfolio_returns = (weights * returns).sum(dim=1)
89                
90                sharpe = self._calculate_sharpe(portfolio_returns)
91                drawdown = self._calculate_max_drawdown(portfolio_returns)
92                
93                metrics['sharpe_ratio'].append(sharpe)
94                metrics['max_drawdown'].append(drawdown)
95        
96        return {k: np.mean(v) for k, v in metrics.items()}
97

Production Deployment #

python

1class ProductionPortfolioOptimizer:
2    """
3    Production-ready portfolio optimizer with risk management.
4    """
5    
6    def __init__(self, model_path: str, config: dict):
7        self.model = self._load_model(model_path)
8        self.config = config
9        self.risk_manager = PortfolioRiskManager(config)
10        
11    async def optimize(self, market_data, current_positions):
12        """
13        Generate optimal portfolio weights.
14        """
15        # Prepare features
16        features = self._prepare_features(market_data)
17        
18        # Generate predictions
19        with torch.no_grad():
20            weights, regime, risk, _ = self.model(features)
21        
22        # Apply risk constraints
23        constrained_weights = await self.risk_manager.apply_constraints(
24            weights,
25            current_positions,
26            regime,
27            risk
28        )
29        
30        # Generate rebalancing trades
31        trades = self._generate_trades(current_positions, constrained_weights)
32        
33        return {
34            'target_weights': constrained_weights,
35            'trades': trades,
36            'regime': regime,
37            'expected_risk': risk
38        }
39
40class PortfolioRiskManager:
41    """Risk management for portfolio optimization"""
42    
43    async def apply_constraints(self, weights, current_positions, regime, risk):
44        """Apply risk and regulatory constraints"""
45        
46        # 1. Position limits
47        weights = self._apply_position_limits(weights)
48        
49        # 2. Sector limits
50        weights = self._apply_sector_limits(weights)
51        
52        # 3. Liquidity constraints
53        weights = await self._apply_liquidity_constraints(weights, current_positions)
54        
55        # 4. Risk budget
56        weights = self._apply_risk_budget(weights, risk)
57        
58        # 5. Turnover limits
59        weights = self._apply_turnover_limits(weights, current_positions)
60        
61        return weights
62

Lessons Learned #

Feature Quality > Model Complexity: Good features with a simple model beat poor features with a complex model
Transaction Costs Matter: Model turnover explicitly; frequent rebalancing kills returns
Risk Management is Critical: Constraints prevent the model from taking excessive risks
Regime Awareness: Market conditions change; models must adapt
Validation is Hard: Walk-forward testing with realistic costs is essential
Interpretability Helps: Attention weights show what the model learned about cross-asset relationships

Conclusion #

Success comes from treating this as an engineering problem, not a pure ML research project.

Need help building AI-powered portfolio optimization systems? Get in touch to discuss your needs.

Deep Learning for Portfolio Optimization: Beyond Mean-Variance

Introduction #

The Problem with Traditional Approaches #

Architecture: Attention-Based Portfolio Network #

Training with Custom Loss Functions #

Data Pipeline and Feature Engineering #

Training Strategy #

Production Deployment #

Lessons Learned #

Conclusion #

NordVarg Team

Join 1,000+ Engineers

Related Posts

Deep Learning for Portfolio Optimization: Beyond Mean-Variance

Introduction #

The Problem with Traditional Approaches #

Architecture: Attention-Based Portfolio Network #

Training with Custom Loss Functions #

Data Pipeline and Feature Engineering #

Training Strategy #

Production Deployment #

Lessons Learned #

Conclusion #

NordVarg Team

Join 1,000+ Engineers

Related Posts