NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

November 5, 2024
•
NordVarg Team
•

Deep Learning for Portfolio Optimization: Beyond Mean-Variance

Using neural networks to build adaptive portfolio optimization systems that handle non-linear dependencies and regime changes

Machine LearningDeep LearningPortfolio ManagementPyTorchQuantitative Finance
10 min read
Share:

Introduction#

Traditional portfolio optimization relies on mean-variance analysis, a framework developed by Harry Markowitz in 1952. While elegant, it makes strong assumptions that often fail in real markets: normal returns, stable correlations, and static risk profiles.

Deep learning offers a path beyond these limitations. After building portfolio optimization systems for institutional clients managing billions in AUM, we've learned how to apply neural networks effectively while avoiding common pitfalls.

The Problem with Traditional Approaches#

Mean-variance optimization has well-known issues:

ProblemImpactTraditional SolutionDL Solution
Non-normal returnsFat tails, skewness ignoredRobust optimizationLearn full distribution
Unstable correlationsEstimation error amplificationShrinkage estimatorsTemporal dependencies
Regime changesFixed allocations failConditional optimizationRegime-aware networks
Non-linear dependenciesMissed diversificationHigher momentsAttention mechanisms
High dimensionalityCurse of dimensionalityFactor modelsRepresentation learning

Architecture: Attention-Based Portfolio Network#

We use a transformer-based architecture that learns temporal dependencies and cross-asset relationships:

python
1import torch
2import torch.nn as nn
3import numpy as np
4
5class PortfolioTransformer(nn.Module):
6    """
7    Transformer-based portfolio optimizer that learns:
8    1. Temporal patterns in asset returns
9    2. Cross-asset dependencies (beyond correlation)
10    3. Regime-conditional allocations
11    4. Risk-return trade-offs
12    """
13    
14    def __init__(
15        self,
16        n_assets: int,
17        n_features: int,
18        d_model: int = 256,
19        n_heads: int = 8,
20        n_layers: int = 6,
21        dropout: float = 0.1
22    ):
23        super().__init__()
24        
25        self.n_assets = n_assets
26        self.n_features = n_features
27        
28        # Feature embedding for each asset
29        self.feature_embedding = nn.Linear(n_features, d_model)
30        
31        # Positional encoding for time series
32        self.temporal_encoding = TemporalEncoding(d_model)
33        
34        # Multi-head attention layers
35        encoder_layer = nn.TransformerEncoderLayer(
36            d_model=d_model,
37            nhead=n_heads,
38            dim_feedforward=d_model * 4,
39            dropout=dropout,
40            batch_first=True
41        )
42        self.transformer = nn.TransformerEncoder(encoder_layer, n_layers)
43        
44        # Asset attention: learn cross-asset dependencies
45        self.asset_attention = nn.MultiheadAttention(
46            d_model, n_heads, dropout=dropout, batch_first=True
47        )
48        
49        # Regime detection head
50        self.regime_head = nn.Sequential(
51            nn.Linear(d_model, 128),
52            nn.ReLU(),
53            nn.Dropout(dropout),
54            nn.Linear(128, 4),  # 4 market regimes
55            nn.Softmax(dim=-1)
56        )
57        
58        # Portfolio weight head
59        self.weight_head = nn.Sequential(
60            nn.Linear(d_model, 128),
61            nn.ReLU(),
62            nn.Dropout(dropout),
63            nn.Linear(128, 1)
64        )
65        
66        # Risk prediction head
67        self.risk_head = nn.Sequential(
68            nn.Linear(d_model, 128),
69            nn.ReLU(),
70            nn.Dropout(dropout),
71            nn.Linear(128, 1),
72            nn.Softplus()  # Ensure positive risk
73        )
74    
75    def forward(self, features, mask=None):
76        """
77        Args:
78            features: (batch, n_assets, seq_len, n_features)
79            mask: Optional attention mask
80            
81        Returns:
82            weights: (batch, n_assets) - Portfolio weights
83            regime: (batch, 4) - Regime probabilities
84            risk: (batch, n_assets) - Asset risk predictions
85        """
86        batch_size, n_assets, seq_len, n_features = features.shape
87        
88        # Reshape for processing
89        x = features.reshape(batch_size * n_assets, seq_len, n_features)
90        
91        # Feature embedding
92        x = self.feature_embedding(x)  # (batch*n_assets, seq_len, d_model)
93        
94        # Add temporal encoding
95        x = self.temporal_encoding(x)
96        
97        # Process through transformer
98        x = self.transformer(x, mask=mask)  # (batch*n_assets, seq_len, d_model)
99        
100        # Take last time step
101        x = x[:, -1, :]  # (batch*n_assets, d_model)
102        
103        # Reshape back to separate assets
104        x = x.reshape(batch_size, n_assets, -1)  # (batch, n_assets, d_model)
105        
106        # Cross-asset attention
107        asset_features, attention_weights = self.asset_attention(x, x, x)
108        
109        # Predict regime (from pooled representation)
110        pooled = torch.mean(asset_features, dim=1)  # (batch, d_model)
111        regime = self.regime_head(pooled)  # (batch, 4)
112        
113        # Predict weights and risk for each asset
114        raw_weights = self.weight_head(asset_features).squeeze(-1)  # (batch, n_assets)
115        risk = self.risk_head(asset_features).squeeze(-1)  # (batch, n_assets)
116        
117        # Normalize weights to sum to 1 (with optional constraints)
118        weights = self._normalize_weights(raw_weights)
119        
120        return weights, regime, risk, attention_weights
121    
122    def _normalize_weights(self, raw_weights):
123        """
124        Convert raw logits to valid portfolio weights.
125        Supports long-only, long-short, and bounded constraints.
126        """
127        # Long-only: softmax
128        weights = torch.softmax(raw_weights, dim=-1)
129        
130        # Could also implement:
131        # - Long-short: tanh normalization
132        # - Bounded: sigmoid with scaling
133        # - Sector constraints: hierarchical softmax
134        
135        return weights
136
137class TemporalEncoding(nn.Module):
138    """
139    Add temporal information to sequences.
140    Combines sinusoidal encoding with learnable parameters.
141    """
142    
143    def __init__(self, d_model: int, max_len: int = 5000):
144        super().__init__()
145        
146        # Sinusoidal positional encoding
147        position = torch.arange(max_len).unsqueeze(1)
148        div_term = torch.exp(
149            torch.arange(0, d_model, 2) * (-np.log(10000.0) / d_model)
150        )
151        
152        pe = torch.zeros(max_len, d_model)
153        pe[:, 0::2] = torch.sin(position * div_term)
154        pe[:, 1::2] = torch.cos(position * div_term)
155        
156        self.register_buffer('pe', pe)
157        
158        # Learnable temporal embeddings
159        self.temporal_embedding = nn.Parameter(torch.randn(1, max_len, d_model) * 0.02)
160    
161    def forward(self, x):
162        """
163        Args:
164            x: (batch, seq_len, d_model)
165        """
166        seq_len = x.size(1)
167        x = x + self.pe[:seq_len, :].unsqueeze(0)
168        x = x + self.temporal_embedding[:, :seq_len, :]
169        return x
170

Training with Custom Loss Functions#

Portfolio optimization requires custom loss functions that balance return, risk, and constraints:

python
1class PortfolioLoss(nn.Module):
2    """
3    Multi-objective loss for portfolio optimization.
4    Combines return prediction, risk management, and constraints.
5    """
6    
7    def __init__(
8        self,
9        risk_aversion: float = 1.0,
10        regime_weight: float = 0.1,
11        turnover_penalty: float = 0.01,
12        concentration_penalty: float = 0.05
13    ):
14        super().__init__()
15        self.risk_aversion = risk_aversion
16        self.regime_weight = regime_weight
17        self.turnover_penalty = turnover_penalty
18        self.concentration_penalty = concentration_penalty
19    
20    def forward(
21        self,
22        weights,
23        predicted_regime,
24        predicted_risk,
25        returns,
26        true_regime=None,
27        previous_weights=None
28    ):
29        """
30        Compute portfolio optimization loss.
31        
32        Args:
33            weights: (batch, n_assets) - Predicted portfolio weights
34            predicted_regime: (batch, 4) - Regime probabilities
35            predicted_risk: (batch, n_assets) - Risk predictions
36            returns: (batch, n_assets) - Realized returns (forward-looking)
37            true_regime: (batch,) - True regime labels (optional)
38            previous_weights: (batch, n_assets) - Previous period weights
39        """
40        batch_size = weights.shape[0]
41        
42        # 1. Return component: maximize expected portfolio return
43        portfolio_returns = torch.sum(weights * returns, dim=1)
44        return_loss = -torch.mean(portfolio_returns)
45        
46        # 2. Risk component: minimize portfolio variance
47        # Use predicted risk to estimate covariance
48        portfolio_variance = self._calculate_portfolio_variance(
49            weights, predicted_risk, returns
50        )
51        risk_loss = torch.mean(portfolio_variance)
52        
53        # 3. Sharpe ratio objective (combining return and risk)
54        sharpe_loss = -torch.mean(
55            portfolio_returns / (torch.sqrt(portfolio_variance) + 1e-8)
56        )
57        
58        # 4. Regime prediction loss (if labels available)
59        regime_loss = 0
60        if true_regime is not None:
61            regime_loss = nn.CrossEntropyLoss()(predicted_regime, true_regime)
62        
63        # 5. Turnover penalty (transaction costs)
64        turnover_loss = 0
65        if previous_weights is not None:
66            turnover = torch.sum(torch.abs(weights - previous_weights), dim=1)
67            turnover_loss = torch.mean(turnover)
68        
69        # 6. Concentration penalty (avoid putting all eggs in one basket)
70        concentration = torch.sum(weights ** 2, dim=1)  # Herfindahl index
71        concentration_loss = torch.mean(concentration)
72        
73        # Combine losses
74        total_loss = (
75            sharpe_loss +
76            self.risk_aversion * risk_loss +
77            self.regime_weight * regime_loss +
78            self.turnover_penalty * turnover_loss +
79            self.concentration_penalty * concentration_loss
80        )
81        
82        return {
83            'total': total_loss,
84            'sharpe': sharpe_loss,
85            'return': return_loss,
86            'risk': risk_loss,
87            'regime': regime_loss,
88            'turnover': turnover_loss,
89            'concentration': concentration_loss
90        }
91    
92    def _calculate_portfolio_variance(self, weights, predicted_risk, returns):
93        """
94        Estimate portfolio variance using predicted risk and empirical correlation.
95        """
96        # Simple approach: use predicted individual risks and sample correlation
97        n_assets = weights.shape[1]
98        
99        # Empirical correlation from recent returns
100        correlation = self._estimate_correlation(returns)
101        
102        # Construct covariance matrix
103        risk_matrix = torch.diag_embed(predicted_risk)
104        covariance = risk_matrix @ correlation @ risk_matrix
105        
106        # Portfolio variance: w^T Σ w
107        portfolio_var = torch.sum(
108            weights.unsqueeze(1) @ covariance @ weights.unsqueeze(2),
109            dim=[1, 2]
110        )
111        
112        return portfolio_var
113    
114    def _estimate_correlation(self, returns):
115        """Estimate correlation matrix from returns"""
116        # Center returns
117        returns_centered = returns - torch.mean(returns, dim=0, keepdim=True)
118        
119        # Compute correlation
120        correlation = torch.matmul(
121            returns_centered.T, returns_centered
122        ) / returns.shape[0]
123        
124        # Normalize to correlation
125        std = torch.sqrt(torch.diag(correlation))
126        correlation = correlation / (std.unsqueeze(0) * std.unsqueeze(1) + 1e-8)
127        
128        return correlation
129

Data Pipeline and Feature Engineering#

Quality features are critical for portfolio optimization:

python
1class PortfolioDataPipeline:
2    """
3    Feature engineering pipeline for portfolio optimization.
4    """
5    
6    def __init__(self, lookback_days: int = 252):
7        self.lookback_days = lookback_days
8        self.feature_generators = [
9            self._momentum_features,
10            self._volatility_features,
11            self._value_features,
12            self._quality_features,
13            self._macro_features
14        ]
15    
16    def prepare_features(self, prices, fundamentals, macro_data):
17        """
18        Generate comprehensive feature set for each asset.
19        """
20        features = {}
21        
22        # Technical features from price data
23        returns = prices.pct_change()
24        
25        for generator in self.feature_generators:
26            feature_dict = generator(prices, returns, fundamentals, macro_data)
27            features.update(feature_dict)
28        
29        # Normalize features
30        normalized = self._normalize_features(features)
31        
32        # Create sequences for time-series model
33        sequences = self._create_sequences(normalized, self.lookback_days)
34        
35        return sequences
36    
37    def _momentum_features(self, prices, returns, fundamentals, macro):
38        """Multi-timeframe momentum indicators"""
39        return {
40            'momentum_1m': returns.rolling(21).mean(),
41            'momentum_3m': returns.rolling(63).mean(),
42            'momentum_6m': returns.rolling(126).mean(),
43            'momentum_12m': returns.rolling(252).mean(),
44            'momentum_acceleration': returns.rolling(21).mean() - returns.rolling(63).mean(),
45            'trend_strength': self._calculate_trend_strength(returns),
46        }
47    
48    def _volatility_features(self, prices, returns, fundamentals, macro):
49        """Volatility and risk indicators"""
50        return {
51            'volatility_21d': returns.rolling(21).std(),
52            'volatility_63d': returns.rolling(63).std(),
53            'volatility_ratio': returns.rolling(21).std() / returns.rolling(63).std(),
54            'downside_volatility': self._downside_volatility(returns, window=63),
55            'max_drawdown': self._rolling_max_drawdown(prices, window=252),
56            'var_95': returns.rolling(252).quantile(0.05),
57            'cvar_95': self._conditional_var(returns, confidence=0.95, window=252)
58        }
59    
60    def _value_features(self, prices, returns, fundamentals, macro):
61        """Valuation metrics"""
62        return {
63            'pe_ratio': fundamentals['price'] / fundamentals['earnings_per_share'],
64            'pb_ratio': fundamentals['price'] / fundamentals['book_value_per_share'],
65            'dividend_yield': fundamentals['dividend'] / fundamentals['price'],
66            'earnings_yield': fundamentals['earnings_per_share'] / fundamentals['price'],
67            'fcf_yield': fundamentals['free_cash_flow'] / fundamentals['market_cap']
68        }
69    
70    def _quality_features(self, prices, returns, fundamentals, macro):
71        """Quality and fundamental strength"""
72        return {
73            'roa': fundamentals['net_income'] / fundamentals['total_assets'],
74            'roe': fundamentals['net_income'] / fundamentals['shareholders_equity'],
75            'profit_margin': fundamentals['net_income'] / fundamentals['revenue'],
76            'debt_to_equity': fundamentals['total_debt'] / fundamentals['shareholders_equity'],
77            'current_ratio': fundamentals['current_assets'] / fundamentals['current_liabilities'],
78            'earnings_growth': fundamentals['earnings_per_share'].pct_change(4)  # YoY
79        }
80    
81    def _macro_features(self, prices, returns, fundamentals, macro):
82        """Macroeconomic indicators"""
83        return {
84            'interest_rate': macro['fed_funds_rate'],
85            'yield_curve': macro['10y_yield'] - macro['2y_yield'],
86            'inflation': macro['cpi'].pct_change(12),
87            'gdp_growth': macro['gdp'].pct_change(4),
88            'credit_spread': macro['baa_yield'] - macro['10y_treasury'],
89            'vix': macro['vix_index']
90        }
91    
92    def _downside_volatility(self, returns, window):
93        """Calculate downside deviation (semi-variance)"""
94        negative_returns = returns.copy()
95        negative_returns[negative_returns > 0] = 0
96        return negative_returns.rolling(window).std()
97    
98    def _conditional_var(self, returns, confidence, window):
99        """Calculate Conditional Value at Risk (Expected Shortfall)"""
100        var = returns.rolling(window).quantile(1 - confidence)
101        cvar = returns.rolling(window).apply(
102            lambda x: x[x <= x.quantile(1 - confidence)].mean()
103        )
104        return cvar
105

Training Strategy#

python
1class PortfolioTrainer:
2    """
3    Training pipeline for portfolio optimization network.
4    """
5    
6    def __init__(
7        self,
8        model: PortfolioTransformer,
9        loss_fn: PortfolioLoss,
10        learning_rate: float = 1e-4,
11        device: str = 'cuda'
12    ):
13        self.model = model.to(device)
14        self.loss_fn = loss_fn
15        self.device = device
16        
17        # Use AdamW with weight decay for regularization
18        self.optimizer = torch.optim.AdamW(
19            model.parameters(),
20            lr=learning_rate,
21            weight_decay=0.01
22        )
23        
24        # Learning rate scheduler
25        self.scheduler = torch.optim.lr_scheduler.OneCycleLR(
26            self.optimizer,
27            max_lr=learning_rate,
28            total_steps=10000,
29            pct_start=0.1
30        )
31    
32    def train_epoch(self, train_loader):
33        """Train for one epoch"""
34        self.model.train()
35        total_loss = 0
36        
37        for batch in train_loader:
38            features = batch['features'].to(self.device)
39            returns = batch['returns'].to(self.device)
40            regime = batch.get('regime')
41            prev_weights = batch.get('previous_weights')
42            
43            # Forward pass
44            weights, pred_regime, pred_risk, attention = self.model(features)
45            
46            # Calculate loss
47            loss_dict = self.loss_fn(
48                weights=weights,
49                predicted_regime=pred_regime,
50                predicted_risk=pred_risk,
51                returns=returns,
52                true_regime=regime,
53                previous_weights=prev_weights
54            )
55            
56            # Backward pass
57            self.optimizer.zero_grad()
58            loss_dict['total'].backward()
59            
60            # Gradient clipping
61            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
62            
63            self.optimizer.step()
64            self.scheduler.step()
65            
66            total_loss += loss_dict['total'].item()
67        
68        return total_loss / len(train_loader)
69    
70    def evaluate(self, val_loader):
71        """Evaluate model performance"""
72        self.model.eval()
73        metrics = {
74            'sharpe_ratio': [],
75            'max_drawdown': [],
76            'turnover': [],
77            'concentration': []
78        }
79        
80        with torch.no_grad():
81            for batch in val_loader:
82                features = batch['features'].to(self.device)
83                returns = batch['returns'].to(self.device)
84                
85                weights, _, _, _ = self.model(features)
86                
87                # Calculate realized metrics
88                portfolio_returns = (weights * returns).sum(dim=1)
89                
90                sharpe = self._calculate_sharpe(portfolio_returns)
91                drawdown = self._calculate_max_drawdown(portfolio_returns)
92                
93                metrics['sharpe_ratio'].append(sharpe)
94                metrics['max_drawdown'].append(drawdown)
95        
96        return {k: np.mean(v) for k, v in metrics.items()}
97

Production Deployment#

python
1class ProductionPortfolioOptimizer:
2    """
3    Production-ready portfolio optimizer with risk management.
4    """
5    
6    def __init__(self, model_path: str, config: dict):
7        self.model = self._load_model(model_path)
8        self.config = config
9        self.risk_manager = PortfolioRiskManager(config)
10        
11    async def optimize(self, market_data, current_positions):
12        """
13        Generate optimal portfolio weights.
14        """
15        # Prepare features
16        features = self._prepare_features(market_data)
17        
18        # Generate predictions
19        with torch.no_grad():
20            weights, regime, risk, _ = self.model(features)
21        
22        # Apply risk constraints
23        constrained_weights = await self.risk_manager.apply_constraints(
24            weights,
25            current_positions,
26            regime,
27            risk
28        )
29        
30        # Generate rebalancing trades
31        trades = self._generate_trades(current_positions, constrained_weights)
32        
33        return {
34            'target_weights': constrained_weights,
35            'trades': trades,
36            'regime': regime,
37            'expected_risk': risk
38        }
39
40class PortfolioRiskManager:
41    """Risk management for portfolio optimization"""
42    
43    async def apply_constraints(self, weights, current_positions, regime, risk):
44        """Apply risk and regulatory constraints"""
45        
46        # 1. Position limits
47        weights = self._apply_position_limits(weights)
48        
49        # 2. Sector limits
50        weights = self._apply_sector_limits(weights)
51        
52        # 3. Liquidity constraints
53        weights = await self._apply_liquidity_constraints(weights, current_positions)
54        
55        # 4. Risk budget
56        weights = self._apply_risk_budget(weights, risk)
57        
58        # 5. Turnover limits
59        weights = self._apply_turnover_limits(weights, current_positions)
60        
61        return weights
62

Lessons Learned#

  1. Feature Quality > Model Complexity: Good features with a simple model beat poor features with a complex model
  2. Transaction Costs Matter: Model turnover explicitly; frequent rebalancing kills returns
  3. Risk Management is Critical: Constraints prevent the model from taking excessive risks
  4. Regime Awareness: Market conditions change; models must adapt
  5. Validation is Hard: Walk-forward testing with realistic costs is essential
  6. Interpretability Helps: Attention weights show what the model learned about cross-asset relationships

Conclusion#

Deep learning for portfolio optimization is powerful but requires careful engineering. The key is combining neural networks' ability to learn complex patterns with sound risk management and proper validation.

Success comes from treating this as an engineering problem, not a pure ML research project.


Need help building AI-powered portfolio optimization systems? Get in touch to discuss your needs.

NT

NordVarg Team

Technical Writer

NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

Deep LearningPortfolio ManagementPyTorchQuantitative Finance

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Nov 1, 2024•12 min read
Reinforcement Learning for Market Making: A Practical Guide
Building profitable market making strategies using deep reinforcement learning in real-time trading environments
Machine LearningReinforcement LearningMarket Making
Oct 28, 2024•10 min read
AI-Powered Risk Management: Real-time Portfolio Risk Monitoring
Building intelligent risk management systems that combine ML with traditional risk models for real-time portfolio protection
Machine LearningAIRisk Management
Oct 25, 2024•10 min read
Time Series Forecasting for Trading: From ARIMA to Transformers
Modern approaches to financial time series forecasting, combining classical methods with deep learning for robust predictions
Machine LearningTime SeriesForecasting

Interested in working together?