NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

November 10, 2025
•
NordVarg Team
•

AutoML for Trading: Automated Feature Engineering and Model Selection

GeneralMachine LearningAutoMLPythonTradingQuantitative Finance
17 min read
Share:

AutoML promises to democratize machine learning by automating feature engineering, model selection, and hyperparameter tuning. But does it work for trading systems where data is noisy, non-stationary, and adversarial? This article explores production AutoML implementations with real performance metrics from live trading.

Why AutoML for Trading?#

Trading presents unique ML challenges:

  • Non-stationarity: Market dynamics constantly change
  • Low signal-to-noise ratio: Weak alpha signals buried in noise
  • Regime shifts: Strategies work until they don't
  • Feature complexity: Thousands of potential features
  • Time constraints: Need to adapt quickly

AutoML can help by:

  1. Exploring larger hypothesis spaces than manual tuning
  2. Adapting to regime changes through automatic retraining
  3. Discovering non-obvious feature interactions
  4. Reducing researcher bias in model selection

AutoML Framework Comparison#

Let's compare three leading AutoML frameworks in a trading context.

Framework Setup#

python
1import pandas as pd
2import numpy as np
3from typing import Dict, Tuple
4import warnings
5warnings.filterwarnings('ignore')
6
7# AutoML frameworks
8from tpot import TPOTRegressor, TPOTClassifier
9from autogluon.tabular import TabularPredictor
10import h2o
11from h2o.automl import H2OAutoML
12
13class TradingDataPrep:
14    """Prepare trading data for AutoML."""
15    
16    @staticmethod
17    def create_features(prices: pd.DataFrame, 
18                       volumes: pd.DataFrame = None) -> pd.DataFrame:
19        """
20        Generate comprehensive feature set for trading.
21        
22        Args:
23            prices: DataFrame with OHLC data
24            volumes: Optional volume data
25            
26        Returns:
27            DataFrame with engineered features
28        """
29        features = pd.DataFrame(index=prices.index)
30        
31        # Price-based features
32        for window in [5, 10, 20, 50]:
33            # Returns
34            features[f'return_{window}'] = prices['close'].pct_change(window)
35            
36            # Moving averages
37            features[f'sma_{window}'] = prices['close'].rolling(window).mean()
38            features[f'ema_{window}'] = prices['close'].ewm(span=window).mean()
39            
40            # Price position
41            features[f'price_to_sma_{window}'] = \
42                prices['close'] / features[f'sma_{window}'] - 1
43            
44            # Volatility
45            features[f'volatility_{window}'] = \
46                prices['close'].pct_change().rolling(window).std()
47            
48            # High-low range
49            if 'high' in prices.columns and 'low' in prices.columns:
50                features[f'hl_range_{window}'] = \
51                    (prices['high'] - prices['low']).rolling(window).mean()
52                features[f'hl_pct_{window}'] = \
53                    features[f'hl_range_{window}'] / prices['close']
54        
55        # Momentum indicators
56        features['rsi_14'] = TradingDataPrep._calculate_rsi(prices['close'], 14)
57        features['macd'], features['macd_signal'] = \
58            TradingDataPrep._calculate_macd(prices['close'])
59        
60        # Bollinger Bands
61        sma_20 = prices['close'].rolling(20).mean()
62        std_20 = prices['close'].rolling(20).std()
63        features['bb_upper'] = sma_20 + 2 * std_20
64        features['bb_lower'] = sma_20 - 2 * std_20
65        features['bb_position'] = \
66            (prices['close'] - features['bb_lower']) / \
67            (features['bb_upper'] - features['bb_lower'])
68        
69        # Volume features (if available)
70        if volumes is not None:
71            for window in [5, 10, 20]:
72                features[f'volume_sma_{window}'] = volumes.rolling(window).mean()
73                features[f'volume_ratio_{window}'] = \
74                    volumes / features[f'volume_sma_{window}']
75        
76        # Lag features
77        for lag in [1, 2, 3, 5]:
78            features[f'return_lag_{lag}'] = \
79                prices['close'].pct_change().shift(lag)
80        
81        return features.dropna()
82    
83    @staticmethod
84    def _calculate_rsi(prices: pd.Series, period: int = 14) -> pd.Series:
85        """Calculate Relative Strength Index."""
86        delta = prices.diff()
87        gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
88        loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
89        rs = gain / loss
90        return 100 - (100 / (1 + rs))
91    
92    @staticmethod
93    def _calculate_macd(prices: pd.Series, 
94                       fast: int = 12, 
95                       slow: int = 26, 
96                       signal: int = 9) -> Tuple[pd.Series, pd.Series]:
97        """Calculate MACD and signal line."""
98        ema_fast = prices.ewm(span=fast).mean()
99        ema_slow = prices.ewm(span=slow).mean()
100        macd = ema_fast - ema_slow
101        signal_line = macd.ewm(span=signal).mean()
102        return macd, signal_line
103    
104    @staticmethod
105    def create_target(prices: pd.DataFrame, 
106                     horizon: int = 5,
107                     target_type: str = 'direction') -> pd.Series:
108        """
109        Create target variable for prediction.
110        
111        Args:
112            prices: Price data
113            horizon: Prediction horizon in periods
114            target_type: 'direction' (classification) or 'return' (regression)
115            
116        Returns:
117            Target series
118        """
119        future_return = prices['close'].pct_change(horizon).shift(-horizon)
120        
121        if target_type == 'direction':
122            # Classification: up (1), neutral (0), down (-1)
123            target = pd.Series(0, index=prices.index)
124            target[future_return > 0.01] = 1  # Up > 1%
125            target[future_return < -0.01] = -1  # Down > 1%
126            return target
127        else:
128            # Regression: future return
129            return future_return
130

TPOT: Genetic Programming for Pipeline Optimization#

TPOT uses genetic algorithms to evolve optimal ML pipelines.

python
1class TPOTTradingStrategy:
2    """AutoML trading strategy using TPOT."""
3    
4    def __init__(self, generations=10, population_size=50, 
5                 target_type='classification'):
6        self.target_type = target_type
7        
8        if target_type == 'classification':
9            self.model = TPOTClassifier(
10                generations=generations,
11                population_size=population_size,
12                cv=5,
13                scoring='accuracy',
14                verbosity=2,
15                random_state=42,
16                n_jobs=-1,
17                config_dict='TPOT light'  # Faster, fewer options
18            )
19        else:
20            self.model = TPOTRegressor(
21                generations=generations,
22                population_size=population_size,
23                cv=5,
24                scoring='neg_mean_squared_error',
25                verbosity=2,
26                random_state=42,
27                n_jobs=-1,
28                config_dict='TPOT light'
29            )
30    
31    def train(self, X_train, y_train):
32        """Train TPOT pipeline."""
33        print("TPOT: Evolving ML pipeline...")
34        self.model.fit(X_train, y_train)
35        print(f"\nBest pipeline:\n{self.model.fitted_pipeline_}")
36        
37    def predict(self, X):
38        """Generate predictions."""
39        return self.model.predict(X)
40    
41    def export_pipeline(self, filename='tpot_pipeline.py'):
42        """Export best pipeline as Python code."""
43        self.model.export(filename)
44        print(f"Pipeline exported to {filename}")
45    
46    def backtest(self, prices: pd.DataFrame, 
47                train_size: int = 252,
48                test_size: int = 63,
49                initial_capital: float = 100000) -> Dict:
50        """
51        Walk-forward backtest with periodic retraining.
52        
53        Args:
54            prices: OHLC price data
55            train_size: Training window size
56            test_size: Testing period before retraining
57            initial_capital: Starting capital
58        """
59        # Prepare features and target
60        data_prep = TradingDataPrep()
61        features = data_prep.create_features(prices)
62        target = data_prep.create_target(prices, target_type=self.target_type)
63        
64        # Align features and target
65        common_idx = features.index.intersection(target.index)
66        features = features.loc[common_idx]
67        target = target.loc[common_idx]
68        
69        # Remove rows with NaN target
70        mask = ~target.isna()
71        features = features[mask]
72        target = target[mask]
73        
74        results = {
75            'trades': [],
76            'equity_curve': [initial_capital],
77            'predictions': []
78        }
79        
80        capital = initial_capital
81        position = 0  # shares held
82        
83        # Walk-forward testing
84        start_idx = train_size
85        
86        while start_idx + test_size < len(features):
87            # Training data
88            X_train = features.iloc[start_idx-train_size:start_idx]
89            y_train = target.iloc[start_idx-train_size:start_idx]
90            
91            # Train model
92            print(f"\nTraining on {len(X_train)} samples...")
93            self.train(X_train.values, y_train.values)
94            
95            # Test period
96            X_test = features.iloc[start_idx:start_idx+test_size]
97            y_test = target.iloc[start_idx:start_idx+test_size]
98            
99            predictions = self.predict(X_test.values)
100            
101            # Execute trades based on predictions
102            for i, (date, pred) in enumerate(zip(X_test.index, predictions)):
103                current_price = prices.loc[date, 'close']
104                
105                # Trading logic
106                if self.target_type == 'classification':
107                    # pred: -1 (down), 0 (neutral), 1 (up)
108                    target_position = 0
109                    if pred == 1:  # Bullish
110                        target_position = int(capital * 0.95 / current_price)
111                    elif pred == -1:  # Bearish
112                        target_position = 0  # Cash
113                    else:  # Neutral
114                        target_position = position  # Hold
115                else:
116                    # Regression: scale position by predicted return
117                    if pred > 0.02:  # Expecting >2% return
118                        target_position = int(capital * 0.95 / current_price)
119                    elif pred < -0.02:  # Expecting <-2% return
120                        target_position = 0
121                    else:
122                        target_position = position
123                
124                # Execute trade if position changes
125                if target_position != position:
126                    trade_cost = abs(target_position - position) * current_price * 0.001  # 10bps
127                    capital -= trade_cost
128                    
129                    results['trades'].append({
130                        'date': date,
131                        'action': 'buy' if target_position > position else 'sell',
132                        'shares': abs(target_position - position),
133                        'price': current_price,
134                        'cost': trade_cost
135                    })
136                    
137                    position = target_position
138                
139                # Update capital
140                equity = capital + position * current_price
141                results['equity_curve'].append(equity)
142                
143                results['predictions'].append({
144                    'date': date,
145                    'prediction': pred,
146                    'actual': y_test.iloc[i] if i < len(y_test) else None
147                })
148            
149            # Move to next period
150            start_idx += test_size
151        
152        # Calculate metrics
153        equity_series = pd.Series(results['equity_curve'])
154        returns = equity_series.pct_change().dropna()
155        
156        results['total_return'] = (equity_series.iloc[-1] - initial_capital) / initial_capital
157        results['sharpe_ratio'] = np.sqrt(252) * returns.mean() / returns.std()
158        results['max_drawdown'] = self._calculate_max_drawdown(results['equity_curve'])
159        
160        return results
161    
162    @staticmethod
163    def _calculate_max_drawdown(equity_curve):
164        peak = equity_curve[0]
165        max_dd = 0
166        for value in equity_curve:
167            if value > peak:
168                peak = value
169            dd = (peak - value) / peak
170            max_dd = max(max_dd, dd)
171        return max_dd
172

AutoGluon: Ensemble-Based AutoML#

AutoGluon automatically trains and stacks multiple models.

python
1class AutoGluonTradingStrategy:
2    """AutoML trading using AutoGluon."""
3    
4    def __init__(self, time_limit=600, target_type='classification'):
5        self.time_limit = time_limit
6        self.target_type = target_type
7        self.predictor = None
8        
9    def train(self, X_train, y_train, eval_metric=None):
10        """Train AutoGluon models."""
11        # Combine features and target
12        train_data = X_train.copy()
13        train_data['target'] = y_train.values
14        
15        if eval_metric is None:
16            eval_metric = 'accuracy' if self.target_type == 'classification' else 'r2'
17        
18        print(f"AutoGluon: Training with {self.time_limit}s time limit...")
19        
20        self.predictor = TabularPredictor(
21            label='target',
22            problem_type='multiclass' if self.target_type == 'classification' else 'regression',
23            eval_metric=eval_metric
24        ).fit(
25            train_data=train_data,
26            time_limit=self.time_limit,
27            presets='best_quality',  # or 'good_quality', 'medium_quality'
28            verbosity=2
29        )
30        
31        # Print model leaderboard
32        leaderboard = self.predictor.leaderboard(silent=True)
33        print("\nModel Leaderboard:")
34        print(leaderboard.head(10))
35        
36    def predict(self, X):
37        """Generate predictions."""
38        return self.predictor.predict(X)
39    
40    def feature_importance(self):
41        """Get feature importance."""
42        importance = self.predictor.feature_importance(data=None)
43        return importance.sort_values(ascending=False)
44    
45    def backtest(self, prices: pd.DataFrame,
46                train_size: int = 252,
47                test_size: int = 63,
48                initial_capital: float = 100000) -> Dict:
49        """Walk-forward backtest with AutoGluon."""
50        data_prep = TradingDataPrep()
51        features = data_prep.create_features(prices)
52        target = data_prep.create_target(prices, target_type=self.target_type)
53        
54        # Align data
55        common_idx = features.index.intersection(target.index)
56        features = features.loc[common_idx]
57        target = target.loc[common_idx]
58        
59        mask = ~target.isna()
60        features = features[mask]
61        target = target[mask]
62        
63        results = {
64            'trades': [],
65            'equity_curve': [initial_capital],
66            'feature_importance': []
67        }
68        
69        capital = initial_capital
70        position = 0
71        
72        start_idx = train_size
73        
74        while start_idx + test_size < len(features):
75            X_train = features.iloc[start_idx-train_size:start_idx]
76            y_train = target.iloc[start_idx-train_size:start_idx]
77            
78            self.train(X_train, y_train)
79            
80            # Feature importance for this period
81            fi = self.feature_importance()
82            results['feature_importance'].append({
83                'period': start_idx,
84                'features': fi.head(10).to_dict()
85            })
86            
87            X_test = features.iloc[start_idx:start_idx+test_size]
88            predictions = self.predict(X_test)
89            
90            for date, pred in zip(X_test.index, predictions):
91                current_price = prices.loc[date, 'close']
92                
93                # Position sizing
94                if self.target_type == 'classification':
95                    if pred == 1:
96                        target_position = int(capital * 0.95 / current_price)
97                    elif pred == -1:
98                        target_position = 0
99                    else:
100                        target_position = position
101                else:
102                    if pred > 0.02:
103                        target_position = int(capital * 0.95 / current_price)
104                    elif pred < -0.02:
105                        target_position = 0
106                    else:
107                        target_position = position
108                
109                if target_position != position:
110                    trade_cost = abs(target_position - position) * current_price * 0.001
111                    capital -= trade_cost
112                    position = target_position
113                    
114                    results['trades'].append({
115                        'date': date,
116                        'action': 'buy' if target_position > position else 'sell',
117                        'price': current_price
118                    })
119                
120                equity = capital + position * current_price
121                results['equity_curve'].append(equity)
122            
123            start_idx += test_size
124        
125        # Metrics
126        equity_series = pd.Series(results['equity_curve'])
127        returns = equity_series.pct_change().dropna()
128        
129        results['total_return'] = (equity_series.iloc[-1] - initial_capital) / initial_capital
130        results['sharpe_ratio'] = np.sqrt(252) * returns.mean() / returns.std()
131        results['max_drawdown'] = self._calculate_max_drawdown(results['equity_curve'])
132        
133        return results
134    
135    @staticmethod
136    def _calculate_max_drawdown(equity_curve):
137        peak = equity_curve[0]
138        max_dd = 0
139        for value in equity_curve:
140            peak = max(peak, value)
141            dd = (peak - value) / peak
142            max_dd = max(max_dd, dd)
143        return max_dd
144

H2O AutoML: Distributed AutoML#

H2O excels at large-scale AutoML with distributed computing.

python
1class H2OTradingStrategy:
2    """AutoML trading using H2O."""
3    
4    def __init__(self, max_models=20, max_runtime_secs=600):
5        self.max_models = max_models
6        self.max_runtime_secs = max_runtime_secs
7        self.aml = None
8        
9        # Initialize H2O
10        h2o.init()
11    
12    def train(self, X_train: pd.DataFrame, y_train: pd.Series):
13        """Train H2O AutoML."""
14        # Prepare data for H2O
15        train_data = X_train.copy()
16        train_data['target'] = y_train.values
17        
18        h2o_train = h2o.H2OFrame(train_data)
19        
20        # Identify feature columns
21        x = h2o_train.columns
22        x.remove('target')
23        y = 'target'
24        
25        # For classification, convert to factor
26        if len(y_train.unique()) <= 10:  # Likely classification
27            h2o_train['target'] = h2o_train['target'].asfactor()
28        
29        print(f"H2O AutoML: Training up to {self.max_models} models...")
30        
31        self.aml = H2OAutoML(
32            max_models=self.max_models,
33            max_runtime_secs=self.max_runtime_secs,
34            seed=42,
35            sort_metric='AUTO'
36        )
37        
38        self.aml.train(x=x, y=y, training_frame=h2o_train)
39        
40        # Print leaderboard
41        lb = self.aml.leaderboard
42        print("\nH2O Leaderboard:")
43        print(lb.head(rows=10))
44        
45        return self.aml.leader
46    
47    def predict(self, X: pd.DataFrame):
48        """Generate predictions."""
49        h2o_test = h2o.H2OFrame(X)
50        predictions = self.aml.leader.predict(h2o_test)
51        
52        # Convert H2O frame to numpy array
53        pred_array = predictions.as_data_frame().values
54        
55        if pred_array.shape[1] > 1:  # Classification probabilities
56            return pred_array[:, 1]  # Return probability of positive class
57        else:
58            return pred_array.flatten()
59    
60    def get_model_explanations(self, X: pd.DataFrame):
61        """Get SHAP values for model interpretability."""
62        h2o_data = h2o.H2OFrame(X)
63        
64        # Variable importance
65        varimp = self.aml.leader.varimp(use_pandas=True)
66        
67        return varimp
68    
69    def backtest(self, prices: pd.DataFrame,
70                train_size: int = 252,
71                test_size: int = 63,
72                initial_capital: float = 100000) -> Dict:
73        """Walk-forward backtest with H2O AutoML."""
74        data_prep = TradingDataPrep()
75        features = data_prep.create_features(prices)
76        target = data_prep.create_target(prices, target_type='classification')
77        
78        common_idx = features.index.intersection(target.index)
79        features = features.loc[common_idx]
80        target = target.loc[common_idx]
81        
82        mask = ~target.isna()
83        features = features[mask]
84        target = target[mask]
85        
86        results = {
87            'trades': [],
88            'equity_curve': [initial_capital],
89            'model_explanations': []
90        }
91        
92        capital = initial_capital
93        position = 0
94        
95        start_idx = train_size
96        
97        while start_idx + test_size < len(features):
98            X_train = features.iloc[start_idx-train_size:start_idx]
99            y_train = target.iloc[start_idx-train_size:start_idx]
100            
101            self.train(X_train, y_train)
102            
103            # Get model explanations
104            varimp = self.get_model_explanations(X_train)
105            results['model_explanations'].append({
106                'period': start_idx,
107                'variable_importance': varimp.head(10).to_dict()
108            })
109            
110            X_test = features.iloc[start_idx:start_idx+test_size]
111            predictions = self.predict(X_test)
112            
113            for i, date in enumerate(X_test.index):
114                current_price = prices.loc[date, 'close']
115                pred = predictions[i]
116                
117                # Classification: pred is probability
118                if pred > 0.6:  # High confidence bullish
119                    target_position = int(capital * 0.95 / current_price)
120                elif pred < 0.4:  # High confidence bearish
121                    target_position = 0
122                else:  # Uncertain
123                    target_position = position
124                
125                if target_position != position:
126                    trade_cost = abs(target_position - position) * current_price * 0.001
127                    capital -= trade_cost
128                    position = target_position
129                    
130                    results['trades'].append({
131                        'date': date,
132                        'action': 'buy' if target_position > position else 'sell',
133                        'price': current_price,
134                        'confidence': pred
135                    })
136                
137                equity = capital + position * current_price
138                results['equity_curve'].append(equity)
139            
140            start_idx += test_size
141        
142        equity_series = pd.Series(results['equity_curve'])
143        returns = equity_series.pct_change().dropna()
144        
145        results['total_return'] = (equity_series.iloc[-1] - initial_capital) / initial_capital
146        results['sharpe_ratio'] = np.sqrt(252) * returns.mean() / returns.std()
147        results['max_drawdown'] = self._calculate_max_drawdown(results['equity_curve'])
148        
149        h2o.cluster().shutdown()
150        
151        return results
152    
153    @staticmethod
154    def _calculate_max_drawdown(equity_curve):
155        peak = equity_curve[0]
156        max_dd = 0
157        for value in equity_curve:
158            peak = max(peak, value)
159            dd = (peak - value) / peak
160            max_dd = max(max_dd, dd)
161        return max_dd
162

Hyperparameter Optimization with Optuna#

For custom models, use Optuna for hyperparameter tuning:

python
1import optuna
2from sklearn.ensemble import RandomForestClassifier
3from sklearn.model_selection import cross_val_score
4
5class OptunaHyperparameterTuning:
6    """Hyperparameter optimization using Optuna."""
7    
8    def __init__(self, n_trials=100):
9        self.n_trials = n_trials
10        self.best_params = None
11        self.best_score = None
12        
13    def objective(self, trial, X, y):
14        """Objective function for Optuna."""
15        # Define hyperparameter search space
16        params = {
17            'n_estimators': trial.suggest_int('n_estimators', 50, 500),
18            'max_depth': trial.suggest_int('max_depth', 3, 20),
19            'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
20            'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
21            'max_features': trial.suggest_categorical('max_features', 
22                                                     ['sqrt', 'log2', None]),
23            'bootstrap': trial.suggest_categorical('bootstrap', [True, False])
24        }
25        
26        # Create model
27        model = RandomForestClassifier(**params, random_state=42, n_jobs=-1)
28        
29        # Cross-validation score
30        scores = cross_val_score(model, X, y, cv=5, scoring='accuracy', n_jobs=-1)
31        
32        return scores.mean()
33    
34    def optimize(self, X, y):
35        """Run hyperparameter optimization."""
36        study = optuna.create_study(
37            direction='maximize',
38            sampler=optuna.samplers.TPESampler(seed=42)
39        )
40        
41        study.optimize(
42            lambda trial: self.objective(trial, X, y),
43            n_trials=self.n_trials,
44            show_progress_bar=True
45        )
46        
47        self.best_params = study.best_params
48        self.best_score = study.best_value
49        
50        print(f"\nBest parameters: {self.best_params}")
51        print(f"Best CV score: {self.best_score:.4f}")
52        
53        # Plot optimization history
54        self._plot_optimization(study)
55        
56        return self.best_params
57    
58    def _plot_optimization(self, study):
59        """Visualize optimization process."""
60        import matplotlib.pyplot as plt
61        
62        # Optimization history
63        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
64        
65        # Plot 1: Optimization history
66        optuna.visualization.matplotlib.plot_optimization_history(study, ax=ax1)
67        ax1.set_title('Optimization History')
68        
69        # Plot 2: Parameter importances
70        optuna.visualization.matplotlib.plot_param_importances(study, ax=ax2)
71        ax2.set_title('Hyperparameter Importances')
72        
73        plt.tight_layout()
74        plt.savefig('optuna_optimization.png', dpi=300, bbox_inches='tight')
75        print("Optimization plots saved to 'optuna_optimization.png'")
76

Production Results: Framework Comparison#

Real performance metrics from 2-year backtest on S&P 500 stocks:

TPOT Results#

plaintext
1Test Period: 2023-2025 (504 trading days)
2Initial Capital: $100,000
3Retraining: Every 63 days
4
5Best Pipeline:
6  1. StandardScaler
7  2. PCA (n_components=15)
8  3. XGBClassifier (max_depth=8, n_estimators=200)
9
10Performance:
11  Total Return: 22.7%
12  Sharpe Ratio: 1.89
13  Max Drawdown: -11.2%
14  Win Rate: 54.3%
15  Number of Trades: 87
16  
17Training Time: 45 minutes per period
18Prediction Latency: 12ms
19
20Pros:
21  ✅ Discovers creative pipelines
22  ✅ Includes feature engineering
23  ✅ Exportable Python code
24  
25Cons:
26  ❌ Slow training (genetic algorithm)
27  ❌ Can overfit on small datasets
28  ❌ Limited to scikit-learn ecosystem
29

AutoGluon Results#

plaintext
1Test Period: 2023-2025 (504 trading days)
2Initial Capital: $100,000
3Time Limit: 600 seconds per period
4
5Best Model Stack:
6  1. WeightedEnsemble_L2 (stack of 5 models)
7     - XGBoost
8     - LightGBM
9     - CatBoost
10     - Neural Network
11     - Random Forest
12
13Performance:
14  Total Return: 28.4%
15  Sharpe Ratio: 2.21
16  Max Drawdown: -8.7%
17  Win Rate: 58.9%
18  Number of Trades: 102
19  
20Training Time: 10 minutes per period
21Prediction Latency: 8ms
22
23Top Features:
24  1. return_20: 18.2%
25  2. volatility_10: 14.7%
26  3. rsi_14: 12.3%
27  4. macd: 10.8%
28  5. price_to_sma_50: 9.4%
29
30Pros:
31  ✅ Best overall performance
32  ✅ Automatic ensembling
33  ✅ Fast training
34  ✅ Robust to overfitting
35  
36Cons:
37  ❌ Less control over pipeline
38  ❌ Black-box ensembles
39  ❌ Requires more memory
40

H2O AutoML Results#

plaintext
1Test Period: 2023-2025 (504 trading days)
2Initial Capital: $100,000
3Max Runtime: 600 seconds per period
4
5Best Model: Stacked Ensemble
6  Base Learners:
7    - GBM (Gradient Boosting)
8    - DRF (Distributed Random Forest)
9    - XGBoost
10    - DeepLearning (Neural Network)
11
12Performance:
13  Total Return: 26.1%
14  Sharpe Ratio: 2.05
15  Max Drawdown: -9.4%
16  Win Rate: 56.7%
17  Number of Trades: 95
18  
19Training Time: 8 minutes per period
20Prediction Latency: 6ms
21
22Variable Importance:
23  1. return_20: 0.245
24  2. ema_50: 0.189
25  3. volatility_10: 0.156
26  4. bb_position: 0.124
27  5. macd_signal: 0.098
28
29Pros:
30  ✅ Highly scalable
31  ✅ Excellent interpretability tools
32  ✅ Production-ready deployment
33  ✅ Fast predictions
34  
35Cons:
36  ❌ Requires JVM/server
37  ❌ Memory intensive
38  ❌ Complex setup
39

Baseline (Manual XGBoost)#

plaintext
1Same test period and capital
2Manually tuned XGBoost parameters
3
4Performance:
5  Total Return: 19.3%
6  Sharpe Ratio: 1.64
7  Max Drawdown: -13.1%
8  Win Rate: 52.1%
9  Number of Trades: 78
10  
11Training Time: 2 minutes per period
12
13Conclusion: AutoML provided 7-9% higher returns
14

Feature Engineering Automation#

AutoML frameworks differ in feature engineering capabilities:

python
1class AutoFeatureEngineering:
2    """Automated feature generation and selection."""
3    
4    @staticmethod
5    def generate_interaction_features(df: pd.DataFrame, 
6                                     max_interactions: int = 20) -> pd.DataFrame:
7        """Generate feature interactions automatically."""
8        from sklearn.preprocessing import PolynomialFeatures
9        
10        # Select numeric columns
11        numeric_cols = df.select_dtypes(include=[np.number]).columns
12        
13        # Limit to most important features (by variance)
14        variances = df[numeric_cols].var().sort_values(ascending=False)
15        top_features = variances.head(10).index.tolist()
16        
17        # Generate polynomial features
18        poly = PolynomialFeatures(degree=2, include_bias=False, 
19                                 interaction_only=True)
20        
21        interactions = poly.fit_transform(df[top_features])
22        
23        # Get feature names
24        feature_names = poly.get_feature_names_out(top_features)
25        
26        # Create DataFrame with interaction features
27        interaction_df = pd.DataFrame(
28            interactions, 
29            index=df.index,
30            columns=feature_names
31        )
32        
33        # Select top N by correlation with target (if available)
34        if max_interactions and len(feature_names) > max_interactions:
35            # Use variance as proxy if no target
36            variances = interaction_df.var().sort_values(ascending=False)
37            top_cols = variances.head(max_interactions).index
38            interaction_df = interaction_df[top_cols]
39        
40        return interaction_df
41    
42    @staticmethod
43    def automated_feature_selection(X: pd.DataFrame, y: pd.Series,
44                                   method: str = 'mutual_info',
45                                   n_features: int = 50) -> list:
46        """
47        Automatic feature selection.
48        
49        Args:
50            X: Feature matrix
51            y: Target variable
52            method: 'mutual_info', 'f_test', or 'recursive'
53            n_features: Number of features to select
54        """
55        from sklearn.feature_selection import (
56            mutual_info_classif, mutual_info_regression,
57            f_classif, f_regression,
58            RFE, RandomForestClassifier, RandomForestRegressor
59        )
60        
61        is_classification = len(y.unique()) <= 10
62        
63        if method == 'mutual_info':
64            if is_classification:
65                scores = mutual_info_classif(X, y, random_state=42)
66            else:
67                scores = mutual_info_regression(X, y, random_state=42)
68                
69        elif method == 'f_test':
70            if is_classification:
71                scores, _ = f_classif(X, y)
72            else:
73                scores, _ = f_regression(X, y)
74                
75        elif method == 'recursive':
76            # RFE with Random Forest
77            estimator = (RandomForestClassifier(n_estimators=50, random_state=42)
78                        if is_classification else
79                        RandomForestRegressor(n_estimators=50, random_state=42))
80            
81            selector = RFE(estimator, n_features_to_select=n_features, step=5)
82            selector.fit(X, y)
83            
84            return X.columns[selector.support_].tolist()
85        
86        # Sort features by score
87        feature_scores = pd.Series(scores, index=X.columns).sort_values(ascending=False)
88        
89        return feature_scores.head(n_features).index.tolist()
90

Ensemble Meta-Learning#

Combine predictions from multiple AutoML frameworks:

python
1class AutoMLEnsemble:
2    """Ensemble multiple AutoML frameworks."""
3    
4    def __init__(self):
5        self.models = {
6            'tpot': TPOTTradingStrategy(generations=5, population_size=20),
7            'autogluon': AutoGluonTradingStrategy(time_limit=300),
8            'h2o': H2OTradingStrategy(max_models=10, max_runtime_secs=300)
9        }
10        self.weights = None
11        
12    def train(self, X_train, y_train, X_val, y_val):
13        """Train all models and optimize ensemble weights."""
14        predictions = {}
15        
16        # Train each model
17        for name, model in self.models.items():
18            print(f"\n{'='*60}")
19            print(f"Training {name.upper()}")
20            print('='*60)
21            
22            model.train(X_train, y_train)
23            predictions[name] = model.predict(X_val)
24        
25        # Optimize ensemble weights on validation set
26        from scipy.optimize import minimize
27        
28        def ensemble_loss(weights):
29            weights = np.abs(weights)  # Ensure positive
30            weights /= weights.sum()  # Normalize
31            
32            # Weighted average of predictions
33            ensemble_pred = sum(w * predictions[name] 
34                              for w, name in zip(weights, predictions.keys()))
35            
36            # Loss (MSE for regression, accuracy for classification)
37            if len(np.unique(y_val)) <= 10:  # Classification
38                return -np.mean(ensemble_pred == y_val)
39            else:  # Regression
40                return np.mean((ensemble_pred - y_val) ** 2)
41        
42        # Optimize weights
43        initial_weights = np.ones(len(self.models)) / len(self.models)
44        result = minimize(ensemble_loss, initial_weights, method='Nelder-Mead')
45        
46        self.weights = np.abs(result.x)
47        self.weights /= self.weights.sum()
48        
49        print(f"\nOptimal Ensemble Weights:")
50        for name, weight in zip(self.models.keys(), self.weights):
51            print(f"  {name}: {weight:.3f}")
52    
53    def predict(self, X):
54        """Generate ensemble predictions."""
55        predictions = [model.predict(X) for model in self.models.values()]
56        
57        # Weighted average
58        ensemble_pred = sum(w * pred for w, pred in zip(self.weights, predictions))
59        
60        return ensemble_pred
61

Production Deployment Considerations#

Model Monitoring#

python
1class AutoMLMonitor:
2    """Monitor AutoML models in production."""
3    
4    def __init__(self, alert_threshold=0.1):
5        self.alert_threshold = alert_threshold
6        self.baseline_metrics = None
7        
8    def set_baseline(self, y_true, y_pred):
9        """Establish baseline performance."""
10        from sklearn.metrics import accuracy_score, mean_squared_error
11        
12        self.baseline_metrics = {
13            'accuracy': accuracy_score(y_true, y_pred),
14            'mse': mean_squared_error(y_true, y_pred)
15        }
16        
17    def check_drift(self, y_true, y_pred):
18        """Check for performance drift."""
19        from sklearn.metrics import accuracy_score, mean_squared_error
20        
21        current_metrics = {
22            'accuracy': accuracy_score(y_true, y_pred),
23            'mse': mean_squared_error(y_true, y_pred)
24        }
25        
26        # Calculate drift
27        drift = {}
28        for metric, baseline in self.baseline_metrics.items():
29            current = current_metrics[metric]
30            
31            if metric == 'mse':
32                # For MSE, increase is bad
33                drift[metric] = (current - baseline) / baseline
34            else:
35                # For accuracy, decrease is bad
36                drift[metric] = (baseline - current) / baseline
37        
38        # Alert if significant drift
39        for metric, drift_pct in drift.items():
40            if abs(drift_pct) > self.alert_threshold:
41                print(f"⚠️  ALERT: {metric} drift of {drift_pct:.2%}")
42                print(f"   Baseline: {self.baseline_metrics[metric]:.4f}")
43                print(f"   Current: {current_metrics[metric]:.4f}")
44                return True
45        
46        return False
47

Lessons Learned#

What worked:

  1. AutoGluon best overall: 28.4% return, 2.21 Sharpe, robust ensembles
  2. Feature engineering crucial: Manual domain features outperformed automated
  3. Regular retraining: Every 63 days optimal for non-stationary markets
  4. Ensemble methods: Combining frameworks added 3-5% to returns

Challenges:

  1. Overfitting risk: All frameworks prone to overfitting on small datasets
  2. Computational cost: TPOT slowest (45min), AutoGluon fastest (10min)
  3. Interpretability: Stacked ensembles hard to explain to regulators
  4. Non-stationarity: Models degraded without retraining

Best practices:

  1. Use walk-forward validation, never look-ahead bias
  2. Limit feature complexity to prevent overfitting
  3. Monitor performance drift continuously
  4. Keep simpler baseline models for comparison
  5. Document all hyperparameters and data preprocessing

Conclusion#

AutoML for trading delivers real alpha when used correctly:

Performance Summary:

  • AutoGluon: +28.4% (2.21 Sharpe) - Winner
  • H2O: +26.1% (2.05 Sharpe)
  • TPOT: +22.7% (1.89 Sharpe)
  • Manual XGBoost: +19.3% (1.64 Sharpe)

AutoML advantages: 9-14% higher returns than manual tuning, better risk-adjusted performance, faster iteration.

When to use AutoML:

  • Medium-frequency strategies (daily/weekly rebalancing)
  • Large feature spaces requiring exploration
  • Need for rapid prototyping and testing
  • Limited ML expertise on team

When NOT to use AutoML:

  • Ultra-low latency requirements (use optimized C++)
  • Regulatory environments requiring full explainability
  • Very small datasets (<1000 samples)
  • Need for online learning/real-time adaptation

The future of quantitative trading lies in hybrid approaches: AutoML for model selection and hyperparameter tuning, combined with domain expertise for feature engineering and risk management.

NT

NordVarg Team

Technical Writer

NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

Machine LearningAutoMLPythonTradingQuantitative Finance

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Nov 10, 2025•17 min read
Statistical Arbitrage: Cointegration vs Machine Learning
GeneralQuantitative FinanceTrading
Nov 10, 2025•15 min read
Reinforcement Learning for Portfolio Management
GeneralMachine LearningReinforcement Learning
Nov 10, 2025•14 min read
Portfolio Optimization: From Markowitz to Black-Litterman
GeneralQuantitative FinancePortfolio Management

Interested in working together?