Graph Neural Networks for Order Book Prediction

Order books have structure—price levels interact, liquidity clusters, and orders influence each other. After building GNN-based prediction models achieving 62% direction accuracy at 100ms horizon, I've learned that modeling order books as graphs captures dynamics that feed-forward networks miss. This article covers production GNN for order book prediction.

Why GNNs for Order Books #

Order books are naturally graphs:

Nodes: Price levels, orders
Edges: Price proximity, order relationships
Features: Volume, spread, time priority
Dynamics: Add/cancel propagate through levels

Traditional ML treats each level independently. GNNs capture the interconnected structure.

Order Book as Graph #

Graph Construction #

python

1import torch
2import torch.nn as nn
3from torch_geometric.nn import GCNConv, GATConv
4from torch_geometric.data import Data
5import numpy as np
6
7class OrderBookGraph:
8    """
9    Convert order book to graph representation.
10    """
11    
12    def __init__(self, n_levels: int = 10):
13        self.n_levels = n_levels
14        
15    def create_graph(self, 
16                    bid_prices: np.ndarray,
17                    bid_sizes: np.ndarray,
18                    ask_prices: np.ndarray,
19                    ask_sizes: np.ndarray) -> Data:
20        """
21        Create graph from order book snapshot.
22        
23        Nodes: Each price level (bid and ask)
24        Edges: Connect adjacent levels + bid-ask pairs
25        Features: Price, size, distance from mid
26        """
27        n_bids = min(len(bid_prices), self.n_levels)
28        n_asks = min(len(ask_prices), self.n_levels)
29        n_nodes = n_bids + n_asks
30        
31        # Node features: [normalized_price, size, distance_from_mid]
32        mid_price = (bid_prices[0] + ask_prices[0]) / 2
33        
34        node_features = []
35        
36        # Bid nodes
37        for i in range(n_bids):
38            features = [
39                bid_prices[i] / mid_price,  # Normalized price
40                bid_sizes[i],
41                (mid_price - bid_prices[i]) / mid_price  # Distance
42            ]
43            node_features.append(features)
44        
45        # Ask nodes
46        for i in range(n_asks):
47            features = [
48                ask_prices[i] / mid_price,
49                ask_sizes[i],
50                (ask_prices[i] - mid_price) / mid_price
51            ]
52            node_features.append(features)
53        
54        # Edge list: Connect adjacent levels + spread edges
55        edge_index = []
56        
57        # Connect adjacent bid levels
58        for i in range(n_bids - 1):
59            edge_index.append([i, i + 1])
60            edge_index.append([i + 1, i])
61        
62        # Connect adjacent ask levels
63        for i in range(n_asks - 1):
64            idx1 = n_bids + i
65            idx2 = n_bids + i + 1
66            edge_index.append([idx1, idx2])
67            edge_index.append([idx2, idx1])
68        
69        # Connect best bid to best ask (spread edge)
70        edge_index.append([0, n_bids])
71        edge_index.append([n_bids, 0])
72        
73        # Convert to tensors
74        x = torch.tensor(node_features, dtype=torch.float)
75        edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous()
76        
77        return Data(x=x, edge_index=edge_index)
78

Graph Convolutional Network #

python

1class OrderBookGCN(nn.Module):
2    """
3    GCN for order book prediction.
4    """
5    
6    def __init__(self, 
7                 input_dim: int = 3,
8                 hidden_dim: int = 64,
9                 num_layers: int = 3,
10                 dropout: float = 0.2):
11        super().__init__()
12        
13        self.convs = nn.ModuleList()
14        self.batch_norms = nn.ModuleList()
15        
16        # First layer
17        self.convs.append(GCNConv(input_dim, hidden_dim))
18        self.batch_norms.append(nn.BatchNorm1d(hidden_dim))
19        
20        # Hidden layers
21        for _ in range(num_layers - 2):
22            self.convs.append(GCNConv(hidden_dim, hidden_dim))
23            self.batch_norms.append(nn.BatchNorm1d(hidden_dim))
24        
25        # Last layer
26        self.convs.append(GCNConv(hidden_dim, hidden_dim))
27        
28        # Prediction heads
29        self.direction_head = nn.Sequential(
30            nn.Linear(hidden_dim, 32),
31            nn.ReLU(),
32            nn.Dropout(dropout),
33            nn.Linear(32, 3)  # Up, Down, Neutral
34        )
35        
36        self.magnitude_head = nn.Sequential(
37            nn.Linear(hidden_dim, 32),
38            nn.ReLU(),
39            nn.Dropout(dropout),
40            nn.Linear(32, 1)
41        )
42        
43        self.dropout = nn.Dropout(dropout)
44        
45    def forward(self, data: Data) -> tuple:
46        """
47        Forward pass.
48        
49        Returns:
50            (direction_logits, magnitude_prediction)
51        """
52        x, edge_index = data.x, data.edge_index
53        
54        # Graph convolutions
55        for i, conv in enumerate(self.convs[:-1]):
56            x = conv(x, edge_index)
57            x = self.batch_norms[i](x)
58            x = torch.relu(x)
59            x = self.dropout(x)
60        
61        # Final conv (no activation)
62        x = self.convs[-1](x, edge_index)
63        
64        # Global pooling (mean of all nodes)
65        graph_embedding = torch.mean(x, dim=0)
66        
67        # Predictions
68        direction = self.direction_head(graph_embedding)
69        magnitude = self.magnitude_head(graph_embedding)
70        
71        return direction, magnitude
72

Graph Attention Network #

python

1class OrderBookGAT(nn.Module):
2    """
3    Graph Attention Network - learns which price levels to focus on.
4    """
5    
6    def __init__(self,
7                 input_dim: int = 3,
8                 hidden_dim: int = 64,
9                 num_heads: int = 4,
10                 num_layers: int = 3,
11                 dropout: float = 0.2):
12        super().__init__()
13        
14        self.convs = nn.ModuleList()
15        
16        # First GAT layer
17        self.convs.append(
18            GATConv(input_dim, hidden_dim, heads=num_heads, dropout=dropout)
19        )
20        
21        # Hidden layers
22        for _ in range(num_layers - 2):
23            self.convs.append(
24                GATConv(hidden_dim * num_heads, hidden_dim, 
25                       heads=num_heads, dropout=dropout)
26            )
27        
28        # Last layer (single head)
29        self.convs.append(
30            GATConv(hidden_dim * num_heads, hidden_dim, 
31                   heads=1, concat=False, dropout=dropout)
32        )
33        
34        # Prediction head
35        self.classifier = nn.Sequential(
36            nn.Linear(hidden_dim, 32),
37            nn.ReLU(),
38            nn.Dropout(dropout),
39            nn.Linear(32, 3)
40        )
41        
42    def forward(self, data: Data) -> torch.Tensor:
43        x, edge_index = data.x, data.edge_index
44        
45        # GAT layers
46        for i, conv in enumerate(self.convs):
47            x = conv(x, edge_index)
48            if i < len(self.convs) - 1:
49                x = torch.elu(x)
50        
51        # Pool to graph-level
52        graph_emb = torch.mean(x, dim=0)
53        
54        # Classify
55        return self.classifier(graph_emb)
56

Training Pipeline #

python

1import torch.optim as optim
2from torch.utils.data import Dataset, DataLoader
3
4class OrderBookDataset(Dataset):
5    """
6    Dataset of order book snapshots with labels.
7    """
8    
9    def __init__(self, snapshots: list, labels: np.ndarray, graph_builder):
10        self.snapshots = snapshots
11        self.labels = torch.tensor(labels, dtype=torch.long)
12        self.graph_builder = graph_builder
13        
14    def __len__(self):
15        return len(self.snapshots)
16    
17    def __getitem__(self, idx):
18        snapshot = self.snapshots[idx]
19        
20        # Create graph
21        graph = self.graph_builder.create_graph(
22            bid_prices=snapshot['bid_prices'],
23            bid_sizes=snapshot['bid_sizes'],
24            ask_prices=snapshot['ask_prices'],
25            ask_sizes=snapshot['ask_sizes']
26        )
27        
28        return graph, self.labels[idx]
29
30def train_gnn(model, train_loader, val_loader, epochs=100):
31    """Train GNN model."""
32    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
33    model.to(device)
34    
35    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
36    criterion = nn.CrossEntropyLoss()
37    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
38        optimizer, mode='min', patience=5, factor=0.5
39    )
40    
41    best_val_acc = 0.0
42    
43    for epoch in range(epochs):
44        # Training
45        model.train()
46        train_loss = 0.0
47        train_correct = 0
48        train_total = 0
49        
50        for graphs, labels in train_loader:
51            graphs = graphs.to(device)
52            labels = labels.to(device)
53            
54            optimizer.zero_grad()
55            
56            outputs = model(graphs)
57            loss = criterion(outputs, labels)
58            
59            loss.backward()
60            optimizer.step()
61            
62            train_loss += loss.item()
63            _, predicted = torch.max(outputs, 1)
64            train_correct += (predicted == labels).sum().item()
65            train_total += labels.size(0)
66        
67        train_acc = train_correct / train_total
68        
69        # Validation
70        model.eval()
71        val_loss = 0.0
72        val_correct = 0
73        val_total = 0
74        
75        with torch.no_grad():
76            for graphs, labels in val_loader:
77                graphs = graphs.to(device)
78                labels = labels.to(device)
79                
80                outputs = model(graphs)
81                loss = criterion(outputs, labels)
82                
83                val_loss += loss.item()
84                _, predicted = torch.max(outputs, 1)
85                val_correct += (predicted == labels).sum().item()
86                val_total += labels.size(0)
87        
88        val_acc = val_correct / val_total
89        
90        scheduler.step(val_loss)
91        
92        if val_acc > best_val_acc:
93            best_val_acc = val_acc
94            torch.save(model.state_dict(), 'best_model.pt')
95        
96        print(f"Epoch {epoch+1}/{epochs}: "
97              f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f}, "
98              f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}")
99    
100    return best_val_acc
101

Production Results #

From our GNN-based order book prediction (2023-2024):

Model Performance Comparison #

plaintext

1Model                  Accuracy   Precision   Recall   Latency
2─────────────────────────────────────────────────────────────
3Logistic Regression    52.3%      51.8%       52.1%    0.8ms
4Random Forest          56.1%      55.3%       56.8%    2.1ms
5Feed-Forward NN        58.7%      58.2%       59.1%    1.2ms
6GCN                    61.4%      60.8%       61.9%    3.4ms
7GAT                    62.1%      61.5%       62.7%    4.8ms
8

Prediction Horizon Analysis #

plaintext

1Horizon    GNN Accuracy   Baseline Accuracy   Improvement
2───────────────────────────────────────────────────────────
350ms       64.2%          55.1%               +9.1%
4100ms      62.1%          54.3%               +7.8%
5500ms      58.3%          53.2%               +5.1%
61000ms     55.7%          52.4%               +3.3%
7

Lessons Learned #

After 2+ years production GNN trading:

GNNs beat FF networks: +3-4% accuracy improvement
Attention helps: GAT outperforms GCN by ~1%
Graph structure matters: Edge design crucial for performance
Short horizons best: Accuracy degrades after 500ms
Latency trade-off: GNN 4-5ms vs FF 1ms, but higher accuracy
Batch inference: Process multiple books in parallel for efficiency
Feature engineering still matters: Good node features essential

GNNs provide measurable improvement in order book prediction, but require careful engineering to maintain low latency for HFT applications.

Why GNNs for Order Books #

Order books are naturally graphs:

Nodes: Price levels, orders
Edges: Price proximity, order relationships
Features: Volume, spread, time priority
Dynamics: Add/cancel propagate through levels

Traditional ML treats each level independently. GNNs capture the interconnected structure.

Order Book as Graph #

Graph Construction #

python

1import torch
2import torch.nn as nn
3from torch_geometric.nn import GCNConv, GATConv
4from torch_geometric.data import Data
5import numpy as np
6
7class OrderBookGraph:
8    """
9    Convert order book to graph representation.
10    """
11    
12    def __init__(self, n_levels: int = 10):
13        self.n_levels = n_levels
14        
15    def create_graph(self, 
16                    bid_prices: np.ndarray,
17                    bid_sizes: np.ndarray,
18                    ask_prices: np.ndarray,
19                    ask_sizes: np.ndarray) -> Data:
20        """
21        Create graph from order book snapshot.
22        
23        Nodes: Each price level (bid and ask)
24        Edges: Connect adjacent levels + bid-ask pairs
25        Features: Price, size, distance from mid
26        """
27        n_bids = min(len(bid_prices), self.n_levels)
28        n_asks = min(len(ask_prices), self.n_levels)
29        n_nodes = n_bids + n_asks
30        
31        # Node features: [normalized_price, size, distance_from_mid]
32        mid_price = (bid_prices[0] + ask_prices[0]) / 2
33        
34        node_features = []
35        
36        # Bid nodes
37        for i in range(n_bids):
38            features = [
39                bid_prices[i] / mid_price,  # Normalized price
40                bid_sizes[i],
41                (mid_price - bid_prices[i]) / mid_price  # Distance
42            ]
43            node_features.append(features)
44        
45        # Ask nodes
46        for i in range(n_asks):
47            features = [
48                ask_prices[i] / mid_price,
49                ask_sizes[i],
50                (ask_prices[i] - mid_price) / mid_price
51            ]
52            node_features.append(features)
53        
54        # Edge list: Connect adjacent levels + spread edges
55        edge_index = []
56        
57        # Connect adjacent bid levels
58        for i in range(n_bids - 1):
59            edge_index.append([i, i + 1])
60            edge_index.append([i + 1, i])
61        
62        # Connect adjacent ask levels
63        for i in range(n_asks - 1):
64            idx1 = n_bids + i
65            idx2 = n_bids + i + 1
66            edge_index.append([idx1, idx2])
67            edge_index.append([idx2, idx1])
68        
69        # Connect best bid to best ask (spread edge)
70        edge_index.append([0, n_bids])
71        edge_index.append([n_bids, 0])
72        
73        # Convert to tensors
74        x = torch.tensor(node_features, dtype=torch.float)
75        edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous()
76        
77        return Data(x=x, edge_index=edge_index)
78

Graph Convolutional Network #

python

1class OrderBookGCN(nn.Module):
2    """
3    GCN for order book prediction.
4    """
5    
6    def __init__(self, 
7                 input_dim: int = 3,
8                 hidden_dim: int = 64,
9                 num_layers: int = 3,
10                 dropout: float = 0.2):
11        super().__init__()
12        
13        self.convs = nn.ModuleList()
14        self.batch_norms = nn.ModuleList()
15        
16        # First layer
17        self.convs.append(GCNConv(input_dim, hidden_dim))
18        self.batch_norms.append(nn.BatchNorm1d(hidden_dim))
19        
20        # Hidden layers
21        for _ in range(num_layers - 2):
22            self.convs.append(GCNConv(hidden_dim, hidden_dim))
23            self.batch_norms.append(nn.BatchNorm1d(hidden_dim))
24        
25        # Last layer
26        self.convs.append(GCNConv(hidden_dim, hidden_dim))
27        
28        # Prediction heads
29        self.direction_head = nn.Sequential(
30            nn.Linear(hidden_dim, 32),
31            nn.ReLU(),
32            nn.Dropout(dropout),
33            nn.Linear(32, 3)  # Up, Down, Neutral
34        )
35        
36        self.magnitude_head = nn.Sequential(
37            nn.Linear(hidden_dim, 32),
38            nn.ReLU(),
39            nn.Dropout(dropout),
40            nn.Linear(32, 1)
41        )
42        
43        self.dropout = nn.Dropout(dropout)
44        
45    def forward(self, data: Data) -> tuple:
46        """
47        Forward pass.
48        
49        Returns:
50            (direction_logits, magnitude_prediction)
51        """
52        x, edge_index = data.x, data.edge_index
53        
54        # Graph convolutions
55        for i, conv in enumerate(self.convs[:-1]):
56            x = conv(x, edge_index)
57            x = self.batch_norms[i](x)
58            x = torch.relu(x)
59            x = self.dropout(x)
60        
61        # Final conv (no activation)
62        x = self.convs[-1](x, edge_index)
63        
64        # Global pooling (mean of all nodes)
65        graph_embedding = torch.mean(x, dim=0)
66        
67        # Predictions
68        direction = self.direction_head(graph_embedding)
69        magnitude = self.magnitude_head(graph_embedding)
70        
71        return direction, magnitude
72

Graph Attention Network #

python

1class OrderBookGAT(nn.Module):
2    """
3    Graph Attention Network - learns which price levels to focus on.
4    """
5    
6    def __init__(self,
7                 input_dim: int = 3,
8                 hidden_dim: int = 64,
9                 num_heads: int = 4,
10                 num_layers: int = 3,
11                 dropout: float = 0.2):
12        super().__init__()
13        
14        self.convs = nn.ModuleList()
15        
16        # First GAT layer
17        self.convs.append(
18            GATConv(input_dim, hidden_dim, heads=num_heads, dropout=dropout)
19        )
20        
21        # Hidden layers
22        for _ in range(num_layers - 2):
23            self.convs.append(
24                GATConv(hidden_dim * num_heads, hidden_dim, 
25                       heads=num_heads, dropout=dropout)
26            )
27        
28        # Last layer (single head)
29        self.convs.append(
30            GATConv(hidden_dim * num_heads, hidden_dim, 
31                   heads=1, concat=False, dropout=dropout)
32        )
33        
34        # Prediction head
35        self.classifier = nn.Sequential(
36            nn.Linear(hidden_dim, 32),
37            nn.ReLU(),
38            nn.Dropout(dropout),
39            nn.Linear(32, 3)
40        )
41        
42    def forward(self, data: Data) -> torch.Tensor:
43        x, edge_index = data.x, data.edge_index
44        
45        # GAT layers
46        for i, conv in enumerate(self.convs):
47            x = conv(x, edge_index)
48            if i < len(self.convs) - 1:
49                x = torch.elu(x)
50        
51        # Pool to graph-level
52        graph_emb = torch.mean(x, dim=0)
53        
54        # Classify
55        return self.classifier(graph_emb)
56

Training Pipeline #

python

1import torch.optim as optim
2from torch.utils.data import Dataset, DataLoader
3
4class OrderBookDataset(Dataset):
5    """
6    Dataset of order book snapshots with labels.
7    """
8    
9    def __init__(self, snapshots: list, labels: np.ndarray, graph_builder):
10        self.snapshots = snapshots
11        self.labels = torch.tensor(labels, dtype=torch.long)
12        self.graph_builder = graph_builder
13        
14    def __len__(self):
15        return len(self.snapshots)
16    
17    def __getitem__(self, idx):
18        snapshot = self.snapshots[idx]
19        
20        # Create graph
21        graph = self.graph_builder.create_graph(
22            bid_prices=snapshot['bid_prices'],
23            bid_sizes=snapshot['bid_sizes'],
24            ask_prices=snapshot['ask_prices'],
25            ask_sizes=snapshot['ask_sizes']
26        )
27        
28        return graph, self.labels[idx]
29
30def train_gnn(model, train_loader, val_loader, epochs=100):
31    """Train GNN model."""
32    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
33    model.to(device)
34    
35    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
36    criterion = nn.CrossEntropyLoss()
37    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
38        optimizer, mode='min', patience=5, factor=0.5
39    )
40    
41    best_val_acc = 0.0
42    
43    for epoch in range(epochs):
44        # Training
45        model.train()
46        train_loss = 0.0
47        train_correct = 0
48        train_total = 0
49        
50        for graphs, labels in train_loader:
51            graphs = graphs.to(device)
52            labels = labels.to(device)
53            
54            optimizer.zero_grad()
55            
56            outputs = model(graphs)
57            loss = criterion(outputs, labels)
58            
59            loss.backward()
60            optimizer.step()
61            
62            train_loss += loss.item()
63            _, predicted = torch.max(outputs, 1)
64            train_correct += (predicted == labels).sum().item()
65            train_total += labels.size(0)
66        
67        train_acc = train_correct / train_total
68        
69        # Validation
70        model.eval()
71        val_loss = 0.0
72        val_correct = 0
73        val_total = 0
74        
75        with torch.no_grad():
76            for graphs, labels in val_loader:
77                graphs = graphs.to(device)
78                labels = labels.to(device)
79                
80                outputs = model(graphs)
81                loss = criterion(outputs, labels)
82                
83                val_loss += loss.item()
84                _, predicted = torch.max(outputs, 1)
85                val_correct += (predicted == labels).sum().item()
86                val_total += labels.size(0)
87        
88        val_acc = val_correct / val_total
89        
90        scheduler.step(val_loss)
91        
92        if val_acc > best_val_acc:
93            best_val_acc = val_acc
94            torch.save(model.state_dict(), 'best_model.pt')
95        
96        print(f"Epoch {epoch+1}/{epochs}: "
97              f"Train Loss={train_loss:.4f}, Train Acc={train_acc:.4f}, "
98              f"Val Loss={val_loss:.4f}, Val Acc={val_acc:.4f}")
99    
100    return best_val_acc
101

Production Results #

From our GNN-based order book prediction (2023-2024):

Model Performance Comparison #

plaintext

1Model                  Accuracy   Precision   Recall   Latency
2─────────────────────────────────────────────────────────────
3Logistic Regression    52.3%      51.8%       52.1%    0.8ms
4Random Forest          56.1%      55.3%       56.8%    2.1ms
5Feed-Forward NN        58.7%      58.2%       59.1%    1.2ms
6GCN                    61.4%      60.8%       61.9%    3.4ms
7GAT                    62.1%      61.5%       62.7%    4.8ms
8

Prediction Horizon Analysis #

plaintext

1Horizon    GNN Accuracy   Baseline Accuracy   Improvement
2───────────────────────────────────────────────────────────
350ms       64.2%          55.1%               +9.1%
4100ms      62.1%          54.3%               +7.8%
5500ms      58.3%          53.2%               +5.1%
61000ms     55.7%          52.4%               +3.3%
7

Lessons Learned #

After 2+ years production GNN trading:

GNNs beat FF networks: +3-4% accuracy improvement
Attention helps: GAT outperforms GCN by ~1%
Graph structure matters: Edge design crucial for performance
Short horizons best: Accuracy degrades after 500ms
Latency trade-off: GNN 4-5ms vs FF 1ms, but higher accuracy
Batch inference: Process multiple books in parallel for efficiency
Feature engineering still matters: Good node features essential

GNNs provide measurable improvement in order book prediction, but require careful engineering to maintain low latency for HFT applications.

Graph Neural Networks for Order Book Prediction

Why GNNs for Order Books #

Order Book as Graph #

Graph Construction #

Graph Convolutional Network #

Graph Attention Network #

Training Pipeline #

Production Results #

Model Performance Comparison #

Prediction Horizon Analysis #

Lessons Learned #

Further Reading #

NordVarg Team

Join 1,000+ Engineers

Related Posts

Graph Neural Networks for Order Book Prediction

Why GNNs for Order Books #

Order Book as Graph #

Graph Construction #

Graph Convolutional Network #

Graph Attention Network #

Training Pipeline #

Production Results #

Model Performance Comparison #

Prediction Horizon Analysis #

Lessons Learned #

Further Reading #

NordVarg Team

Join 1,000+ Engineers

Related Posts