In financial software, a "correct" result isn't just about passing a few example inputs. It's about guaranteeing invariants: money is never created or destroyed, orders are matched according to price-time priority, and risk limits are never breached.
Unit tests are necessary but insufficient for these guarantees. You need Property-Based Testing (PBT). In this guide, we'll move beyond the basics of Hypothesis and test a real-world component: a Limit Order Book (LOB).
An Order Book is a stateful system. The state (open orders) changes with every new order, cancellation, or trade. Testing this with hardcoded examples is a nightmare because the number of permutations is infinite.
We need to generate random—but valid—sequences of actions and verify that core invariants hold true after every step.
First, let's define what we're testing. A simple Order Book might look like this:
1from dataclasses import dataclass
2from enum import Enum
3from typing import Optional, List
4
5class Side(Enum):
6 BUY = 1
7 SELL = 2
8
9@dataclass(frozen=True)
10class Order:
11 id: str
12 side: Side
13 price: int # Using integers for currency (cents/ticks) to avoid float issues
14 quantity: int
15
16@dataclass
17class Trade:
18 buy_order_id: str
19 sell_order_id: str
20 price: int
21 quantity: int
22
23class OrderBook:
24 """Simple order book implementation."""
25
26 def __init__(self):
27 self.orders = {}
28 self.trades = []
29
30 def add_order(self, order: Order) -> List[Trade]:
31 """Add order and return any trades executed."""
32 trades = self._match(order)
33 if order.quantity > 0: # Remaining quantity
34 self.orders[order.id] = order
35 return trades
36
37 def cancel_order(self, order_id: str) -> int:
38 """Cancel order and return cancelled quantity."""
39 if order_id in self.orders:
40 qty = self.orders[order_id].quantity
41 del self.orders[order_id]
42 return qty
43 return 0
44
45 def get_all_orders(self) -> List[Order]:
46 """Get all resting orders."""
47 return list(self.orders.values())
48
49 def _match(self, taker: Order) -> List[Trade]:
50 """Match taker order against resting orders."""
51 trades = []
52 # Simplified matching logic
53 # ... (implementation details)
54 return trades
55We can't just feed random bytes into an Order Book. We need valid orders. Hypothesis allows us to build custom strategies.
1from hypothesis import strategies as st
2
3# Generate valid prices (e.g., ticks between 1 and 10,000)
4prices = st.integers(min_value=1, max_value=10000)
5
6# Generate valid quantities
7quantities = st.integers(min_value=1, max_value=1000)
8
9# Generate Order IDs
10order_ids = st.uuids().map(str)
11
12# Composite strategy for a valid Limit Order
13@st.composite
14def limit_orders(draw):
15 return Order(
16 id=draw(order_ids),
17 side=draw(st.sampled_from(Side)),
18 price=draw(prices),
19 quantity=draw(quantities)
20 )
21Now we can generate thousands of valid orders in seconds.
The real power of Hypothesis lies in Stateful Testing. We define a state machine that models our system. Hypothesis then attempts to break it by finding a sequence of actions (add order, cancel order, match) that violates our invariants.
A critical invariant in an exchange is:
Total Volume In = Volume in Book + Volume Traded + Volume Cancelled
If this equation ever fails, we have a bug (and potentially a lawsuit).
1from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, precondition
2from hypothesis import assume
3
4class OrderBookMachine(RuleBasedStateMachine):
5 def __init__(self):
6 super().__init__()
7 self.book = OrderBook() # The system under test
8
9 # Shadow state for verification
10 self.orders_sent = {}
11 self.volume_traded = 0
12 self.volume_cancelled = 0
13
14 @rule(order=limit_orders())
15 def add_limit_order(self, order):
16 # Update shadow state
17 self.orders_sent[order.id] = order
18
19 # Apply to system
20 trades = self.book.add_order(order)
21
22 # Update traded volume
23 for trade in trades:
24 self.volume_traded += trade.quantity
25
26 @rule(order_id=st.uuids().map(str))
27 def cancel_order(self, order_id):
28 # Only cancel if we know about this order
29 assume(order_id in self.orders_sent)
30
31 cancelled_qty = self.book.cancel_order(order_id)
32 if cancelled_qty > 0:
33 self.volume_cancelled += cancelled_qty
34
35 @invariant()
36 def check_conservation_of_volume(self):
37 # Calculate total volume sent to the system
38 total_sent = sum(o.quantity for o in self.orders_sent.values())
39
40 # Calculate volume currently resting in the book
41 volume_in_book = sum(o.quantity for o in self.book.get_all_orders())
42
43 # The Equation
44 assert total_sent == volume_in_book + self.volume_traded + self.volume_cancelled, \
45 f"Volume leak detected! Sent: {total_sent}, Book: {volume_in_book}, Traded: {self.volume_traded}, Cancelled: {self.volume_cancelled}"
46
47 @invariant()
48 def check_crossed_book(self):
49 # Invariant: Best Bid must always be < Best Ask
50 # (Unless we are in the middle of matching, but this runs after matching)
51 bids = [o for o in self.book.get_all_orders() if o.side == Side.BUY]
52 asks = [o for o in self.book.get_all_orders() if o.side == Side.SELL]
53
54 if bids and asks:
55 best_bid = max(o.price for o in bids)
56 best_ask = min(o.price for o in asks)
57 assert best_bid < best_ask, f"Crossed book detected! Bid: {best_bid} >= Ask: {best_ask}"
58
59# Run the stateful test
60TestOrderBook = OrderBookMachine.TestCase
61Let's add more sophisticated invariants that real trading systems must maintain.
1class MarginOrderBookMachine(OrderBookMachine):
2 """Order book with margin requirements."""
3
4 def __init__(self):
5 super().__init__()
6 self.user_balances = {} # user_id -> balance
7 self.margin_requirements = {} # order_id -> margin
8
9 @rule(order=limit_orders(), user_id=st.text(min_size=1, max_size=10))
10 def add_order_with_margin(self, order, user_id):
11 # Calculate margin requirement
12 margin_required = order.price * order.quantity
13
14 # Check if user has sufficient balance
15 current_balance = self.user_balances.get(user_id, 1_000_000)
16
17 if current_balance >= margin_required:
18 # Deduct margin
19 self.user_balances[user_id] = current_balance - margin_required
20 self.margin_requirements[order.id] = margin_required
21
22 # Add order
23 self.add_limit_order(order)
24
25 @invariant()
26 def check_margin_consistency(self):
27 """Ensure total margin equals sum of individual margins."""
28 total_margin = sum(self.margin_requirements.values())
29
30 # All locked margin should be accounted for
31 for order_id, margin in self.margin_requirements.items():
32 assert margin > 0, f"Negative margin for order {order_id}"
331@invariant()
2def check_settlement_balance(self):
3 """Ensure settlement balances are consistent."""
4 # For each trade, buyer pays seller
5 for trade in self.book.trades:
6 trade_value = trade.price * trade.quantity
7
8 # Verify trade value is positive
9 assert trade_value > 0, f"Invalid trade value: {trade_value}"
10When I ran this against a naive implementation of a matching engine, Hypothesis found a bug in less than 3 seconds.
The Scenario:
add_order(BUY, 100 @ $10)add_order(SELL, 50 @ $10) -> Matches 50. Remaining Buy: 50.cancel_order(BuyID) -> Bug: The naive implementation cancelled the original quantity (100), not the remaining quantity (50).Result: Volume Cancelled (100) + Volume Traded (50) > Total Sent (100). Money was created out of thin air.
Hypothesis provided the exact minimal reproduction steps to fix it.
Property-based testing isn't just for correctness—it can also catch performance regressions.
1import time
2from hypothesis import settings, Phase
3
4class PerformanceOrderBookMachine(OrderBookMachine):
5 """Test performance characteristics."""
6
7 @rule(order=limit_orders())
8 @settings(max_examples=1000, phases=[Phase.generate])
9 def add_order_performance(self, order):
10 start = time.perf_counter()
11 self.book.add_order(order)
12 duration = time.perf_counter() - start
13
14 # Assert order processing takes < 1ms
15 assert duration < 0.001, f"Order processing too slow: {duration*1000:.2f}ms"
16
17 @invariant()
18 def check_book_size_performance(self):
19 """Ensure book doesn't grow unbounded."""
20 num_orders = len(self.book.get_all_orders())
21
22 # Book should never have more than 10,000 resting orders
23 assert num_orders < 10000, f"Order book too large: {num_orders} orders"
24Running property-based tests in CI requires some configuration to balance thoroughness with build time.
1# conftest.py
2from hypothesis import settings, Verbosity
3
4# Default settings for local development
5settings.register_profile("dev", max_examples=100, verbosity=Verbosity.verbose)
6
7# Thorough settings for CI
8settings.register_profile("ci", max_examples=1000, verbosity=Verbosity.normal)
9
10# Quick settings for pre-commit hooks
11settings.register_profile("quick", max_examples=10)
12
13import os
14settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))
151# .github/workflows/test.yml
2name: Tests
3
4on: [push, pull_request]
5
6jobs:
7 test:
8 runs-on: ubuntu-latest
9
10 steps:
11 - uses: actions/checkout@v3
12
13 - name: Set up Python
14 uses: actions/setup-python@v4
15 with:
16 python-version: '3.11'
17
18 - name: Install dependencies
19 run: |
20 pip install pytest hypothesis pytest-cov
21
22 - name: Run property-based tests
23 env:
24 HYPOTHESIS_PROFILE: ci
25 run: |
26 pytest tests/ -v --cov=src --cov-report=xml
27
28 - name: Upload coverage
29 uses: codecov/codecov-action@v3
30Hypothesis can store failing examples to ensure regressions don't reoccur:
1# pytest.ini
2[pytest]
3hypothesis_storage_directory = .hypothesis
4Commit the .hypothesis directory to version control to share failing examples across the team.
| Feature | Hypothesis (Python) | QuickCheck (Haskell) |
|---|---|---|
| Shrinking | Automatic | Automatic |
| Stateful Testing | RuleBasedStateMachine | quickcheck-state-machine |
| Custom Generators | @st.composite | Arbitrary instances |
| Performance | ~1000 examples/sec | ~10,000 examples/sec |
| Type Safety | Runtime | Compile-time |
| Integration | pytest, unittest | HUnit, Tasty |
| Learning Curve | Moderate | Steep (requires Haskell) |
When to use Hypothesis:
When to use QuickCheck:
When a failure occurs, Hypothesis "shrinks" the input. It doesn't just tell you "it failed after 1000 steps." It simplifies the sequence:
It gives you the simplest possible sequence that triggers the bug.
Example:
1from hypothesis import given, example
2
3@given(st.lists(st.integers()))
4@example([]) # Always test empty list
5@example([0]) # Always test single element
6def test_sorted_list(xs):
7 sorted_xs = sorted(xs)
8
9 # Property: sorted list should be in order
10 for i in range(len(sorted_xs) - 1):
11 assert sorted_xs[i] <= sorted_xs[i + 1]
12If this fails, Hypothesis will shrink to the minimal failing list.
In fintech, you cannot rely on "happy path" testing. The edge cases—race conditions, partial fills, complex state transitions—are where the money is lost.
By defining Invariants and using Stateful Property-Based Testing, you move from "checking examples" to "proving correctness."
st.composite to generate valid domain objects.RuleBasedStateMachine for stateful systems like Order Books.Production Checklist:
max_examples >= 1000.hypothesis/Property-based testing is an investment that pays dividends every time it catches a bug before production.
Technical Writer
NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.
Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.