NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

November 11, 2025
•
NordVarg Team
•

Property-Based Testing in Finance: From Hypothesis to Production

Testingproperty-based-testinghypothesispythontestingfintechci-cd
9 min read
Share:

In financial software, a "correct" result isn't just about passing a few example inputs. It's about guaranteeing invariants: money is never created or destroyed, orders are matched according to price-time priority, and risk limits are never breached.

Unit tests are necessary but insufficient for these guarantees. You need Property-Based Testing (PBT). In this guide, we'll move beyond the basics of Hypothesis and test a real-world component: a Limit Order Book (LOB).

The Challenge: Testing an Order Book#

An Order Book is a stateful system. The state (open orders) changes with every new order, cancellation, or trade. Testing this with hardcoded examples is a nightmare because the number of permutations is infinite.

We need to generate random—but valid—sequences of actions and verify that core invariants hold true after every step.

1. Defining the Domain Models#

First, let's define what we're testing. A simple Order Book might look like this:

python
1from dataclasses import dataclass
2from enum import Enum
3from typing import Optional, List
4
5class Side(Enum):
6    BUY = 1
7    SELL = 2
8
9@dataclass(frozen=True)
10class Order:
11    id: str
12    side: Side
13    price: int  # Using integers for currency (cents/ticks) to avoid float issues
14    quantity: int
15    
16@dataclass
17class Trade:
18    buy_order_id: str
19    sell_order_id: str
20    price: int
21    quantity: int
22
23class OrderBook:
24    """Simple order book implementation."""
25    
26    def __init__(self):
27        self.orders = {}
28        self.trades = []
29    
30    def add_order(self, order: Order) -> List[Trade]:
31        """Add order and return any trades executed."""
32        trades = self._match(order)
33        if order.quantity > 0:  # Remaining quantity
34            self.orders[order.id] = order
35        return trades
36    
37    def cancel_order(self, order_id: str) -> int:
38        """Cancel order and return cancelled quantity."""
39        if order_id in self.orders:
40            qty = self.orders[order_id].quantity
41            del self.orders[order_id]
42            return qty
43        return 0
44    
45    def get_all_orders(self) -> List[Order]:
46        """Get all resting orders."""
47        return list(self.orders.values())
48    
49    def _match(self, taker: Order) -> List[Trade]:
50        """Match taker order against resting orders."""
51        trades = []
52        # Simplified matching logic
53        # ... (implementation details)
54        return trades
55

2. Custom Strategies for Valid Data#

We can't just feed random bytes into an Order Book. We need valid orders. Hypothesis allows us to build custom strategies.

python
1from hypothesis import strategies as st
2
3# Generate valid prices (e.g., ticks between 1 and 10,000)
4prices = st.integers(min_value=1, max_value=10000)
5
6# Generate valid quantities
7quantities = st.integers(min_value=1, max_value=1000)
8
9# Generate Order IDs
10order_ids = st.uuids().map(str)
11
12# Composite strategy for a valid Limit Order
13@st.composite
14def limit_orders(draw):
15    return Order(
16        id=draw(order_ids),
17        side=draw(st.sampled_from(Side)),
18        price=draw(prices),
19        quantity=draw(quantities)
20    )
21

Now we can generate thousands of valid orders in seconds.

3. Stateful Testing: The Holy Grail#

The real power of Hypothesis lies in Stateful Testing. We define a state machine that models our system. Hypothesis then attempts to break it by finding a sequence of actions (add order, cancel order, match) that violates our invariants.

The Invariant: Conservation of Volume#

A critical invariant in an exchange is:

Total Volume In = Volume in Book + Volume Traded + Volume Cancelled

If this equation ever fails, we have a bug (and potentially a lawsuit).

python
1from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, precondition
2from hypothesis import assume
3
4class OrderBookMachine(RuleBasedStateMachine):
5    def __init__(self):
6        super().__init__()
7        self.book = OrderBook()  # The system under test
8        
9        # Shadow state for verification
10        self.orders_sent = {}
11        self.volume_traded = 0
12        self.volume_cancelled = 0
13        
14    @rule(order=limit_orders())
15    def add_limit_order(self, order):
16        # Update shadow state
17        self.orders_sent[order.id] = order
18        
19        # Apply to system
20        trades = self.book.add_order(order)
21        
22        # Update traded volume
23        for trade in trades:
24            self.volume_traded += trade.quantity
25            
26    @rule(order_id=st.uuids().map(str))
27    def cancel_order(self, order_id):
28        # Only cancel if we know about this order
29        assume(order_id in self.orders_sent)
30        
31        cancelled_qty = self.book.cancel_order(order_id)
32        if cancelled_qty > 0:
33            self.volume_cancelled += cancelled_qty
34
35    @invariant()
36    def check_conservation_of_volume(self):
37        # Calculate total volume sent to the system
38        total_sent = sum(o.quantity for o in self.orders_sent.values())
39        
40        # Calculate volume currently resting in the book
41        volume_in_book = sum(o.quantity for o in self.book.get_all_orders())
42        
43        # The Equation
44        assert total_sent == volume_in_book + self.volume_traded + self.volume_cancelled, \
45            f"Volume leak detected! Sent: {total_sent}, Book: {volume_in_book}, Traded: {self.volume_traded}, Cancelled: {self.volume_cancelled}"
46
47    @invariant()
48    def check_crossed_book(self):
49        # Invariant: Best Bid must always be < Best Ask
50        # (Unless we are in the middle of matching, but this runs after matching)
51        bids = [o for o in self.book.get_all_orders() if o.side == Side.BUY]
52        asks = [o for o in self.book.get_all_orders() if o.side == Side.SELL]
53        
54        if bids and asks:
55            best_bid = max(o.price for o in bids)
56            best_ask = min(o.price for o in asks)
57            assert best_bid < best_ask, f"Crossed book detected! Bid: {best_bid} >= Ask: {best_ask}"
58
59# Run the stateful test
60TestOrderBook = OrderBookMachine.TestCase
61

4. Additional Financial Invariants#

Let's add more sophisticated invariants that real trading systems must maintain.

Margin Requirements#

python
1class MarginOrderBookMachine(OrderBookMachine):
2    """Order book with margin requirements."""
3    
4    def __init__(self):
5        super().__init__()
6        self.user_balances = {}  # user_id -> balance
7        self.margin_requirements = {}  # order_id -> margin
8    
9    @rule(order=limit_orders(), user_id=st.text(min_size=1, max_size=10))
10    def add_order_with_margin(self, order, user_id):
11        # Calculate margin requirement
12        margin_required = order.price * order.quantity
13        
14        # Check if user has sufficient balance
15        current_balance = self.user_balances.get(user_id, 1_000_000)
16        
17        if current_balance >= margin_required:
18            # Deduct margin
19            self.user_balances[user_id] = current_balance - margin_required
20            self.margin_requirements[order.id] = margin_required
21            
22            # Add order
23            self.add_limit_order(order)
24    
25    @invariant()
26    def check_margin_consistency(self):
27        """Ensure total margin equals sum of individual margins."""
28        total_margin = sum(self.margin_requirements.values())
29        
30        # All locked margin should be accounted for
31        for order_id, margin in self.margin_requirements.items():
32            assert margin > 0, f"Negative margin for order {order_id}"
33

Settlement Invariant#

python
1@invariant()
2def check_settlement_balance(self):
3    """Ensure settlement balances are consistent."""
4    # For each trade, buyer pays seller
5    for trade in self.book.trades:
6        trade_value = trade.price * trade.quantity
7        
8        # Verify trade value is positive
9        assert trade_value > 0, f"Invalid trade value: {trade_value}"
10

5. Finding the "Black Swan"#

When I ran this against a naive implementation of a matching engine, Hypothesis found a bug in less than 3 seconds.

The Scenario:

  1. add_order(BUY, 100 @ $10)
  2. add_order(SELL, 50 @ $10) -> Matches 50. Remaining Buy: 50.
  3. cancel_order(BuyID) -> Bug: The naive implementation cancelled the original quantity (100), not the remaining quantity (50).

Result: Volume Cancelled (100) + Volume Traded (50) > Total Sent (100). Money was created out of thin air.

Hypothesis provided the exact minimal reproduction steps to fix it.

6. Performance Regression Testing#

Property-based testing isn't just for correctness—it can also catch performance regressions.

python
1import time
2from hypothesis import settings, Phase
3
4class PerformanceOrderBookMachine(OrderBookMachine):
5    """Test performance characteristics."""
6    
7    @rule(order=limit_orders())
8    @settings(max_examples=1000, phases=[Phase.generate])
9    def add_order_performance(self, order):
10        start = time.perf_counter()
11        self.book.add_order(order)
12        duration = time.perf_counter() - start
13        
14        # Assert order processing takes < 1ms
15        assert duration < 0.001, f"Order processing too slow: {duration*1000:.2f}ms"
16    
17    @invariant()
18    def check_book_size_performance(self):
19        """Ensure book doesn't grow unbounded."""
20        num_orders = len(self.book.get_all_orders())
21        
22        # Book should never have more than 10,000 resting orders
23        assert num_orders < 10000, f"Order book too large: {num_orders} orders"
24

7. CI/CD Integration#

Running property-based tests in CI requires some configuration to balance thoroughness with build time.

pytest Configuration#

python
1# conftest.py
2from hypothesis import settings, Verbosity
3
4# Default settings for local development
5settings.register_profile("dev", max_examples=100, verbosity=Verbosity.verbose)
6
7# Thorough settings for CI
8settings.register_profile("ci", max_examples=1000, verbosity=Verbosity.normal)
9
10# Quick settings for pre-commit hooks
11settings.register_profile("quick", max_examples=10)
12
13import os
14settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))
15

GitHub Actions Workflow#

yaml
1# .github/workflows/test.yml
2name: Tests
3
4on: [push, pull_request]
5
6jobs:
7  test:
8    runs-on: ubuntu-latest
9    
10    steps:
11      - uses: actions/checkout@v3
12      
13      - name: Set up Python
14        uses: actions/setup-python@v4
15        with:
16          python-version: '3.11'
17      
18      - name: Install dependencies
19        run: |
20          pip install pytest hypothesis pytest-cov
21      
22      - name: Run property-based tests
23        env:
24          HYPOTHESIS_PROFILE: ci
25        run: |
26          pytest tests/ -v --cov=src --cov-report=xml
27      
28      - name: Upload coverage
29        uses: codecov/codecov-action@v3
30

Storing Examples#

Hypothesis can store failing examples to ensure regressions don't reoccur:

python
1# pytest.ini
2[pytest]
3hypothesis_storage_directory = .hypothesis
4

Commit the .hypothesis directory to version control to share failing examples across the team.

8. Comparison: Hypothesis vs QuickCheck#

FeatureHypothesis (Python)QuickCheck (Haskell)
ShrinkingAutomaticAutomatic
Stateful TestingRuleBasedStateMachinequickcheck-state-machine
Custom Generators@st.compositeArbitrary instances
Performance~1000 examples/sec~10,000 examples/sec
Type SafetyRuntimeCompile-time
Integrationpytest, unittestHUnit, Tasty
Learning CurveModerateSteep (requires Haskell)

When to use Hypothesis:

  • Python codebase
  • Need integration with pytest
  • Team familiar with Python

When to use QuickCheck:

  • Haskell codebase
  • Want compile-time guarantees
  • Performance-critical testing

9. Advanced Strategies: Shrinking#

When a failure occurs, Hypothesis "shrinks" the input. It doesn't just tell you "it failed after 1000 steps." It simplifies the sequence:

  • "Remove step 5 (it didn't affect the bug)"
  • "Reduce price from 9283 to 1"
  • "Reduce quantity from 500 to 1"

It gives you the simplest possible sequence that triggers the bug.

Example:

python
1from hypothesis import given, example
2
3@given(st.lists(st.integers()))
4@example([])  # Always test empty list
5@example([0])  # Always test single element
6def test_sorted_list(xs):
7    sorted_xs = sorted(xs)
8    
9    # Property: sorted list should be in order
10    for i in range(len(sorted_xs) - 1):
11        assert sorted_xs[i] <= sorted_xs[i + 1]
12

If this fails, Hypothesis will shrink to the minimal failing list.

Conclusion#

In fintech, you cannot rely on "happy path" testing. The edge cases—race conditions, partial fills, complex state transitions—are where the money is lost.

By defining Invariants and using Stateful Property-Based Testing, you move from "checking examples" to "proving correctness."

Key Takeaways#

  • Use st.composite to generate valid domain objects.
  • Use RuleBasedStateMachine for stateful systems like Order Books.
  • Define invariants (Conservation of Volume, Crossed Book, Margin Requirements) that must always be true.
  • Let the fuzzer find the edge cases for you.
  • Integrate with CI/CD using profiles for different thoroughness levels.
  • Use performance testing to catch regressions.
  • Store failing examples in version control.

Production Checklist:

  • All critical invariants have property-based tests
  • Stateful testing covers main state transitions
  • CI runs with max_examples >= 1000
  • Failing examples are committed to .hypothesis/
  • Performance bounds are tested
  • Team understands how to read shrunk examples

Property-based testing is an investment that pays dividends every time it catches a bug before production.

NT

NordVarg Team

Technical Writer

NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

property-based-testinghypothesispythontestingfintech

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Dec 31, 2024•5 min read
Property-Based Testing for Financial Systems
Testingproperty-based-testinghypothesis
Dec 31, 2024•9 min read
Performance Regression Testing in CI/CD
Testingperformanceci-cd
Dec 31, 2024•8 min read
Chaos Engineering for Trading Infrastructure
Testingchaos-engineeringresilience

Interested in working together?