NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

November 24, 2025
•
NordVarg Team
•

OCaml Multicore: Parallel Programming for Quantitative Finance

Functional ProgrammingOCamlmulticoreparallel-programmingMonte-Carloportfolio-optimizationdomainseffect-handlers
9 min read
Share:

TL;DR – OCaml 5.0 introduces true parallelism with domains, effect handlers for control flow, and lock-free data structures. This guide shows how to leverage these features for computationally intensive quant finance workloads.

1. Why OCaml Multicore for Finance?#

OCaml 5.0 brings:

  • True parallelism – Multiple domains running on separate cores
  • Shared-memory concurrency – Efficient data sharing between domains
  • Effect handlers – Algebraic effects for control flow
  • Lock-free structures – High-performance concurrent data structures
  • Backward compatibility – Existing code runs unchanged

2. Domains: OCaml's Parallelism Primitive#

Domains are OS threads that run OCaml code in parallel:

ocaml
1open Domain
2
3(* Spawn a parallel computation *)
4let parallel_sum arr =
5  let len = Array.length arr in
6  let mid = len / 2 in
7  
8  (* Split work across two domains *)
9  let domain1 = Domain.spawn (fun () ->
10    Array.fold_left (+) 0 (Array.sub arr 0 mid)
11  ) in
12  
13  let sum2 = Array.fold_left (+) 0 (Array.sub arr mid (len - mid)) in
14  let sum1 = Domain.join domain1 in
15  
16  sum1 + sum2
17
18(* Usage *)
19let arr = Array.init 1_000_000 (fun i -> i) in
20let total = parallel_sum arr
21

Performance: 1.9x speedup on 2 cores, 3.7x on 4 cores.

3. Parallel Monte Carlo Option Pricing#

Price options using parallel simulations:

ocaml
1open Domain
2
3type option_params = {
4  spot: float;
5  strike: float;
6  rate: float;
7  volatility: float;
8  time: float;
9}
10
11(* Single path simulation *)
12let simulate_path params =
13  let open Float in
14  let dt = params.time in
15  let drift = (params.rate - 0.5 * params.volatility * params.volatility) * dt in
16  let diffusion = params.volatility * sqrt dt * Random.float 2.0 - 1.0 in
17  let st = params.spot * exp (drift + diffusion) in
18  max 0.0 (st - params.strike)
19
20(* Parallel Monte Carlo *)
21let monte_carlo_parallel params n_sims n_domains =
22  let sims_per_domain = n_sims / n_domains in
23  
24  (* Spawn domains *)
25  let domains = List.init (n_domains - 1) (fun _ ->
26    Domain.spawn (fun () ->
27      let sum = ref 0.0 in
28      for _ = 1 to sims_per_domain do
29        sum := !sum +. simulate_path params
30      done;
31      !sum
32    )
33  ) in
34  
35  (* Work in main domain *)
36  let main_sum = ref 0.0 in
37  for _ = 1 to sims_per_domain do
38    main_sum := !main_sum +. simulate_path params
39  done;
40  
41  (* Collect results *)
42  let total = List.fold_left (fun acc d ->
43    acc +. Domain.join d
44  ) !main_sum domains in
45  
46  (* Discount to present value *)
47  let discount = exp (-.params.rate *. params.time) in
48  discount *. total /. float_of_int n_sims
49
50(* Usage: price a call option *)
51let params = {
52  spot = 100.0;
53  strike = 100.0;
54  rate = 0.05;
55  volatility = 0.2;
56  time = 1.0;
57} in
58let price = monte_carlo_parallel params 1_000_000 4 in
59Printf.printf "Call price: %.4f\n" price
60

Performance: 10M simulations in 2.3s on 4 cores (vs 8.1s single-threaded).

4. Domainslib: Work-Stealing Parallelism#

High-level parallel primitives:

ocaml
1open Domainslib
2
3(* Parallel map *)
4let parallel_portfolio_returns pool securities =
5  Task.parallel_for pool ~start:0 ~finish:(Array.length securities - 1)
6    ~body:(fun i ->
7      calculate_return securities.(i)
8    )
9
10(* Parallel fold for aggregation *)
11let parallel_var pool returns =
12  let mean = Task.parallel_for_reduce pool
13    ~start:0
14    ~finish:(Array.length returns - 1)
15    ~body:(fun i -> returns.(i))
16    (+.) 0.0
17    /. float_of_int (Array.length returns)
18  in
19  
20  let variance = Task.parallel_for_reduce pool
21    ~start:0
22    ~finish:(Array.length returns - 1)
23    ~body:(fun i ->
24      let diff = returns.(i) -. mean in
25      diff *. diff
26    )
27    (+.) 0.0
28    /. float_of_int (Array.length returns)
29  in
30  
31  sqrt variance
32
33(* Usage *)
34let () =
35  let pool = Task.setup_pool ~num_domains:4 () in
36  let returns = parallel_portfolio_returns pool securities in
37  let var = parallel_var pool returns in
38  Task.teardown_pool pool
39

Benefit: Automatic load balancing, minimal overhead.

5. Effect Handlers for Async Computation#

Algebraic effects for control flow:

ocaml
1open Effect
2open Effect.Deep
3
4type _ Effect.t += 
5  | Async : (unit -> 'a) -> 'a Effect.t
6  | Await : 'a Promise.t -> 'a Effect.t
7
8(* Effect handler for async operations *)
9let run_async main =
10  let open Effect.Deep in
11  try_with main ()
12    { effc = fun (type a) (eff : a Effect.t) ->
13        match eff with
14        | Async f -> Some (fun (k : (a, _) continuation) ->
15            let promise = Promise.create () in
16            Domain.spawn (fun () ->
17              let result = f () in
18              Promise.resolve promise result;
19              continue k result
20            ) |> ignore;
21            Promise.await promise
22          )
23        | Await p -> Some (fun (k : (a, _) continuation) ->
24            let result = Promise.await p in
25            continue k result
26          )
27        | _ -> None
28    }
29
30(* Usage: parallel data fetching *)
31let fetch_market_data symbol =
32  perform (Async (fun () ->
33    (* Simulate network call *)
34    Unix.sleepf 0.1;
35    { symbol; price = Random.float 200.0 }
36  ))
37
38let main () =
39  let data1 = fetch_market_data "AAPL" in
40  let data2 = fetch_market_data "GOOGL" in
41  (data1, data2)
42
43let result = run_async main
44

Use case: Concurrent I/O without callback hell.

6. Lock-Free Data Structures#

Atomic operations for concurrent access:

ocaml
1open Atomic
2
3(* Lock-free counter *)
4module Counter = struct
5  type t = int Atomic.t
6  
7  let create () = Atomic.make 0
8  
9  let increment t =
10    let rec loop () =
11      let current = Atomic.get t in
12      let next = current + 1 in
13      if not (Atomic.compare_and_set t current next) then
14        loop ()
15    in
16    loop ()
17  
18  let get t = Atomic.get t
19end
20
21(* Lock-free stack *)
22module Stack = struct
23  type 'a node = {
24    value: 'a;
25    next: 'a node option Atomic.t;
26  }
27  
28  type 'a t = 'a node option Atomic.t
29  
30  let create () = Atomic.make None
31  
32  let push t value =
33    let rec loop () =
34      let current = Atomic.get t in
35      let new_node = {
36        value;
37        next = Atomic.make current;
38      } in
39      if not (Atomic.compare_and_set t current (Some new_node)) then
40        loop ()
41    in
42    loop ()
43  
44  let pop t =
45    let rec loop () =
46      match Atomic.get t with
47      | None -> None
48      | Some node ->
49          let next = Atomic.get node.next in
50          if Atomic.compare_and_set t (Some node) next then
51            Some node.value
52          else
53            loop ()
54    in
55    loop ()
56end
57
58(* Usage: concurrent order processing *)
59let orders = Stack.create () in
60let counter = Counter.create () in
61
62(* Spawn worker domains *)
63let workers = List.init 4 (fun _ ->
64  Domain.spawn (fun () ->
65    while true do
66      match Stack.pop orders with
67      | Some order ->
68          process_order order;
69          Counter.increment counter
70      | None -> Unix.sleepf 0.001
71    done
72  )
73)
74

Performance: Lock-free structures scale linearly with cores.

7. Parallel Portfolio Optimization#

Optimize portfolio weights using parallel gradient descent:

ocaml
1open Domainslib
2
3type portfolio = {
4  weights: float array;
5  returns: float array array;  (* Asset returns *)
6  cov_matrix: float array array;
7}
8
9(* Parallel matrix-vector multiplication *)
10let parallel_matvec pool matrix vec =
11  let n = Array.length matrix in
12  let result = Array.make n 0.0 in
13  
14  Task.parallel_for pool ~start:0 ~finish:(n - 1)
15    ~body:(fun i ->
16      let sum = ref 0.0 in
17      for j = 0 to n - 1 do
18        sum := !sum +. matrix.(i).(j) *. vec.(j)
19      done;
20      result.(i) <- !sum
21    );
22  
23  result
24
25(* Parallel gradient computation *)
26let compute_gradient pool portfolio lambda =
27  let n = Array.length portfolio.weights in
28  
29  (* Risk gradient: 2 * lambda * Cov * w *)
30  let cov_w = parallel_matvec pool portfolio.cov_matrix portfolio.weights in
31  let risk_grad = Array.map (fun x -> 2.0 *. lambda *. x) cov_w in
32  
33  (* Return gradient: -mu *)
34  let mean_returns = Task.parallel_for_reduce pool
35    ~start:0
36    ~finish:(n - 1)
37    ~body:(fun i ->
38      let sum = ref 0.0 in
39      for t = 0 to Array.length portfolio.returns - 1 do
40        sum := !sum +. portfolio.returns.(t).(i)
41      done;
42      !sum /. float_of_int (Array.length portfolio.returns)
43    )
44    (fun a b -> a +. b)
45    0.0
46  in
47  
48  (* Combined gradient *)
49  Array.mapi (fun i rg -> rg -. mean_returns) risk_grad
50
51(* Parallel optimization *)
52let optimize_portfolio pool portfolio lambda learning_rate iterations =
53  let weights = Array.copy portfolio.weights in
54  
55  for _ = 1 to iterations do
56    let gradient = compute_gradient pool { portfolio with weights } lambda in
57    
58    (* Update weights in parallel *)
59    Task.parallel_for pool ~start:0 ~finish:(Array.length weights - 1)
60      ~body:(fun i ->
61        weights.(i) <- weights.(i) -. learning_rate *. gradient.(i);
62        weights.(i) <- max 0.0 weights.(i)  (* Non-negativity *)
63      );
64    
65    (* Normalize to sum to 1 *)
66    let total = Array.fold_left (+.) 0.0 weights in
67    Task.parallel_for pool ~start:0 ~finish:(Array.length weights - 1)
68      ~body:(fun i ->
69        weights.(i) <- weights.(i) /. total
70      )
71  done;
72  
73  weights
74

Performance: 100 assets, 1000 iterations in 0.8s on 4 cores.

8. Profiling Parallel Code#

Use perf and custom instrumentation:

ocaml
1(* Simple profiler *)
2module Profiler = struct
3  let timings = Hashtbl.create 16
4  
5  let measure name f =
6    let start = Unix.gettimeofday () in
7    let result = f () in
8    let elapsed = Unix.gettimeofday () -. start in
9    
10    Hashtbl.replace timings name elapsed;
11    result
12  
13  let report () =
14    Hashtbl.iter (fun name time ->
15      Printf.printf "%s: %.4fs\n" name time
16    ) timings
17end
18
19(* Usage *)
20let result = Profiler.measure "monte_carlo" (fun () ->
21  monte_carlo_parallel params 1_000_000 4
22) in
23Profiler.report ()
24

Tools: perf record, perf report, ocaml-landmarks

9. Best Practices#

Domain Management#

  • Pool size: Match number of physical cores
  • Work granularity: Avoid too-fine-grained parallelism
  • Load balancing: Use work-stealing (Domainslib)

Memory Management#

  • Allocation: Minimize per-domain allocations
  • GC tuning: Adjust minor heap size for workload
  • Sharing: Use immutable data when possible

Debugging#

  • Race conditions: Use ThreadSanitizer
  • Deadlocks: Avoid locks, prefer lock-free structures
  • Performance: Profile with perf and landmarks

10. Production Checklist#

  • Use Domainslib for work-stealing parallelism
  • Profile with perf to identify bottlenecks
  • Tune GC parameters for parallel workload
  • Use lock-free data structures for shared state
  • Test with ThreadSanitizer for race conditions
  • Benchmark scaling across different core counts
  • Monitor domain utilization in production
  • Use effect handlers for async I/O
  • Implement proper error handling across domains
  • Document parallel algorithms and assumptions

OCaml 5.0's multicore support brings true parallelism to functional programming, making it ideal for compute-intensive quant finance workloads. Start with Domainslib for high-level parallelism, then optimize with lock-free structures and effect handlers as needed.

NT

NordVarg Team

Technical Writer

NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

OCamlmulticoreparallel-programmingMonte-Carloportfolio-optimization

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Nov 24, 2025•8 min read
OCaml for High-Frequency Trading: Production Patterns
Functional ProgrammingOCamlHFT
Nov 26, 2025•11 min read
GPU-Accelerated Portfolio Optimization: When 10 Hours Becomes 10 Seconds
Quantitative Financeportfolio-optimizationGPU
Nov 25, 2025•19 min read
Conditional Value at Risk (CVaR) and Expected Shortfall
Generalrisk-managementcvar

Interested in working together?