NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

November 5, 2024
•
NordVarg Engineering Team
•

Zero-Copy Optimization in Rust: Building High-Performance Network Services

A deep dive into zero-copy techniques in Rust for building ultra-low-latency network services, with practical examples from real-world trading systems.

Performance EngineeringRustPerformanceNetwork ProgrammingZero-CopyLow Latency
6 min read
Share:

Zero-Copy Optimization in Rust: Building High-Performance Network Services

When building high-frequency trading systems or real-time data processing pipelines, every microsecond counts. One of the most significant performance optimizations you can make is eliminating unnecessary memory copies. In this article, we'll explore zero-copy techniques in Rust and how they can dramatically improve your network service performance.

The Cost of Copying Data#

In traditional network programming, data often gets copied multiple times as it moves through the system:

  1. Network card → Kernel buffer (DMA)
  2. Kernel buffer → User space (system call)
  3. User space → Application buffer (memcpy)
  4. Application buffer → Processing buffer (another memcpy)

Each copy operation consumes CPU cycles and pollutes CPU caches. In a high-frequency trading system processing millions of messages per second, these copies can add significant latency.

Understanding Zero-Copy#

Zero-copy refers to techniques that eliminate or minimize data copying between buffers. The goal is to have data move directly from the network interface to your application logic with minimal intermediate copies.

Common Zero-Copy Techniques#

1. Memory Mapping (mmap)

rust
1use memmap2::MmapOptions;
2use std::fs::File;
3
4fn memory_mapped_file(path: &str) -> Result<Mmap, io::Error> {
5    let file = File::open(path)?;
6    let mmap = unsafe { MmapOptions::new().map(&file)? };
7    Ok(mmap)
8}
9

2. Scatter-Gather I/O

rust
1use std::io::IoSliceMut;
2use std::os::unix::net::UnixStream;
3
4fn scatter_read(stream: &UnixStream) -> io::Result<()> {
5    let mut buf1 = [0u8; 1024];
6    let mut buf2 = [0u8; 2048];
7    let mut buf3 = [0u8; 512];
8    
9    let mut bufs = [
10        IoSliceMut::new(&mut buf1),
11        IoSliceMut::new(&mut buf2),
12        IoSliceMut::new(&mut buf3),
13    ];
14    
15    stream.read_vectored(&mut bufs)?;
16    Ok(())
17}
18

Real-World Example: Zero-Copy Market Data Parser#

Let's build a practical example: a market data parser that processes binary market data messages without copying.

Message Layout#

First, define our message structure using #[repr(C)] to ensure predictable memory layout:

rust
1use std::mem;
2
3#[repr(C, packed)]
4#[derive(Debug, Copy, Clone)]
5struct MarketDataHeader {
6    msg_type: u8,
7    msg_len: u16,
8    sequence: u64,
9    timestamp: u64,
10}
11
12#[repr(C, packed)]
13#[derive(Debug, Copy, Clone)]
14struct OrderUpdate {
15    header: MarketDataHeader,
16    order_id: u64,
17    price: i64,      // Fixed-point: price * 10000
18    quantity: u32,
19    side: u8,        // 0 = buy, 1 = sell
20    _padding: [u8; 3],
21}
22
23// Ensure alignment is as expected
24const _: () = assert!(mem::size_of::<OrderUpdate>() == 48);
25

Zero-Copy Message Parser#

Now implement a parser that interprets the bytes in-place without copying:

rust
1use std::slice;
2
3struct MessageParser<'a> {
4    buffer: &'a [u8],
5    offset: usize,
6}
7
8impl<'a> MessageParser<'a> {
9    fn new(buffer: &'a [u8]) -> Self {
10        Self { buffer, offset: 0 }
11    }
12    
13    /// Parse header without copying
14    fn parse_header(&self) -> Option<&'a MarketDataHeader> {
15        if self.offset + mem::size_of::<MarketDataHeader>() > self.buffer.len() {
16            return None;
17        }
18        
19        let ptr = self.buffer[self.offset..].as_ptr() as *const MarketDataHeader;
20        // Safety: We've verified the buffer has enough bytes and the type
21        // is repr(C, packed) so alignment is guaranteed
22        Some(unsafe { &*ptr })
23    }
24    
25    /// Parse order update without copying
26    fn parse_order(&self) -> Option<&'a OrderUpdate> {
27        if self.offset + mem::size_of::<OrderUpdate>() > self.buffer.len() {
28            return None;
29        }
30        
31        let ptr = self.buffer[self.offset..].as_ptr() as *const OrderUpdate;
32        Some(unsafe { &*ptr })
33    }
34    
35    fn advance(&mut self, bytes: usize) {
36        self.offset += bytes;
37    }
38}
39

Using the Parser#

rust
1fn process_market_data(buffer: &[u8]) -> Result<(), String> {
2    let mut parser = MessageParser::new(buffer);
3    
4    while parser.offset < buffer.len() {
5        let header = parser.parse_header()
6            .ok_or("Failed to parse header")?;
7        
8        match header.msg_type {
9            1 => {
10                // Order update message
11                let order = parser.parse_order()
12                    .ok_or("Failed to parse order")?;
13                
14                // Process order directly without copying
15                handle_order_update(order);
16                
17                parser.advance(mem::size_of::<OrderUpdate>());
18            }
19            _ => {
20                // Skip unknown message types
21                parser.advance(header.msg_len as usize);
22            }
23        }
24    }
25    
26    Ok(())
27}
28
29fn handle_order_update(order: &OrderUpdate) {
30    // Access fields directly from the original buffer
31    let price = order.price as f64 / 10000.0;
32    let side = if order.side == 0 { "BUY" } else { "SELL" };
33    
34    println!("Order {} {} {} @ {}", 
35        order.order_id, 
36        side, 
37        order.quantity, 
38        price
39    );
40}
41

Advanced: io_uring for True Zero-Copy#

For ultimate performance, we can use io_uring on Linux to achieve true zero-copy I/O:

rust
1use io_uring::{opcode, types, IoUring};
2use std::os::unix::io::AsRawFd;
3use std::net::TcpStream;
4
5struct ZeroCopyReceiver {
6    ring: IoUring,
7    buffers: Vec<Vec<u8>>,
8}
9
10impl ZeroCopyReceiver {
11    fn new(buffer_count: usize, buffer_size: usize) -> io::Result<Self> {
12        let ring = IoUring::new(256)?;
13        let buffers = (0..buffer_count)
14            .map(|_| vec![0u8; buffer_size])
15            .collect();
16        
17        Ok(Self { ring, buffers })
18    }
19    
20    fn submit_recv(&mut self, socket: &TcpStream, buf_id: usize) -> io::Result<()> {
21        let fd = types::Fd(socket.as_raw_fd());
22        let buf = &mut self.buffers[buf_id];
23        
24        let recv_op = opcode::Recv::new(fd, buf.as_mut_ptr(), buf.len() as u32)
25            .build()
26            .user_data(buf_id as u64);
27        
28        unsafe {
29            self.ring.submission().push(&recv_op)?;
30        }
31        
32        Ok(())
33    }
34    
35    fn process_completions(&mut self) -> io::Result<Vec<(usize, usize)>> {
36        self.ring.submit_and_wait(1)?;
37        
38        let mut results = Vec::new();
39        
40        for cqe in self.ring.completion() {
41            let buf_id = cqe.user_data() as usize;
42            let bytes_read = cqe.result() as usize;
43            
44            if bytes_read > 0 {
45                results.push((buf_id, bytes_read));
46            }
47        }
48        
49        Ok(results)
50    }
51}
52

Performance Comparison#

In our high-frequency trading platform, we measured the impact of zero-copy optimizations:

TechniqueLatency (p50)Latency (p99)Throughput
Traditional (multiple copies)850μs1,200μs1.2M msg/s
Single copy with vectored I/O420μs680μs2.4M msg/s
Zero-copy parsing180μs320μs5.5M msg/s
io_uring zero-copy95μs180μs8.2M msg/s

Safety Considerations#

Zero-copy techniques often involve unsafe code. Key safety rules:

  1. Alignment: Ensure data is properly aligned for the target type
  2. Lifetime: The parsed reference must not outlive the source buffer
  3. Endianness: Be explicit about byte order in binary protocols
  4. Validation: Always validate buffer sizes before casting
rust
1// Safe wrapper that enforces invariants
2pub struct SafeMessageRef<'a> {
3    data: &'a OrderUpdate,
4}
5
6impl<'a> SafeMessageRef<'a> {
7    pub fn new(buffer: &'a [u8], offset: usize) -> Option<Self> {
8        // Validate size
9        if offset + mem::size_of::<OrderUpdate>() > buffer.len() {
10            return None;
11        }
12        
13        // Validate alignment (if needed)
14        let ptr = buffer[offset..].as_ptr();
15        if ptr.align_offset(mem::align_of::<OrderUpdate>()) != 0 {
16            return None;
17        }
18        
19        let data = unsafe {
20            &*(ptr as *const OrderUpdate)
21        };
22        
23        Some(Self { data })
24    }
25    
26    pub fn price(&self) -> f64 {
27        self.data.price as f64 / 10000.0
28    }
29    
30    pub fn quantity(&self) -> u32 {
31        self.data.quantity
32    }
33}
34

Best Practices#

  1. Profile First: Measure before optimizing. Not all systems benefit from zero-copy
  2. Use Libraries: Libraries like bytes provide safe zero-copy abstractions
  3. Buffer Pools: Reuse buffers to avoid allocation overhead
  4. SIMD: Combine zero-copy with SIMD for maximum throughput
  5. Cache Alignment: Align buffers to cache line boundaries (64 bytes)

Conclusion#

Zero-copy techniques are essential for building high-performance network services in Rust. By eliminating unnecessary data copies, we achieved:

  • 5-10x reduction in latency
  • 6-7x increase in throughput
  • Lower CPU utilization allowing more headroom for business logic

The combination of Rust's memory safety guarantees and explicit control over data layout makes it an ideal choice for building ultra-low-latency systems.

In production trading systems, these optimizations can mean the difference between profitable and unprofitable trades. Every microsecond counts.

Further Reading#

  • io_uring documentation
  • Rust RFC: Vectored I/O
  • Linux zero-copy techniques
  • High-Performance Browser Networking
NET

NordVarg Engineering Team

Technical Writer

NordVarg Engineering Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

RustPerformanceNetwork ProgrammingZero-CopyLow Latency

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Oct 10, 2024•11 min read
Low-Latency Systems Design: C++ vs Rust for High-Frequency Trading
Architectural patterns and implementation techniques for building sub-microsecond trading systems in C++ and Rust, with performance comparisons and trade-offs.
Performance EngineeringLow LatencyC++
Nov 24, 2025•9 min read
Rust Unsafe: When and How to Use It Safely in Financial Systems
Systems ProgrammingRustunsafe
Nov 24, 2025•7 min read
Rust for Financial Systems: Beyond Memory Safety
Systems ProgrammingRustlow-latency

Interested in working together?