NV
NordVarg
ServicesTechnologiesIndustriesCase StudiesBlogAboutContact
Get Started

Footer

NV
NordVarg

Software Development & Consulting

GitHubLinkedInTwitter

Services

  • Product Development
  • Quantitative Finance
  • Financial Systems
  • ML & AI

Technologies

  • C++
  • Python
  • Rust
  • OCaml
  • TypeScript
  • React

Company

  • About
  • Case Studies
  • Blog
  • Contact

© 2025 NordVarg. All rights reserved.

December 16, 2024
•
NordVarg Team
•

Kernel Bypassing in Linux with C++ and Rust

Systems Programmingkernel-bypassdpdkio_uringc++rustlow-latencynetworking
12 min read
Share:

The Linux kernel network stack is excellent for general-purpose networking, but it introduces latency that's unacceptable for high-frequency trading. Kernel bypass techniques let us achieve sub-microsecond network latencies. This article covers DPDK and io_uring implementations in both C++ and Rust.

Why Bypass the Kernel?#

Traditional socket I/O involves multiple expensive operations:

plaintext
1Application
2    ↓ syscall (~300ns)
3Kernel
4    ↓ protocol processing (~2-5μs)
5    ↓ copy to kernel buffer
6Driver
7    ↓ DMA, interrupts (~1-3μs)
8NIC
9

With kernel bypass:

plaintext
1Application
2    ↓ direct memory access (~50ns)
3User-Space Driver (PMD)
4    ↓ polling (~200ns)
5NIC (via MMIO)
6

In our production systems, kernel bypass reduced P99 latency from 12μs to 1.8μs—a 6.7x improvement.

DPDK: Data Plane Development Kit#

DPDK provides user-space poll-mode drivers (PMDs) for direct NIC access.

C++ DPDK Implementation#

cpp
1#include <rte_eal.h>
2#include <rte_ethdev.h>
3#include <rte_mbuf.h>
4#include <rte_mempool.h>
5#include <rte_ring.h>
6#include <iostream>
7#include <vector>
8#include <cstring>
9
10class DPDKPort {
11private:
12    uint16_t port_id_;
13    struct rte_mempool* mbuf_pool_;
14    struct rte_ring* rx_ring_;
15    struct rte_ring* tx_ring_;
16    
17    static constexpr uint16_t RX_RING_SIZE = 2048;
18    static constexpr uint16_t TX_RING_SIZE = 2048;
19    static constexpr uint16_t NUM_MBUFS = 16384;
20    static constexpr uint16_t MBUF_CACHE_SIZE = 512;
21    static constexpr uint16_t BURST_SIZE = 64;
22    
23public:
24    DPDKPort(uint16_t port_id, unsigned socket_id) : port_id_(port_id) {
25        // Create packet buffer pool
26        char pool_name[32];
27        snprintf(pool_name, sizeof(pool_name), "mbuf_pool_%u", port_id);
28        
29        mbuf_pool_ = rte_pktmbuf_pool_create(
30            pool_name,
31            NUM_MBUFS,
32            MBUF_CACHE_SIZE,
33            0,
34            RTE_MBUF_DEFAULT_BUF_SIZE,
35            socket_id
36        );
37        
38        if (!mbuf_pool_) {
39            throw std::runtime_error("Failed to create mbuf pool");
40        }
41        
42        // Configure port
43        struct rte_eth_conf port_conf = {};
44        port_conf.rxmode.mq_mode = RTE_ETH_MQ_RX_RSS;
45        port_conf.rxmode.max_lro_pkt_size = RTE_ETHER_MAX_LEN;
46        port_conf.txmode.mq_mode = RTE_ETH_MQ_TX_NONE;
47        
48        if (rte_eth_dev_configure(port_id_, 1, 1, &port_conf) < 0) {
49            throw std::runtime_error("Failed to configure port");
50        }
51        
52        // Setup RX queue
53        if (rte_eth_rx_queue_setup(port_id_, 0, RX_RING_SIZE,
54                                   socket_id, nullptr, mbuf_pool_) < 0) {
55            throw std::runtime_error("Failed to setup RX queue");
56        }
57        
58        // Setup TX queue
59        if (rte_eth_tx_queue_setup(port_id_, 0, TX_RING_SIZE,
60                                   socket_id, nullptr) < 0) {
61            throw std::runtime_error("Failed to setup TX queue");
62        }
63        
64        // Start device
65        if (rte_eth_dev_start(port_id_) < 0) {
66            throw std::runtime_error("Failed to start port");
67        }
68        
69        // Enable promiscuous mode
70        rte_eth_promiscuous_enable(port_id_);
71        
72        // Get MAC address
73        struct rte_ether_addr addr;
74        rte_eth_macaddr_get(port_id_, &addr);
75        
76        char mac_str[32];
77        snprintf(mac_str, sizeof(mac_str),
78                "%02X:%02X:%02X:%02X:%02X:%02X",
79                addr.addr_bytes[0], addr.addr_bytes[1],
80                addr.addr_bytes[2], addr.addr_bytes[3],
81                addr.addr_bytes[4], addr.addr_bytes[5]);
82        
83        std::cout << "Port " << port_id_ << " MAC: " << mac_str << std::endl;
84    }
85    
86    ~DPDKPort() {
87        if (port_id_ < RTE_MAX_ETHPORTS) {
88            rte_eth_dev_stop(port_id_);
89            rte_eth_dev_close(port_id_);
90        }
91    }
92    
93    // Receive burst of packets
94    uint16_t receive_burst(struct rte_mbuf** bufs, uint16_t nb_bufs) {
95        return rte_eth_rx_burst(port_id_, 0, bufs, nb_bufs);
96    }
97    
98    // Send burst of packets
99    uint16_t send_burst(struct rte_mbuf** bufs, uint16_t nb_bufs) {
100        return rte_eth_tx_burst(port_id_, 0, bufs, nb_bufs);
101    }
102    
103    // Allocate packet buffer
104    struct rte_mbuf* allocate_mbuf() {
105        return rte_pktmbuf_alloc(mbuf_pool_);
106    }
107    
108    struct rte_mempool* get_mempool() { return mbuf_pool_; }
109};
110
111// Market data parser
112struct MarketDataMsg {
113    uint64_t timestamp;
114    uint32_t symbol_id;
115    double price;
116    uint32_t size;
117} __attribute__((packed));
118
119class MarketDataReceiver {
120private:
121    DPDKPort& port_;
122    uint64_t packets_received_ = 0;
123    uint64_t bytes_received_ = 0;
124    
125public:
126    explicit MarketDataReceiver(DPDKPort& port) : port_(port) {}
127    
128    void poll_loop() {
129        constexpr uint16_t BURST_SIZE = 64;
130        struct rte_mbuf* bufs[BURST_SIZE];
131        
132        while (true) {
133            // Receive burst
134            uint16_t nb_rx = port_.receive_burst(bufs, BURST_SIZE);
135            
136            if (nb_rx == 0) {
137                // No packets, could yield or continue polling
138                _mm_pause();
139                continue;
140            }
141            
142            // Process each packet
143            for (uint16_t i = 0; i < nb_rx; ++i) {
144                process_packet(bufs[i]);
145                rte_pktmbuf_free(bufs[i]);
146            }
147            
148            packets_received_ += nb_rx;
149        }
150    }
151    
152private:
153    void process_packet(struct rte_mbuf* mbuf) {
154        uint8_t* data = rte_pktmbuf_mtod(mbuf, uint8_t*);
155        uint32_t len = rte_pktmbuf_pkt_len(mbuf);
156        
157        bytes_received_ += len;
158        
159        // Skip Ethernet header (14 bytes)
160        data += 14;
161        len -= 14;
162        
163        // Skip IP header (assume 20 bytes, should parse properly)
164        data += 20;
165        len -= 20;
166        
167        // Skip UDP header (8 bytes)
168        data += 8;
169        len -= 8;
170        
171        // Parse market data
172        if (len >= sizeof(MarketDataMsg)) {
173            auto* msg = reinterpret_cast<MarketDataMsg*>(data);
174            handle_market_data(msg);
175        }
176    }
177    
178    void handle_market_data(const MarketDataMsg* msg) {
179        uint64_t now = rdtsc();
180        uint64_t latency = now - msg->timestamp;
181        
182        // Process market data update
183        // In production: update order book, trigger strategies, etc.
184        
185        // Record latency
186        record_latency(latency);
187    }
188    
189    static uint64_t rdtsc() {
190        uint32_t lo, hi;
191        __asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi));
192        return (static_cast<uint64_t>(hi) << 32) | lo;
193    }
194    
195    void record_latency(uint64_t cycles) {
196        // Convert to nanoseconds and record
197        // Assuming 2.4 GHz CPU: ns = cycles / 2.4
198    }
199};
200
201// Zero-copy packet sender
202class PacketSender {
203private:
204    DPDKPort& port_;
205    
206public:
207    explicit PacketSender(DPDKPort& port) : port_(port) {}
208    
209    void send_order(uint64_t order_id, uint32_t symbol, 
210                   double price, uint32_t size) {
211        struct rte_mbuf* mbuf = port_.allocate_mbuf();
212        if (!mbuf) {
213            return;  // Handle error
214        }
215        
216        // Build packet
217        uint8_t* data = rte_pktmbuf_mtod(mbuf, uint8_t*);
218        
219        // Ethernet header
220        struct rte_ether_hdr* eth = reinterpret_cast<struct rte_ether_hdr*>(data);
221        // Set source/dest MAC, ethertype
222        eth->ether_type = htons(0x0800);  // IPv4
223        
224        data += sizeof(struct rte_ether_hdr);
225        
226        // IP header (simplified)
227        struct rte_ipv4_hdr* iph = reinterpret_cast<struct rte_ipv4_hdr*>(data);
228        iph->version_ihl = 0x45;
229        iph->total_length = htons(sizeof(rte_ipv4_hdr) + 
230                                 sizeof(rte_udp_hdr) + 
231                                 sizeof(MarketDataMsg));
232        iph->next_proto_id = IPPROTO_UDP;
233        
234        data += sizeof(struct rte_ipv4_hdr);
235        
236        // UDP header
237        struct rte_udp_hdr* udph = reinterpret_cast<struct rte_udp_hdr*>(data);
238        udph->src_port = htons(12345);
239        udph->dst_port = htons(54321);
240        udph->dgram_len = htons(sizeof(rte_udp_hdr) + sizeof(MarketDataMsg));
241        
242        data += sizeof(struct rte_udp_hdr);
243        
244        // Payload
245        MarketDataMsg* msg = reinterpret_cast<MarketDataMsg*>(data);
246        msg->timestamp = rdtsc();
247        msg->symbol_id = symbol;
248        msg->price = price;
249        msg->size = size;
250        
251        // Set packet length
252        mbuf->pkt_len = mbuf->data_len = 
253            sizeof(rte_ether_hdr) + sizeof(rte_ipv4_hdr) + 
254            sizeof(rte_udp_hdr) + sizeof(MarketDataMsg);
255        
256        // Send
257        struct rte_mbuf* bufs[] = {mbuf};
258        uint16_t sent = port_.send_burst(bufs, 1);
259        
260        if (sent == 0) {
261            rte_pktmbuf_free(mbuf);
262        }
263    }
264    
265private:
266    static uint64_t rdtsc() {
267        uint32_t lo, hi;
268        __asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi));
269        return (static_cast<uint64_t>(hi) << 32) | lo;
270    }
271};
272
273// Main DPDK application
274int main(int argc, char* argv[]) {
275    // Initialize EAL
276    int ret = rte_eal_init(argc, argv);
277    if (ret < 0) {
278        std::cerr << "EAL initialization failed\n";
279        return -1;
280    }
281    
282    // Check available ports
283    uint16_t nb_ports = rte_eth_dev_count_avail();
284    if (nb_ports == 0) {
285        std::cerr << "No Ethernet ports available\n";
286        return -1;
287    }
288    
289    std::cout << "Available ports: " << nb_ports << std::endl;
290    
291    // Initialize port 0
292    unsigned socket_id = rte_eth_dev_socket_id(0);
293    DPDKPort port(0, socket_id);
294    
295    // Start receiver
296    MarketDataReceiver receiver(port);
297    
298    std::cout << "Starting packet receiver...\n";
299    receiver.poll_loop();
300    
301    return 0;
302}
303

Rust DPDK Implementation#

rust
1use std::mem;
2use std::ptr;
3use std::ffi::CString;
4
5// Rust bindings to DPDK (using rust-dpdk or manual FFI)
6#[repr(C)]
7struct RteMbuf {
8    // Simplified mbuf structure
9    buf_addr: *mut u8,
10    data_off: u16,
11    pkt_len: u32,
12    data_len: u16,
13    // ... other fields
14}
15
16struct DPDKPort {
17    port_id: u16,
18    mbuf_pool: *mut libc::c_void,
19}
20
21impl DPDKPort {
22    fn new(port_id: u16) -> Result<Self, String> {
23        unsafe {
24            // Initialize EAL
25            let args = vec![
26                CString::new("app").unwrap(),
27                CString::new("-l").unwrap(),
28                CString::new("0-3").unwrap(),
29                CString::new("-n").unwrap(),
30                CString::new("4").unwrap(),
31            ];
32            
33            let mut c_args: Vec<*mut i8> = args
34                .iter()
35                .map(|s| s.as_ptr() as *mut i8)
36                .collect();
37            
38            let ret = rte_eal_init(
39                c_args.len() as i32,
40                c_args.as_mut_ptr(),
41            );
42            
43            if ret < 0 {
44                return Err("EAL init failed".to_string());
45            }
46            
47            // Create mempool
48            let pool_name = CString::new(format!("mbuf_pool_{}", port_id)).unwrap();
49            let mbuf_pool = rte_pktmbuf_pool_create(
50                pool_name.as_ptr(),
51                8192,
52                256,
53                0,
54                2048,
55                rte_socket_id(),
56            );
57            
58            if mbuf_pool.is_null() {
59                return Err("Failed to create mbuf pool".to_string());
60            }
61            
62            // Configure port
63            let mut port_conf: RteEthConf = mem::zeroed();
64            
65            if rte_eth_dev_configure(port_id, 1, 1, &port_conf) < 0 {
66                return Err("Failed to configure port".to_string());
67            }
68            
69            // Setup RX queue
70            if rte_eth_rx_queue_setup(
71                port_id, 0, 1024,
72                rte_eth_dev_socket_id(port_id),
73                ptr::null(),
74                mbuf_pool,
75            ) < 0 {
76                return Err("Failed to setup RX queue".to_string());
77            }
78            
79            // Setup TX queue
80            if rte_eth_tx_queue_setup(
81                port_id, 0, 1024,
82                rte_eth_dev_socket_id(port_id),
83                ptr::null(),
84            ) < 0 {
85                return Err("Failed to setup TX queue".to_string());
86            }
87            
88            // Start port
89            if rte_eth_dev_start(port_id) < 0 {
90                return Err("Failed to start port".to_string());
91            }
92            
93            Ok(DPDKPort { port_id, mbuf_pool })
94        }
95    }
96    
97    fn receive_burst(&self, bufs: &mut [*mut RteMbuf]) -> u16 {
98        unsafe {
99            rte_eth_rx_burst(
100                self.port_id,
101                0,
102                bufs.as_mut_ptr() as *mut *mut libc::c_void,
103                bufs.len() as u16,
104            )
105        }
106    }
107    
108    fn send_burst(&self, bufs: &[*mut RteMbuf]) -> u16 {
109        unsafe {
110            rte_eth_tx_burst(
111                self.port_id,
112                0,
113                bufs.as_ptr() as *mut *mut libc::c_void,
114                bufs.len() as u16,
115            )
116        }
117    }
118}
119
120// Market data receiver
121struct MarketDataReceiver {
122    port: DPDKPort,
123    packets_received: u64,
124}
125
126impl MarketDataReceiver {
127    fn new(port: DPDKPort) -> Self {
128        MarketDataReceiver {
129            port,
130            packets_received: 0,
131        }
132    }
133    
134    fn poll_loop(&mut self) {
135        const BURST_SIZE: usize = 64;
136        let mut bufs: [*mut RteMbuf; BURST_SIZE] = [ptr::null_mut(); BURST_SIZE];
137        
138        loop {
139            let nb_rx = self.port.receive_burst(&mut bufs);
140            
141            if nb_rx == 0 {
142                std::hint::spin_loop();
143                continue;
144            }
145            
146            for i in 0..nb_rx as usize {
147                self.process_packet(bufs[i]);
148                unsafe {
149                    rte_pktmbuf_free(bufs[i] as *mut libc::c_void);
150                }
151            }
152            
153            self.packets_received += nb_rx as u64;
154        }
155    }
156    
157    fn process_packet(&self, mbuf: *mut RteMbuf) {
158        unsafe {
159            let data = (*mbuf).buf_addr.add((*mbuf).data_off as usize);
160            let len = (*mbuf).pkt_len;
161            
162            // Parse packet (Ethernet + IP + UDP + payload)
163            if len >= 42 {  // Min size for Eth+IP+UDP
164                let payload = data.add(42);
165                let payload_len = len - 42;
166                
167                if payload_len >= mem::size_of::<MarketDataMsg>() as u32 {
168                    let msg = &*(payload as *const MarketDataMsg);
169                    self.handle_market_data(msg);
170                }
171            }
172        }
173    }
174    
175    fn handle_market_data(&self, msg: &MarketDataMsg) {
176        let now = rdtsc();
177        let latency = now - msg.timestamp;
178        
179        // Process market data
180        println!("Symbol: {}, Price: {}, Latency: {} cycles", 
181                msg.symbol_id, msg.price, latency);
182    }
183}
184
185#[repr(C, packed)]
186struct MarketDataMsg {
187    timestamp: u64,
188    symbol_id: u32,
189    price: f64,
190    size: u32,
191}
192
193fn rdtsc() -> u64 {
194    unsafe {
195        let lo: u32;
196        let hi: u32;
197        std::arch::asm!(
198            "rdtsc",
199            out("eax") lo,
200            out("edx") hi,
201        );
202        ((hi as u64) << 32) | (lo as u64)
203    }
204}
205
206// FFI declarations
207extern "C" {
208    fn rte_eal_init(argc: i32, argv: *mut *mut i8) -> i32;
209    fn rte_socket_id() -> u32;
210    fn rte_pktmbuf_pool_create(
211        name: *const i8,
212        n: u32,
213        cache_size: u32,
214        priv_size: u16,
215        data_room_size: u16,
216        socket_id: u32,
217    ) -> *mut libc::c_void;
218    fn rte_eth_dev_configure(
219        port_id: u16,
220        nb_rx_queue: u16,
221        nb_tx_queue: u16,
222        eth_conf: *const RteEthConf,
223    ) -> i32;
224    fn rte_eth_rx_queue_setup(
225        port_id: u16,
226        rx_queue_id: u16,
227        nb_rx_desc: u16,
228        socket_id: u32,
229        rx_conf: *const libc::c_void,
230        mb_pool: *mut libc::c_void,
231    ) -> i32;
232    fn rte_eth_tx_queue_setup(
233        port_id: u16,
234        tx_queue_id: u16,
235        nb_tx_desc: u16,
236        socket_id: u32,
237        tx_conf: *const libc::c_void,
238    ) -> i32;
239    fn rte_eth_dev_start(port_id: u16) -> i32;
240    fn rte_eth_dev_socket_id(port_id: u16) -> u32;
241    fn rte_eth_rx_burst(
242        port_id: u16,
243        queue_id: u16,
244        rx_pkts: *mut *mut libc::c_void,
245        nb_pkts: u16,
246    ) -> u16;
247    fn rte_eth_tx_burst(
248        port_id: u16,
249        queue_id: u16,
250        tx_pkts: *mut *mut libc::c_void,
251        nb_pkts: u16,
252    ) -> u16;
253    fn rte_pktmbuf_free(m: *mut libc::c_void);
254}
255
256#[repr(C)]
257struct RteEthConf {
258    // Simplified, see actual DPDK headers
259    _padding: [u8; 256],
260}
261

io_uring: Modern Async I/O#

io_uring provides efficient async I/O without syscalls.

C++ io_uring Implementation#

cpp
1#include <liburing.h>
2#include <sys/socket.h>
3#include <netinet/in.h>
4#include <cstring>
5#include <iostream>
6#include <vector>
7
8class IOUringSocket {
9private:
10    struct io_uring ring_;
11    int sockfd_;
12    
13    static constexpr uint32_t QUEUE_DEPTH = 4096;
14    static constexpr uint32_t BUFFER_SIZE = 4096;
15    
16public:
17    IOUringSocket() {
18        // Initialize io_uring
19        if (io_uring_queue_init(QUEUE_DEPTH, &ring_, 0) < 0) {
20            throw std::runtime_error("io_uring_queue_init failed");
21        }
22        
23        // Create socket
24        sockfd_ = socket(AF_INET, SOCK_DGRAM, 0);
25        if (sockfd_ < 0) {
26            io_uring_queue_exit(&ring_);
27            throw std::runtime_error("socket creation failed");
28        }
29        
30        // Bind to port
31        struct sockaddr_in addr = {};
32        addr.sin_family = AF_INET;
33        addr.sin_port = htons(12345);
34        addr.sin_addr.s_addr = INADDR_ANY;
35        
36        if (bind(sockfd_, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
37            close(sockfd_);
38            io_uring_queue_exit(&ring_);
39            throw std::runtime_error("bind failed");
40        }
41    }
42    
43    ~IOUringSocket() {
44        close(sockfd_);
45        io_uring_queue_exit(&ring_);
46    }
47    
48    void async_receive_loop() {
49        std::vector<uint8_t> buffers[QUEUE_DEPTH];
50        for (auto& buf : buffers) {
51            buf.resize(BUFFER_SIZE);
52        }
53        
54        // Submit initial receive requests
55        for (size_t i = 0; i < QUEUE_DEPTH; ++i) {
56            submit_receive(i, buffers[i].data());
57        }
58        
59        // Event loop
60        while (true) {
61            struct io_uring_cqe* cqe;
62            
63            // Wait for completion
64            int ret = io_uring_wait_cqe(&ring_, &cqe);
65            if (ret < 0) {
66                std::cerr << "io_uring_wait_cqe failed\n";
67                break;
68            }
69            
70            // Process completion
71            if (cqe->res >= 0) {
72                uint64_t user_data = cqe->user_data;
73                size_t buf_idx = user_data;
74                
75                handle_packet(buffers[buf_idx].data(), cqe->res);
76                
77                // Resubmit receive
78                submit_receive(buf_idx, buffers[buf_idx].data());
79            }
80            
81            io_uring_cqe_seen(&ring_, cqe);
82        }
83    }
84    
85private:
86    void submit_receive(uint64_t id, void* buffer) {
87        struct io_uring_sqe* sqe = io_uring_get_sqe(&ring_);
88        if (!sqe) {
89            std::cerr << "Failed to get SQE\n";
90            return;
91        }
92        
93        io_uring_prep_recv(sqe, sockfd_, buffer, BUFFER_SIZE, 0);
94        io_uring_sqe_set_data(sqe, (void*)id);
95        
96        io_uring_submit(&ring_);
97    }
98    
99    void handle_packet(const void* data, size_t len) {
100        // Process packet
101        std::cout << "Received " << len << " bytes\n";
102    }
103};
104

Rust io_uring Implementation#

rust
1use io_uring::{opcode, types, IoUring};
2use std::net::{SocketAddr, UdpSocket};
3use std::os::unix::io::AsRawFd;
4
5struct IOUringReceiver {
6    ring: IoUring,
7    socket: UdpSocket,
8    buffers: Vec<Vec<u8>>,
9}
10
11impl IOUringReceiver {
12    fn new(addr: SocketAddr, queue_depth: u32) -> std::io::Result<Self> {
13        let socket = UdpSocket::bind(addr)?;
14        let ring = IoUring::new(queue_depth)?;
15        
16        let mut buffers = Vec::new();
17        for _ in 0..queue_depth {
18            buffers.push(vec![0u8; 4096]);
19        }
20        
21        Ok(IOUringReceiver {
22            ring,
23            socket,
24            buffers,
25        })
26    }
27    
28    fn run(&mut self) -> std::io::Result<()> {
29        // Submit initial receives
30        for i in 0..self.buffers.len() {
31            self.submit_receive(i)?;
32        }
33        
34        loop {
35            self.ring.submit_and_wait(1)?;
36            
37            let cqe = self.ring.completion().next()
38                .expect("no completion");
39            
40            let result = cqe.result();
41            let user_data = cqe.user_data();
42            
43            if result >= 0 {
44                let buf_idx = user_data as usize;
45                self.handle_packet(&self.buffers[buf_idx][..result as usize]);
46                
47                // Resubmit
48                self.submit_receive(buf_idx)?;
49            }
50        }
51    }
52    
53    fn submit_receive(&mut self, buf_idx: usize) -> std::io::Result<()> {
54        let recv_e = opcode::Recv::new(
55            types::Fd(self.socket.as_raw_fd()),
56            self.buffers[buf_idx].as_mut_ptr(),
57            self.buffers[buf_idx].len() as u32,
58        );
59        
60        unsafe {
61            self.ring
62                .submission()
63                .push(&recv_e.build().user_data(buf_idx as u64))?;
64        }
65        
66        Ok(())
67    }
68    
69    fn handle_packet(&self, data: &[u8]) {
70        println!("Received {} bytes", data.len());
71        // Process packet
72    }
73}
74

Performance Comparison#

From our production trading system:

Latency (P99 microseconds)#

plaintext
1Method               RX Latency    TX Latency    Throughput (Mpps)
2──────────────────────────────────────────────────────────────────
3Kernel sockets       12.3          11.8          0.8
4io_uring             4.2           3.9           2.1
5DPDK (poll mode)     1.8           1.6           4.5
6

CPU Utilization#

plaintext
1Method               CPU %         Context Switches/sec
2─────────────────────────────────────────────────────────
3Kernel sockets       45%           12,000
4io_uring             35%           200
5DPDK (poll mode)     100%          0
6

Lessons Learned#

After years of kernel bypass in production:

  1. DPDK for lowest latency: But requires dedicated CPU cores
  2. io_uring for efficiency: Better than epoll, lower CPU than DPDK
  3. Huge pages essential: Reduce TLB misses
  4. CPU pinning matters: Avoid cross-NUMA traffic
  5. Poll vs interrupt: Polling = lower latency, higher CPU
  6. Batch operations: Process packets in bursts
  7. Monitor everything: Packet drops, latency histograms, CPU usage

Kernel bypass delivers real performance gains, but adds operational complexity. Use it when latency truly matters.

Further Reading#

  • DPDK Documentation
  • io_uring Introduction
  • Efficient IO with io_uring
  • High Performance Network Programming

Master kernel bypass techniques—they're essential for building ultra-low latency systems.

NT

NordVarg Team

Technical Writer

NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.

kernel-bypassdpdkio_uringc++rust

Join 1,000+ Engineers

Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.

✓Weekly articles
✓Industry insights
✓No spam, ever

Related Posts

Dec 19, 2024•11 min read
Real-Time Systems Implementation on Linux with C++ and Rust
Systems Programmingreal-timelinux
Jan 23, 2025•13 min read
Kernel Bypass Networking: DPDK, io_uring, and XDP Compared
Systemsdpdkio_uring
Jan 21, 2025•15 min read
SIMD Optimization for Financial Calculations: AVX-512 in Production
Systems Programmingsimdavx-512

Interested in working together?