Binghao Yue

10.0NIApr 13

Programmable Packet Scheduling with Dynamic Reordering at Line Rate

Zekun Wang, Binghao Yue, Yichen Deng et al.

High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of scheduling algorithms; however, their semantics are restricted to packet-level ordering and cannot dynamically reorder buffered packets, which limits the support for dynamic-ordering algorithms such as pFabric. To overcome this limitation, we propose UIFO (Update-In-First-Out), a new programmable scheduling model that introduces a two-level abstraction over classes and packets. UIFO enables dynamic updates to the scheduling order at the class level while preserving in-order packet scheduling within each class, thereby supporting dynamic reordering of already-buffered packets. Furthermore, UIFO remains fully compatible with and generalizes existing PIFO and PIEO models. We implement a hardware prototype of UIFO based on priority-queue designs and evaluate it on an FPGA platform and in a 28 nm ASIC process. Overall, UIFO significantly enhances scheduling expressiveness and maintains favorable scalability while sustaining 100 Gbps line-rate throughput.

DSJan 14

A Grouped Sorting Queue Supporting Dynamic Updates for Timer Management in High-Speed Network Interface Cards

Zekun Wang, Binghao Yue, Weitao Pan et al.

With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management schemes suffer from heavy software load, low precision, lack of hardware update support, and overflow. This paper proposes two novel operations for priority queues--update and group sorting--to enable hardware timer management. To the best of our knowledge, this work presents the first hardware priority queue to support an update operation through the composition and propagation of basic operations to modify the priorities of elements within the queue. The group sorting mechanism ensures correct timing behavior post-overflow by establishing a group boundary priority to alter the sorting process and element insertion positions. Implemented with a hybrid architecture of a one-dimension (1D) systolic array and shift registers, our design is validated through packet-level simulations for flow table timeout management. Results demonstrate that a 4K-depth, 16-bit timer queue achieves over 500 MHz (175 Mpps, 12 ns precision) in a 28nm process and over 300 MHz (116 Mpps) on an FPGA. Critically, it reduces LUTs and FFs usage by 31% and 25%, respectively, compared to existing designs.

Binghao Yue

2 Papers