Falcon: A Reliable, Low Latency Hardware Transport

  • Arjun Singhvi ,
  • Nandita Dukkipati ,
  • Prashant Chandra ,
  • Hassan M. G. Wassel ,
  • Naveen Kr. Sharma ,
  • Anthony Rebello ,
  • Henry Schuh ,
  • Praveen Kumar ,
  • Behnam Montazeri ,
  • Neelesh Bansod ,
  • Sarin Thomas ,
  • Inho Cho ,
  • Hyojeong Lee Seibert ,
  • Baijun Wu ,
  • Rui Yang ,
  • Yuliang Li ,
  • Kai Huang ,
  • Qianwen Yin ,
  • Abhishek Agarwal ,
  • Srinivas Vaduvatha ,
  • Weihuang Wang ,
  • Masoud Moshref ,
  • Tao Ji ,
  • David Wetherall ,
  • Amin Vahdat

SIGCOMM 2025 |

Hardware transports such as RoCE deliver high performance with minimal host CPU, but are best suited to special-purpose deployments that limit their use, e.g., backend networks or Ethernet with Priority Flow Control (PFC). We introduce Falcon, the first hardware transport that supports multiple Upper Layer Protocols (ULPs) and heterogeneous application workloads in general-purpose Ethernet datacenter environments (with losses and without special switch support). Key design elements include: delay-based congestion control with multipath load balancing; a layered design with a simple request-response transaction interface for multi-ULP support; hardware-based retransmissions and error-handling for scalability; and a programmable engine for flexibility. The first Falcon hardware implementation delivers a peak performance of 200 Gbps, 120 Mops/sec, with near-optimal operation completion times that are up to 8×× lower than CX-7 RoCE under network congestion, and up to 65% higher goodput under lossy conditions.