Programmable network devices have seen much attention from both academia and industry in recent years, and affordable hardware is becoming increasingly available. We envision that such technologies, such as eBPF or P4, can be used to improve the performance of DNS resolvers and nameservers. In 2020, we want to assess the possibilities eBPF has to offer in this area.
While our research will be explorative in nature and tangible goals and results will crystallize along the way, we see two overarching activities: firstly, designing and implementing proof-of-concepts (see below) using eBPF, and secondly, measuring and thus quantifying how these implementations improve on the status quo. In other words, we aim at research with a hands-on approach and clear, applicable benefits to existing DNS software. As eBPF can be used without special hardware, simply running within the Linux kernel, any result will be useable by a large share of DNS operators and thus be widely beneficial for the community. Kernel-based eBPF will therefore be our first target when implementing the proof-of-concepts, but if time allows we want to explore hardware offloading possibilities and measure performances in such a setup as well. This does however require access to some sort of testbed with special hardware, e.g. two servers connected via Netronome NFP NICs.
Examples of proof-of-concepts we would like to explore include:
- Optimisations for specific query loads on the root and/or TLD level in nameservers
- Very hot caches in resolvers
- Load-balancing DNS traffic, e.g. based on the qname in queries
In this project a programmable integrated telemetry system will be composed of a combination of domain-specific telemetry and diagnostic solutions. The foundations are domain operating systems that control configuration and collection of monitoring data from virtual functions, their physical hosts and switching fabrics. Using open interfaces to these domain OS the collection and analysis of end- to-end monitoring data will be orchestrated by a data analytics control layer.
In this research project we focus on:
- What is the state of the art in programmable, data-plane telemetry in terms of features as well as practical deployability?
- How can it be applied in an integrated (multidomain, cross-layer) telemetry system?
- How can programmable telemetry system be orchestrated and reconfigured in run-time?
- What are the limits in terms of scalability, granularity and domain constraints of programmable, integrated telemetry?
WP1: Programmable data plane telemetry
The research will start with desk research into P4 INT and related programmable telemetry technology and composing an integrated telemetry architecture. In the course of the research project this work-in-progress architecture will be used for discussed and refined with the consortium partners. Based on this desk research and the initial architecture we will investigate the feasibility of composing an integrated telemetry system which collects end-to-end monitoring data from the switching fabric, programmable network interfaces on physical host(s) and virtual functions. In this exploration phase we use programmable network equipment in TNO’s research cloud and an emulated provider network.
WP2: Integrated monitoring analytics
Building on the WP1 research results WP2 is focused on the upper, integrated monitoring & analytics part. We will investigate how the data plane telemetry technology can be applied for integrated monitoring of the cloud VR use case. The first, brief exercise is to break-down the service level non-functional requirements for the use case into data plane metrics, i.e. how to extract 'actionable information' from the collectable telemetry data
Based on the break-down and the programmable data plane telemetry options identified in WP1 design options for monitoring orchestration and analytics will be investigated. Particular attention will be paid to how actionable monitoring information can be extracted from multiple domains that is aligned with relevant developments in the IETF IPPM working group. Another challenge for the integrated monitoring & analytics layer is to confine the data collection strategy in order to be scalable in provider networks. It will be explored how the integrated telemetry system and the data it can collect can best be used to develop “zoom-in” monitoring functionality.
WP3: Validation & dissemination
Based on the results of the first two work packages the integrated telemetry system will be validated in partners’ testbed environments. The nature of this research is experimental and exploratory, up to a TRL-3 prototype of key components of the integrated telemetry system for demonstration and validation purpose. Details of the validation will be discussed with the consortium partners after completion of WP2.
Making Congestion Control Fair Again
We plan to extend our Congestion Control fingerprinting algorithm (developed in last year’s SURFnet project) to also include MPTCP and MPQUIC. Furthermore, we plan to investigate how employing different actions at the bottleneck (e.g., delaying a packet, dropping a packet), affects different algorithms. As different algorithm groups (loss-based, delay-based, hybrid, coupled) use different metrics to detect congestion and react to it, a queue management algorithm targeting their specifics might enable the fair co-existence of different congestion control groups on the same link. Through this project, we would like to test that hypothesis and build a corresponding queue management algorithm. For our experiments, we would also like to make use of the 2STiC test network, which contains < 10 Gbps links. Finally, as our fingerprinting solution keeps track of all the flows that pass through the switch, we plan to investigate ways to reduce its memory footprint.
eBPF/XDP/DPDK/P4 offloading of transport features
The goal of this project is to investigate how data-plane techniques, such as eBPF (extended Berkeley Packet Filter), XDP (Express Data Path), DPDK (Data Plane Development Kit), and P4 (Programming protocol-independent packet processors) can enable a more modular approach in the design of transport protocols by taking into account their respective differences and limitations. For example, while offloading to a P4-programmable SmartNIC offers a huge performance gain, P4 inherently has limitations making certain features non-implementable. We will evaluate all the data-plane techniques by deploying different TCP features (e.g. a custom congestion control algorithm, explicit congestion notification extension) in terms of latency gains and implementation suitability. The result of this project would be an increased understanding of what data-plane technique can be used when.
Work Package 1: Testing, Management and Control of QKD Platforms
Several Stakeholders developing QKD continue to advance their offerings. Toshiba and ID Quantique are the mainly leaders in end to end QKD systems. In this workpackage, we will test separate QKD systems, establish the methodology to operate them and explore the potential for inter-operability. This workpackage also provides scope to test single QKD transmitters, or components to assess performance. We will also develop software to generate keys on demand.
Work Package 2: Exploiting standard SM fibers for co-propagation
Demonstration of co-propagation of QKD and Classical by exploiting components developed in WP1, developing novel DSP schemes and exploiting novel fiber technologies and amplifications schemes ( e.g., Raman) to extend QKD performance over specific reach and at specific key rates.
Workpackage 3: Laboratory and Field Trials
In a laboratory setting as a Phase 1 demonstrator, transmission evaluation and expansion to the field trial either through the SurfNet infrastructure (currently with fiber spans that are not EDFA amplified but only Raman amplified) studying the impact of Raman Scattering on QKD/Classical co-propagated signals.
Workpackage 4: Dissemination
Project reporting and submission of results to Publications to OFC, ECOC and QCRYPT conference and IEEE/OSA/Nature Journals.
Over the past two years, DACS has gained experience in the use of the P4 language for programmable network hardware in the context of Research on Networks. In this project, we want to build on this knowledge to develop what we call a hybrid router. Traditionally there are two classes of routers: those that rely on specialised hardware that can forward traffic at ultra-high speeds, with a simple general-purpose CPU for managing/configuring the router, and those that are implemented in software on commodity general-purpose computing platforms. The former (pure hardware routers) are generally closed in design and implementation, whereas the latter (software routing platforms) are typically open source. While this makes the latter category attractive for experimentation, software-only platforms are limited in terms of performance, as they typically lack the capability for processing ultra-high-speed traffic. We intend to overcome this limitation by leveraging our experience with P4 to transfer the speed critical functions of an open source routing platform to hardware using P4 on commodity open hardware switches. We plan to base this work on an open source routing platform such as Free Range Routing (FRR) or BIRD and open hardware by, e.g., Barefoot. Ultimately, we envisage that such a hybrid router can then be used to experiment with new routing protocols, and, for example, secure routing paradigms (BGPsec, RPKI ASPA).
To summarise our goal: we want to design and prototype a hybrid router platform for use in (secure) routing experimentation.
Segment Routing performance
SURFnet8 will use Segment Routing in its core. SR and its applications are therefore not only interesting from a research perspective but also have a clear applicability for SURFnet.
In 2019 we have explored the use of SR and VNFS over IPv4/MPLS and implemented a working prototype demonstrating the feasibility of the technology. In 2020 we propose to continue our efforts around SR, both with the existing IPv4/MPLS implementation as well as extending to IPv6. We intend to focus our research efforts on two aspects related to SR, comparing our prototype with a v6 based implementation.
We want to investigate the use and performance of anycast. A particular service could be instantiated in multiple locations and announce the same SID to other nodes in the SR network.
One of the reasons to write our VNFs code in eBPF was the potential to achieve a good performance while processing the packets. We intend to quantify the performance of our implementation more clearly.
P4 telemetry : processing and adapting
We have an extensive joint research with SURFnet on P4 and our activity in 2020 build upon this solid basis. Our focus in 2020 will be primarily on the following aspects:
Consuming P4 telemetry.
We are interested in determining the best way to do telemetry and consume this information.
Using programmable networks we are able to generate a significant amount of telemetry data. This puts pressure on the collectors of telemetry data to be able to at least store the data as it comes in. Ideally collectors would be able to do real time analysis of incoming telemetry data so that immediate action could be taken to improve network health. We would like to research how technologies such as eBPF, DPDK, and RDMA can be used to store and process high resolution network telemetry in real time. EBPF allows for high performance packet processing but it makes decision based on the headers and does not process the data. RDMA allows for low latency high bandwidth data transfers with reduced CPU overheads.
Analysis of network state and behavior with P4 in order to decide on possible actions; in particular in regard to DTN operations, where parameters can be changed for better performance.
Given high resolution telemetry data we would like to determine if it is possible to determine which flows in a network belong to data transfers and if they are functioning optimally. In essence we look at if it is possible to suggests corrective actions for end hosts to improve the performance of data transfers.