At the GPU Technology Conference 2020, Jensen Huang, NVIDIA’s CEO, unveiled a new family of processors branded as the BlueField-2 Data Processing Unit (DPU). The DPU is accessible to the developers via the software platform, the DOCA SDK. The DPU and DOCA SDK are comparable to NVIDIA’s powerful combination of GPU hardware and CUDA software.
Having dominated the AI accelerator market, NVIDIA is now aiming to expand it to the data center infrastructure acceleration and optimization.
Why is Jensen Huang bullish about the DPU market and how it matters to the enterprise data center? Here is an attempt to explain the evolution of DPU in simple terms.
The Aggregation and Disaggregation of Enterprise Infrastructure
During the 90s, the combination of Intel x86 CPU and OS software offered unmatched power to enterprises. The rise of client/server computing, followed by n-tier computing, paved the way for distributed computing. Enterprises ran databases, application servers and custom line of business software on a fleet of x86 servers.
During the early 2000s, VMware introduced ESX, a hypervisor that brought the ability to virtualize the x86 CPU. Enterprises could run multiple virtual machines on a single powerful server. CPU virtualization was the first step towards the aggregation of enterprise infrastructure.
The hypervisor made the hardware programmable. Developers could write code to define and provision a virtual machine without manual intervention. This programmability aspect of infrastructure became the foundation of the modern cloud computing paradigm.
Based on the success of ESX, VMware moved towards network and storage virtualization. Traditional infrastructure players such as Cisco and EMC have started to build virtualized network and storage services to compete with VMware. In 2012, VMware acquired Nicira, the software-defined networking startup, for $1.26 billion, which was branded as NSX. In March 2014, VMware announced vSAN, its virtualized software storage platform tightly integrated with vSphere, the management platform for the ESX hypervisor. With vSphere, NSX and vSAN, VMware had the complete infrastructure virtualization stack for enterprises.
The integration of virtualized compute, storage and networking led to a new breed of converged infrastructure offered as a software-defined data center (SDDC). Microsoft, Nutanix, VCE and, of course, VMware were in the race to win the SDDC market.
The SDDC became a significant milestone in the aggregation of the enterprise data centers. Instead of running hundreds of mid-sized servers, customers could consolidate their infrastructure to fewer hyper-converged appliances.
The exciting aspect of SDDC is that a powerful x86 processor is exploited to emulate multiple CPUs, network cards, and storage interfaces. The core building blocks of SDDC are available to developers as APIs bringing the highest level of automation.
The rise of software-defined infrastructure put the x86 processor under tremendous pressure. The CPU has to deal with everything from running the operating system, applications, network traffic, storage I/O, security and more. Since every component of the SDDC is just a piece of software running on top of the processor, the role of CPU has changed dramatically. In a SDDC scenario, the infrastructure services that run the control plane of the system aggressively compete with the applications that run in the data plane for CPU resources.
Alongside the SDDC, two more trends started to take place. The first one is the rise of specialized artificial intelligence (AI) hardware and the second is the evolution of programmable hardware.
AI demands extreme parallelism, which cannot be delivered by a general-purpose CPU. GPUs, that were originally developed for accelerating graphics started to become the co-processors for running complex mathematical operations in parallel. NVIDIA was quick to address this opportunity by shipping GPUs targeting the training and inference use cases of AI workloads.
Like how GPUs complement CPUs by offloading the mathematical operations, a new breed of programmable chips started to become available in the market. These chips are known as Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Arrays (FPGAs), which can be programmed for a specific purpose such as network traffic optimization or storage I/O acceleration.
Companies such as Broadcom, Ethernity Networks, Intel, Marvell, Mellanox (acquired by NVIDIA in 2019), Napatech, Netronome, Pensando, and Xilinx started to add the FPGA to the network interface cards to accelerate network traffic. Sold as SmartNICs, they offload CPUs from performing network functions, thus freeing up the processing power.
Similar to SmartNIC, the NVMe storage controller also benefits from the addition of FPGA. It brings massive parallelism and customization to the data path by making the controller programmable.
The rise of GPU, SmartNIC and the programmable storage controller bring disaggregation to SDDC. They free up the CPU by intelligently delegating the network and storage functions to specialized hardware. This enables the applications and workloads to get the best performance from the underlying CPU and hardware resources.
With containers and Kubernetes, each SDDC appliance runs tens of thousands of microservices generating a heavy load of network and storage I/O that impacts the CPU performance. With dedicated hardware for network and storage acceleration, CPU can focus on the application while delegating routine tasks to these specialized controllers.
From the hypervisor to SDDC to FPGA-based network and storage controllers, the industry has come full-circle. Apart from these, there is an opportunity to push network security, firewall functionality, encryption, and even infrastructure management to specialized hardware.
Through the acquisition of Mellanox, NVIDIA is eyeing the next big opportunity in the enterprise data center driven by purpose-built, specialized hardware for the network, security, storage and infrastructure management.
What are DPU and DOCA?
NVIDIA’s DPU is the new avatar of Mellanox’s SmartNIC. According to NVIDIA, The BlueField-2 DPU is the world’s first data center infrastructure on a chip architecture optimized for modern enterprise data centers.
The BlueField-2 DPU has programmable network and storage interfaces that offload the CPU from processing these functions. The applications and workloads get the major share of the processor, which is now free of tackling the mundane network and storage functions.
The DPUs are accessible through the DOCA SDK, which exposes programmable API for the underlying hardware platform.
Just as CUDA enables developers to program accelerated computing applications, DOCA enables them to program the acceleration of data processing for moving data into and out of servers, VMs, and containers. DOCA sits alongside CUDA to leverage the entire range of NVIDIA AI applications in a secure, accelerated data center.
DOCA is fully integrated into NVIDIA GPU Cloud (NGC), a software catalog offering a containerized software environment for developers to build advanced DPU data-center-accelerated services.
When it comes to networking, the DPU accelerates the most advanced data center SDN and network function virtualization (NFV). It handles the east-west traffic associated with the virtual machines and containers and the north-south traffic flowing in and out of the software-defined data center. It effectively accelerates the network path for both the control plane and the data plane.
The DPU is designed to optimize software-defined elastic storage, NVMe over Fabrics (NVMe-oF), RoCE, data-at-rest encryption, and data deduplication, distributed error correction, and data compression. Enterprises and cloud service providers can connect remote NVMe storage pools to Bluefield DPU without compromising throughput and performance. The NVMe SNAP technology delivers elastic block storage functionality and presents to the host remote block storage as if it were local NVMe block storage or a VirtIO blk device with low-latency, high throughput, and high IOPS.
According to NVIDIA, the DPU also offloads, accelerates, and isolates all essential data center security services. This includes support for next-generation firewalls, micro-segmentation, data-in-motion inline encryption with transparent IPSec and TLS, and intrusion protection. The DPU has a set of dedicated security engines that includes all the building blocks of any security solution. Traditionally, this functionality was embedded into the software-defined networking stack running on top of x86 CPU. By moving this layer to the DPU, enterprises benefit from an efficient, responsive, zero-trust security functionality running at the lowest level of the stack, which frees up the CPU.
DPUs come with management agents that provide unmatched visibility into the network layer and host. Based on DOCA, the DPU-based agents can perform in-band or out-of-band management without burdening the server CPU. If the server needs a reset, or even if the tenant or business application requires a bare metal server with no agents, a DOCA-programmed DPU can still send telemetry, perform a remote reset, or allow the secure boot of the server, all without running an agent on the server CPU. Essentially, the DPU takes over the monitoring and network observability function without explicitly running an agent at the operating system level.
Jensen has already announced the roadmap for the BlueField family. The BlueField-4 DPU scheduled to be launched in 2023 will have an embedded GPU bringing the DPU and GPU into the same hardware interface.
But for customers willing to try this functionality today, the BlueField2-X DPU already has an Ampere GPU, which brings AI capabilities that can be applied to data center security, networking and storage tasks.
NVIDIA is partnering with VMware, Red Hat, Canonical and Check Point software to integrate BlueField DPU with their platforms.
With DPU, NVIDIA aims to bring efficiency and optimization to the enterprise data center that was once available only to cloud service providers. By adding a DPU to the servers, customers will be able to maximize the CPU usage towards applications rather than burdening them with mundane network and storage access.