As a next generation fabric the PCIE fabric introduces Multi host computing and IO
It gives system builders the ability to compose a cloud of resources that are all interconnected through the PCIE port with PCIE native speeds
Native PCIe transactions (TLPs) are forwarded automatically over the fabric with no protocol conversion. Standard PCIe NT technology is used to route the PCIe traffic from the host computer to the NVMe device. Device Interrupts are also automatically routed through the PCIe fabric. The image below illustrates the difference between using an PCIe Fabric and other Fabrics. PCIe Fabrics eliminate the transport abstraction, thus providing a much lower latency. It also still supports features such as RDMA.
By building on the natural strengths of PCI Express (PCIe) — it’s everywhere, it’s fast, it’s low power, it’s affordable — and by adding some straightforward, standards-compliant extensions that address multi-host communication and I/O sharing capabilities, a universal interconnect now exists that substantially improves on the status quo in high-performance cloud and data center installations.
One application for these installations now receiving considerable attention is that of replacing small InfiniBand clusters with a PCIe-based alternative. This implementation approach for high-speed data center applications was addressed at the Super Computing 2012 Conference (SC12) in Salt Lake City, where the high-performance computing (HPC) community began to really sit up and take notice.
The system is designed with the belief is that in cloud and data-center environments, PCIe-based fabrics can replace small InfiniBand and other complicated ultra-pricy clusters, offering Quad Data Rate (QDR)-like performance when communicating between CPUs, enabling straightforward sharing of I/O devices, and doing so at a much lower cost and power envelope. InfiniBand doesn’t do this anywhere near as easily or cost-effectively. Figure 1 illustrates the simplicity of a PCIe-based fabric compared to InfiniBand.
PCIe can also allow sharing of I/O devices using standard multifunction networking and telecommunications hardware and software, something InfiniBand can’t easily do.
The native sharing of I/O devices, and the ability to enable high-speed communication between the CPUs in a system, is not part of the current specification for PCIe. However, that specification does provide a mechanism for vendors to add their own extensions, while still remaining compatible with the overall specification. Using these vendor-defined extensions allows the enhanced implementation to be used with existing test and analysis equipment, but with a more robust feature set.
So, a PCIe-based fabric can achieve InfiniBand-like performance, but does it add anything to data centers and cloud computing environments? PCIe delivers a range of advantages that accrue from its comparative strengths.
First is its ability to scale linearly for different bandwidth requirements — from x1 connections on PC motherboards, to x2 connections to high-speed storage, to x4 and x8 connections on backplanes, and up to x16 for graphics applications.
Another key advantage of PCIe is its simple, low-overhead protocol, and the fact that PCIe builds on the legacy architecture of PCI, so that it has been quick and easy to migrate to the newer, faster connections. While InfiniBand has achieved very low latency with a relatively complex protocol through special-purpose hardware and software drivers that have been tuned over many years, PCIe starts out with low latency and simplicity based on its construction, giving it an advantage in adaptability.
But the most powerful advantage for PCIe is that it’s already a near-universal interconnection technology.
In summary, PCIe has grown from its original use as a high-speed, board-level graphics interconnect to a popular general-purpose solution. This has enabled it to penetrate every market segment: enterprise, servers, storage, embedded and consumer. And with some simple extensions it is a highly attractive solution for high-speed clustering in data center fabrics. It can offer performance comparable to a QDR InfiniBand solution, with a much lower cost and power envelope. InfiniBand technology has its place in those applications that require high data transfer rates without regard to power or cost, and those willing to pay a premium for this requirement will continue to use the technology. But for cloud and data center applications that need and value the three P’s — performance, power and price — PCIe is the superior option.
Benefits of PCIE fabrics.
The NVMe over PCIe Fabric solution does not include any software abstraction layers and the remote system and the NVMe device are able to communicate without using CPU or memory resources in the system where the NVMe is physically located.
The dominating part of the latency for other NVMe over Fabrics solution is originating from the software emulation layers and are around 10 microseconds end to end. With the NVMe over PCIe Fabric solution, there is no software emulation layer and adds no more than around 700 nanoseconds (depends on the computer and IO system). This latency originates from crossing a few PCIe bridges in the PCIe fabric verses a software emulation layer.
The NVMe over PCIe fabric supports device DMA engines. These engines will typically move data between the device and memory in the system borrowing the device. No network specific RDMA is needed.
The NVMe over PCIe fabric is resilient to errors. The Vortex Express driver implement full error containment and will ensure that no user data is lost due to transient errors. PCIe Vfusion Vortex managed cables can be hot plugged, and the remote NVMe will only be unavailable while the cable is disconnected.
Wide compatibility and future expandability
- Extend PCIe between systems and I/O using cables or backplanes
- Supports copper and fiber cables Copper cables up to 9 meters
- Fiber cables between 10-100 meters
- Two types of bridging models Transparent bridging to I/O devices –no software needed supported in hardware
- Non-transparent bridging used to connect two or more root complexes such as processor and IO devices –Vfusion Vortex software required to manage data transfers.
- No changes to PCIe protocol –standard PCIe transactions
- Combination of two elements PCIe Clustering
- PCIe SmartIO technology
- PCIe Clustering Designed for tightly coupled distributed systems Low latency
- High throughput
- Scale-out capability Node scaling from 2 nodes to 128 nodes (128 nodes based on new technology)
- Performance scaling Gen3 x4 PCIe to x16 PCIe
- PCIe SmartIO technology Create pool of devices Device lending enables devices to be shared in a PCIe Fabric
- Direct peer-to-peer communication
- By-pass local CPU and system memory
- Enhance capabilities Create MR-IOV capabilities with SR-IOV devices
- Use standard PCIe technology to remove boundaries between local and remote devices
- Applications can dynamically access PCIe devices independent on location in the PCIe fabric Local attached devices
- Devices attached to remote storage nodes
- Devices attached to PCIe fabric
- Plug and Play Full support for hot add of Vfusion NVME expansion shelves
- No power on sequencing requirements
- Fail over to alternative path
- Device Address Resolution Protocol PCIe fabric ID
- Hosts on a PCIe network can borrow regular PCIe devices attached to remote hostsLend PCIedevices between systems
- Supports GPUs, FPGAs, NVMe drivers, and other PCIe devices
- Scale out to multiple systems with PCIe switches
- No Linux Kernel patches
- No application software modifications necessary
- Virtually no performance difference between local and remote resources
- Supports Hot Pluggable devices
- Supports run-time re-configuration and bring-up. Now power on sequencing required between systems
- All PCIe devices connected to separate server are logically available at one server no changes to device drivers
- Lending and borrowing software on multiple hosts
- Lending system makes borrowing system aware of available devices
- Borrowing system borrows devices. New device is hot added to borrowing system.
- Supports MSI and MSI-X Interrupts
- No changes to device drivers, standard transparent drivers used with Vfusion Vortex Smart-Disk setup
- Devices look like part of borrowing system and acts like an attached device
- Remote access without any software overhead-PCIeend to end
- Easy Setup, managed by Vfusion Vortex software
- Low latency solution –remote PCIe transport adds less than 500ns
- No power on sequencing –hot plug support
- No application or library changes
- Flexible dynamic use. All types of devices can be passed around between nodes
- Native I/O performance