In-Storage Distributed Machine Learning for the Edge

V. Alves
NGD Systems, Inc.,
United States

Keywords: computational storage, distributed machine learning, solid state drive

Summary:

Cloud-only architectures will soon not be able to keep up with the volume and velocity of data across the network, therefore gradually reducing the value that can be created from these investments. Edge computing can help solve the limitations in current infrastructure to enable mission-critical, data-dense IoT and other advanced digital use cases by reducing or eliminating data movement and address latency and energy efficiency bottlenecks. To address the problems above in the context of ML applications, it is necessary to perform training and inference at the edge, transmitting only processed data (metadata) or full data only when necessary. Doing this, however, faces the limitation that most devices do not present strong computing capabilities and, even if they did, it would take too much energy to make them work. Big data analytics solutions, such as Hadoop, have addressed the performance challenge by using a distributed architecture based on a new paradigm that relies on moving computation closer to data. Similarly, by pushing the “move computation to data” paradigm to its ultimate limit we enable highly efficient and flexible in-storage processing capability in solid state drives, i.e., computational storage. By moving data processing tasks closer to where the data resides, we dramatically reduce the storage bandwidth bottleneck, data movement cost, and improve the overall energy efficiency creating an ideal platform for Machine Learning at the Edge. NGD’s computational storage device (CSD) provides a seamless programming model based on a Linux OS and high-level programming languages thanks to a complete standard network software and protocol stack. It is the first commercially available SSD that can be configured to run a server-like operating system (e.g., Linux), allowing general application developers to fully leverage existing tools and libraries to minimize the effort to create and maintain applications running in-storage. This device can be configured to be used in two modes: - a very high capacity PCIe/NVMe enterprise grade solid state drive operating at low power or - a full-blown computational storage device capable of processing data in-situ in addition to its storage capabilities. Finding an optimum method for multimedia similarity search for high performance and low energy consumption is one of the critical issues in big-data and ML processing. To address the problems of scalability and memory bottleneck, we investigate leveraging in-storage processing on computational storage devices, allowing high, scalable data processing by meeting fast I/O and low storage cost requirements of data-intensive applications. This paper proposes a framework for distributed, in-storage training of neural networks on heterogeneous clusters of computational storage devices. Such devices contain multi-core application processors as well SIMD engines and virtually eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results have shown up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.