AI Post Transformers

Meta's solution to massive DLRM inference through software defined memory


Listen Later

On November, 2021 Meta (back then Facebook) in collaboration with George Mason University and University of Illinois Chicago published their paper "Supporting Massive DLRM inference through software defined memory". Meta addressed the infrastructure challenge of serving massive Deep Learning Recommendation Models by extending the memory hierarchy to include NVMe Storage Class Memory. Because standard storage devices read large data blocks that exceed the small size of embedding rows the company faced significant read amplification and bandwidth waste. To resolve this the engineering team implemented a solution using the NVMe SGL Bit Bucket feature within a software defined memory stack. This modification to the Linux kernel and drivers allows applications to perform direct input output requests for specific data chunks down to four bytes rather than transferring full logical blocks. The implementation of bit buckets enables the system to transfer only the requested portion of a data block which significantly optimizes link bandwidth and reduces memory utilization. This granular approach saves approximately 75 percent of bus bandwidth and lowers individual read latency by 3 to 5 percent by removing unnecessary data transfer and memory copies. When applied to production environments this architecture allows data centers to replace expensive DRAM with efficient flash storage for specific model components. These optimizations result in up to 20 percent power savings on simpler hardware and a projected 29 percent increase in performance per watt for multi tenant serving scenarios.Sources:https://arxiv.org/pdf/2110.11489https://lore.kernel.org/linux-nvme/[email protected]/
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof