AI Post Transformers

Xerxes: CXL 3.0 Simulation for Scalable Memory Systems


Listen Later

This episode explores Xerxes, a new open-source simulator designed to model CXL 3.0 features before the hardware exists. The hosts explain how CXL adds cache coherence to PCIe to solve memory access bottlenecks in AI and HPC workloads, then dive into the two major architectural changes in CXL 3.0: Port-Based Routing, which enables arbitrary fabric topologies beyond rigid trees, and Device-Managed Coherence, which lets devices handle coherence protocols peer-to-peer without routing every transaction through the host CPU. The discussion highlights why this simulator matters for designing next-generation rack-scale memory pools and accelerator fabrics, addressing the chicken-and-egg problem of validating designs before physical hardware ships. The hosts question how validation works without reference hardware and preview a deeper look at Xerxes' architecture and methodology.
Sources:
1. Xerxes: CXL 3.0 Simulation for Scalable Memory Systems
https://www.usenix.org/system/files/fast26-an.pdf
2. CXL Memory Disaggregation: Opportunities and Challenges — Guz et al. (Intel), 2023
https://scholar.google.com/scholar?q=CXL+Memory+Disaggregation:+Opportunities+and+Challenges
3. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms — Li et al., 2023
https://scholar.google.com/scholar?q=Pond:+CXL-Based+Memory+Pooling+Systems+for+Cloud+Platforms
4. TPP: Transparent Page Placement for CXL-Enabled Tiered Memory — Maruf et al., 2023
https://scholar.google.com/scholar?q=TPP:+Transparent+Page+Placement+for+CXL-Enabled+Tiered+Memory
5. The CXL Memory Expander: Performance and Cost Analysis — Gouk et al. (SK hynix), 2023
https://scholar.google.com/scholar?q=The+CXL+Memory+Expander:+Performance+and+Cost+Analysis
6. Exploring CXL 3.0 Port-Based Routing for Scalable Memory Systems — Pan et al., 2024
https://scholar.google.com/scholar?q=Exploring+CXL+3.0+Port-Based+Routing+for+Scalable+Memory+Systems
7. SMART: Scalable Memory Architecture with Port-Based Routing — Kim et al., 2024
https://scholar.google.com/scholar?q=SMART:+Scalable+Memory+Architecture+with+Port-Based+Routing
8. Deadlock-Free Routing for CXL Fabrics — Zhang et al., 2024
https://scholar.google.com/scholar?q=Deadlock-Free+Routing+for+CXL+Fabrics
9. DMC: Distributed Cache Coherence for CXL Memory Systems — Lee et al., 2024
https://scholar.google.com/scholar?q=DMC:+Distributed+Cache+Coherence+for+CXL+Memory+Systems
10. Scaling Cache Coherence to Thousands of Devices with CXL DMC — Wang et al., 2024
https://scholar.google.com/scholar?q=Scaling+Cache+Coherence+to+Thousands+of+Devices+with+CXL+DMC
11. Coherence Protocol Verification for CXL Device-Managed Coherence — Chen et al., 2024
https://scholar.google.com/scholar?q=Coherence+Protocol+Verification+for+CXL+Device-Managed+Coherence
12. gem5: A Multiple-ISA Full-System Simulator — Binkert et al., 2011
https://scholar.google.com/scholar?q=gem5:+A+Multiple-ISA+Full-System+Simulator
13. The ZSim Simulator: Fast and Accurate Multicore Simulation — Sanchez and Kozyrakis, 2013
https://scholar.google.com/scholar?q=The+ZSim+Simulator:+Fast+and+Accurate+Multicore+Simulation
14. Simulating Multi-Core Systems with Shared Memory Coherence — Martin et al. (Wisconsin Multifacet group), 2005
https://scholar.google.com/scholar?q=Simulating+Multi-Core+Systems+with+Shared+Memory+Coherence
15. PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectures — Fuchs et al., 2020
https://scholar.google.com/scholar?q=PARADE:+A+Cycle-Accurate+Full-System+Simulation+Platform+for+Accelerator-Rich+Architectures
16. A Primer on Memory Consistency and Cache Coherence — Sorin, Hill, and Wood, 2011
https://scholar.google.com/scholar?q=A+Primer+on+Memory+Consistency+and+Cache+Coherence
17. Coherence and Consistency Models in Shared-Memory Multiprocessors — Adve and Gharachorloo, 1996
https://scholar.google.com/scholar?q=Coherence+and+Consistency+Models+in+Shared-Memory+Multiprocessors
18. DASH: A Scalable Directory-Based Multiprocessor — Lenoski et al. (Stanford DASH project), 1992
https://scholar.google.com/scholar?q=DASH:+A+Scalable+Directory-Based+Multiprocessor
19. Directory-Based Cache Coherence in Large-Scale Multiprocessors — Chaiken et al. (Alewife project), 1991
https://scholar.google.com/scholar?q=Directory-Based+Cache+Coherence+in+Large-Scale+Multiprocessors
20. Enabling Rack-Scale Confidential Computing using Heterogeneous Trusted Execution Environment — Jianping Zhu, Hang Yin, Yuekai Jia, Wenhao Wang, Chunhui Li, Jiashuo Liang, Shoumeng Yan, Zhengyu He, Qingkui Liu, Alex X. Liu, 2024
https://scholar.google.com/scholar?q=Enabling+Rack-Scale+Confidential+Computing+using+Heterogeneous+Trusted+Execution+Environment
21. Understanding the Overheads of Hardware Memory Coherence — Lena E. Olson, Joseph Izraelevitz, Mark D. Hill, 2015
https://scholar.google.com/scholar?q=Understanding+the+Overheads+of+Hardware+Memory+Coherence
22. AI Post Transformers: SolidAttention: Co-Designing Sparse Attention and SSD I/O — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-18-solidattention-co-designing-sparse-atten-5a8622.mp3
23. AI Post Transformers: Accelerating LLM Cold Starts with Programmable Page Cache — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-17-accelerating-llm-cold-starts-with-progra-0912d1.mp3
24. AI Post Transformers: xLLM: Co-Locating Online and Offline LLM Inference — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-16-xllm-co-locating-online-and-offline-llm-10bb81.mp3
Interactive Visualization: Xerxes: CXL 3.0 Simulation for Scalable Memory Systems
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof