March 19, 2026

Xerxes: CXL 3.0 Simulation for Scalable Memory Systems

This episode explores Xerxes, a new open-source simulator designed to model CXL 3.0 features before the hardware exists. The hosts explain how CXL adds cache coherence to PCIe to solve memory access bottlenecks in AI and HPC workloads, then dive into the two major architectural changes in CXL 3.0: Port-Based Routing, which enables arbitrary fabric topologies beyond rigid trees, and Device-Managed Coherence, which lets devices handle coherence protocols peer-to-peer without routing every transaction through the host CPU. The discussion highlights why this simulator matters for designing next-generation rack-scale memory pools and accelerator fabrics, addressing the chicken-and-egg problem of validating designs before physical hardware ships. The hosts question how validation works without reference hardware and preview a deeper look at Xerxes' architecture and methodology.

Sources:

1. Xerxes: CXL 3.0 Simulation for Scalable Memory Systems

https://www.usenix.org/system/files/fast26-an.pdf

2. CXL Memory Disaggregation: Opportunities and Challenges — Guz et al. (Intel), 2023

https://scholar.google.com/scholar?q=CXL+Memory+Disaggregation:+Opportunities+and+Challenges

3. Pond: CXL-Based Memory Pooling Systems for Cloud Platforms — Li et al., 2023

https://scholar.google.com/scholar?q=Pond:+CXL-Based+Memory+Pooling+Systems+for+Cloud+Platforms

4. TPP: Transparent Page Placement for CXL-Enabled Tiered Memory — Maruf et al., 2023

https://scholar.google.com/scholar?q=TPP:+Transparent+Page+Placement+for+CXL-Enabled+Tiered+Memory

5. The CXL Memory Expander: Performance and Cost Analysis — Gouk et al. (SK hynix), 2023

https://scholar.google.com/scholar?q=The+CXL+Memory+Expander:+Performance+and+Cost+Analysis

6. Exploring CXL 3.0 Port-Based Routing for Scalable Memory Systems — Pan et al., 2024

https://scholar.google.com/scholar?q=Exploring+CXL+3.0+Port-Based+Routing+for+Scalable+Memory+Systems

7. SMART: Scalable Memory Architecture with Port-Based Routing — Kim et al., 2024

https://scholar.google.com/scholar?q=SMART:+Scalable+Memory+Architecture+with+Port-Based+Routing

8. Deadlock-Free Routing for CXL Fabrics — Zhang et al., 2024

https://scholar.google.com/scholar?q=Deadlock-Free+Routing+for+CXL+Fabrics

9. DMC: Distributed Cache Coherence for CXL Memory Systems — Lee et al., 2024

https://scholar.google.com/scholar?q=DMC:+Distributed+Cache+Coherence+for+CXL+Memory+Systems

10. Scaling Cache Coherence to Thousands of Devices with CXL DMC — Wang et al., 2024

https://scholar.google.com/scholar?q=Scaling+Cache+Coherence+to+Thousands+of+Devices+with+CXL+DMC

11. Coherence Protocol Verification for CXL Device-Managed Coherence — Chen et al., 2024

https://scholar.google.com/scholar?q=Coherence+Protocol+Verification+for+CXL+Device-Managed+Coherence

12. gem5: A Multiple-ISA Full-System Simulator — Binkert et al., 2011

https://scholar.google.com/scholar?q=gem5:+A+Multiple-ISA+Full-System+Simulator

13. The ZSim Simulator: Fast and Accurate Multicore Simulation — Sanchez and Kozyrakis, 2013

https://scholar.google.com/scholar?q=The+ZSim+Simulator:+Fast+and+Accurate+Multicore+Simulation

14. Simulating Multi-Core Systems with Shared Memory Coherence — Martin et al. (Wisconsin Multifacet group), 2005

https://scholar.google.com/scholar?q=Simulating+Multi-Core+Systems+with+Shared+Memory+Coherence

15. PARADE: A Cycle-Accurate Full-System Simulation Platform for Accelerator-Rich Architectures — Fuchs et al., 2020

https://scholar.google.com/scholar?q=PARADE:+A+Cycle-Accurate+Full-System+Simulation+Platform+for+Accelerator-Rich+Architectures

16. A Primer on Memory Consistency and Cache Coherence — Sorin, Hill, and Wood, 2011

https://scholar.google.com/scholar?q=A+Primer+on+Memory+Consistency+and+Cache+Coherence

17. Coherence and Consistency Models in Shared-Memory Multiprocessors — Adve and Gharachorloo, 1996

https://scholar.google.com/scholar?q=Coherence+and+Consistency+Models+in+Shared-Memory+Multiprocessors

18. DASH: A Scalable Directory-Based Multiprocessor — Lenoski et al. (Stanford DASH project), 1992

https://scholar.google.com/scholar?q=DASH:+A+Scalable+Directory-Based+Multiprocessor

19. Directory-Based Cache Coherence in Large-Scale Multiprocessors — Chaiken et al. (Alewife project), 1991

https://scholar.google.com/scholar?q=Directory-Based+Cache+Coherence+in+Large-Scale+Multiprocessors

20. Enabling Rack-Scale Confidential Computing using Heterogeneous Trusted Execution Environment — Jianping Zhu, Hang Yin, Yuekai Jia, Wenhao Wang, Chunhui Li, Jiashuo Liang, Shoumeng Yan, Zhengyu He, Qingkui Liu, Alex X. Liu, 2024

https://scholar.google.com/scholar?q=Enabling+Rack-Scale+Confidential+Computing+using+Heterogeneous+Trusted+Execution+Environment

21. Understanding the Overheads of Hardware Memory Coherence — Lena E. Olson, Joseph Izraelevitz, Mark D. Hill, 2015

https://scholar.google.com/scholar?q=Understanding+the+Overheads+of+Hardware+Memory+Coherence

22. AI Post Transformers: SolidAttention: Co-Designing Sparse Attention and SSD I/O — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-18-solidattention-co-designing-sparse-atten-5a8622.mp3

23. AI Post Transformers: Accelerating LLM Cold Starts with Programmable Page Cache — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-17-accelerating-llm-cold-starts-with-progra-0912d1.mp3

24. AI Post Transformers: xLLM: Co-Locating Online and Offline LLM Inference — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-16-xllm-co-locating-online-and-offline-llm-10bb81.mp3

Interactive Visualization: Xerxes: CXL 3.0 Simulation for Scalable Memory Systems

...more

View all episodes

By mcgrof