The June 5, 2025 research paper introducing HALoS: Hierarchical Asynchronous Local SGD, a novel optimization framework designed for training large language models (LLMs) across geographically distributed accelerators and slow, high-latency networks. The core challenge addressed is the inefficiency of standard synchronous training methods due to slow inter-region communication and heterogeneous hardware speeds. HALoS mitigates these issues through a two-tier architecture featuring local parameter servers (LPSs) and a global parameter server (GPS), which leverages fast intra-region links and asynchronous updates to reduce communication overhead and minimize straggler effects. The authors provide a rigorous convergence analysis for their non-convex objective and demonstrate empirically that HALoS achieves significantly faster convergence (up to 7.5x faster than synchronous baselines) while maintaining or exceeding model quality.Sources:https://arxiv.org/pdf/2506.04531