The September 25 2025 academic paper evaluates the performance and portability of the novel Mojo programming language for high-performance computing (HPC) scientific kernels on modern GPUs. Researchers compare Mojo’s performance against vendor-specific baselines, CUDA for NVIDIA H100 and HIP for AMD MI300A GPUs, using four workloads: two memory-bound (seven-point stencil and BabelStream) and two compute-bound (miniBUDE and Hartree–Fock). The paper finds that Mojo's performance is highly competitive for memory-bound kernels, particularly on AMD GPUs, but notes performance gaps in compute-bound kernels due to the current lack of fast-math optimizations and limitations with atomic operations. Overall, the work suggests Mojo has significant potential to close performance and productivity gaps in the fragmented Python ecosystem by leveraging its MLIR-based compile-time architecture for GPU programming. Source: https://www.arxiv.org/pdf/2509.21039