
Sign up to save your podcasts
Or


This episode introduces Jamba-1.5, a new series of instruction-tuned large language models built on the Jamba hybrid Transformer-Mamba mixture-of-experts architecture. These models, available in Large (94B active parameters) and Mini (12B active parameters) sizes, are highlighted for their high efficiency, superior throughput, and remarkably low memory usage over long context lengths, up to 256K tokens. A key technical innovation is ExpertsInt8, a novel quantization technique enabling the large model to run efficiently on standard GPU hardware without compromising quality. Evaluations consistently show that Jamba-1.5 models achieve competitive performance on academic and chatbot benchmarks while excelling in long-context tasks compared to other similarly sized open-weight models. The authors also share insights into the model's training stages, multilingual capabilities, and alignment safety considerations
By kwThis episode introduces Jamba-1.5, a new series of instruction-tuned large language models built on the Jamba hybrid Transformer-Mamba mixture-of-experts architecture. These models, available in Large (94B active parameters) and Mini (12B active parameters) sizes, are highlighted for their high efficiency, superior throughput, and remarkably low memory usage over long context lengths, up to 256K tokens. A key technical innovation is ExpertsInt8, a novel quantization technique enabling the large model to run efficiently on standard GPU hardware without compromising quality. Evaluations consistently show that Jamba-1.5 models achieve competitive performance on academic and chatbot benchmarks while excelling in long-context tasks compared to other similarly sized open-weight models. The authors also share insights into the model's training stages, multilingual capabilities, and alignment safety considerations