
Sign up to save your podcasts
Or


Source Link: https://prismml.com/news/bonsai-8b
Summary:
PrismML has announced 1-bit Bonsai, a family of Large Language Models (LLMs) designed to provide high-level intelligence on consumer-grade edge devices. The flagship 8B model features a "true" 1-bit architecture where the entire network—including embeddings, attention, and MLP layers—operates at 1-bit precision. This results in a footprint of just 1.15 GB, making it roughly 14x smaller than standard 16-bit models in its class while remaining competitive on benchmarks.
Key highlights of the announcement include:
• Intelligence Density: PrismML defines this metric as a model's capability per unit of size (GB). Bonsai 8B achieves a score of 1.06/GB, drastically higher than the 0.10/GB scored by comparable models like Qwen3 8B.
• Local Performance: The models enable high-throughput local inference, reaching 40+ tokens per second on an iPhone 17 Pro and 131 tokens per second on an M4 Pro Mac. This speed allows for more efficient long-horizon agentic tasks.
• Efficiency: Bonsai delivers 4–5x better energy efficiency than full-precision counterparts, even on standard hardware not yet optimized for 1-bit arithmetic.
• Wider Availability: PrismML also released 4B and 1.7B variants, all of which are available under the Apache 2.0 License to support the development of private, responsive, and offline AI-native products.
By Yun WuSource Link: https://prismml.com/news/bonsai-8b
Summary:
PrismML has announced 1-bit Bonsai, a family of Large Language Models (LLMs) designed to provide high-level intelligence on consumer-grade edge devices. The flagship 8B model features a "true" 1-bit architecture where the entire network—including embeddings, attention, and MLP layers—operates at 1-bit precision. This results in a footprint of just 1.15 GB, making it roughly 14x smaller than standard 16-bit models in its class while remaining competitive on benchmarks.
Key highlights of the announcement include:
• Intelligence Density: PrismML defines this metric as a model's capability per unit of size (GB). Bonsai 8B achieves a score of 1.06/GB, drastically higher than the 0.10/GB scored by comparable models like Qwen3 8B.
• Local Performance: The models enable high-throughput local inference, reaching 40+ tokens per second on an iPhone 17 Pro and 131 tokens per second on an M4 Pro Mac. This speed allows for more efficient long-horizon agentic tasks.
• Efficiency: Bonsai delivers 4–5x better energy efficiency than full-precision counterparts, even on standard hardware not yet optimized for 1-bit arithmetic.
• Wider Availability: PrismML also released 4B and 1.7B variants, all of which are available under the Apache 2.0 License to support the development of private, responsive, and offline AI-native products.