Learning GenAI via SOTA Papers

EP079: DBRX Beats GPT-3.5


Listen Later

DBRX is a new state-of-the-art, open, general-purpose large language model (LLM) developed by Databricks.

Key Highlights:

  • Architecture and Scale: DBRX uses a fine-grained mixture-of-experts (MoE) architecture. It contains 132 billion total parameters, but only 36 billion are active for any given input. The model was pre-trained on 12 trillion tokens of carefully curated text and code, supporting a maximum context length of 32k tokens.
  • Superior Performance: DBRX establishes a new standard for open models, outperforming peers like LLaMA2-70B, Mixtral, and Grok-1 across composite benchmarks, with particular strengths in programming (HumanEval) and mathematics (GSM8k). It also exceeds the capabilities of GPT-3.5 and is highly competitive with closed models like Gemini 1.0 Pro and Mistral Medium.
  • High Efficiency: Thanks to its MoE architecture, DBRX achieves significant efficiency gains. It is highly compute-efficient to train and delivers inference throughput that is up to 2x faster than LLaMA2-70B. Databricks notes that their overall end-to-end training pipeline has become nearly 4x more compute-efficient compared to their previous MPT models.

Ultimately, DBRX is designed to provide the open community and enterprises with the capability to build and control their own world-class foundation models, matching the quality of closed APIs.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu