The article explores Mixture of Experts (MoE) models, a new architecture in AI that prioritizes computational efficiency by activating only a small subset of its parameters for any given task. This "forgetting" of unused knowledge, while seemingly a limitation, is presented as a key feature enabling scalability to massive model sizes like GPT-4. However, the article also cautions against the potential downsides, such as the development of an "expert oligarchy" where some parts of the model dominate, leading to bias and reduced adaptability. The author ultimately questions whether this approach truly maximizes intelligence or simply optimizes for cost-effective performance, sacrificing holistic thinking for efficiency. A case study of DeepSeek-V3 and its attempt to address this imbalance through load balancing is included.