Louise Ai agent - David S. Nishimoto

Louise ai agent: the Deepseek R1 mystery


Listen Later

DeepSeek was probably using Openai o1 in its reinforcement learning to accelerate the learning curve. o1 then became the teacher model says Hans Nelson. They used American technology to fast track to the top.

DeepSeek R1 is an advanced language model that enhances efficiency by optimizing precision, using only 8 decimal places instead of the standard 32, which reduces memory usage by 75%. Additionally, its "multi-token" system allows the model to read entire phrases at once, effectively doubling processing speed while maintaining 90% accuracy.

When comparing the costs of DeepSeek R1 with OpenAI's O1 model, the differences are striking. DeepSeek R1 is estimated to cost only 2% of what users would spend on OpenAI's O1 model, making it significantly more affordable for users seeking advanced AI capabilities. Specifically, OpenAI's O1 charges $15 per million input tokens and $60 per million output tokens, while DeepSeek R1 offers a much lower rate of $0.55 per million input tokens and $2.19 per million output tokens. This translates to a 95% cost reduction for users opting for DeepSeek R1 over OpenAI's O1, making it a compelling choice for those looking to leverage AI without incurring high expenses.

DeepSeek is a Chinese artificial intelligence company that has gained significant attention for its development of open-source large language models. Founded and funded by the hedge fund High-Flyer, it is based in Hangzhou, Zhejiang, and is led by Liang Wenfeng.

Since its inception, DeepSeek has made waves in the AI community, particularly in Silicon Valley, by demonstrating AI models that rival the performance of leading chatbots at a much lower cost. This has led to considerable market disruption, with reports indicating that the success of DeepSeek has caused stock prices of US and European tech firms to decline.

DeepSeek's AI assistant recently became the top-rated free application on the Apple App Store in the United States, which resulted in website outages due to the influx of users. This surge in popularity highlights the growing interest and potential of DeepSeek's technology in the competitive AI landscape.

DeepSeek R1 could have been trained using 2,048 Nvidia H800 GPUs, which collectively accounted for approximately 2.788 million GPU hours at a cost of roughly $5.58 million. The Nvidia H800 is a high-performance chip specifically designed for AI workloads, making it a suitable candidate for such extensive training tasks.

If DeepSeek R1 were to use Apple M-series chips for training at a total cost of $6 million, approximately 3,000 M-series chips would be required. This calculation assumes a cost of $2,000 per chip, which may vary based on the specific model and market conditions.

The M-series chips feature powerful integrated GPUs that are optimized for machine learning workloads. This integration allows for faster processing of the matrix operations that are fundamental to training neural networks. The ability to leverage both CPU and GPU resources effectively could make the M-series capable of managing the computational demands of an LLM.

One of the standout features of the M-series chips is their energy efficiency. This could lead to lower operational costs during training, which is particularly beneficial for extensive training sessions that LLMs typically require. The efficiency could also allow for longer training runs without overheating or requiring extensive cooling solutions.

Each Nvidia H800 GPU has a power consumption of around 300 watts on average when fully utilized. With 2048 GPUs running for a total of 2.788 million GPU hours, we can calculate the total power consumption in kilowatt-hours (kWh).

1. Total Power Consumption for Nvidia H800:

- Power per GPU: 300 watts = 0.3 kW

- Total power for 2048 GPUs: 2048 GPUs * 0.3 kW = 614.4 kW

- Total training hours: 2.788 million GPU hours

- Total energy consumption in kWh: 614.4 kW * (2.788 million GPU hours / 2048) ≈ 843,000 kWh

Now, let’s look at the Apple Mseries chips. Assuming they are more energy-efficient, we can estimate their power consumption is around 150 watts per chip.

2. Total Power Consumption for Apple M series:

- Power per M series chip: 150 watts = 0.15 kW

- Total power for 3000 M series chips: 3000 * 0.15 kW = 450 kW

- We need to estimate the training hours for M series chips equivalent to the workload done by Nvidia H800. Assuming similar GPU hours, the total energy consumption in kWh would be:

- Total energy consumption in kWh: 450 kW * (2.788 million GPU hours / 3000) ≈ 418,800 kWh

3. Cost of Electricity:

Assuming an average electricity cost of $0.10 per kWh, we can calculate the operational costs:

- For Nvidia H800: 843,000 kWh * $0.10 = $84,300

- For M series: 418,800 kWh * $0.10 = $41,880

- Nvidia H800:

- Total Energy Consumption: ~843,000 kWh

- Operational Cost: ~$84,300

- Apple M series:

- Total Energy Consumption: ~418,800 kWh

- Operational Cost: ~$41,880

...more
View all episodesView all episodes
Download on the App Store

Louise Ai agent - David S. NishimotoBy David Nishimoto