Large Language Models on Small Computers
As technology advances, we typically expect increased processing power, faster speeds, more memory, and lower costs each year. However, software improvements can also enable us to run complex tasks on hardware that might otherwise be considered inadequate. In an unconventional approach to this idea, DaveBben is experimenting with scaling down large language models (LLMs) like GPT by running them on minimal hardware, such as the ESP32, rather than the powerful systems these models typically require.
To achieve this, significant compromises are necessary. The hardware in question, an ESP32 microcontroller, lacks the capacity to handle models with trillions or even hundreds of billions of parameters, like GPT-4 or GPT-3. Instead, DaveBben uses a reduced dataset with only 260,000 parameters from the tinyllamas checkpoint. The chosen implementation, llama.2c, is optimized to run more efficiently on a limited platform like the ESP32. Specifically, the ESP32-S3FH4R2 model was selected due to its relatively large RAM capacity, which is critical since even this scaled-down LLM requires a minimum of 1 MB to operate. The ESP32-S3FH4R2 also features dual cores that can be pushed to their limits, with a clock speed reaching up to 240 MHz.
While this experiment isn’t about achieving practical, high-performance LLM capabilities on an ESP32, it does demonstrate that it is technically feasible to run such models on highly constrained hardware. The ESP32, with processing capabilities comparable to a 486 or early Pentium chip, represents an extreme in terms of minimal computational resources. Yet, with software optimization, it manages to execute a simplified LLM, which is quite remarkable considering the limitations.
Ultimately, DaveBben’s work is more about the novelty and challenge of running an LLM on such limited hardware rather than solving real-world problems. It serves as a fascinating proof of concept that, with clever software adaptations, even hardware that seems insufficient can handle tasks traditionally reserved for much more powerful systems. This experimentation pushes the boundaries of what is possible with LLMs, albeit in a non-conventional way.
Read more: Large Language Models on Small Computers