Uncategorized

Managed to get GPT-OSS 120B running locally on my mini PC!

“`html





Running GPT-OSS 120B Locally: A Mini PC Story

Running GPT-OSS 120B Locally: A Mini PC Story

So, I stumbled across something absolutely wild the other day, and I just had to share it. A friend of mine, let’s call him “spoilt999” (because that’s the handle he used on Reddit!), managed to get the GPT-OSS 120B model running directly on his mini PC. Seriously, my jaw dropped. And the best part? It’s not some super-expensive, high-powered server. It’s a fairly affordable mini PC that’s surprisingly capable.

I’ve been messing around with smaller language models lately, and they’re impressive, but the thought of running a model as massive as GPT-OSS 120B – which is roughly 120 billion parameters – on everyday hardware felt, well, improbable. It’s a fantastic achievement, and it opens up a whole new world of possibilities for experimentation and self-hosting.

The Specs – It’s Not as Crazy as You Think

Let’s break down the hardware. spoilt999’s setup consists of a Minisforum UH125 Pro mini PC, which he snagged for around $300. He then added 96GB of DDR5 RAM for an additional $160. That’s a total investment of about $460. Honestly, that’s a very reasonable price point for something that can run a model of this size, especially when you consider the potential cost of cloud-based API access.

The PC has an Intel U5 125H CPU. It’s not the latest and greatest, but it’s powerful enough to handle the workload. The key, I think, is the RAM. The 96GB is what really makes it possible. It’s also running Ollama, which simplifies the process of running large language models.

What Does it Actually Do?

So, what can you *do* with GPT-OSS 120B running on a mini PC? Well, it’s surprisingly responsive. It’s running on June 2024 data, so it’s definitely up-to-date (as of this writing). spoilt999 shared some sample output, including timings for loading, prompt evaluation, and overall execution. Let’s look at the numbers:

  • Total Duration: 33.3516897 seconds
  • Load Duration: 91.5095 ms
  • Prompt Eval Count: 72 token(s)
  • Prompt Eval Duration: 2.2618922 seconds
  • Prompt Eval Rate: 31.83 tokens/s
  • Eval Count: 86 token(s)
  • Eval Duration: 30.9972121 seconds
  • Eval Rate: 2.77 tokens/s

These numbers aren’t going to blow your mind if you’re used to enterprise-level performance, but they’re impressive considering the hardware. A load duration of 91ms is incredibly fast, and the prompt eval rate of 31.83 tokens/s is quite good.

Why This Matters

This isn’t just about a cool experiment. It has significant implications. Firstly, it demonstrates that you don’t *need* a massive, expensive server to run powerful language models. This lowers the barrier to entry for experimentation and development.

Secondly, it opens the door to self-hosting. Imagine being able to run your own AI assistant, chatbot, or creative tool without relying on external APIs. It’s becoming increasingly possible, and this mini PC setup is a fantastic example.

Finally, it’s a fantastic learning opportunity. If you’re interested in AI, this project provides a tangible way to understand the technical aspects of running these models – from memory management to prompt engineering.

Resources and Further Reading

I encourage you to check out spoilt999’s Reddit post here. He has a YouTube video showing the setup in action, which I highly recommend. You can also find the original Reddit post here.



“`

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Appliance - Powered by TurnKey Linux