Uncategorized

Managed to get GPT-OSS 120B running locally on my mini PC!

“`html





Running GPT-OSS 120B Locally: Seriously?

Running GPT-OSS 120B Locally: Seriously?

Okay, so I just had to share this with you guys. I’ve been messing around with some AI models, and I actually managed to get the GPT-OSS 120B running on my mini PC. Seriously! It’s not exactly a powerhouse, but it’s actually working, and it’s pretty impressive.

I know, I know, everyone’s talking about huge AI models, needing massive servers, and frankly, it all seems a little intimidating. But this shows that you don’t *need* all that to experiment with some really powerful AI. I’ve been on the hunt for a way to play with LLMs without breaking the bank or requiring a data center. This felt like a potential solution, and I’m thrilled with the outcome.

The Setup – It’s Surprisingly Simple

Let’s break down how I did it. The model itself is GPT-OSS 120B. I’m using Ollama to run it. It’s designed to make things like this a bit more straightforward. I’d heard about Ollama and thought it could be the key to getting this done.

My mini PC is a Minisforum UH125 Pro. It’s a fairly standard mini PC. I’ve got an Intel U5 125H CPU, and a surprising 96GB of RAM. And the total cost was $460 – $300 for the mini PC and $160 for the RAM. It’s a lot less than renting server time, that’s for sure.

I put it all together using Ollama, and it honestly wasn’t as complicated as I expected. The instructions were pretty clear, and it took me a couple of hours to get everything set up. The biggest surprise was just how quickly it loaded.

Performance – It’s Not Blazing Fast, But It Works

Okay, let’s be real – this isn’t going to replace a dedicated GPU setup. The response times aren’t lightning fast. But, for a CPU-only setup, it’s genuinely impressive. I’m running it on a machine that cost less than $500 and it’s generating responses in a reasonable amount of time.

I ran a few tests, and the output times varied a little depending on the prompt, but the response time of around 30 seconds to a minute isn’t terrible. It’s definitely noticeable compared to waiting for API calls.

The Data – June 2024

Just a quick note on the data it was trained on. According to the model, its training data cut off in June 2024. So, while it can generate pretty complex and coherent text, it doesn’t have any knowledge of events that happened after that date.

Here’s a little snippet of the data from one of my tests (don’t worry, I’ve sanitized it a bit!):

total duration: 33.3516897s
load duration: 91.5095ms
prompt eval count: 72 token(s)
prompt eval duration: 2.2618922s
prompt eval rate: 31.83 tokens/s
eval count: 86 token(s)
eval duration: 30.9972121s
eval rate: 2.77 tokens/s

The numbers are just a snapshot, but they give you an idea of the processing involved. It’s not instantaneous, but it’s functional.

Why This Matters (Maybe?)

I think this is more than just a cool gadget. It demonstrates that advanced AI isn’t *always* locked behind expensive hardware. It opens up possibilities for experimentation, learning, and even small-scale development. It shows that you can actually get something like this running on consumer hardware.

I don’t know where this will lead, but it’s definitely an exciting development. I’m curious to see what other models can be run locally like this. And who knows – maybe this will encourage more people to explore the world of self-hosted AI!



“`

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Appliance - Powered by TurnKey Linux