Exploring Ollama Running Local LLMs on Your Machine

Exploring Ollama

Full Transcript

00:00
Hello, this is Jim from Augusto. I recently did a video that was a deep dive into local LLMs and I had a few people reach back out to me and ask if that could just show what Olamma would look like to run in their local machine, maybe the developer machine,
00:15
or a machine that they have in the back of their home office or maybe at the office. So I wanted to showcase Olamma. It’s a great tool. Um, you can download it, there’s downloads for Mac OS, Linux, and Windows, and the install is pretty straightforward.
00:31
You install it and it ends up running in the background. Both the Windows machine and Linux, and as well as Mac, you can all run it through the command line as well as it runs a web service on it so you can connect to it remotely if you’ve opened a firewall to that machine or allow access.
00:47
But here’s what it looks like. I’m going to go ahead and open Alama and it looks very similar to ChatGTP. In fact, you can see my chat histories are over here and it’s got a window out there. But let’s go ahead and just do a new chat and let’s talk about what I’m getting here.
01:03
Running this locally right now, it has different models installed. One of the nice things about this interface is you can find a model by searching and it lists a few that I may not have locally, but if I select them, it will be downloaded.
01:16
It’s kind of difficult to see in this interface, which ones are installed. I’m going to pick Tiny Lama, but over in the command line, I can quickly do what’s called O Lama Lists, and you can see that here are the models I have installed.
01:27
Some are different. For example, DeepSeek has 1.5 billion parameters in it. It’s a pretty big model. It’s about a gig of size on my machine. But then there’s this open AI GPT open source. That’s 20 billion. And you’ll see that that’s 13 gig of space that’s gonna take up in my GPU,
01:44
which is your processing for these models. My machine I’m on is a MacBook M1. It’s not gonna be able to run that very well, but it will run some of these others very, very well. Like for example, deep seek. What I’ll do is a couple queries. We’ll take a look at what it does in my GPU. So let’s start
02:01
with tiny lava. I’m gonna ask it some simple questions. Let’s do, we’ll do, why is the sky blue? And it should give me back a pretty quick response. In my last video, I showcased how long that takes with some verboseness. But here you can see it gave a simple answer not much detail,
02:18
because tiny lava doesn’t really have that much information, right? It’s super small, it’s concise, it gives a high level summary. But let’s switch to deep seek. and let’s see if that what that same answer looks like. DeepSeek has more knowledge. It’s a larger model.
02:35
It does more thinking. So why is the sky blue? And now a couple things you can see the GPU is running previously. And right here it’s doing thinking, right? There’s a lot of look-up and information that’s being done processing my question. And then I get a much a more complete answer,
02:56
probably more depth inside of that ampfer and it gives me a summary, right? So these two different models are giving me two different types of answers. One thing we can look at is when you run a Lama, I’m able to fit deep seek and tiny Lama both into the memory on
03:10
my GPU. So it’s showing it’s using 100% of my GPU and it’s processing, meaning that it has the capacity for 100%. If I were to run the version of OpenAI, it would definitely consume all of my GPU as well as some of my system memory as well. Other things we can look at is in the interface,
03:31
you don’t really get to see much more about this. Let’s go back to deep seek and let’s do a more actually let’s pick let’s do a different question All right, I’m going to ask a question, a little more detail. Who was the 21st president and who was in his cabinet?
03:51
All right, let’s let it run, let’s do some thinking. It’s digging back the 21st, it’s looking at names, talks about dates. So it’s doing some detailed information. You can see over here, my GPU is starting to rise. guys. Again, I have the two models actually still running and
04:09
it came back. It said thought for nine seconds. It said the 21st president was John Adams, served until 1801 and his cabinet include himself and others of the time. Pretty interesting that it didn’t actually pull out those people if we maybe use the larger model that could have done that. Again,
04:27
this model is limited to knowledge that’s been embedded into it. So I wanted you to see what what it looked like to run locally, some of the benefits, some of the things. Really quick, I probably should show, I did list the items that I have. For example, I have deep seek,
04:44
five billion, let’s go ahead and update that. So in Alama, pull, and then you give it the one you want, it’s gonna pull down the most current version of that model. You can see that previously it was about three weeks ago they last updated it. I’ll just do a list again and you can see it was updated
05:00
about nine seconds ago. So that’s how you get the most recent version of a model if it’s being republished. And again, you can see that you can do many things. Help. Can you generate? I don’t think this will work. So if you have questions on running
05:15
locally, reach out, let us know. I just wanted people to see what that looks like.

Let's work together.

Partner with Augusto to streamline your digital operations, improve scalability, and enhance user experience. Whether you're facing infrastructure challenges or looking to elevate your digital strategy, our team is ready to help.

Schedule a Consult