A Beginner’s Journey into AI: Guide to Running Large Language Models Locally

7 min readJan 12, 2024

Background

I am a complete noob in AI space, but I have a personal goal of becoming a proficient user of AI in 2024. I have a project in mind of what I am willing to build, but it is yet too early to share. I can share that it will be a rather complex one which will require:

Fine-tuning LLM for local language
Fine-tuning LLM for specific context
Using RAG to utilize a knowledge base
Maybe even joining with text2sql

I do have technical knowledge, but not much in AI space. I would like to share with you my learning journey since I think it might be useful for other fellow explorers as well.

Video

For those who prefer to read — continue, for those who prefer video — here is the video tutorial:

Day 1: Let’s run LLM locally

Requirements

Requirements are for this tutorial, if you have a different setup — you might need to look for alternative ways, as the methods described here might not work:

NVIDIA GPU with at least 8Gb of VRAM.
Decent other PC components (CPU, RAM). AI PC requirements are high, you can run it on slower machines, but don’t expect performance to be fast. I have i5–13500, 64Gb DDR4 memory, Samsung 990 Pro
Fast internet. On avg. 7B model take ~13Gb of space.

I will be running this on Windows 11.

Step 1: Install MiniConda. This will allow us to create and drop isolated Python environments. Technically you can do it without MiniConda, but when the next time you want e.g. to run Stable Diffusion — you might be in trouble due to compatibility issues. So the easiest way — is to create an isolated playground. Remember where you have installed it.

Step 2: Add MiniConda directory to the PATH variable

This step is not mandatory, but personally, this improved my experience since I could use Conda in standard Command Prompt and not need to launch Anaconda PowerShell each time.

In the taskbar start typing System Environment Variables…

Click on ‘Edit the system environment variables’.

Then click ‘Environment variables’:

Find the ‘Path’ variable and click Edit:

Add a folder where you have installed Miniconda and the ‘Scripts’ sub-folder.

Open the command prompt and test if this worked by entering:

conda -V

You should see the installed Conda Version

Step 3. Let’s create a new Condata environment. Do not close the Command Prompt from before and execute this command

conda create -n my_llm_instance python=3.11.2

‘my_llm_instance’ — any name of your environment. You will be several times asked some questions, answer Y:

Now let’s launch our environment:

conda activate my_llm_instance

You should see ‘(my_llm_instance)’ before your home path, indicating you did great!

Step 4. If you still do not have it — install Nvidia CUDA. In my case, the newest 12.3 CUDA version was not compatible with PyTorch, so I rolled it back to 12.1, try and see what works for you. 12.1 CUDA download link.

Step 5. Create a folder where all files will be stored and navigate in CMD to that directory. In my case, it’s ‘C:\local_llm’

cd C:\local_llm

Step 6. Clone text-generation-webui GIT.

git clone https://github.com/oobabooga/text-generation-webui

Step 7. Install required libraries.

pip install torchvision torchaudio torch==2.1.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html

Check what is latest version and specify your Nvidia CUDA version, in my case it was 12.1.

these libraries are large so it might take a while.

Step 8. Install text-generation-webui required libraries.

First, enter text-generation-webui folder:

cd text-generation-webui

And install requirements

pip install -r requirements.txt

This will also take some time

Step 9. Let’s run text-generation-webui.

Just execute the command:

python server.py

Open URL in a browser which will be shown:

You should see something like this:

This is now an empty LLM user interface. It does not yet have any models so don’t expect it to work :)

Step 10. Find some LLM model you want to run.

Go to Hugging Face and choose some. This is not an easy task — there are thousands of models, and you will need to do your research on what fits you.

Few things to pay attention to:

Model size: usually expressed with 7B, 13B, 30B, 70B etc. This indicates how many BILLION parameters the model was trained or fine-tuned. The bigger — the better, but you are limited by the GPU you have. If you have 8Gb of VRAM — you can run up to 7B models. If you have 24Gb of VRAM — you can run a 30B model. For 70B you will need a very powerful GPU with a lot of RAM. You can offload some of the models to PC RAM and CPU, but you will not like the performance.
Model type: base, instruct, chat. This is a difference between how the model was trained and how it will respond to your requests. E.g. Base model will complete your sentences e.g. you will type “Capital of Lithuania is” and the model should respond as “Vilnius”. Chat models are tuned for chat interaction. You can ask ‘What is the capital of Lithuania’ and the model will respond ‘The capital of Lithuania is Vilnius’.

As of the moment of writing most popular models are mixtral, llama2, phi-2.

If you don’t know where to start, I suggest checking out these models:

There is valuable content on Reddit.

Step 11. Let’s download the model.

When you will find the model you want to try, copy it’s name:

Now jump back to text-generation-webui and click on ‘Model’. In the Download Section paste the copied name and click Download.

Depending on the model size (most 7B models are ~13Gb in size), it will take a while to download, you can monitor the status in CMD:

If you get any errors — likely the model is not compatible with your system or libraries. The simplest way — just try another model, or you can try to debug.

If all is good, it should show ‘Done’:

Step 12. Let’s load the model.

1st click Refresh button:

2nd in the drop-down choose the downloaded model and click ‘Load’:

If all is good, it will say ‘Successfully loaded’:

That’s it! Let’s test it

Jump to the “Chat” tab and start asking questions:

For my prompt ‘Give me Python script, which would calculate distance between two coordinates’ it gave this code:

import math

def haversine(lat1, lon1, lat2, lon2):
   # convert decimal degrees to radians
   lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])   # haversine formula
   dlon = lon2 - lon1
   dlat = lat2 - lat1
   a = (sin(dlat/2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon/2) ** 2)
   c = 2 * atan2(sqrt(a), sqrt(1-a))   return 6371 * c # 6371 is the radius of earth in kilometers

It was almost correct, it should be math.sin(), math.cos(),… and it should print the result. My final adjusted code to calculate the distance between Vilnius and Kaunas (cities of Lithuania) coordinates:

import math

def haversine(lat1, lon1, lat2, lon2):
   # convert decimal degrees to radians
   lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])   # haversine formula
   dlon = lon2 - lon1
   dlat = lat2 - lat1
   a = (math.sin(dlat/2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2) ** 2)
   c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))   return 6371 * c # 6371 is the radius of earth in kilometersresult=haversine(54.9036,23.90889,54.682,25.3284)
print(f'Distance is: {result}')

But logic and calculation is correct! Impressive! GPT4 level stuff!

Summary

That concludes my first day with AI. This is just a first step — be able to load and run the LLM models. Next time I will try to explore some image generation by loading and using Stable Diffusion.

If you have any questions ping me in the comments.