The Experiment: A Private AI Chatbot on Your Own Machine
What if you could run a capable AI language model entirely on your own hardware — no internet connection required, no API costs, no data leaving your machine? With Ollama, that's not only possible, it's surprisingly straightforward. This lab experiment walks through installing Ollama on Linux, pulling a model, and interacting with it via the terminal and a local web UI.
What Is Ollama?
Ollama is an open-source tool that makes running large language models (LLMs) locally as simple as running a Docker container. It handles model management, hardware acceleration (CPU and GPU), and exposes a local API compatible with the OpenAI API format — meaning many existing tools work with it out of the box.
Ollama supports a growing library of open-source models including Llama 3, Mistral, Phi-3, Gemma, CodeLlama, and more.
Hardware Requirements
You don't need a top-tier GPU to get started. Here's a rough guide:
| Model Size | Min RAM (CPU) | Recommended GPU VRAM |
|---|---|---|
| 7B parameters | 8 GB | 6–8 GB VRAM |
| 13B parameters | 16 GB | 10–12 GB VRAM |
| 34B+ parameters | 32 GB+ | 24 GB+ VRAM |
CPU-only inference works fine for smaller models — it's slower but functional for experimentation.
Step 1: Install Ollama
The official one-liner installer works on most Linux distros:
curl -fsSL https://ollama.com/install.sh | sh
This installs the Ollama binary and registers it as a systemd service. Verify the installation:
ollama --version
Step 2: Pull a Model
Let's start with Llama 3.2 (3B — fast and capable on modest hardware):
ollama pull llama3.2
Or try Mistral 7B for a great balance of quality and speed:
ollama pull mistral
Models are stored in ~/.ollama/models and are reused across sessions.
Step 3: Start Chatting in the Terminal
ollama run mistral
You'll get an interactive chat prompt. Type your message and hit Enter. Type /bye to exit.
>>> Explain the difference between a process and a thread in simple terms.
A process is an independent program running in its own memory space...
Step 4: Use the REST API
Ollama runs a local HTTP server on http://localhost:11434. You can interact with it via curl:
curl http://localhost:11434/api/generate \
-d '{
"model": "mistral",
"prompt": "Write a bash script that monitors disk usage and sends an alert when it exceeds 80%",
"stream": false
}'
The API is compatible with the OpenAI format, so you can use it with tools like Aider, Continue.dev, or LangChain by simply pointing them at localhost:11434.
Step 5: Add a Web UI (Optional)
For a more polished chat interface, Open WebUI is the most popular option and runs in Docker:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Visit http://localhost:3000 in your browser for a full ChatGPT-style interface connected to your local Ollama instance.
Useful Ollama Commands
ollama list— see all downloaded modelsollama rm modelname— delete a modelollama show mistral— display model info and parametersollama ps— see currently loaded models
Lab Takeaways
Running LLMs locally with Ollama is genuinely practical for developers. The use cases are compelling:
- 🔒 Privacy-sensitive queries — your prompts never leave your machine
- 💻 Code generation and review — use CodeLlama for offline coding assistance
- ⚙️ Automation — integrate local LLM calls into scripts and pipelines
- 🧪 Model experimentation — easily swap and compare models
This experiment proved that you don't need a cloud subscription to access powerful AI capabilities. Your local machine — even a mid-range one — is more capable than you might think.