The Experiment: A Private AI Chatbot on Your Own Machine

What if you could run a capable AI language model entirely on your own hardware — no internet connection required, no API costs, no data leaving your machine? With Ollama, that's not only possible, it's surprisingly straightforward. This lab experiment walks through installing Ollama on Linux, pulling a model, and interacting with it via the terminal and a local web UI.

What Is Ollama?

Ollama is an open-source tool that makes running large language models (LLMs) locally as simple as running a Docker container. It handles model management, hardware acceleration (CPU and GPU), and exposes a local API compatible with the OpenAI API format — meaning many existing tools work with it out of the box.

Ollama supports a growing library of open-source models including Llama 3, Mistral, Phi-3, Gemma, CodeLlama, and more.

Hardware Requirements

You don't need a top-tier GPU to get started. Here's a rough guide:

Model SizeMin RAM (CPU)Recommended GPU VRAM
7B parameters8 GB6–8 GB VRAM
13B parameters16 GB10–12 GB VRAM
34B+ parameters32 GB+24 GB+ VRAM

CPU-only inference works fine for smaller models — it's slower but functional for experimentation.

Step 1: Install Ollama

The official one-liner installer works on most Linux distros:

curl -fsSL https://ollama.com/install.sh | sh

This installs the Ollama binary and registers it as a systemd service. Verify the installation:

ollama --version

Step 2: Pull a Model

Let's start with Llama 3.2 (3B — fast and capable on modest hardware):

ollama pull llama3.2

Or try Mistral 7B for a great balance of quality and speed:

ollama pull mistral

Models are stored in ~/.ollama/models and are reused across sessions.

Step 3: Start Chatting in the Terminal

ollama run mistral

You'll get an interactive chat prompt. Type your message and hit Enter. Type /bye to exit.

>>> Explain the difference between a process and a thread in simple terms.

A process is an independent program running in its own memory space...

Step 4: Use the REST API

Ollama runs a local HTTP server on http://localhost:11434. You can interact with it via curl:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "mistral",
    "prompt": "Write a bash script that monitors disk usage and sends an alert when it exceeds 80%",
    "stream": false
  }'

The API is compatible with the OpenAI format, so you can use it with tools like Aider, Continue.dev, or LangChain by simply pointing them at localhost:11434.

Step 5: Add a Web UI (Optional)

For a more polished chat interface, Open WebUI is the most popular option and runs in Docker:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Visit http://localhost:3000 in your browser for a full ChatGPT-style interface connected to your local Ollama instance.

Useful Ollama Commands

  • ollama list — see all downloaded models
  • ollama rm modelname — delete a model
  • ollama show mistral — display model info and parameters
  • ollama ps — see currently loaded models

Lab Takeaways

Running LLMs locally with Ollama is genuinely practical for developers. The use cases are compelling:

  • 🔒 Privacy-sensitive queries — your prompts never leave your machine
  • 💻 Code generation and review — use CodeLlama for offline coding assistance
  • ⚙️ Automation — integrate local LLM calls into scripts and pipelines
  • 🧪 Model experimentation — easily swap and compare models

This experiment proved that you don't need a cloud subscription to access powerful AI capabilities. Your local machine — even a mid-range one — is more capable than you might think.