Learn how to host a private, local LLM on your home Linux server with this CPU-optimized guide. Step-by-step instructions for setting up Ollama and Open WebUI for a full AI experience without a dedicated GPU.
Running a local Large Language Model (LLM) ensures your data stays private and removes reliance on subscription-based APIs. If you have a home Linux server with modest hardware and no dedicated GPU, you can still achieve a responsive, professional AI experience. This guide provides the exact steps to deploy a CPU-optimized AI instance accessible to any device on your local network (LAN).
Step 1: Install and Configure the Ollama Backend
Ollama is the engine that handles model execution. It is highly optimized for CPUs with AVX/AVX2 support.
1.1 Installation
Run the official install script to get the latest binary and set up the systemd service:
curl -fsSL https://ollama.com/install.sh | sh
1.2 Enable LAN Access (Service Configuration)
By default, Ollama only listens to localhost. To allow your frontend (and other devices) to communicate with it, you must bind it to 0.0.0.0.
- Open the systemd override file:
sudo systemctl edit ollama.service - Paste the following lines between the comments:
[Service] Environment="OLLAMA_HOST=0.0.0.0" - Save and exit, then reload and restart the service:
sudo systemctl daemon-reload sudo systemctl restart ollama
1.3 Verification
Confirm the service is active and listening correctly:
systemctl status ollama
Step 2: Deploy the Open WebUI Frontend
Open WebUI provides a ChatGPT-like interface in your browser. Choose the method that best fits your workflow.
Option A: Docker Deployment (Recommended)
This is the fastest method for most users. It automatically handles dependencies and keeps your base system clean.
docker run -d -p 8080:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Option B: Native Python Installation
Use this if you prefer to avoid containers. Requires Python 3.11+.
# Clone the repository
git clone https://github.com/open-webui/open-webui.git
cd open-webui/backend
# Create and activate environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt -U
# Set network bindings and start
export HOST=0.0.0.0
export PORT=8080
bash start.sh
Step 3: Configure Firewall and Network
To access the UI from your smartphone or laptop on the same Wi-Fi, you must open the necessary ports.
3.1 UFW Rules
Open the WebUI port (8080) and the Ollama API port (11434):
sudo ufw allow 8080/tcp
sudo ufw allow 11434/tcp
sudo ufw reload
3.2 Find Your Server’s Local IP
You will need this address to connect from other devices on your LAN:
hostname -I | awk '{print $1}'
Step 4: Download CPU-Optimized Models
For hardware without a GPU, model size and "quantization" are critical. We recommend starting with 4-bit quantized (GGUF) models to prevent system freezing.
1B - 3B Parameter Models (Ultra-Lightweight)
Perfect for older CPUs or low RAM (<4GB). Very fast inference.
ollama pull llama3.2:1b
# OR
ollama pull phi3.5
7B - 8B Parameter Models (Balanced Performance)
Best for logic and complex tasks; requires 8GB-12GB of system RAM for smooth performance.
ollama pull llama3.1:8b
Step 5: Final Connectivity Test
- Local Test: Open a browser on the server and go to
http://localhost:8080. - LAN Test: Open a browser on another device on your network and go to
http://[YOUR_SERVER_IP]:8080. - API Check: Verify the backend is visible by running:
curl http://[YOUR_SERVER_IP]:11434
"Your local AI is now ready. You can log in via the web interface, create your admin account, and start chatting immediately with the models you downloaded."