Deploying OpenClaw with a Local LLM on Leafcloud: A Sovereign AI Agent Stack
A step-by-step guide to self-hosting OpenClaw on a Leafcloud A100 GPU instance with an open-weight Qwen3 model. Sovereign data, sovereign inference, and waste heat that warms Dutch homes instead of the sky over Virginia.
By
Published on
Self-hosted AI agents are having a moment, and OpenClaw is one of the cleaner takes on the genre. It’s a local-first agent gateway you actually own — your auth profiles, your skills, your model choices, your data. No agent platform sitting between you and your inbox, quietly logging everything for “quality and training purposes.”
The trade-off is that you have to host it somewhere. And there’s a second-order question that often gets skipped: even if OpenClaw itself runs in Europe on infrastructure you trust, every prompt it sends to OpenAI or Anthropic still leaves the continent. The sovereignty story is only half true if the model lives elsewhere. The other half lives on the GPU.
This guide puts the whole stack on a Leafcloud A100: OpenClaw as the agent gateway, vLLM serving an open-weight Qwen3 model on the same machine, and the messaging gateway connecting your clients to your own private LLM. About forty minutes, end to end. The heat coming off the GPU goes toward warming apartments somewhere in the Netherlands rather than being blown into the sky above a Virginia datacenter. No prompts leave the building.
View GPU pricing and options →
Why run the full AI agent stack on Leafcloud
Sovereignty, properly. Agents read your email, browse your files, and hold your API keys. Running OpenClaw on European infrastructure under GDPR keeps the gateway data in the same jurisdiction as the rest of your operations. Running the model locally too means the conversations themselves never leave that jurisdiction either. No transatlantic data transfer agreements, no Schrems III roulette, no hoping OpenAI’s logs are clean.
Cost predictability. A reserved A100 instance costs the same on Tuesday as it does on Sunday at 3am. Token-based pricing on hosted LLM APIs does not — and agents are token-hungry, especially with persistent context and tool-heavy workflows.
Heat reuse, not greenwashing. This is the Leafcloud-specific bit. Our GPU instances live in Leafsites placed inside real buildings — apartment blocks, offices, swimming pools — where the waste heat from your workload directly displaces gas heating. An A100 under load draws around 400W. Multiply that by the rest of the node and you’re heating somewhere on the order of 1 kW worth of showers per GPU. It is, somehow, the most defensible thing you can do with an LLM.
A real alternative to Big Tech hyperscalers. No CLOUD Act exposure, no vendor lock-in, no proprietary services creeping into your architecture. Open standards, OpenStack underneath, and Kubernetes when you outgrow a single VM.
What you’ll end up with
A single Ubuntu 22.04 VM with one A100 attached, running:
- The NVIDIA driver and Container Toolkit so Docker can see the GPU
- vLLM in Docker, serving an open-weight Qwen3 model on
localhost:8000 - The OpenClaw Gateway in Docker, listening on port
18789 - Persistent state on a separate Leafcloud block volume so rebuilds don’t eat your auth profiles or force you to re-download model weights
- A reasonable security posture — no random WebSockets exposed to the internet
You’ll reach the gateway from your laptop over an SSH tunnel and pair OpenClaw clients with it from there. The model only ever listens on localhost. Optional TLS reverse proxy at the end if you want to share the gateway across a team.
1. Provision the GPU VM
In the Leafcloud dashboard, launch a new instance:
| Setting | Value |
|---|---|
| Image | Ubuntu 22.04 LTS |
| Flavor | A100 GPU flavor (1× A100, 80 GB VRAM) |
| Root volume | 100 GB minimum — Node, Docker layers, and the OS itself take up real space |
| Key pair | Your SSH key |
| Network | Default tenant network |
Create a security group with two rules:
22/tcp— SSH, ideally restricted to a known IP range18789/tcp— OpenClaw Gateway. Don’t open this to0.0.0.0/0. Either keep it closed and tunnel in, or front it with TLS and auth (covered at the end).
Allocate a floating IP and SSH in.
Note: if
pnpm install --frozen-lockfilelater fails withKilledor exit code 137 during the OpenClaw build, the VM is out of memory. Move up a flavor and retry. GPU flavors come with generous RAM, so this is usually only a problem on smaller test machines.
2. Install the NVIDIA driver
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential
sudo apt install -y nvidia-driver-550 nvidia-utils-550
sudo reboot
Reconnect and confirm the GPU is visible:
nvidia-smi
You should see the A100 listed with driver version and CUDA runtime. If you don’t, check lsmod | grep nvidia and reboot again.
3. Install Docker and the NVIDIA Container Toolkit
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker
Then the Container Toolkit, which is what lets containers see the GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Smoke test:
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
If you see the A100 from inside the container, the plumbing works.
4. Attach a persistent block volume
Two things need to live somewhere persistent: OpenClaw’s data directory (/home/node/.openclaw/) and the model weights you’re about to download. Both go on a block volume so they survive container, image, and VM rebuilds.
Attach a 200 GB block volume in the Leafcloud dashboard. It’ll appear as /dev/vdb. Then on the VM:
sudo mkfs.ext4 /dev/vdb
sudo mkdir -p /var/openclaw-data
echo '/dev/vdb /var/openclaw-data ext4 defaults 0 2' | sudo tee -a /etc/fstab
sudo mount -a
sudo mkdir -p /var/openclaw-data/gateway /var/openclaw-data/models
sudo chown -R 1000:1000 /var/openclaw-data/gateway
The 1000:1000 ownership matches the node user inside the OpenClaw container. Models are cached under /var/openclaw-data/models, separate from the gateway state.
For longer-term snapshots — the kind of thing you’d want before a major version bump — Leafcloud’s S3-compatible object storage is the right target. Schedule a restic job against /var/openclaw-data/gateway (skip the models directory; it’s reproducible from HuggingFace) on whatever cadence makes sense.
5. Run vLLM with an open-weight Qwen3 model
The current consensus default for open-weight tool-calling agents is the Qwen3 family — Apache 2.0 licensed, strong function calling, well supported in vLLM. The sensible defaults:
Qwen/Qwen3-8Bto get started — fast, fits with massive headroom, perfect for iteration.Qwen/Qwen3-32Bif you want significantly higher quality. Fits in FP16 on a single 80 GB A100 with KV cache headroom.
Start with the 8B. Swapping models later is one container restart.
Launch vLLM in Docker:
docker run -d --restart unless-stopped \
--name vllm \
--gpus all \
--ipc=host \
-p 127.0.0.1:8000:8000 \
-v /var/openclaw-data/models:/root/.cache/huggingface \
vllm/vllm-openai:latest \
--model Qwen/Qwen3-8B \
--served-model-name qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--max-model-len 32768
A few things in that command worth flagging:
-p 127.0.0.1:8000:8000binds vLLM to localhost only. The model is never reachable from outside the VM.--ipc=hostis required for vLLM’s shared-memory transport.--enable-auto-tool-choice --tool-call-parser qwen3_coderis what makes Qwen3’s function calling work cleanly through OpenClaw’s tool interface.- The HuggingFace cache lives on the persistent volume — model weights survive container rebuilds, no re-downloading.
First boot downloads the weights (~16 GB for the 8B); tail the logs:
docker logs -f vllm
You’re waiting for Uvicorn running on http://0.0.0.0:8000 (it’s still bound to localhost via the port mapping, despite what the log says).
Verify:
curl http://localhost:8000/v1/models
You should see qwen3 in the response.
6. Install OpenClaw
Clone the repository:
git clone https://github.com/openclaw/openclaw.git
cd openclaw
Follow OpenClaw’s Docker VM runtime guide to build the image. The important rule from their docs, worth repeating because it’s the kind of thing that bites you a week later: all external binaries used by skills must be baked into the image at build time, never installed inside a running container. Anything apt install’d into a live container is gone the next time you rebuild.
Their example Dockerfile shows the pattern with gog, goplaces, and wacli. Follow the same template for any other CLIs your skills depend on.
In your docker-compose.yml, point the gateway’s host volume mount at /var/openclaw-data/gateway. Add the OpenAI-compatible endpoint and model name to the gateway’s environment so it talks to your local vLLM:
environment:
OPENAI_BASE_URL: http://host.docker.internal:8000/v1
OPENAI_API_KEY: not-used-but-required
OPENAI_MODEL: qwen3
extra_hosts:
- "host.docker.internal:host-gateway"
(Exact env var names vary by OpenClaw version — check openclaw.json or the official docs for the current spelling.)
Then:
docker compose build
docker compose up -d openclaw-gateway
Confirm the gateway is alive:
docker compose logs -f openclaw-gateway
Expected:
[gateway] listening on ws://0.0.0.0:18789
And that the binaries you baked in resolve:
docker compose exec openclaw-gateway which gog goplaces wacli
Each should return /usr/local/bin/<binary>.
7. Connect to the OpenClaw gateway
For a single-user setup — you, on a laptop, talking to your own gateway — leave port 18789 closed at the security group and reach the gateway over SSH:
ssh -L 18789:localhost:18789 ubuntu@<your-floating-ip>
Point your OpenClaw client at ws://localhost:18789 and you’re connected, with all traffic riding inside SSH and all inference happening on your A100.
For shared or production use, put a Caddy or Nginx reverse proxy in front with TLS (Let’s Encrypt handles certificates automatically) and at minimum basic auth. If you need HA, Leafcloud’s managed Octavia load balancers can terminate TLS and front multiple gateway instances on a private network.
8. Keeping the stack up to date
# OpenClaw
cd ~/openclaw
git pull
docker compose build
docker compose up -d
# vLLM
docker pull vllm/vllm-openai:latest
docker stop vllm && docker rm vllm
# Re-run the vLLM launch command from step 5
The persistent volume means OpenClaw’s auth profiles, agents, configs, and downloaded model weights all survive every rebuild.
To swap to the 32B model, stop the vLLM container and relaunch with --model Qwen/Qwen3-32B. OpenClaw doesn’t need to know — --served-model-name qwen3 stays the same.
If you add a new OpenClaw skill that depends on a new binary, update the Dockerfile, rebuild, restart. Don’t be tempted to apt install inside the running container — it’ll work for a week and then disappear during the next rebuild.
What you’ve actually deployed
A sovereign, GDPR-aligned AI agent gateway and the model it talks to, both running on a single 100% renewable-powered A100, with the waste heat going to displace gas heating in a real building somewhere in the Netherlands. Your auth profiles live on a persistent volume, your prompts never leave the VM, your secrets stay in Europe, and your monthly bill doesn’t fluctuate based on how aggressive your agent has been with its tool calls this week.
If you outgrow a single A100 — heavier throughput, larger models, parallel agents pushing concurrent inference — the next step is either a multi-GPU flavor with a larger Qwen3 variant, or Managed Kubernetes with vLLM behind a load balancer. The same OpenClaw deployment, the same qwen3 served model name, keep working.
Ready to run your AI agents on European GPUs?
Spin up an A100 and run through this guide yourself. Sign up at leaf.cloud to get started, or book a call with our team if you want to talk through sizing, model selection, or wiring this up to a managed Kubernetes cluster. We answer.
related