Deploying OpenClaw with a Local LLM on Leafcloud: A Sovereign AI Agent Stack

Self-hosted AI agents are having a moment, and OpenClaw is one of the cleaner takes on the genre. It’s a local-first agent gateway you actually own — your auth profiles, your skills, your model choices, your data. No agent platform sitting between you and your inbox, quietly logging everything for “quality and training purposes.”

The trade-off is that you have to host it somewhere. And there’s a second-order question that often gets skipped: even if OpenClaw itself runs in Europe on infrastructure you trust, every prompt it sends to OpenAI or Anthropic still leaves the continent. The sovereignty story is only half true if the model lives elsewhere. The other half lives on the GPU.

This guide puts the whole stack on a Leafcloud RTX 6000 Blackwell: OpenClaw as the agent gateway, vLLM serving an open-weight Qwen3 model on the same machine, and the messaging gateway connecting your clients to your own private LLM. About forty minutes, end to end. The heat coming off the GPU goes toward warming apartments somewhere in the Netherlands rather than being blown into the sky above a Virginia datacenter. No prompts leave the building.

Why run the full AI agent stack on Leafcloud

Sovereignty, properly. Agents read your email, browse your files, and hold your API keys. Running OpenClaw on European infrastructure under GDPR keeps the gateway data in the same jurisdiction as the rest of your operations. Running the model locally too means the conversations themselves never leave that jurisdiction either. No transatlantic data transfer agreements, no Schrems III roulette, no hoping OpenAI’s logs are clean.

Cost predictability. A reserved RTX 6000 Blackwell instance costs the same on Tuesday as it does on Sunday at 3am. Token-based pricing on hosted LLM APIs does not — and agents are token-hungry, especially with persistent context and tool-heavy workflows. Our RTX 6000 Blackwell pricing starts at €2.35/hour committed.

View GPU pricing and options →

Heat reuse, not greenwashing. This is the Leafcloud-specific bit. Our GPU instances live in Leafsites placed inside real buildings — apartment blocks, offices, swimming pools — where the waste heat from your workload directly displaces gas heating. An RTX 6000 Blackwell under load draws around 300W TDP, and the rest of the node brings the total to roughly 1 kW of useful heat per GPU — somebody’s hot showers, in other words. It is, somehow, the most defensible thing you can do with an LLM.

A real alternative to Big Tech hyperscalers. No CLOUD Act exposure, no vendor lock-in, no proprietary services creeping into your architecture. Open standards, OpenStack underneath, Kubernetes when you outgrow a single VM — and the newest Blackwell silicon available in Europe without a US parent company in the chain of custody.

What you’ll end up with

A single Ubuntu 22.04 VM with one RTX 6000 Blackwell attached (96 GB GDDR7 ECC, 5th-gen Tensor Cores), running:

The NVIDIA driver and Container Toolkit so Docker can see the GPU
vLLM in Docker, serving an open-weight Qwen3 model on localhost:8000
The OpenClaw Gateway in Docker, listening on port 18789
Persistent state on a separate Leafcloud block volume so rebuilds don’t eat your auth profiles or force you to re-download model weights
A reasonable security posture — no random WebSockets exposed to the internet

You’ll reach the gateway from your laptop over an SSH tunnel and pair OpenClaw clients with it from there. The model only ever listens on localhost. Optional TLS reverse proxy at the end if you want to share the gateway across a team.

1. Provision the GPU VM

In the Leafcloud dashboard, launch a new instance:

Setting	Value
Image	`Ubuntu 22.04 LTS`
Flavor	Blackwell Pro (1× RTX 6000 Blackwell, 96 GB VRAM, 32 vCPU, 256 GB RAM, 2 TB NVMe)
Root volume	100 GB minimum — Node, Docker layers, and the OS itself take up real space
Key pair	Your SSH key
Network	Default tenant network

Create a security group with two rules:

22/tcp — SSH, ideally restricted to a known IP range
18789/tcp — OpenClaw Gateway. Don’t open this to 0.0.0.0/0. Either keep it closed and tunnel in, or front it with TLS and auth (covered at the end).

Allocate a floating IP and SSH in.

Note: the Blackwell Pro flavor ships with 256 GB of system RAM, so the classic pnpm install --frozen-lockfile OOM-during-build problem essentially doesn’t happen here. If you do see exit code 137 on a smaller test machine, that’s the diagnosis.

2. Install the NVIDIA driver

sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential
sudo apt install -y nvidia-driver-550 nvidia-utils-550
sudo reboot

Reconnect and confirm the GPU is visible:

nvidia-smi

You should see the RTX 6000 Blackwell listed with driver version and CUDA runtime. If you don’t, check lsmod | grep nvidia and reboot again.

3. Install Docker and the NVIDIA Container Toolkit

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker

Then the Container Toolkit, which is what lets containers see the GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Smoke test:

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If you see the RTX 6000 Blackwell from inside the container, the plumbing works.

4. Attach a persistent block volume

Two things need to live somewhere persistent: OpenClaw’s data directory (/home/node/.openclaw/) and the model weights you’re about to download. Both go on a block volume so they survive container, image, and VM rebuilds.

Attach a 200 GB block volume in the Leafcloud dashboard. It’ll appear as /dev/vdb. Then on the VM:

sudo mkfs.ext4 /dev/vdb
sudo mkdir -p /var/openclaw-data
echo '/dev/vdb /var/openclaw-data ext4 defaults 0 2' | sudo tee -a /etc/fstab
sudo mount -a

sudo mkdir -p /var/openclaw-data/gateway /var/openclaw-data/models
sudo chown -R 1000:1000 /var/openclaw-data/gateway

The 1000:1000 ownership matches the node user inside the OpenClaw container. Models are cached under /var/openclaw-data/models, separate from the gateway state.

For longer-term snapshots — the kind of thing you’d want before a major version bump — Leafcloud’s S3-compatible object storage is the right target. Schedule a restic job against /var/openclaw-data/gateway (skip the models directory; it’s reproducible from HuggingFace) on whatever cadence makes sense.

5. Run vLLM with an open-weight Qwen3 model

The current consensus default for open-weight tool-calling agents is the Qwen3 family — Apache 2.0 licensed, strong function calling, well supported in vLLM. The sensible defaults:

Qwen/Qwen3-8B to get started — fast, fits with massive headroom, perfect for iteration.
Qwen/Qwen3-32B if you want significantly higher quality. The 96 GB GDDR7 on the RTX 6000 Blackwell leaves plenty of room for FP16 weights plus a generous KV cache, even at long context lengths.

Start with the 8B. Swapping models later is one container restart.

Launch vLLM in Docker:

docker run -d --restart unless-stopped \
  --name vllm \
  --gpus all \
  --ipc=host \
  -p 127.0.0.1:8000:8000 \
  -v /var/openclaw-data/models:/root/.cache/huggingface \
  vllm/vllm-openai:latest \
  --model Qwen/Qwen3-8B \
  --served-model-name qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --max-model-len 32768

A few things in that command worth flagging:

-p 127.0.0.1:8000:8000 binds vLLM to localhost only. The model is never reachable from outside the VM.
--ipc=host is required for vLLM’s shared-memory transport.
--enable-auto-tool-choice --tool-call-parser qwen3_coder is what makes Qwen3’s function calling work cleanly through OpenClaw’s tool interface.
The HuggingFace cache lives on the persistent volume — model weights survive container rebuilds, no re-downloading.

First boot downloads the weights (~16 GB for the 8B); tail the logs:

docker logs -f vllm

You’re waiting for Uvicorn running on http://0.0.0.0:8000 (it’s still bound to localhost via the port mapping, despite what the log says).

Verify:

curl http://localhost:8000/v1/models

You should see qwen3 in the response.

6. Install OpenClaw

Clone the repository:

git clone https://github.com/openclaw/openclaw.git
cd openclaw

Follow OpenClaw’s Docker VM runtime guide to build the image. The important rule from their docs, worth repeating because it’s the kind of thing that bites you a week later: all external binaries used by skills must be baked into the image at build time, never installed inside a running container. Anything apt install’d into a live container is gone the next time you rebuild.

Their example Dockerfile shows the pattern with gog, goplaces, and wacli. Follow the same template for any other CLIs your skills depend on.

In your docker-compose.yml, point the gateway’s host volume mount at /var/openclaw-data/gateway. Add the OpenAI-compatible endpoint and model name to the gateway’s environment so it talks to your local vLLM:

environment:
  OPENAI_BASE_URL: http://host.docker.internal:8000/v1
  OPENAI_API_KEY: not-used-but-required
  OPENAI_MODEL: qwen3
extra_hosts:
  - "host.docker.internal:host-gateway"

(Exact env var names vary by OpenClaw version — check openclaw.json or the official docs for the current spelling.)

Then:

docker compose build
docker compose up -d openclaw-gateway

Confirm the gateway is alive:

docker compose logs -f openclaw-gateway

Expected:

[gateway] listening on ws://0.0.0.0:18789

And that the binaries you baked in resolve:

docker compose exec openclaw-gateway which gog goplaces wacli

Each should return /usr/local/bin/<binary>.

7. Connect to the OpenClaw gateway

For a single-user setup — you, on a laptop, talking to your own gateway — leave port 18789 closed at the security group and reach the gateway over SSH:

ssh -L 18789:localhost:18789 ubuntu@<your-floating-ip>

Point your OpenClaw client at ws://localhost:18789 and you’re connected, with all traffic riding inside SSH and all inference happening on your RTX 6000 Blackwell.

For shared or production use, put a Caddy or Nginx reverse proxy in front with TLS (Let’s Encrypt handles certificates automatically) and at minimum basic auth. If you need HA, Leafcloud’s managed Octavia load balancers can terminate TLS and front multiple gateway instances on a private network.

8. Keeping the stack up to date

# OpenClaw
cd ~/openclaw
git pull
docker compose build
docker compose up -d

# vLLM
docker pull vllm/vllm-openai:latest
docker stop vllm && docker rm vllm
# Re-run the vLLM launch command from step 5

The persistent volume means OpenClaw’s auth profiles, agents, configs, and downloaded model weights all survive every rebuild.

To swap to the 32B model, stop the vLLM container and relaunch with --model Qwen/Qwen3-32B. OpenClaw doesn’t need to know — --served-model-name qwen3 stays the same.

If you add a new OpenClaw skill that depends on a new binary, update the Dockerfile, rebuild, restart. Don’t be tempted to apt install inside the running container — it’ll work for a week and then disappear during the next rebuild.

What you’ve actually deployed

A sovereign, GDPR-aligned AI agent gateway and the model it talks to, both running on a single 100% renewable-powered RTX 6000 Blackwell, with the waste heat going to displace gas heating in a real building somewhere in the Netherlands. Your auth profiles live on a persistent volume, your prompts never leave the VM, your secrets stay in Europe, and your monthly bill doesn’t fluctuate based on how aggressive your agent has been with its tool calls this week.

If you outgrow a single GPU — heavier throughput, larger models, parallel agents pushing concurrent inference — the next step is either the Blackwell Duo Pro (2× RTX 6000) or Blackwell Quad Pro (4× RTX 6000) for up to 384 GB of total VRAM, or Managed Kubernetes with vLLM behind a load balancer. The same OpenClaw deployment, the same qwen3 served model name, keep working.

Ready to run your AI agents on European GPUs?

Spin up an RTX 6000 Blackwell and run through this guide yourself. Sign up at leaf.cloud to get started, or book a call with our team if you want to talk through sizing, model selection, or wiring this up to a managed Kubernetes cluster. We answer.