Configuration recipes¶

Copy-paste config.toml snippets for common setups. Every key documented in Reference → config.toml.

Default localhost¶

What a fresh install ships with. Nothing exposed beyond the loopback browser session; passkeys gated on but loopback-bypassed.

listen = ""                          # → :8787 (all interfaces, but allowlist below stays empty so LAN can't reach)
allowed_hosts = []
require_auth = true
loopback_auth_bypass = true
encrypt_at_rest = true
open_browser = true

If you want loopback-only (no LAN exposure at all):

listen = "127.0.0.1:8787"

Phone-on-the-same-WiFi¶

Same network, no tailnet. Allow your phone's subnet, keep auth on so a random device on the LAN can't reach.

listen = ":8787"
allowed_hosts = [
  "192.168.1.0/24",                  # your LAN; replace with the right CIDR
  "fe80::/10",                       # link-local IPv6 (phones often hit you over this)
]
require_auth = true
loopback_auth_bypass = true

Find the host's IP with ipconfig getifaddr en0 (macOS) or ip addr (Linux). From the phone, hit http://<host-ip>:8787 — the Pair prompt will show; enrol a passkey, then bookmark the URL.

Tailnet-served (recommended for multi-device)¶

The smoothest cross-device path. HTTPS, magic-cert, no port forwarding.

listen = ":8787"                     # still serves on LAN/loopback for the host
tsnet_enabled = true
tsnet_hostname = "pluma"             # → https://pluma.<tailnet>.ts.net/
require_auth = true
loopback_auth_bypass = true

First boot pops a Tailscale sign-in URL. See Multi-device access (Tailscale). Tailnet listener bypasses allowed_hosts by design — tailnet membership is itself the gate.

Behind a reverse proxy (Caddy / nginx / Cloudflared)¶

Tell Pluma which CIDR the proxy lives in so the host allowlist sees the real client IP from X-Forwarded-For, not the proxy's loopback.

listen = "127.0.0.1:8787"            # only the proxy can reach Pluma directly
trusted_proxies = ["127.0.0.1"]      # Caddy/nginx on the same box
allowed_hosts = [
  "10.0.0.0/24",                     # tailnet or your VPN subnet
  "*.example.com",                   # if Caddy fronts a public hostname
]
require_auth = true

Pluma walks XFF right-to-left; the first untrusted hop is the real client. Setting trusted_proxies is the only way XFF gets honoured — without it Pluma uses RemoteAddr (the proxy itself), and allowed_hosts would have to include the proxy CIDR for anyone to reach the API.

Hardened (kiosk / shared workstation)¶

Lock the host machine down: every browser session has to auth, including loopback.

listen = "127.0.0.1:8787"
require_auth = true
loopback_auth_bypass = false         # the host user pairs their own browser too
encrypt_at_rest = true
log_requests = false                 # silence the access log
open_browser = false                 # service supervisor opens the page

Pair a passkey from each browser profile that should have access. Revoke from Settings → Privacy → Passkeys if a device leaves.

Custom listen port¶

listen = "0.0.0.0:9090"              # all interfaces, port 9090

Or via CLI / env without editing config:

./pluma -addr 0.0.0.0:9090
PLUMA_ADDR=:9090 ./pluma

External Tavern card library¶

You already have a SillyTavern install and want Pluma to see those characters without copying.

card_dirs = [
  "/Users/you/SillyTavern/data/default-user/characters",
  "/srv/shared-characters",          # NAS, read-only mount, whatever
]

External cards show with an ext badge and can't be edited through the UI. Copy one into <datadir>/characters/ to make it editable.

Kokoro-FastAPI TTS¶

The default; no changes needed if Kokoro-FastAPI is on :8880.

tts_base_url = "http://127.0.0.1:8880/v1"
tts_model = "kokoro"
tts_voice = "af_nicole"              # one of Kokoro's 49 voices; empty = server default

OpenAI hosted TTS¶

tts_base_url = "https://api.openai.com/v1"
tts_model = "tts-1-hd"
tts_voice = "nova"                   # alloy / echo / fable / onyx / nova / shimmer

The active LLM connection's API key gets reused for TTS when the host matches OpenAI's.

Bigger model-download caps (or no cap)¶

Default is 100 GiB per file / 6 hours per job. The biggest publicly-served quants are mid-50 GiB.

max_model_download_bytes = 214748364800        # 200 GiB
max_model_download_duration_seconds = 43200    # 12 hours

# Or disable entirely (escape hatch for "I really do want a 400 GiB MoE"):
# max_model_download_bytes = -1
# max_model_download_duration_seconds = -1

Re-running the first-run wizard¶

Flip the completion flag back and reload:

setup_completed = false

Settings UI flips it true when the wizard finishes; manual toggle is the way to see it again.

Service / launchd / systemd¶

When Pluma runs as a service, you don't want the browser popping at startup.

open_browser = false

Or set PLUMA_OPEN=0 in the service env. Both work; config is the persistent home.