Skip to content

Model downloads fail

You picked a model in the HuggingFace browser, started a download, and it errors out or hangs.

Symptom: "download failed: 401 Unauthorized"

The model is gated. HuggingFace requires you to accept its terms first AND for Pluma to send an auth token.

Fix:

  1. Sign in at https://huggingface.co/.
  2. Visit the model's page, click Agree and access.
  3. Generate a token at https://huggingface.co/settings/tokens with read scope.
  4. Stick the token in your env before launching Pluma:

    export HF_TOKEN=hf_yourtokenhere
    ./pluma
    

    Or use a ~/.zshenv / ~/.bashrc entry so it persists.

Pluma forwards HF_TOKEN (or HUGGING_FACE_HUB_TOKEN) on every HF download request.

Symptom: "download failed: exceeds 100 GiB cap"

You hit max_model_download_bytes. Default is 100 GiB; most publicly-served quants are well under that, but the biggest MoE checkpoints exceed it.

Bump in config.toml:

max_model_download_bytes = 214748364800   # 200 GiB

Or disable the cap entirely:

max_model_download_bytes = -1

Same story for max_model_download_duration_seconds (default 6 hours). Restart Pluma after editing.

Symptom: download hangs at <100% forever

A few possible causes:

  1. The HF mirror is rate-limiting you. Cancel + retry; HF cycles requests across mirrors.
  2. Your disk filled up. Check with df -h ~/.config/pluma/models/.
  3. Network dropped mid-download. Pluma's downloader doesn't resume on connection drops. Cancel + retry; subsequent attempt starts fresh.
  4. The cap timeout fired silently. If the job is older than max_model_download_duration_seconds, Pluma marks it errored. Bump the cap if you're on a slow line.

Symptom: "no MLX files in this repo" / "no GGUF files in this repo"

The format toggle (GGUF / MLX) limits results to repos that actually ship that format. Some popular repos ship only one — e.g. mlx-community/* is MLX-only; bartowski/* is GGUF-only.

Flip the toggle in the model browser to switch format. If you really want a GGUF of a model that only ships as MLX (or vice versa), look for community re-quants on HF.

Symptom: model downloaded but Pluma "can't find" it

Pluma drops files into <datadir>/models/<repo>/<file>. Your LLM runtime needs to be pointed at that path:

Runtime Command
llama-server (llama.cpp) llama-server -m /path/to/<datadir>/models/<repo>/<file>.gguf
mlx_lm.server mlx_lm.server --model /path/to/<datadir>/models/<repo>

mlx_lm wants the repo DIRECTORY (it loads multiple files); llama-server wants a specific .gguf file. After starting the runtime, hit Auto-detect in Pluma's connection settings to pick up the new model.

Symptom: download starts then immediately errors

Often DNS / TLS / proxy issues. Quick checks:

curl -sI https://huggingface.co/

Should return 200 OK. If not, your environment can't reach HF — VPN, captive portal, corporate proxy. Pluma uses the system HTTP client, so any system-level HTTPS_PROXY env applies.

When all else fails

Manual download:

  1. Go to the model's HF page in a browser.
  2. Click Files and versions.
  3. Download the file(s) you want.
  4. Drop into <datadir>/models/<repo>/.
  5. Point your runtime at it.

Pluma's HF browser is a convenience; you're never locked into it.