Local CPU split-reasoning chat

Running a local safetensors model on CPU from Qwen/Qwen3-0.6B. No GGUF and no external inference provider.

The first request downloads the model, so the cold start is slower.

Preset prompt