Running a local safetensors model on CPU from Qwen/Qwen3-0.6B. No GGUF and no external inference provider.
Qwen/Qwen3-0.6B
The first request downloads the model, so the cold start is slower.