* qwen model is mostly working
* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model
- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* smaller diff
* test dequant
* half split
* better
* simple tok
* mock token
* polish
* better
* fix
* replace
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* llm: add created/model fields, non-streaming support, and tests
- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* add openai to testing
* toml
* Remove 'openai' from dependencies
Removed 'openai' from the dependencies list.
* bump cache
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>