Commit Graph

2 Commits

Author SHA1 Message Date
George Hotz
321ab943b2 qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
George Hotz
316da9f7ff llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00