qwen model is working (#13690)

* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
George Hotz
2025-12-15 18:00:34 -04:00
committed by GitHub
parent d43e4c7553
commit 321ab943b2
2 changed files with 43 additions and 24 deletions

View File

@@ -10,6 +10,7 @@ class TestLLMServer(unittest.TestCase):
cls.mock_tok.role = Mock(return_value=[100, 101])
cls.mock_tok.encode = Mock(return_value=[200, 201, 202])
cls.mock_tok.decode = Mock(return_value="Hello")
cls.mock_tok.end_turn = Mock(return_value=[998])
cls.mock_model = Mock()
cls.mock_model.generate = Mock(side_effect=lambda ids, **kwargs: iter([300, 301, 999]))