qwen model is working (#13690)

* qwen model is mostly working * add Q4_K quantization support to GGUF parser, add qwen3:1.7b model - Add Q4_K (type 12) dequantization in nn/state.py - Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0) - Make bos_token_id optional for models like Qwen3 that don't have it - Fix line length issues and add preset parameter to SimpleTokenizer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * smaller diff * test dequant * half split * better * simple tok * mock token * polish * better * fix * replace --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 15:08:02 -05:00 · 2025-12-15 18:00:34 -04:00
parent d43e4c7553
commit 321ab943b2
2 changed files with 43 additions and 24 deletions
--- a/test/unit/test_llm_server.py
+++ b/test/unit/test_llm_server.py
@@ -10,6 +10,7 @@ class TestLLMServer(unittest.TestCase):
    cls.mock_tok.role = Mock(return_value=[100, 101])
    cls.mock_tok.encode = Mock(return_value=[200, 201, 202])
    cls.mock_tok.decode = Mock(return_value="Hello")
+    cls.mock_tok.end_turn = Mock(return_value=[998])

    cls.mock_model = Mock()
    cls.mock_model.generate = Mock(side_effect=lambda ids, **kwargs: iter([300, 301, 999]))