use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
```
PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark
```
on M4 Max, 40 sec to load the model and
```
enqueue in 165.15 ms
total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s
enqueue in 5.31 ms
total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s
enqueue in 5.32 ms
total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s
enqueue in 5.69 ms
total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s
enqueue in 5.41 ms
total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s
enqueue in 5.18 ms
total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s
enqueue in 5.43 ms
total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s
enqueue in 5.27 ms
total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s
```
Calling
python3 examples/yolov8.py ./test/models/efficientnet/Chicken.jpg
used to result in this error
ValueError: Calling nonzero on 0d arrays is not allowed.
Using np.atleast_1d makes sure we avoid a zero-dimension array.
Co-authored-by: gonutz <gonutz@fake.mail>
* basic loader, untested
* testing
* remove utils import in test
* q8_0
* q4_1
* end to end testing
* minor cleanup
* fix casting
* moved to state
* move tests
* move dequant to fn
* fix lint elif
* remove gguf from extra
* fix dict union
* q6_k simpler
* naming and spacing
* gpt2-gguf example
* cleanup
* move gguf example
* minor cleanup
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* metaops srcs
* delete multioutput ctx var
* always has metadata
* shorter path for realized
* this still needs inputs
This reverts commit a59cbb2886.
* add write support
* add test
* update test case to compare write outputs
* assert final write output
* flush when using write
* update write logic
* Revert "update write logic"
This reverts commit 5e0e611b46.
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630.
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics