Commit Graph

57 Commits

Author SHA1 Message Date
Sieds Lykles
91ccf1c343 Off by one error in start_pos (#9792)
Variable upper bound is inclusive
2025-04-15 15:07:13 -04:00
George Hotz
4de084a835 cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952)
* cleanup ci [pr]

* testing_minimal

* add hypothesis to minimal

* fail tiktoken import okay

* add LLVM speed test

* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
chenyu
73ea913050 really not using numpy in gpt2 example (#7779) 2024-11-18 23:21:16 -05:00
chenyu
e6debda5c4 remove numpy from gpt2 and llama examples (#7778) 2024-11-18 22:48:17 -05:00
leopf
87877d7a91 GGUF cleanup (#7192)
* cleanup

* remove vocab size hard code
2024-10-21 10:44:54 -04:00
leopf
b6d9b276bb GGUF support (#7046)
* basic loader, untested

* testing

* remove utils import in test

* q8_0

* q4_1

* end to end testing

* minor cleanup

* fix casting

* moved to state

* move tests

* move dequant to fn

* fix lint elif

* remove gguf from extra

* fix dict union

* q6_k simpler

* naming and spacing

* gpt2-gguf example

* cleanup

* move gguf example

* minor cleanup

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-10-21 16:15:34 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
322c37e621 use helpers.JIT in llama and gpt2 examples (#5350)
* use helpers.JIT in llama and gpt2 examples

replaced getenv("JIT"), effectively made gpt2 default jit

* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
e356807696 tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
chenyu
92c0675ccf setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
chenyu
c71627fee6 move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
641f347232 simple LoadOps.ASSIGN (#3745)
* simple LoadOps.ASSIGN

* skip that test

* don't assign in onnx ops gemm

* track cache usage

* recreate the lazybuffer to avoid the cache

* fix contigs

* skip that test

* lol

* better letters
2024-03-14 20:44:34 -07:00
George Hotz
3527c5a9d2 add Tensor.replace (#3738)
* add Tensor.replace

* fix dtypes in that test

* should be replace

* and mixtral
2024-03-14 13:34:14 -07:00
chenyu
f96fc6e9d4 fix gpt2 with empty prompt take 2 (#3102)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b Revert "fix gpt2 with empty prompt" (#3101) 2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d fix gpt2 with empty prompt (#3100)
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
chenyu
f0d7ad8aaa fix gpt2 attention with start_pos = 0 (#3061)
* fix gpt2 attention with start_pos size 1

test cases taken from ll_transformer branch

* fix interpreted
2024-01-09 16:14:55 -05:00
chenyu
7c80b78be9 cleanup gpt2 build function (#3018) 2024-01-04 23:14:53 -05:00
chenyu
f88506e630 move gpt2/llama sampling inside the model call (#3013)
* move gpt2/llama sampling inside the model call

* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu
8524493748 minor gpt2 cleanup (#3012) 2024-01-04 13:53:18 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
61e255d197 use max for gpt2 and llama (#2949)
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu
857c35d256 make gpt2 decode output just once at the end (#2869)
also updated function name from greedy_until to generate, as it's not greedy nor until
2023-12-20 12:14:55 -05:00
chenyu
c0f76ed4ea transformer kvcache and mask have same dtype as input (#2771)
* transformer kvcache and mask have same dtype as input

* don't use `=0` in cstyle ternary where

* (bool)

* where float16 test
2023-12-14 22:41:51 -05:00
chenyu
371005cb2d use one kvcache tensor in gpt2 instead of two separate caches (#2662)
* use one kvcache tensor in gpt2

* test case

* is None

* better test cases
2023-12-06 20:59:17 -05:00
chenyu
0978c24b8e fast gpt2 embedding with variable bs=1 (#2596) 2023-12-05 23:01:17 -05:00
chenyu
229ada5fe5 Gpt2 benchmark with HALF and BEAM (#2636)
* benchmark gpt2 with half and beam

* BEAM=4

* optional validation

* green is good

* we care
2023-12-05 22:15:16 -05:00
chenyu
a63f48d3db gpt2 half for kvcache and output logits (#2630)
* gpt2 more half

* hlaf is fine after softmax
2023-12-05 16:54:56 -05:00
George Hotz
8c67eb1c92 GPT bugfixes (#2624)
* simple fixes

* fix exp2

* fixed

* parallel beam for CUDA

* fix image dtypes
2023-12-05 11:42:28 -08:00
chenyu
a739c6646e fp16 in gpt2 attention (#2491)
* fp16 in gpt2 attention

* HALF
2023-11-28 19:27:03 -05:00
chenyu
7f9a4c1285 fp16 and noshow flags for gpt2 (#2470) 2023-11-27 16:23:03 -05:00
George Hotz
9e07824542 move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
7170a9a057 coder.py can write and run code (#2439)
* wip mistral

* coder

* touchups

* cleanups

* mistral cleanups

* clean up cache create

* download the weights, fix tests

* fix llama loading

* global fixup

* clean up all

* move llama model

* cleanups

* Revert "cleanups"

This reverts commit a71c5d59eb.

* fine, leave it
2023-11-25 12:27:54 -08:00
George Hotz
96c12fdeab multibatch gpt2 (#2432)
* support multibatch gpt-2

* multi output

* no default JIT in CI
2023-11-24 18:10:10 -08:00
George Hotz
095e2ced61 add name support to fetch (#2407)
* add name support

* use fetch in gpt2

* remove requests from main lib, networkx also optional

* umm, keep that assert

* updates to fetch

* i love the walrus so much

* stop bundling mnist with tinygrad

* err, https

* download cache names

* add DOWNLOAD_CACHE_VERSION

* need env.

* ugh, wrong path

* replace get_child
2023-11-23 14:16:17 -08:00
George Hotz
3baaf298d6 two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
453f48ce02 pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
nimlgen
8d41b3eb3f beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154) 2023-10-27 10:21:27 -10:00
nimlgen
e21bf776c8 fix debug=1 llama/gpt2 timings (#2143) 2023-10-24 15:45:00 -04:00
chenyu
e2b83f1b42 Variable.bind newer (#2017)
* Variable.bind attempt 2

* ShapeTracker.unbind

* fix llama

* fix types

* test case

* View.vars cleanup

* include mask in symbolic source

* mask can be sint

* st.unbind in bufferops

* assert ast contain free Variable only

* cleanup

* conservative unbinding reduce op arg

* move reduceop unbind

* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
chenyu
c99fa58dd2 simplify gpt2 example (#1973)
* simplify gpt2 example

* kernel_jitted_count and jit tests

* Revert "kernel_jitted_count and jit tests"

This reverts commit 31a3c26dd0.

* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
48c8d130ae simpler GPT2 (#1941)
* don't realize in gpt2

* simpler gpt2
2023-09-29 04:41:09 -07:00
Gijs Koning
b8ff20ffe4 Gpt2 (#1896)
* small helps

* got something working

* faster?

* faster yes

* cleanup

* cleanup

* cleanup

* Fix non jit

* Fix fp16 and some cleanup

* Fix fp16 and some cleanup

* cleanup

* similar to master

* cleanup
2023-09-22 20:14:47 +08:00
nimlgen
4c31dfafb3 add seed to gpt-2 (#1869) 2023-09-15 17:34:14 -04:00
chenyu
ebcda8a714 Move var_vals from ShapeTracker to LazyBuffer (#1819) 2023-09-08 09:25:10 -07:00