chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
92c0675ccf
setitem initial support ( #4093 )
...
* wip setitem
it's an eager assign to output shapetracker view
* cleanups and tests
* more cleanups
2024-04-07 20:35:22 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
641f347232
simple LoadOps.ASSIGN ( #3745 )
...
* simple LoadOps.ASSIGN
* skip that test
* don't assign in onnx ops gemm
* track cache usage
* recreate the lazybuffer to avoid the cache
* fix contigs
* skip that test
* lol
* better letters
2024-03-14 20:44:34 -07:00
George Hotz
3527c5a9d2
add Tensor.replace ( #3738 )
...
* add Tensor.replace
* fix dtypes in that test
* should be replace
* and mixtral
2024-03-14 13:34:14 -07:00
chenyu
f96fc6e9d4
fix gpt2 with empty prompt take 2 ( #3102 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b
Revert "fix gpt2 with empty prompt" ( #3101 )
2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d
fix gpt2 with empty prompt ( #3100 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
chenyu
f0d7ad8aaa
fix gpt2 attention with start_pos = 0 ( #3061 )
...
* fix gpt2 attention with start_pos size 1
test cases taken from ll_transformer branch
* fix interpreted
2024-01-09 16:14:55 -05:00
chenyu
7c80b78be9
cleanup gpt2 build function ( #3018 )
2024-01-04 23:14:53 -05:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu
8524493748
minor gpt2 cleanup ( #3012 )
2024-01-04 13:53:18 -05:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d
move globalcounters to ops ( #2960 )
...
* move globalcounters to ops
* missed a few
* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
61e255d197
use max for gpt2 and llama ( #2949 )
...
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
George Hotz
1765849937
new lazy, benchmark ( #2878 )
...
* lazy rewrite, try 2
* min fix tests
* pass contig test
* put broken pads back
* move that to realize
* no contig child fixes array packing
* so wrong
* now that's correct
* base children
* fix bind issues
* disable to_image_idx
* fix tests
* that failure shouldn't break other tests
* more fixes
* fix torch
* skip failing tests in CI
* 1e-7
* half is broken
* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu
857c35d256
make gpt2 decode output just once at the end ( #2869 )
...
also updated function name from greedy_until to generate, as it's not greedy nor until
2023-12-20 12:14:55 -05:00
chenyu
c0f76ed4ea
transformer kvcache and mask have same dtype as input ( #2771 )
...
* transformer kvcache and mask have same dtype as input
* don't use `=0` in cstyle ternary where
* (bool)
* where float16 test
2023-12-14 22:41:51 -05:00
chenyu
371005cb2d
use one kvcache tensor in gpt2 instead of two separate caches ( #2662 )
...
* use one kvcache tensor in gpt2
* test case
* is None
* better test cases
2023-12-06 20:59:17 -05:00
chenyu
0978c24b8e
fast gpt2 embedding with variable bs=1 ( #2596 )
2023-12-05 23:01:17 -05:00
chenyu
229ada5fe5
Gpt2 benchmark with HALF and BEAM ( #2636 )
...
* benchmark gpt2 with half and beam
* BEAM=4
* optional validation
* green is good
* we care
2023-12-05 22:15:16 -05:00
chenyu
a63f48d3db
gpt2 half for kvcache and output logits ( #2630 )
...
* gpt2 more half
* hlaf is fine after softmax
2023-12-05 16:54:56 -05:00
George Hotz
8c67eb1c92
GPT bugfixes ( #2624 )
...
* simple fixes
* fix exp2
* fixed
* parallel beam for CUDA
* fix image dtypes
2023-12-05 11:42:28 -08:00
chenyu
a739c6646e
fp16 in gpt2 attention ( #2491 )
...
* fp16 in gpt2 attention
* HALF
2023-11-28 19:27:03 -05:00
chenyu
7f9a4c1285
fp16 and noshow flags for gpt2 ( #2470 )
2023-11-27 16:23:03 -05:00
George Hotz
9e07824542
move device to device.py ( #2466 )
...
* move device to device.py
* pylint test --disable R,C,W,E --enable E0611
* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
7170a9a057
coder.py can write and run code ( #2439 )
...
* wip mistral
* coder
* touchups
* cleanups
* mistral cleanups
* clean up cache create
* download the weights, fix tests
* fix llama loading
* global fixup
* clean up all
* move llama model
* cleanups
* Revert "cleanups"
This reverts commit a71c5d59eb .
* fine, leave it
2023-11-25 12:27:54 -08:00
George Hotz
96c12fdeab
multibatch gpt2 ( #2432 )
...
* support multibatch gpt-2
* multi output
* no default JIT in CI
2023-11-24 18:10:10 -08:00
George Hotz
095e2ced61
add name support to fetch ( #2407 )
...
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child
2023-11-23 14:16:17 -08:00
George Hotz
3baaf298d6
two stage cumsum in tensor.py ( #2331 )
...
* two stage cumsum in tensor.py
* 2 more kernels for llama cumsum
* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
453f48ce02
pad None means (0,0) ( #2273 )
2023-11-11 09:50:26 -08:00
chenyu
a753c8e071
examples of new GPT2 and JIT change ( #2261 )
...
* var_vals are global
* working with global ish
* better
* fix export model
* fix tests
* better kv cache
* does it run?
* use where for kvmask
* fix excessive var_vals
* fix import
* how does multigpu use this?
* llama kinda work
* faster and simpler
* cleanup
* fix conversation mode
* test cleanups
* fix one more test
* test cleanup
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2023-11-10 15:07:02 -05:00
George Hotz
2f7aab3d13
move optimize_local_size ( #2221 )
...
* move optimize_local_size
* interpret_ast
2023-11-05 21:00:52 -08:00
nimlgen
8d41b3eb3f
beam=16 makes gpt2 gpu-time < 5ms on 3090 ( #2154 )
2023-10-27 10:21:27 -10:00
nimlgen
e21bf776c8
fix debug=1 llama/gpt2 timings ( #2143 )
2023-10-24 15:45:00 -04:00
chenyu
e2b83f1b42
Variable.bind newer ( #2017 )
...
* Variable.bind attempt 2
* ShapeTracker.unbind
* fix llama
* fix types
* test case
* View.vars cleanup
* include mask in symbolic source
* mask can be sint
* st.unbind in bufferops
* assert ast contain free Variable only
* cleanup
* conservative unbinding reduce op arg
* move reduceop unbind
* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
chenyu
c99fa58dd2
simplify gpt2 example ( #1973 )
...
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd0 .
* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
48c8d130ae
simpler GPT2 ( #1941 )
...
* don't realize in gpt2
* simpler gpt2
2023-09-29 04:41:09 -07:00
Gijs Koning
b8ff20ffe4
Gpt2 ( #1896 )
...
* small helps
* got something working
* faster?
* faster yes
* cleanup
* cleanup
* cleanup
* Fix non jit
* Fix fp16 and some cleanup
* Fix fp16 and some cleanup
* cleanup
* similar to master
* cleanup
2023-09-22 20:14:47 +08:00
nimlgen
4c31dfafb3
add seed to gpt-2 ( #1869 )
2023-09-15 17:34:14 -04:00
chenyu
ebcda8a714
Move var_vals from ShapeTracker to LazyBuffer ( #1819 )
2023-09-08 09:25:10 -07:00
chenyu
a2745819f6
faster gpt2 jit path and gpt2 in test_real_world ( #1738 )
2023-09-02 08:39:12 -07:00
George Hotz
cd7ceed914
gpt2: print total instead of sync time
2023-08-30 10:59:42 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
643cbdfd50
make embedding and GPT-2 fast ( #1631 )
...
* make embedding fast
* jit more, variable shape support
* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
718ced296c
move state to nn/state ( #1619 )
2023-08-22 07:36:24 -07:00
George Hotz
4f459841bc
Symbolic JIT for GPT2 ( #1613 )
...
* not fast yet
* simpler
* symbolic jit
* fp16 GOPS and GB
2023-08-21 19:44:57 -07:00
George Hotz
e3c6c0c6db
add GPT2 example ( #1511 ) ( #1514 )
...
* add gpt2 to examples
* some cleanup
* fixes
* argparse + scaled_dot_product_attention
* add timing
* add to benchmark
Co-authored-by: YassineYousfi <yassine.y10@gmail.com >
2023-08-10 09:09:47 -07:00