George Hotz
3baaf298d6
two stage cumsum in tensor.py ( #2331 )
...
* two stage cumsum in tensor.py
* 2 more kernels for llama cumsum
* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
453f48ce02
pad None means (0,0) ( #2273 )
2023-11-11 09:50:26 -08:00
chenyu
a753c8e071
examples of new GPT2 and JIT change ( #2261 )
...
* var_vals are global
* working with global ish
* better
* fix export model
* fix tests
* better kv cache
* does it run?
* use where for kvmask
* fix excessive var_vals
* fix import
* how does multigpu use this?
* llama kinda work
* faster and simpler
* cleanup
* fix conversation mode
* test cleanups
* fix one more test
* test cleanup
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2023-11-10 15:07:02 -05:00
George Hotz
2f7aab3d13
move optimize_local_size ( #2221 )
...
* move optimize_local_size
* interpret_ast
2023-11-05 21:00:52 -08:00
nimlgen
8d41b3eb3f
beam=16 makes gpt2 gpu-time < 5ms on 3090 ( #2154 )
2023-10-27 10:21:27 -10:00
nimlgen
e21bf776c8
fix debug=1 llama/gpt2 timings ( #2143 )
2023-10-24 15:45:00 -04:00
chenyu
e2b83f1b42
Variable.bind newer ( #2017 )
...
* Variable.bind attempt 2
* ShapeTracker.unbind
* fix llama
* fix types
* test case
* View.vars cleanup
* include mask in symbolic source
* mask can be sint
* st.unbind in bufferops
* assert ast contain free Variable only
* cleanup
* conservative unbinding reduce op arg
* move reduceop unbind
* fix llama JIT arg behavior
2023-10-10 10:03:01 -07:00
chenyu
c99fa58dd2
simplify gpt2 example ( #1973 )
...
* simplify gpt2 example
* kernel_jitted_count and jit tests
* Revert "kernel_jitted_count and jit tests"
This reverts commit 31a3c26dd0 .
* all_jitted test in test_real_world
2023-10-05 07:09:29 -07:00
George Hotz
48c8d130ae
simpler GPT2 ( #1941 )
...
* don't realize in gpt2
* simpler gpt2
2023-09-29 04:41:09 -07:00
Gijs Koning
b8ff20ffe4
Gpt2 ( #1896 )
...
* small helps
* got something working
* faster?
* faster yes
* cleanup
* cleanup
* cleanup
* Fix non jit
* Fix fp16 and some cleanup
* Fix fp16 and some cleanup
* cleanup
* similar to master
* cleanup
2023-09-22 20:14:47 +08:00
nimlgen
4c31dfafb3
add seed to gpt-2 ( #1869 )
2023-09-15 17:34:14 -04:00
chenyu
ebcda8a714
Move var_vals from ShapeTracker to LazyBuffer ( #1819 )
2023-09-08 09:25:10 -07:00
chenyu
a2745819f6
faster gpt2 jit path and gpt2 in test_real_world ( #1738 )
2023-09-02 08:39:12 -07:00
George Hotz
cd7ceed914
gpt2: print total instead of sync time
2023-08-30 10:59:42 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
George Hotz
643cbdfd50
make embedding and GPT-2 fast ( #1631 )
...
* make embedding fast
* jit more, variable shape support
* print mem bw
2023-08-22 15:14:38 -07:00
George Hotz
718ced296c
move state to nn/state ( #1619 )
2023-08-22 07:36:24 -07:00
George Hotz
4f459841bc
Symbolic JIT for GPT2 ( #1613 )
...
* not fast yet
* simpler
* symbolic jit
* fp16 GOPS and GB
2023-08-21 19:44:57 -07:00
George Hotz
e3c6c0c6db
add GPT2 example ( #1511 ) ( #1514 )
...
* add gpt2 to examples
* some cleanup
* fixes
* argparse + scaled_dot_product_attention
* add timing
* add to benchmark
Co-authored-by: YassineYousfi <yassine.y10@gmail.com >
2023-08-10 09:09:47 -07:00