chenyu
f34f26bca0
fix gpt2 with benchmark ( #12736 )
...
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
wozeparrot
2a0caa09c2
push copy to disk ( #12348 )
2025-09-29 21:55:05 -07:00
chenyu
3a480b858f
use more getitem in gpt2 ( #12343 )
2025-09-29 23:08:03 -04:00
George Hotz
baf3b60cfb
fix gpt2 on rangeify ( #12335 )
2025-09-29 19:16:44 +08:00
George Hotz
b899392f30
fix llm app with rangeify ( #12334 )
...
* fix llm app with rangeify
* add gpt2 contiguous also
2025-09-29 18:42:44 +08:00
chenyu
0599e86186
replace hardcoded GPU in llama debug msg ( #12102 )
2025-09-10 13:56:40 -04:00
Sieds Lykles
5b73076e48
assert benchmark times ( #12042 )
...
* assert jitted times in openpilot
* better error
* better error
* add ASSERT_MIN_STEP_TIME to more models
* t is step_times
* update benchmark times
* update times
2025-09-09 23:40:02 +02:00
Sieds Lykles
2f605eadf7
fix oob ( #10666 )
2025-06-07 11:32:03 -04:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] ( #10556 )
2025-05-28 22:20:02 -07:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb ( #10185 )
2025-05-15 16:14:56 -07:00
Sieds Lykles
91ccf1c343
Off by one error in start_pos ( #9792 )
...
Variable upper bound is inclusive
2025-04-15 15:07:13 -04:00
George Hotz
4de084a835
cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] ( #8952 )
...
* cleanup ci [pr]
* testing_minimal
* add hypothesis to minimal
* fail tiktoken import okay
* add LLVM speed test
* llvm speed w/o beam
2025-02-07 19:01:59 +08:00
chenyu
73ea913050
really not using numpy in gpt2 example ( #7779 )
2024-11-18 23:21:16 -05:00
chenyu
e6debda5c4
remove numpy from gpt2 and llama examples ( #7778 )
2024-11-18 22:48:17 -05:00
leopf
87877d7a91
GGUF cleanup ( #7192 )
...
* cleanup
* remove vocab size hard code
2024-10-21 10:44:54 -04:00
leopf
b6d9b276bb
GGUF support ( #7046 )
...
* basic loader, untested
* testing
* remove utils import in test
* q8_0
* q4_1
* end to end testing
* minor cleanup
* fix casting
* moved to state
* move tests
* move dequant to fn
* fix lint elif
* remove gguf from extra
* fix dict union
* q6_k simpler
* naming and spacing
* gpt2-gguf example
* cleanup
* move gguf example
* minor cleanup
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-10-21 16:15:34 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630 .
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
chenyu
92c0675ccf
setitem initial support ( #4093 )
...
* wip setitem
it's an eager assign to output shapetracker view
* cleanups and tests
* more cleanups
2024-04-07 20:35:22 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
641f347232
simple LoadOps.ASSIGN ( #3745 )
...
* simple LoadOps.ASSIGN
* skip that test
* don't assign in onnx ops gemm
* track cache usage
* recreate the lazybuffer to avoid the cache
* fix contigs
* skip that test
* lol
* better letters
2024-03-14 20:44:34 -07:00
George Hotz
3527c5a9d2
add Tensor.replace ( #3738 )
...
* add Tensor.replace
* fix dtypes in that test
* should be replace
* and mixtral
2024-03-14 13:34:14 -07:00
chenyu
f96fc6e9d4
fix gpt2 with empty prompt take 2 ( #3102 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:46:36 -05:00
chenyu
ca46d3541b
Revert "fix gpt2 with empty prompt" ( #3101 )
2024-01-12 14:27:41 -05:00
chenyu
1d7f01bc6d
fix gpt2 with empty prompt ( #3100 )
...
logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes
2024-01-12 14:18:17 -05:00
chenyu
f0d7ad8aaa
fix gpt2 attention with start_pos = 0 ( #3061 )
...
* fix gpt2 attention with start_pos size 1
test cases taken from ll_transformer branch
* fix interpreted
2024-01-09 16:14:55 -05:00
chenyu
7c80b78be9
cleanup gpt2 build function ( #3018 )
2024-01-04 23:14:53 -05:00
chenyu
f88506e630
move gpt2/llama sampling inside the model call ( #3013 )
...
* move gpt2/llama sampling inside the model call
* argmax uses one more kernel
2024-01-04 17:01:50 -05:00
chenyu
8524493748
minor gpt2 cleanup ( #3012 )
2024-01-04 13:53:18 -05:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d
move globalcounters to ops ( #2960 )
...
* move globalcounters to ops
* missed a few
* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
61e255d197
use max for gpt2 and llama ( #2949 )
...
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
George Hotz
1765849937
new lazy, benchmark ( #2878 )
...
* lazy rewrite, try 2
* min fix tests
* pass contig test
* put broken pads back
* move that to realize
* no contig child fixes array packing
* so wrong
* now that's correct
* base children
* fix bind issues
* disable to_image_idx
* fix tests
* that failure shouldn't break other tests
* more fixes
* fix torch
* skip failing tests in CI
* 1e-7
* half is broken
* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
chenyu
857c35d256
make gpt2 decode output just once at the end ( #2869 )
...
also updated function name from greedy_until to generate, as it's not greedy nor until
2023-12-20 12:14:55 -05:00
chenyu
c0f76ed4ea
transformer kvcache and mask have same dtype as input ( #2771 )
...
* transformer kvcache and mask have same dtype as input
* don't use `=0` in cstyle ternary where
* (bool)
* where float16 test
2023-12-14 22:41:51 -05:00
chenyu
371005cb2d
use one kvcache tensor in gpt2 instead of two separate caches ( #2662 )
...
* use one kvcache tensor in gpt2
* test case
* is None
* better test cases
2023-12-06 20:59:17 -05:00
chenyu
0978c24b8e
fast gpt2 embedding with variable bs=1 ( #2596 )
2023-12-05 23:01:17 -05:00
chenyu
229ada5fe5
Gpt2 benchmark with HALF and BEAM ( #2636 )
...
* benchmark gpt2 with half and beam
* BEAM=4
* optional validation
* green is good
* we care
2023-12-05 22:15:16 -05:00
chenyu
a63f48d3db
gpt2 half for kvcache and output logits ( #2630 )
...
* gpt2 more half
* hlaf is fine after softmax
2023-12-05 16:54:56 -05:00
George Hotz
8c67eb1c92
GPT bugfixes ( #2624 )
...
* simple fixes
* fix exp2
* fixed
* parallel beam for CUDA
* fix image dtypes
2023-12-05 11:42:28 -08:00
chenyu
a739c6646e
fp16 in gpt2 attention ( #2491 )
...
* fp16 in gpt2 attention
* HALF
2023-11-28 19:27:03 -05:00
chenyu
7f9a4c1285
fp16 and noshow flags for gpt2 ( #2470 )
2023-11-27 16:23:03 -05:00
George Hotz
9e07824542
move device to device.py ( #2466 )
...
* move device to device.py
* pylint test --disable R,C,W,E --enable E0611
* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
7170a9a057
coder.py can write and run code ( #2439 )
...
* wip mistral
* coder
* touchups
* cleanups
* mistral cleanups
* clean up cache create
* download the weights, fix tests
* fix llama loading
* global fixup
* clean up all
* move llama model
* cleanups
* Revert "cleanups"
This reverts commit a71c5d59eb .
* fine, leave it
2023-11-25 12:27:54 -08:00
George Hotz
96c12fdeab
multibatch gpt2 ( #2432 )
...
* support multibatch gpt-2
* multi output
* no default JIT in CI
2023-11-24 18:10:10 -08:00
George Hotz
095e2ced61
add name support to fetch ( #2407 )
...
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child
2023-11-23 14:16:17 -08:00
George Hotz
3baaf298d6
two stage cumsum in tensor.py ( #2331 )
...
* two stage cumsum in tensor.py
* 2 more kernels for llama cumsum
* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00