tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
Sieds Lykles	91ccf1c343	Off by one error in start_pos (#9792 ) Variable upper bound is inclusive	2025-04-15 15:07:13 -04:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
chenyu	73ea913050	really not using numpy in gpt2 example (#7779 )	2024-11-18 23:21:16 -05:00
chenyu	e6debda5c4	remove numpy from gpt2 and llama examples (#7778 )	2024-11-18 22:48:17 -05:00
leopf	87877d7a91	GGUF cleanup (#7192 ) * cleanup * remove vocab size hard code	2024-10-21 10:44:54 -04:00
leopf	b6d9b276bb	GGUF support (#7046 ) * basic loader, untested * testing * remove utils import in test * q8_0 * q4_1 * end to end testing * minor cleanup * fix casting * moved to state * move tests * move dequant to fn * fix lint elif * remove gguf from extra * fix dict union * q6_k simpler * naming and spacing * gpt2-gguf example * cleanup * move gguf example * minor cleanup --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-21 16:15:34 +08:00
George Hotz	f4ec39fe58	switch symbolic from old to uops, final PR (#6872 ) * switch symbolic from old to uops, final PR * two wrong answers * not needed resolves * symbolic ops passes * symbolic ops passes * progress * tests pass (almost) * fix last test * fix some tests * global binding and unbinding * Revert "global binding and unbinding" This reverts commit `9456725630`. * that test works now * vars on uop doesn't recurse * fix fuzzer * update * fix type * fix gpt, it's UOp now * ssimplify symbolics	2024-10-04 16:42:27 +08:00
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	92c0675ccf	setitem initial support (#4093 ) * wip setitem it's an eager assign to output shapetracker view * cleanups and tests * more cleanups	2024-04-07 20:35:22 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
George Hotz	641f347232	simple LoadOps.ASSIGN (#3745 ) * simple LoadOps.ASSIGN * skip that test * don't assign in onnx ops gemm * track cache usage * recreate the lazybuffer to avoid the cache * fix contigs * skip that test * lol * better letters	2024-03-14 20:44:34 -07:00
George Hotz	3527c5a9d2	add Tensor.replace (#3738 ) * add Tensor.replace * fix dtypes in that test * should be replace * and mixtral	2024-03-14 13:34:14 -07:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
chenyu	f0d7ad8aaa	fix gpt2 attention with start_pos = 0 (#3061 ) * fix gpt2 attention with start_pos size 1 test cases taken from ll_transformer branch * fix interpreted	2024-01-09 16:14:55 -05:00
chenyu	7c80b78be9	cleanup gpt2 build function (#3018 )	2024-01-04 23:14:53 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
chenyu	8524493748	minor gpt2 cleanup (#3012 )	2024-01-04 13:53:18 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
chenyu	857c35d256	make gpt2 decode output just once at the end (#2869 ) also updated function name from greedy_until to generate, as it's not greedy nor until	2023-12-20 12:14:55 -05:00
chenyu	c0f76ed4ea	transformer kvcache and mask have same dtype as input (#2771 ) * transformer kvcache and mask have same dtype as input * don't use `=0` in cstyle ternary where * (bool) * where float16 test	2023-12-14 22:41:51 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
chenyu	0978c24b8e	fast gpt2 embedding with variable bs=1 (#2596 )	2023-12-05 23:01:17 -05:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
chenyu	a63f48d3db	gpt2 half for kvcache and output logits (#2630 ) * gpt2 more half * hlaf is fine after softmax	2023-12-05 16:54:56 -05:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
chenyu	a739c6646e	fp16 in gpt2 attention (#2491 ) * fp16 in gpt2 attention * HALF	2023-11-28 19:27:03 -05:00
chenyu	7f9a4c1285	fp16 and noshow flags for gpt2 (#2470 )	2023-11-27 16:23:03 -05:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit `a71c5d59eb`. * fine, leave it	2023-11-25 12:27:54 -08:00
George Hotz	96c12fdeab	multibatch gpt2 (#2432 ) * support multibatch gpt-2 * multi output * no default JIT in CI	2023-11-24 18:10:10 -08:00
George Hotz	095e2ced61	add name support to fetch (#2407 ) * add name support * use fetch in gpt2 * remove requests from main lib, networkx also optional * umm, keep that assert * updates to fetch * i love the walrus so much * stop bundling mnist with tinygrad * err, https * download cache names * add DOWNLOAD_CACHE_VERSION * need env. * ugh, wrong path * replace get_child	2023-11-23 14:16:17 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
chenyu	453f48ce02	pad None means (0,0) (#2273 )	2023-11-11 09:50:26 -08:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
nimlgen	8d41b3eb3f	beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154 )	2023-10-27 10:21:27 -10:00
nimlgen	e21bf776c8	fix debug=1 llama/gpt2 timings (#2143 )	2023-10-24 15:45:00 -04:00
chenyu	e2b83f1b42	Variable.bind newer (#2017 ) * Variable.bind attempt 2 * ShapeTracker.unbind * fix llama * fix types * test case * View.vars cleanup * include mask in symbolic source * mask can be sint * st.unbind in bufferops * assert ast contain free Variable only * cleanup * conservative unbinding reduce op arg * move reduceop unbind * fix llama JIT arg behavior	2023-10-10 10:03:01 -07:00
chenyu	c99fa58dd2	simplify gpt2 example (#1973 ) * simplify gpt2 example * kernel_jitted_count and jit tests * Revert "kernel_jitted_count and jit tests" This reverts commit `31a3c26dd0`. * all_jitted test in test_real_world	2023-10-05 07:09:29 -07:00
George Hotz	48c8d130ae	simpler GPT2 (#1941 ) * don't realize in gpt2 * simpler gpt2	2023-09-29 04:41:09 -07:00
Gijs Koning	b8ff20ffe4	Gpt2 (#1896 ) * small helps * got something working * faster? * faster yes * cleanup * cleanup * cleanup * Fix non jit * Fix fp16 and some cleanup * Fix fp16 and some cleanup * cleanup * similar to master * cleanup	2023-09-22 20:14:47 +08:00
nimlgen	4c31dfafb3	add seed to gpt-2 (#1869 )	2023-09-15 17:34:14 -04:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00

1 2

57 Commits