tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 15:28:10 -05:00

Author	SHA1	Message	Date
chenyu	322c37e621	use helpers.JIT in llama and gpt2 examples (#5350 ) * use helpers.JIT in llama and gpt2 examples replaced getenv("JIT"), effectively made gpt2 default jit * fix test_gpt2	2024-07-09 15:04:43 -04:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	31358cbea5	change Tensor.stack to method (#4719 )	2024-05-24 17:04:19 -04:00
chenyu	92c0675ccf	setitem initial support (#4093 ) * wip setitem it's an eager assign to output shapetracker view * cleanups and tests * more cleanups	2024-04-07 20:35:22 -04:00
chenyu	c71627fee6	move GlobalCounter to helpers (#4002 ) break circular import between ops and buffer	2024-03-30 00:30:30 -04:00
George Hotz	641f347232	simple LoadOps.ASSIGN (#3745 ) * simple LoadOps.ASSIGN * skip that test * don't assign in onnx ops gemm * track cache usage * recreate the lazybuffer to avoid the cache * fix contigs * skip that test * lol * better letters	2024-03-14 20:44:34 -07:00
George Hotz	3527c5a9d2	add Tensor.replace (#3738 ) * add Tensor.replace * fix dtypes in that test * should be replace * and mixtral	2024-03-14 13:34:14 -07:00
chenyu	f96fc6e9d4	fix gpt2 with empty prompt take 2 (#3102 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:46:36 -05:00
chenyu	ca46d3541b	Revert "fix gpt2 with empty prompt" (#3101 )	2024-01-12 14:27:41 -05:00
chenyu	1d7f01bc6d	fix gpt2 with empty prompt (#3100 ) logits would be empty so need to replace that with ones before sampling, also cannot reshape with -1 when there's 0 in other axes	2024-01-12 14:18:17 -05:00
chenyu	f0d7ad8aaa	fix gpt2 attention with start_pos = 0 (#3061 ) * fix gpt2 attention with start_pos size 1 test cases taken from ll_transformer branch * fix interpreted	2024-01-09 16:14:55 -05:00
chenyu	7c80b78be9	cleanup gpt2 build function (#3018 )	2024-01-04 23:14:53 -05:00
chenyu	f88506e630	move gpt2/llama sampling inside the model call (#3013 ) * move gpt2/llama sampling inside the model call * argmax uses one more kernel	2024-01-04 17:01:50 -05:00
chenyu	8524493748	minor gpt2 cleanup (#3012 )	2024-01-04 13:53:18 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
George Hotz	1765849937	new lazy, benchmark (#2878 ) * lazy rewrite, try 2 * min fix tests * pass contig test * put broken pads back * move that to realize * no contig child fixes array packing * so wrong * now that's correct * base children * fix bind issues * disable to_image_idx * fix tests * that failure shouldn't break other tests * more fixes * fix torch * skip failing tests in CI * 1e-7 * half is broken * 1e-6 margin of error	2023-12-20 14:33:21 -08:00
chenyu	857c35d256	make gpt2 decode output just once at the end (#2869 ) also updated function name from greedy_until to generate, as it's not greedy nor until	2023-12-20 12:14:55 -05:00
chenyu	c0f76ed4ea	transformer kvcache and mask have same dtype as input (#2771 ) * transformer kvcache and mask have same dtype as input * don't use `=0` in cstyle ternary where * (bool) * where float16 test	2023-12-14 22:41:51 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
chenyu	0978c24b8e	fast gpt2 embedding with variable bs=1 (#2596 )	2023-12-05 23:01:17 -05:00
chenyu	229ada5fe5	Gpt2 benchmark with HALF and BEAM (#2636 ) * benchmark gpt2 with half and beam * BEAM=4 * optional validation * green is good * we care	2023-12-05 22:15:16 -05:00
chenyu	a63f48d3db	gpt2 half for kvcache and output logits (#2630 ) * gpt2 more half * hlaf is fine after softmax	2023-12-05 16:54:56 -05:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
chenyu	a739c6646e	fp16 in gpt2 attention (#2491 ) * fp16 in gpt2 attention * HALF	2023-11-28 19:27:03 -05:00
chenyu	7f9a4c1285	fp16 and noshow flags for gpt2 (#2470 )	2023-11-27 16:23:03 -05:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	7170a9a057	coder.py can write and run code (#2439 ) * wip mistral * coder * touchups * cleanups * mistral cleanups * clean up cache create * download the weights, fix tests * fix llama loading * global fixup * clean up all * move llama model * cleanups * Revert "cleanups" This reverts commit `a71c5d59eb`. * fine, leave it	2023-11-25 12:27:54 -08:00
George Hotz	96c12fdeab	multibatch gpt2 (#2432 ) * support multibatch gpt-2 * multi output * no default JIT in CI	2023-11-24 18:10:10 -08:00
George Hotz	095e2ced61	add name support to fetch (#2407 ) * add name support * use fetch in gpt2 * remove requests from main lib, networkx also optional * umm, keep that assert * updates to fetch * i love the walrus so much * stop bundling mnist with tinygrad * err, https * download cache names * add DOWNLOAD_CACHE_VERSION * need env. * ugh, wrong path * replace get_child	2023-11-23 14:16:17 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
chenyu	453f48ce02	pad None means (0,0) (#2273 )	2023-11-11 09:50:26 -08:00
chenyu	a753c8e071	examples of new GPT2 and JIT change (#2261 ) * var_vals are global * working with global ish * better * fix export model * fix tests * better kv cache * does it run? * use where for kvmask * fix excessive var_vals * fix import * how does multigpu use this? * llama kinda work * faster and simpler * cleanup * fix conversation mode * test cleanups * fix one more test * test cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-11-10 15:07:02 -05:00
George Hotz	2f7aab3d13	move optimize_local_size (#2221 ) * move optimize_local_size * interpret_ast	2023-11-05 21:00:52 -08:00
nimlgen	8d41b3eb3f	beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154 )	2023-10-27 10:21:27 -10:00
nimlgen	e21bf776c8	fix debug=1 llama/gpt2 timings (#2143 )	2023-10-24 15:45:00 -04:00
chenyu	e2b83f1b42	Variable.bind newer (#2017 ) * Variable.bind attempt 2 * ShapeTracker.unbind * fix llama * fix types * test case * View.vars cleanup * include mask in symbolic source * mask can be sint * st.unbind in bufferops * assert ast contain free Variable only * cleanup * conservative unbinding reduce op arg * move reduceop unbind * fix llama JIT arg behavior	2023-10-10 10:03:01 -07:00
chenyu	c99fa58dd2	simplify gpt2 example (#1973 ) * simplify gpt2 example * kernel_jitted_count and jit tests * Revert "kernel_jitted_count and jit tests" This reverts commit `31a3c26dd0`. * all_jitted test in test_real_world	2023-10-05 07:09:29 -07:00
George Hotz	48c8d130ae	simpler GPT2 (#1941 ) * don't realize in gpt2 * simpler gpt2	2023-09-29 04:41:09 -07:00
Gijs Koning	b8ff20ffe4	Gpt2 (#1896 ) * small helps * got something working * faster? * faster yes * cleanup * cleanup * cleanup * Fix non jit * Fix fp16 and some cleanup * Fix fp16 and some cleanup * cleanup * similar to master * cleanup	2023-09-22 20:14:47 +08:00
nimlgen	4c31dfafb3	add seed to gpt-2 (#1869 )	2023-09-15 17:34:14 -04:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
George Hotz	cd7ceed914	gpt2: print total instead of sync time	2023-08-30 10:59:42 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	643cbdfd50	make embedding and GPT-2 fast (#1631 ) * make embedding fast * jit more, variable shape support * print mem bw	2023-08-22 15:14:38 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
George Hotz	4f459841bc	Symbolic JIT for GPT2 (#1613 ) * not fast yet * simpler * symbolic jit * fp16 GOPS and GB	2023-08-21 19:44:57 -07:00
George Hotz	e3c6c0c6db	add GPT2 example (#1511 ) (#1514 ) * add gpt2 to examples * some cleanup * fixes * argparse + scaled_dot_product_attention * add timing * add to benchmark Co-authored-by: YassineYousfi <yassine.y10@gmail.com>	2023-08-10 09:09:47 -07:00

50 Commits