tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
Francis Lata	ce61be16f1	clean up how preprocessed folder is defined (#5813 )	2024-07-30 12:35:26 -04:00
chenyu	471b188d79	fix mypy errors in latest mypy (#5794 ) * fix mypy errors in latest mypy mypy has stricter partial and api arg checks now * PYTHONPATH="."	2024-07-29 14:53:30 -04:00
nimlgen	ea27ec4cd0	nv switch classlist_v2 to classlist (#5763 ) * nv switch classlist_v2 to classlist * support in mockgpu * fix mockgpu	2024-07-28 20:24:42 +03:00
chenyu	3686b6726a	move GraphException to jit.py (#5744 ) same place where GraphRunner is defined	2024-07-26 19:01:12 -04:00
George Hotz	489a5b99a5	hotfix: triton_nv_matmul touchups	2024-07-24 23:24:29 +00:00
George Hotz	bf24be4c8c	triton gets 163 TFLOPS on 4090	2024-07-24 18:32:29 +00:00
George Hotz	4d47968580	fix acc folding for NV tensor cores (#5658 ) * fix acc folding for NV tensor cores * fix correctness of reduce_before_expand	2024-07-23 13:03:02 -07:00
nimlgen	08a9c0ae5e	hcq cache invalidation for beam (#5630 ) * nv full cache invalidation * the same command on amd * linter * fix amd * nv no hardcoded consts * beam default	2024-07-22 18:13:17 +03:00
George Hotz	6c6d74d922	parallel mcts (#5626 ) * start work on parallel mcts * compile was linearizing twice * typing + more early stopping * fix compiler error	2024-07-21 14:53:23 -07:00
George Hotz	ef179087a4	mcts exit condition wasn't right, also use it with BEAM>=100 (#5619 ) * mcts exit condition wasn't right, also use it with BEAM>=100 * mcts touchups * clean up sample	2024-07-21 10:16:47 -07:00
George Hotz	0f67ef4674	mcts graph and dedup support (#5618 ) * mcts graph and dedup support * usable graph * mcts colors * C=4 seems better * C=3 even better * sample_tree * backprop is external function * late expand to match algo	2024-07-20 23:29:14 -07:00
chenyu	eddc5bcfd7	MCTS tweaks (#5616 ) MCTS 500 is competitive with BEAM=8 on resnet on M1 Max. - increment trial times even with compiled error and runtime error. - use best time of children as the node value.	2024-07-20 19:45:59 -07:00
George Hotz	1113e47f96	print best in MCTS + light up the winner in hcopt	2024-07-20 09:39:36 -07:00
George Hotz	ac99ecd94e	use statistics.median for timing (#5606 )	2024-07-20 08:37:32 -07:00
George Hotz	06e336bccb	mcts search (#5598 ) * mcts search * mcts cleanups * mcts cleanup * random shuffle children order * mcts in handcode_opt * src and remove_node * debug 3 to print ast * print the type * mcts in extra	2024-07-19 21:38:39 -07:00
Tobias Fischer	72da3fe7e6	added clip vision model (#5595 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-19 18:35:51 -04:00
George Hotz	fa7e734b49	MetaOps.KERNEL (#5543 )	2024-07-17 19:41:23 -07:00
Francis Lam	2d53abb04a	test/external/fuzz_linearizer: fix for new AST changes (#5519 ) * test/external/fuzz_linearizer: fix for new AST changes also add beautiful_mnist failures * add CLANG and LLVM to test_failure_35 failed_platforms * fix test_linearizer_failure names	2024-07-17 00:08:07 -04:00
Tobias Fischer	85d4ca7caa	FID Inception Model (#5516 ) * added model impl * minor cleanups * extracted weights loading into from_pretrained * reorganized model for better weight loading * removed lru cache for state dict loading	2024-07-16 23:12:03 -04:00
chenyu	28972418c4	s/get_linearizer/get_kernel [run_process_replay] (#5467 )	2024-07-13 20:32:22 -04:00
George Hotz	03c2dc8bd7	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
chenyu	00813a92a0	update Tensor.eye api to match torch (#5433 ) * update Tensor.eye api to match torch input is n for nrows and optional m for ncols * space * fix onnx	2024-07-12 20:25:12 -04:00
George Hotz	870dc8c350	s/Linearizer/Lowerer [run_process_replay] (#5428 )	2024-07-12 15:54:07 -07:00
George Hotz	6707c778d0	scheduleitem is not Tuple [run_process_replay] (#5425 ) * scheduleitem is not Tuple [run_process_replay] * fix tests * fix op + fuzzers * fix mop test	2024-07-12 15:13:19 -07:00
George Hotz	94599c0637	fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424 ) * fixup ast in kernel to be MetaOps.SINK [run_process_replay] * fix tests * fix more tests	2024-07-12 14:01:03 -07:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
wozeparrot	a02b38c0ac	download openimages by running it (#5396 )	2024-07-11 16:06:13 -07:00
wozeparrot	fa873df9c1	bring tinychat more inline with tinyos' version (#5358 )	2024-07-10 13:13:52 -07:00
George Hotz	c13da83f12	tests from lowerer branch (#5339 ) * tests from lowerer branch * Update test_image_dtype.py * Update test_image_dtype.py * Update test_image_dtype.py	2024-07-08 21:23:19 -07:00
nimlgen	51d6f372e4	nv get classes based on device (#5325 ) * nv get classes * support in mockgpu * choose sm based on gpu * fix * fix * fix arch	2024-07-08 18:25:05 +03:00
Tobias Fischer	0c3a35e5c2	Stable Diffusion v2 Inference (#5283 ) * model implementation * clip fix, more qol options	2024-07-03 22:47:10 -04:00
chenyu	b2c3a28a5e	nn.RMSNorm (#5272 ) the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize	2024-07-02 21:39:01 -04:00
Tobias Fischer	8c9c1cf62f	Pulled CLIP and UNet into Seperate Files (#5253 ) * pulled clip and unet into seperate files * reference cleanup, lru cache fix * better pool indexing	2024-07-01 22:33:01 -04:00
nimlgen	57e89645cd	hcq spec test (#5226 ) * start hcq spec test * more test * fixes * run on amd as well * test amdgpu exec * fix amd * amd mockgpu support sdma timestamp	2024-07-01 17:36:37 +03:00
George Hotz	14980f79dd	hotfix: unbreak llama	2024-06-30 15:27:54 -07:00
George Hotz	3df47bc21e	OpenELM + repeat_interleave (#5234 ) * start writing openelm * progress...hit bug * repeat_interleave support * gqa * add rotary embedding * spp * i think it runs correctly * broken * output is good now * cleanups * no io_uring on android	2024-06-30 15:18:39 -07:00
nimlgen	dd7eef7d71	libc defs to autogen (#5217 ) * libc defs to autogen * amd import libc * linter * better a bit * remove comment, check this * not hardcoded path	2024-06-29 14:37:33 +03:00
qazal	3e56c8422c	remu err handling (#5208 ) * add error handling * use pre release * minor * works	2024-06-28 13:15:18 +03:00
reddyn12	f1c7944c44	Fix batchnorm shapes for resnet.load_pretrained (#5167 ) * Fix batchnorm shapes * make it general reshape	2024-06-26 18:44:10 -04:00
nimlgen	69f116a7e1	nv/amd profiler (#4718 ) * nv/amd profiler * fix * fix * profile copies * profile logger * fixes * more fixes * less lines and fixes * fixes * some linter * back sync, no related change * fix gpu2cpu time def * simpler * linter * linter * docs * add add_event api	2024-06-23 17:10:12 +03:00
chenyu	e356807696	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
chenyu	8080298739	s/tinytqdm/tqdm (#5103 ) except in unit test where tqdm is imported	2024-06-22 14:18:26 -04:00
chenyu	e468601226	update llama attention casting (#5096 ) * update llama attention casting updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention. * fix that	2024-06-22 10:57:17 -04:00
chenyu	8bd6cb9511	update llama model RMSNorm casting (#5095 ) following the original implementation, cast back to input dtype before multiplying weight. slightly faster https://github.com/meta-llama/llama/blob/main/llama/model.py	2024-06-21 23:02:04 -04:00
chenyu	0c857ae2d6	some onnx_ops cleanups (#5094 )	2024-06-21 22:01:32 -04:00
nimlgen	fb1bf48cfe	io_uring for copies from disk (#5035 ) * exp uring * fixes and old version * nv * cleaner * cmp vs aio * fix * no lib * fix nv * linter * disk_speed_test now runs default * fixes * uring -> io_uring * linter happy * get_temp_buf comment added * tiny nits * put wait back * test runs everywhere * remove consts * remove mmap consts * do not require iouring to run test, they are generic	2024-06-21 11:36:51 +03:00
chenyu	f6d6760f71	don't cast tuple to list before creating Tensor (#5071 ) Tensor constructor supports creating from tuple now	2024-06-20 13:32:56 -04:00
chenyu	e2c5054bdd	update resnet.load_from_pretrained (#5040 )	2024-06-18 16:29:22 -04:00
chenyu	a3ed4176c8	use tinytqdm in active tests and examples (#5038 ) * use tinytqdm in active tests and examples stress test this before 0.9.1 * no set_description	2024-06-18 16:01:19 -04:00
Junjun Dong	c8cd6e725c	Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977 ) * feat: remove BinaryOps.SUB * remove SUB in test_early_end_local * regenerate dataset. remove SUB in test_linearizer_* * reenable overflow tests * simplify tensor.sub function by returning a+(-b) * remove whitespaces --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-06-18 09:06:13 -04:00

1 2 3 4 5 ...

735 Commits