tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-12 07:35:16 -05:00

Author	SHA1	Message	Date
geohotstan	fe88591890	update onnx to 1.16.0 (#4127 ) * update * pass tests and skip tests	2024-04-10 11:19:13 -04:00
chenyu	6bbbeb93ac	skip a few clang test that took > 30 seconds in CI (#4126 ) * skip slow CLANG test test_train_cifar * skip those too * and that * only CI * one more	2024-04-10 02:00:34 -04:00
George Hotz	08ddeb5685	create schedule has global vars (#4125 ) * abstractions3 is currently wishful thinking * create_schedule_with_vars	2024-04-09 21:42:16 -07:00
George Hotz	216eb235e5	hotfix: cast mnist to float	2024-04-09 19:30:03 -07:00
George Hotz	fea774f669	spend 5 lines to bring mnist into the repo (#4122 )	2024-04-09 19:24:57 -07:00
qazal	42edae8935	pickle schedules (#4114 ) * pickle schedules * Update test_pickle.py * Update test_pickle.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-09 13:47:25 -07:00
Felix Kuehling	38ae4194a6	Fixes for ops_kfd (#4105 ) * kfd_ops: Fix GPU node discovery on NUMA systems Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are not accessible because of device cgroups. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> * kfd_ops: Format the GFX arch target name correctly The target version in sysfs properties is a decimal representation with two digits per component. The format for LLVM GFX target names is a bit quirky for historical reasons. It uses one digit for the minor version and stepping. When it ran out of decimal digits for the stepping on gfx90X it started using hexadecimal there. But the major version is still decimal and went double digit in GFX10. Make sure to parse and format it accordingly for all supported GPUs. Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> --------- Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>	2024-04-09 13:21:21 -07:00
George Hotz	10dbf90b2c	hotfix: test speed	2024-04-09 13:20:39 -07:00
George Hotz	ae849d12d7	numpy device + pickle it (#4120 )	2024-04-09 13:19:30 -07:00
chenyu	1ef9c50fd7	Update ssa input order and annotate types in cstyle and assembly (#4117 ) variable prefix is never optional (removed the default "t") and UOp can be optional (added the default None).	2024-04-09 13:10:29 -04:00
geohotstan	15f2f39658	conceptually simpler fancy index (#3335 ) * init * add failed case * fix: temp comment out MULACC cast * is this right? * add test case * oops, forgot to get rid of temp test * WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY * cleaner * comment cleanup * update docs * resolve conflict * oops * SUPA FAST * comment out a test * del some print statements * use new broadcast stuff * more clean up * move try except * skip fancy indexing for python backend test_ops	2024-04-09 11:18:04 -04:00
David González Martínez	980124a605	add lerp operation to tensor (#4102 ) * feat: add lerp operation to tensor * fix * style: fit in one line: * tests: test backward for lerp	2024-04-08 17:03:27 -07:00
Francis Lam	46850a0269	search: add a BEAM_COMPARE env to optionally not compare to hc/tc (#4107 ) * search: add a BEAM_COMPARE env to optionally not compare to hc/tc setting BEAM_COMPARE=0 will prevent additional memory allocation needed to do the timing tests assuming the BEAM result is in the diskcache. * change to optionally use Buffer.allocate	2024-04-08 18:54:01 -04:00
qazal	c390828f61	refactor outbufs (#4112 )	2024-04-08 14:54:10 -07:00
andresgit	7fd12aba85	graph remove input buffer references (#4100 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-08 16:49:16 -04:00
chenyu	078d841479	add SPLIT_REDUCEOP to disable reduce split (#4115 ) verify with `SPLIT_REDUCEOP=0 BIG=2 MPS=1 python3 -m pytest -rA test/test_speed_v_torch.py -k sum`. 10X slower on mac	2024-04-08 16:31:08 -04:00
qazal	eea42d864f	account for all outputs (#4113 )	2024-04-08 10:04:19 -07:00
chenyu	dbd39ab78a	setitem support setting python const (#4111 )	2024-04-08 11:37:50 -04:00
chenyu	f8dc82a8a7	use single tensor for llama kv chache (#4108 ) similar to optimization in gpt2	2024-04-08 00:38:32 -04:00
chenyu	92c0675ccf	setitem initial support (#4093 ) * wip setitem it's an eager assign to output shapetracker view * cleanups and tests * more cleanups	2024-04-07 20:35:22 -04:00
geohotstan	183708b3fd	broadcast expand to match torch (#4085 ) * initial version * heh gimme grrrreen * version 2 * clean ups * some test confusion * fix onnx * rename to _broadcast_tensors * improved errors and test * fixed? * some test fixup * version 3 lol * comments * cleaner * add failure test for expand to 0 test * 1 more assertRaises test * make err msg better * also rewrite the expand onnx op? :s	2024-04-07 16:23:13 -04:00
uuuvn	2b81d9b334	Fix broken test (#4104 )	2024-04-07 12:02:12 -04:00
chenyu	9a95d87366	metal CI run llama with 4 shards (#4103 ) this can catch multi tensor issue on mac.	2024-04-07 11:04:08 -04:00
George Hotz	444d2a7487	hotfix: fix SDMA read_pointer_address in KFD	2024-04-07 13:13:15 +00:00
uuuvn	bb7567b365	Fix metal (#4101 )	2024-04-07 05:21:19 -07:00
chenyu	bdbcac67f1	assign jit test case with other tensor as input (#4098 ) hmm it works	2024-04-06 14:41:14 -04:00
George Hotz	e4a1858471	revert command queue (#4097 )	2024-04-06 08:58:18 -07:00
George Hotz	97c402d69e	use imagenet spawn (#4096 )	2024-04-06 08:34:10 -07:00
George Hotz	fffd9b05f5	mock mnist data for imagenet trainer (#4095 ) * mock mnist data for imagenet * move print and test * needed to reshape	2024-04-06 08:08:40 -07:00
George Hotz	8739d33fe9	kfd: disable copy_from_fd while debugging (#4091 ) * kfd: disable copy_from_fd while debugging * increase timeout to a minute	2024-04-05 18:02:58 -07:00
George Hotz	93824e59eb	support MOCKDATA=1 for resnet (#4090 ) * mockdata for resnet * fix eval, revert hsa	2024-04-05 17:19:18 -07:00
George Hotz	164329a8ea	address kfd feedback (#4087 ) * address kfd feedback * signals cleanup * signals cleanup * handle 2 doorbell pages correctly * signal reset cleanup * signals cleanup * more GTT * cleanups * minor cleanups	2024-04-05 15:24:41 -07:00
geohotstan	dafa42e864	clean up (#4081 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-05 11:57:44 -04:00
Akshit Talwar	750ecf8fef	replace slice by pad/shrink in _pool (#4082 )	2024-04-05 11:47:22 -04:00
George Hotz	a337922c44	more work on kfd (#4079 ) * more work on kfd * fix multitensor test on kfd * stuff	2024-04-05 08:36:36 -07:00
chenyu	e7ff5102cf	failed test in test_pattern_matcher (#4080 ) something about the PTX rewrite is incorrect that it has duplicated rewritten uops	2024-04-05 02:53:50 -04:00
chenyu	a023a1ed87	update github action to actions/cache@v4 (#4077 ) get rid of warning `Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/cache@v3.`	2024-04-04 22:24:26 -04:00
George Hotz	28ec6c67be	hotfix: hlb_cifar KFD works	2024-04-05 02:19:14 +00:00
chenyu	1de9778949	import Buffer and BufferOption from tinygrad.buffer (#4076 )	2024-04-04 22:12:23 -04:00
chenyu	9e0ebf8979	remove dtype from FlopCounter (#4075 ) the annoying thing to remove all FlopCounter is that for device that does not support local, matmul index alu is huge. we can remove the dtype first. sneak in updating `ruff` command to `ruff check`	2024-04-04 21:23:28 -04:00
George Hotz	3de855ea50	don't use SVM memory in KFD (#4072 ) * don't use SVM memory in KFD * copy from fd * cleanups * transfer * hacks * ops_hsa * tighter API	2024-04-04 17:33:21 -07:00
chenyu	5e6e6c9a67	use ConstType in various const function type hint (#4074 )	2024-04-04 20:32:07 -04:00
chenyu	c1cffed1df	add LazyOp.dtype (#4073 ) an inferred cached_property. removed all cases that use get_lazyop_info just to get the dtype of an op. prereq to remove InterpretedFlopCounter	2024-04-04 17:38:19 -04:00
chenyu	f836d6a03f	is_unrealized_unpadded_const -> is_unrealized_unmasked_const (#4071 ) realized #3580 was doing the same thing. unmasked is more accurate	2024-04-04 14:25:17 -04:00
Szymon Ożóg	82b7b9655f	test for dtype set (#4069 )	2024-04-04 11:24:33 -04:00
geohotstan	1a1dd1c1a7	add and enable tests for indexing const folding (#4068 ) * enable test in test_indexing * added tests * rename stuff * del a test case cuz it's loadops.copy	2024-04-04 10:46:28 -04:00
Szymon Ożóg	ba118abfec	improved caching for pointer arithmetics in ptx (#3922 ) * improved caching for pointer arithmetics * Add test for pointer arithmetics caching * Refactor test	2024-04-04 07:33:48 -07:00
Szymon Ożóg	68fe3527f1	Tensor core ptx (#3894 ) * tensor cores * Merge from master * faster program start in llvm (#3897) * Fix the result permutation in einsum (#3895) * Fix permutation of result indices in einsum. * Delete stray line used for breaking tests * Fix linter error by renaming twice-used variable --------- Co-authored-by: chenyu <chenyu@fastmail.com> * touchup einsum (#3900) don't need rhs_letters * hotfix check ckpts before writing achieved model (#3901) this killed tinybox green run * replace dtype.name str with render_dtype (#3903) fixed some bf16 cast issue since it does not have `.name`. also more robust if there are lang specific type override * add --minimal flag to nvrtc (#3899) * wmma: fix the AMD TC threads to split the first 16 threads (#3904) previously it was incorrectly aliasing 16 into the size 8 upcast on the store alias. now it splits it properly into 8 and the remaining 2 into the correct local stride * training cifar with BF16 on CUDA (#3905) * training cifar with BF16 on CUDA memory usage is between float and half due to numpy calls on dataset preprocessing, which converts into float. * simpler bf16 functions * bf16 cifar works for HSA too just very slow * simpler bf16 functions, we love cuda * include negative float in test_dtype (#3884) * include negative float in test_dtype * that is ub * too annoying * pack can overflow * add to benchmark * change var name to satisfy mypy * spacing * Update to new TensorCore format * Spacing --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: Alejandro F Queiruga <33233447+afqueiruga@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: sekstini <127142660+sekstini@users.noreply.github.com> Co-authored-by: Francis Lam <flam@alum.mit.edu> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-04 07:32:31 -07:00
Szymon Ożóg	92378fb5b6	Ptx mulacc (#3937 ) * mulacc * Move more stuff to pattern matcher * disable callable from the == check * disable function passing in pattern matcher * Add set of dtypes pattern matching + refactor mulacc pattern	2024-04-04 00:15:25 -07:00
George Hotz	3e72d745ea	hotfix: make KFD timings right	2024-04-04 05:55:29 +00:00

1 2 3 4 5 ...

4147 Commits