tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
Mason Mahaffey	3ebf7a3e38	reflect changes to shapetracker in doc printouts (#3349 )	2024-02-08 16:20:30 +01:00
Francis Lam	2266152b28	linearizer: added FUZZ_BEAM to fuzz_linearizer and additional tests (#3340 ) Fixed test_tensor_core_opts to test all the TCs. Added commented out failing tests in test_color_shapes_with_local.	2024-02-08 16:12:58 +01:00
chenyu	b110c4a7b8	explicitly set input low and high in test_ops (#3347 ) easier to set `(low, high)` than figuring out a,b for `(x+a)*b`. this pr kept the same input ranges	2024-02-08 04:11:45 -05:00
chenyu	d8ad9e5660	verify eval acc for hlb_cifar training (#3344 ) set to 93% to reduce flakiness for now	2024-02-07 19:19:59 -05:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
chenyu	1732f1ba83	fix import and long lines in view (#3337 )	2024-02-07 06:50:21 -05:00
chenyu	02636ff62d	re-enable test_reduce_0d_default int test case in test_dtype (#3336 )	2024-02-07 05:30:14 -05:00
chenyu	ca66be6a70	add failed Tensor.pow test cases (#3334 ) tried refactoring pow and found some bugs	2024-02-07 04:28:24 -05:00
chenyu	ea74856d99	remove some noqa: E501 in tensor (#3332 ) left ones in conv2d and wino, no E501 elsewhere in tensor. three functions need general readability improvement: getitem and gather, conv2d and wino, and pow	2024-02-07 00:03:05 -05:00
David Hou	6478ee5c75	PoC UnaryOps before expand (#3319 ) * PoC cast before expand * maybe also do unaryops before expand? * undo unaryops change	2024-02-06 19:05:13 -08:00
chenyu	d9ef8e25b3	fix Tensor.var with 0 in reduce dim. (#3324 ) fix when correction is too big. it seems to only work when input size is 0 though. torch can output -inf in var when correction is too big, which does not make sense.	2024-02-05 20:59:13 -05:00
Obada Khalili	ee25f73283	Fix Tensor.mean to compute the mean correctly when 0-length axes are selected (#3318 ) * fix Tensor.mean to compute the mean correctly with 0-length axes are selected * add a regression test * rename sum variable to sum_t to avoid conflict with built it function * refactor Tensor.mean to has less lines	2024-02-05 01:40:37 -05:00
terafo	3752e97c8f	Fix: Always cast ONNX Slice op arguments into ints (#3317 ) * fix: ensure that axes and steps are always ints * Cast everything in tinygrad --------- Co-authored-by: terafo <terafo@protonmail.com>	2024-02-04 18:40:48 -05:00
chenyu	97275101e9	fix safetensor load uint32 and uint64 (#3315 ) the correct keys are U32 and U64.	2024-02-04 10:46:27 -05:00
Yoshinori Sano	edb74897b2	support safe load bf16 (#3310 ) * support safe load bf16 * fix lint error E501 * add test for loading safetensors * key should be BOOL * fix lint	2024-02-04 10:08:39 -05:00
chenyu	ca7973f61c	clean up einsum_mulacc (#3312 ) * clean up einsum_mulacc * push get_strides * stride * get_strides for ndim	2024-02-04 06:21:19 -05:00
chenyu	d459956966	move TestGetContraction to test_helpers (#3313 ) also cleaned long lines in test_shapetracker and enabled the line length check	2024-02-04 06:05:01 -05:00
Obada Khalili	b4ea0e18e3	Fix dot product on buffers with zero strides (#3303 ) * skip matacc opt if the all src buffers of mul op are const buffers * add noqa directive for long test * unskip MALACC opt * ensure that a_axes at least includes summation axes in order to perform np.einsum correctly * add regression test for mulacc op * compute a_slices using a_axes * refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes * include a regression test that uses and to test the behaviour indirectly	2024-02-04 05:15:06 -05:00
chenyu	30a3288c4a	touchup canonicalize empty mask (#3308 ) empty list -> None. also added env SEED for fuzz_shapetracker_math	2024-02-03 21:05:10 -05:00
Jyotirmaya Mahanta	f5e0d9673c	canonicalize empty masks (#3292 )	2024-02-03 20:27:57 -05:00
chenyu	7e6f69e963	remove some contiguous and contiguous_backward from wino (#3306 ) noop cleanup, the kernels remain the same	2024-02-02 23:15:05 -05:00
chenyu	f8563a7e9f	touchup apply_matrix (#3301 )	2024-02-02 05:13:37 -05:00
chenyu	3a7c1eb383	add winograd hlb_cifar10 back to tinybox benchmark (#3300 ) * add winograd hlb_cifar10 back to tinybox benchmark * LATEWINO * use wino for the full run to save benchmark time	2024-02-02 04:29:56 -05:00
David Hou	aebaab011f	faster wino compile by catting consts across data expand dim (#3293 ) * PoC faster wino compile by catting consts across data expand dim * fix fusions * faster + golf it * noqa 501 * implicit broadcast * Revert "implicit broadcast" This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666. * shorter * shorter * oops * 216 upcasts is probably fine * wino kernel count test * test winograd number of sts * specify device for apply_matrix mat elements	2024-02-02 03:47:45 -05:00
David Hou	cf6f478901	limit group_for_reduce bufs to 32kb (#3299 ) hipcc crashes for buffers that are too large	2024-02-02 03:13:12 -05:00
chenyu	b564660637	type annotation for Compiler.cachekey and minor cleanup (#3298 )	2024-02-01 21:31:21 -05:00
Felix Wu	021eea3a52	fix UnboundLocalError when running Compiler with DISABLE_COMPILER_CACHE (#3296 )	2024-02-01 21:12:33 -05:00
chenyu	a5bf4afc1a	update ruff.toml for v0.2.0 (#3297 ) select -> lint.select. also added rule names for fully specified ones	2024-02-01 20:50:20 -05:00
chenyu	9196b11dfb	test_ops sinh/cosh/asinh/acosh/atanh (#3294 ) some have numerical issues at large input similar to sigmoid	2024-02-01 03:10:11 -05:00
Francis Lam	927f2dd24d	wmma: add HIP FP16 to FP16 tensor core (#3287 ) * wmma: add HIP FP16 to FP16 tensor core * test: fix test_tensor_core to use separate tolerances for half	2024-01-31 23:00:51 -05:00
chenyu	18e854cdbf	shrink MLB on sharded axis (#3255 ) * shrink MLB on sharded axis use onehot structure to store the real partition. goal is unsynced batchnorm2d that can be run on multigpu for training. draft version in https://github.com/chenyuxyz/tinygrad/pull/109 * SYNCBN flag * test unclean shrinks * UnsyncedBatchNorm reuses BatchNorm * more robust pad arg check * better types * more tests! * 6 gpus in benchmark * disable slow GPUS=6 benchmark	2024-01-31 21:48:25 -05:00
chenyu	a3652e6ddc	minor cleanups to test_ops (#3290 ) - removed noop a=0 - fixed integer div test - added test for both python expression and Tensor method call - reordered for consistency and added some spaces	2024-01-31 19:01:25 -05:00
chenyu	77251336d5	fix handcode_resnet50_opt.py (#3289 ) linearizer_opts has moved. also update the logging to print after total_tm update	2024-01-31 19:01:08 -05:00
chenyu	9b8c1a0408	Tensor.batchnorm works more than 2d and reuse in onnx (#3284 )	2024-01-30 19:02:45 -05:00
chenyu	7816c3b692	onnx update for trilu and argmax (#3283 ) * support 0 in shape for tril and triu * select_last_index for ArgMax and ArgMin * pass **kwargs	2024-01-30 18:39:16 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit `3781e1bc56`. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	247a8a2a6c	add canonicalization to View.create (#3280 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views * update second any * isinstance -> not * 25% less same but unequal	2024-01-30 10:26:48 -08:00
George Hotz	d8f6280ffb	hotfix: add CHECK_NEQ to fuzz_shapetracker_math	2024-01-30 10:07:54 -08:00
George Hotz	09f2952dc3	reintroduce merge views in update benchmark (#3279 ) * Reapply "take merge views from corsix branch" (#3278) This reverts commit `d298916232`. * reintroduce merge views	2024-01-30 09:47:20 -08:00
George Hotz	d298916232	Revert "take merge views from corsix branch" (#3278 )	2024-01-30 09:34:28 -08:00
George Hotz	b57a16aa89	take merge views from corsix branch (#3273 ) * take merge views from corsix branch * better DEBUG * max views * remove view.py change * Revert "remove view.py change" This reverts commit `f3025f4f39`. * only allow filter on non symbolic * oops, correct fix * comment to explain	2024-01-30 09:25:16 -08:00
George Hotz	6a4a5dc79d	fix pad 0 size (#3277 ) * fix pad 0 size * put in view, not pad * test was wrong	2024-01-30 08:58:10 -08:00
chenyu	b0a755288f	cifar EVAL_BS set default value to BS (#3274 ) less compile time for eval due to cache. 500 was a slow uneven number for 6 GPU too. eval time 5.9s -> 3.4s	2024-01-29 17:37:12 -05:00
Francis Lam	861d5ac224	wmma: fix the upcasts after WMMA to be hcopt ordering invariant (#3250 ) will correctly handle and permutation of optops after the TC one	2024-01-29 11:51:57 -08:00
chenyu	af4ca85594	MultiLazyBuffer.reshape new_axis without real_strides (#3272 ) similar to contraction, but this is one is for finding the mapped single axis	2024-01-28 23:53:52 -05:00
chenyu	34c7621556	HIP=1 NOCLANG=1 for tinybox external_model_benchmark (#3270 ) used HIP instead of GPU and disabled slow CLANG	2024-01-28 22:05:26 -05:00
George Hotz	085dc87bed	winograd should be 4 kernels (#3268 )	2024-01-28 09:21:26 -08:00
George Hotz	f48b6aca77	long running beam pool (#3267 )	2024-01-28 08:06:03 -08:00
George Hotz	9e17378b60	Fix metal tests (#3266 ) * small fixes for tests on mac * remove device from TensorCore	2024-01-27 18:09:42 -08:00

1 2 3 4 5 ...

3543 Commits