tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 17:38:06 -05:00

Author	SHA1	Message	Date
George Hotz	0d39bb5de1	rename to get_kernelize_map (#10465 )	2025-05-22 11:44:44 -07:00
chenyu	7bfb20757c	fix tensor int floor div (#10327 ) * fix tensor int floor div * test_float_floordiv_scalar	2025-05-21 06:46:54 -04:00
Sieds Lykles	2b4375f36d	Correct divmod folding behind flag (#10433 ) * add flag * add test * remove import	2025-05-21 06:46:13 -04:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
chenyu	29624af872	skip commavq in external_model_benchmark (#10439 ) precision issue with different onnxruntime version	2025-05-21 01:45:33 -04:00
George Hotz	03e7a99ca8	add edge cases found by codex [pr] (#10423 ) * add edge cases found by codex [pr] * another test * more edgecases * docs * instructions * fine, add that one * nan cases * roll failures * inv prob * more failing tests * err, that's failing * more tests * more failures * uop verif * failures * webgpu	2025-05-20 14:53:18 -07:00
nimlgen	2895198c36	am: download regs (#10419 ) * am: download regs * x * linter * mypy * after merge * raise * fixed name * fix * xx * remove * missing reg * missing reg * move to online * ops	2025-05-20 18:59:56 +03:00
uuuvn	ec9955c956	Use REAL_DEV for test skips (#10420 ) This should fix remote cpu tests flakiness (segfaults were in `test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't skipped on remote cpu)	2025-05-19 17:32:14 -07:00
Sieds Lykles	db09676250	Dont simplify gate in gate, fix `FUSE_ARANGE=1 python test/test_ops.py TestOps.test_scatter_add` (#10411 ) * substitute out index * Add test * change comment	2025-05-19 13:16:21 -04:00
qazal	cc8dda1d75	move multi_map to grouper rewrite pass (#10409 ) * move multi_map to grouper rewrite pass * delete that	2025-05-19 10:44:06 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
uuuvn	27c12be471	amd mockgpu graph support (#10385 ) For testing remote graph stuff (prompted by #10371) in ci	2025-05-18 09:43:16 -07:00
qazal	04b23087d8	grouper tests from fuse_arange_default [pr] (#10394 )	2025-05-18 18:42:43 +03:00
qazal	9e2089dcd4	don't raise Exception in process replay [pr] (#10392 ) * don't raise Exception in process replay [pr] * continue generating diffs unless [pr] is set, exit(1) otherwise * change * works	2025-05-18 11:23:23 +03:00
qazal	0294bfe507	simpler can_pad (#10364 ) * simpler can_pad [pr] * 3 kernels * tests * less kernels	2025-05-18 10:00:07 +03:00
George Hotz	6f77b938d7	Move getbits tests into test_helpers (#10382 )	2025-05-17 17:04:00 -07:00
George Hotz	6ec88d94df	add tests for multi ram usage [pr] (#10376 )	2025-05-17 15:33:40 -07:00
वेदांत	2453d99050	rms matching pytorch implementation (#10319 ) * rms matching pytorch implementation * pre commit fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-17 08:23:11 -07:00
qazal	e054b53a75	kernel count tests for pad [pr] (#10369 ) * kernel count tests for pads * handcoded rand one kernel * comment * prerealize device rng counter * test_rand_handcoded generates /0 * remove track_rewrites	2025-05-17 17:20:46 +03:00
George Hotz	e13f2a3092	multi is O(1) (#10183 ) * multi is O(1) * allreduce * no new uops needed * junk * something * simple * that's really what i want * closer * inject _device_num * pretty print * cleanups * this * early dnum * ops allreduce is good * ish * device is the tuple and this is fine * simpler * progress * copy_multi * work * more tests * more tests pass * work * no None axis * tests * no none multi * type fixes * pre commit passes * lil * remove this * mlperf dataloader on mac * that test was wrong * unbind * support DEBUG=2 * realize * only unbind bound vars * don't include fixedvars * graph test * one test * fixedvars in hcq * new ring reduce * ring reduce * simpler ring * mselect * mselect doesn't work * Revert "mselect doesn't work" This reverts commit `c78b77bd7d`. * Revert "mselect" This reverts commit `bb2e430ac3`. * simpler * fixups * no optional * fix jit * move things around * cleanup multi * simpler multi * simpler reshape	2025-05-16 23:14:23 -07:00
George Hotz	e1a40e8040	add hcq fixedvars support [pr] (#10356 ) * add hcq fixedvars support [pr] * different test * fixedvars are only for comp_queues * fix hcq varvals	2025-05-16 22:05:53 -07:00
George Hotz	876d2275a1	changes from new multi (#10353 ) * changes from new multi * revert hcq change	2025-05-16 13:07:29 -07:00
wozeparrot	66e00c04dd	fix: skip kernel timing tests on ci cuda (#10348 )	2025-05-16 11:48:06 -07:00
qazal	e9e5b54e43	grouper cleanups and merge with insert_kernels [pr] (#10349 ) * grouper cleanups and merge with insert_kernels [pr] * remove that	2025-05-16 14:39:56 +03:00
b1tg	caded2f413	llvm diagnostic error (#10267 ) * llvm diagnostic info * use decorator * better error reporting * fix mypy * collect all diag msgs * test diag error --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-16 02:03:20 -04:00
George Hotz	a4a25720b2	add test_multitensor_jit_input [pr] (#10347 )	2025-05-15 20:47:57 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
wozeparrot	f59ecf2116	fix: mockgpu cuda timing (#10343 )	2025-05-15 14:14:14 -07:00
qazal	7cfe367c07	failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330 )	2025-05-15 14:58:11 +03:00
qazal	0a45cd0cbe	grouper: merge views in fuse elementwise (#10325 ) * grouper: merge views in fuse elementwise * with gradient api	2025-05-15 13:17:09 +03:00
qazal	89d8d5b25e	add dims check in FUSE_ARANGE (#10323 )	2025-05-15 11:33:21 +03:00
qazal	8fad0f0124	grouper: check for unsafe PAD in FUSE (#10322 )	2025-05-15 10:53:44 +03:00
chenyu	f008e5f233	test_dtype_alu should cast bf16 input (#10320 ) when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate	2025-05-15 01:11:39 -04:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
chenyu	f6cf25fce4	cleanup test_conv2d_ceildiv_edge_case [pr] (#10317 )	2025-05-14 23:35:28 -04:00
Kirill R.	50d7162acd	Add conv2d ceildiv edge case (#10303 )	2025-05-14 22:50:23 -04:00
wozeparrot	9bbc2bc2a7	hotfix: filter_too_much (#10308 )	2025-05-14 15:31:51 -07:00
George Hotz	42e70193c9	multi: instead of real, just copy (#10289 ) * multi: instead of real, just copy * fix test * remove real	2025-05-14 10:36:55 -07:00
qazal	043efc6ec4	do not require self for track_rewrites [pr] (#10302 )	2025-05-14 18:23:32 +03:00
qazal	d342f7688d	remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296 )	2025-05-14 14:54:07 +03:00
qazal	40f4ce3390	enable AMD CI for TestRandomness.test_multinomial [pr] (#10295 )	2025-05-14 14:32:22 +03:00
qazal	1770e00c41	only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292 )	2025-05-14 11:58:42 +03:00
qazal	1c97338be5	enable process replay assert for schedule [pr] (#10280 ) * enable process replay assert for schedule * start at unique+1	2025-05-14 11:10:47 +03:00
uuuvn	7bc4864bc4	Make `dev` a property of `Allocator` (#10286 ) * Make `dev` a property of `Allocator` (this is a prereq refactor for #10285) At least `BufferXfer.copy` accesses it assuming it's always present, currently most devices just add this property on their own repeating the same code over and over again. This is also a bit footguny, see `RemoteAllocator` that named this property `device` instead of `dev`, i could obviously just change that in one place but doing it globally seems like a better solution (and it reduces code duplication too). `MallocAllocator` is a bit special, but passing `None` works just fine. * typing * ignore type instead of cast	2025-05-13 17:01:01 -07:00
uuuvn	ddff9857b8	Remote properties is a dataclass (#10283 ) Not strictly required for anything but soon there will be like 4 new properties and having it be a huge json just seems like a bad taste. It also seems right to not have a separate endpoint for this, just `GetProperties` request that returns a repr of this similar to how requests are sent in `BatchRequest`. This will also make a switch to anything other than http much simpler if it will be required for any reason, like just a tcp stream of `BatchRequest`s	2025-05-13 11:56:58 -07:00
uuuvn	ba87eca0f1	Remote multi (basic) (#10269 ) * Basic remote multi support Simplest thing to be able to use remote with multiple gpus, very slow because no transfers (copyin copyout for cross-device copies) * tests	2025-05-13 09:52:47 -07:00
George Hotz	5f64bbc63d	improve multi tests + add support for fixedvars [pr] (#10281 ) * improve multi tests + add support for fixedvars [pr] * add support for fixedvars	2025-05-13 09:27:00 -07:00
chenyu	8a906cb124	Tensor.randn_like (#10276 )	2025-05-13 11:53:59 -04:00
chenyu	c4988bc07b	only run test_u32_to_f16 if it supports fp16 (#10277 ) * only run test_u32_to_f16 if it supports fp16 * cleanup	2025-05-13 11:16:14 -04:00

... 17 18 19 20 21 ...

4667 Commits