tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-12 08:28:29 -05:00

Author	SHA1	Message	Date
Jordan Chalupka	4785cd959a	[TYPED=1] cvar should allow dtype as a tuple (#11770 ) * cvar dtype:DType\|tuple[DType, ...]\|None=None * fmt * add a test * list typeguard as a dep for CI * extra step to install mypy * fix venv * ci fixes * mv typeguard to testing install group * simpler TYPED=1 test * add typeguard to lint group	2025-08-26 12:49:51 -04:00
George Hotz	6540bb32a6	move into codegen late [pr] (#11823 )	2025-08-24 10:23:25 -07:00
wozeparrot	bcc7623025	feat: bump version to 0.11.0 (#11736 )	2025-08-19 17:08:56 -04:00
chenyu	4ddefbccb4	update setup packages (#11674 ) sorted, and added missing 'tinygrad.frontend' and 'tinygrad.runtime.autogen.nv'	2025-08-14 19:24:57 -04:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
George Hotz	067daee5be	pin torch to 2.7.1 (#11519 )	2025-08-05 15:58:57 -07:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
George Hotz	397826f0b4	add a test for 1B llm (#11124 ) * add a test for 1B llm * fix mbs * add apps to release	2025-07-07 18:47:25 -07:00
qazal	1127302c46	move perfetto to extra (#10994 ) * move perfetto to extra * update TestViz and fix tests * remove perfetto.html from viz directory * work * mypy	2025-06-27 01:53:54 +03:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
George Hotz	b41e0563a3	move stuff to kernelize folder (#10902 ) * move stuff to kernelize folder * oops, forgot that	2025-06-20 16:10:20 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
George Hotz	7ff175c022	cache a venv to avoid pip usage (#10689 ) * try built in pip caching * try venv * export venv * set VIRTUAL_ENV * revert that * venv key * fix * ci cache hit? * fix windows	2025-06-07 20:13:41 -07:00
qazal	ed37f29184	remove unused lib directory from viz setup [pr] (#10639 )	2025-06-05 13:54:31 +03:00
Fang-Pen Lin	b0913295d2	Add missing js files in python package data for viz (#10624 )	2025-06-04 10:49:43 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
wozeparrot	9b14e8c3cd	feat: tag 0.10.3 (#10310 )	2025-05-14 15:45:13 -07:00
wozeparrot	2df2ec6640	feat: unpin hypothesis (#10306 )	2025-05-14 14:26:28 -07:00
chenyu	61bfd23881	update mlperf-logging version (#9995 )	2025-04-22 19:32:39 -04:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
chenyu	fe6a482f1d	pin hypothesis version to 6.131.0 (#9920 ) 6.131.1 seems to cause timeout in CI	2025-04-17 16:34:10 -04:00
chenyu	57f4bc3fbb	add numpy to setup linting (#9806 ) this would have caught the mypy error in fp8 pr. keep ignore_missing_imports to true as we also import torch which is fat	2025-04-09 03:47:03 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	8fe83385ec	add system json for mi300x mlperf (#9786 ) * add system json for mi300x mlperf ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v5.0/tinycorp/systems/tinybox_8xMI300X.json training 4.1.0 INFO - System description checker passed for tinybox 8xMI300X ``` also removed the rocm from tinybox_red since we are not using it * update mlperf-logging version	2025-04-08 06:36:44 -04:00
chenyu	4a807ee952	remove duplicated z3-solver in setup.py (#9787 )	2025-04-08 06:12:58 -04:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
chenyu	c20f112e9f	example test use z3 to verify valid simplification (#9684 )	2025-04-02 01:05:52 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
qazal	1db4341e9f	move viz graph to lib/graph [pr] (#9196 ) * move viz graph to lib/graph [pr] * add package * share with program	2025-02-21 21:04:07 +01:00
Simon R	2318d7ac51	Add missing tinygrad.runtime.autogen.am to packages (#9194 )	2025-02-21 15:38:24 +02:00
George Hotz	d3a21cced2	hotfix: bump version to 0.10.2	2025-02-21 10:43:49 +08:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
chenyu	287de4ecc6	use torch in test_gradient (#9186 ) used torch.autograd.grad, but not sure if it can be a template like jax	2025-02-20 12:26:11 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	5844883e59	bump master version	2025-02-05 09:08:28 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	e2f87ecf36	start work on new gradient (#7838 ) * start work on new gradient * more correct * working tests * more tests * work * add (faliing) gradient test * add view and reduce gradient * test_add works, many failing test_ops * add max and reduce max * add max and reduce max * 129 failing * 108 failed * better view drawing * 101 failed * i got 99 failures * 94 failures * it's tons of terrible code, but only 50 tests fail * only 19 failures * same 19 but shorter * minimal doesn't matter * shorter * lil simpler * simpler * simpler * simpler * 13 test failures * nine tests fail * all ops tests pass * add contiguous gradient + fix sched tests * faster by removing toposort calls * missed one * add jax to testing	2024-12-13 16:45:53 -08:00

1 2 3 4

165 Commits