tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	7ff175c022	cache a venv to avoid pip usage (#10689 ) * try built in pip caching * try venv * export venv * set VIRTUAL_ENV * revert that * venv key * fix * ci cache hit? * fix windows	2025-06-07 20:13:41 -07:00
qazal	ed37f29184	remove unused lib directory from viz setup [pr] (#10639 )	2025-06-05 13:54:31 +03:00
Fang-Pen Lin	b0913295d2	Add missing js files in python package data for viz (#10624 )	2025-06-04 10:49:43 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
wozeparrot	9b14e8c3cd	feat: tag 0.10.3 (#10310 )	2025-05-14 15:45:13 -07:00
wozeparrot	2df2ec6640	feat: unpin hypothesis (#10306 )	2025-05-14 14:26:28 -07:00
chenyu	61bfd23881	update mlperf-logging version (#9995 )	2025-04-22 19:32:39 -04:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
chenyu	fe6a482f1d	pin hypothesis version to 6.131.0 (#9920 ) 6.131.1 seems to cause timeout in CI	2025-04-17 16:34:10 -04:00
chenyu	57f4bc3fbb	add numpy to setup linting (#9806 ) this would have caught the mypy error in fp8 pr. keep ignore_missing_imports to true as we also import torch which is fat	2025-04-09 03:47:03 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	8fe83385ec	add system json for mi300x mlperf (#9786 ) * add system json for mi300x mlperf ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v5.0/tinycorp/systems/tinybox_8xMI300X.json training 4.1.0 INFO - System description checker passed for tinybox 8xMI300X ``` also removed the rocm from tinybox_red since we are not using it * update mlperf-logging version	2025-04-08 06:36:44 -04:00
chenyu	4a807ee952	remove duplicated z3-solver in setup.py (#9787 )	2025-04-08 06:12:58 -04:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
chenyu	c20f112e9f	example test use z3 to verify valid simplification (#9684 )	2025-04-02 01:05:52 -04:00
geohotstan	a08b07b4da	Bump onnx==1.17.0 (#9618 ) * bump * remove resize tf_crop_and_resize --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-30 03:21:51 -04:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
qazal	1db4341e9f	move viz graph to lib/graph [pr] (#9196 ) * move viz graph to lib/graph [pr] * add package * share with program	2025-02-21 21:04:07 +01:00
Simon R	2318d7ac51	Add missing tinygrad.runtime.autogen.am to packages (#9194 )	2025-02-21 15:38:24 +02:00
George Hotz	d3a21cced2	hotfix: bump version to 0.10.2	2025-02-21 10:43:49 +08:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
chenyu	287de4ecc6	use torch in test_gradient (#9186 ) used torch.autograd.grad, but not sure if it can be a template like jax	2025-02-20 12:26:11 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	5844883e59	bump master version	2025-02-05 09:08:28 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	e2f87ecf36	start work on new gradient (#7838 ) * start work on new gradient * more correct * working tests * more tests * work * add (faliing) gradient test * add view and reduce gradient * test_add works, many failing test_ops * add max and reduce max * add max and reduce max * 129 failing * 108 failed * better view drawing * 101 failed * i got 99 failures * 94 failures * it's tons of terrible code, but only 50 tests fail * only 19 failures * same 19 but shorter * minimal doesn't matter * shorter * lil simpler * simpler * simpler * simpler * 13 test failures * nine tests fail * all ops tests pass * add contiguous gradient + fix sched tests * faster by removing toposort calls * missed one * add jax to testing	2024-12-13 16:45:53 -08:00
Ahmed Harmouche	1b94cc095a	Bump back wgpu to latest (#8179 )	2024-12-12 09:40:52 +01:00
JaSpa99	3c5d5f9414	mypy==1.13.0 (#7990 ) * explicit instantiation and narrowing asserts * explicit cast * bump * one line assert * handle case for no copy_queue_t * Revert "handle case for no copy_queue_t" This reverts commit `38347806ca`. * more readable control flow --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-06 12:09:14 +08:00
Ahmed Harmouche	146e1caea3	Downgrade wgpu to prevent sd segfault (#7969 )	2024-12-02 15:48:44 +01:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
George Hotz	65f188aafb	bump version to 0.10.0	2024-11-19 08:27:28 +08:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
George Hotz	4af228e9fc	hotfix: pin mypy	2024-10-21 16:22:24 +08:00
leopf	b6d9b276bb	GGUF support (#7046 ) * basic loader, untested * testing * remove utils import in test * q8_0 * q4_1 * end to end testing * minor cleanup * fix casting * moved to state * move tests * move dequant to fn * fix lint elif * remove gguf from extra * fix dict union * q6_k simpler * naming and spacing * gpt2-gguf example * cleanup * move gguf example * minor cleanup --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-21 16:15:34 +08:00
chenyu	68e59eb3f5	update mlperf-logging to 4.1.0-rc3 (#6796 )	2024-09-28 21:45:37 -04:00
wozeparrot	abd484a9f7	fix: need numpy for docs and testing (#6766 )	2024-09-26 16:44:59 +08:00
wozeparrot	2b899164c6	no numpy (#6751 )	2024-09-26 16:40:18 +08:00
mesozoic-egg	992cde05d7	Metal with CDLL instead of py-objc (#6545 ) * Add CDLL interface for metal * remove two unused functions * Cover most of the API methods * switch to cdll * directly call objc message in ops_metal * keep only obj interface * Use direct message sending for graph * may have found a solution to the memoryview on ctypes pointer * buf indexing bug fixed * fix c_int * fix c int to bytes * fix gpu time bug * line savings for cdll metal core * wip * c int bug * fix buf casting * dedup for c_void_p * dedup for c_void_p * linter fix * remove unused stuff * my py fix * more mypy error fix * line savings * line savings * rename send_message to msg; add __hash__ and __eq__ for dedup * wip * refactor * refactor * remove named import from ctypes * forgot to change variable name * file reorg, put support.py to ops_metal * refactor * hash error * remove to_ns_array * test oom exception, fix exception change * typevar for msg * add back dedup * test for compile error * move constant to graph * move header constant around * get label for icb buffer * check icb label using "in" * wip fixing mypy reported error * fixed mypy error * code formatting * all_resources dedup match previous * code formatting * code formatting; buffer set to objc_id * revert changes on buf for the manual release, seems like _free is not always called * skip unless on metal, for test_metal * fix premature mem release causing seg fault * test_metal check for device before importing * Buffer should only be released under _free explicitly * mypy fixes * change object ownership * test compile success * lint fixes * remove load_library * wrap sel_register in cache * simplify to_struct * swap lines * fix type error in to_struct * bump line to 9800 * remove pyobjc from setup.py * command buffer should be objc_instance and get released * stringWithUTF8String: returns objc_instance * Use constant for MTLPipelineOptionNone * better explanation for [MTLBuffer contents:] return * Use dyld_find in case the path differs * trailing whitespace * handle exception for methods that take error: * load /System/Library instead of /Library * Init c_void_p with None instead of zero for error objects --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-25 17:43:01 +08:00
George Hotz	c6e117c899	add a single py.typed (#6083 )	2024-08-14 17:31:46 -07:00

1 2 3 4

152 Commits