tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
chenyu	287de4ecc6	use torch in test_gradient (#9186 ) used torch.autograd.grad, but not sure if it can be a template like jax	2025-02-20 12:26:11 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	5844883e59	bump master version	2025-02-05 09:08:28 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	e2f87ecf36	start work on new gradient (#7838 ) * start work on new gradient * more correct * working tests * more tests * work * add (faliing) gradient test * add view and reduce gradient * test_add works, many failing test_ops * add max and reduce max * add max and reduce max * 129 failing * 108 failed * better view drawing * 101 failed * i got 99 failures * 94 failures * it's tons of terrible code, but only 50 tests fail * only 19 failures * same 19 but shorter * minimal doesn't matter * shorter * lil simpler * simpler * simpler * simpler * 13 test failures * nine tests fail * all ops tests pass * add contiguous gradient + fix sched tests * faster by removing toposort calls * missed one * add jax to testing	2024-12-13 16:45:53 -08:00
Ahmed Harmouche	1b94cc095a	Bump back wgpu to latest (#8179 )	2024-12-12 09:40:52 +01:00
JaSpa99	3c5d5f9414	mypy==1.13.0 (#7990 ) * explicit instantiation and narrowing asserts * explicit cast * bump * one line assert * handle case for no copy_queue_t * Revert "handle case for no copy_queue_t" This reverts commit `38347806ca`. * more readable control flow --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-06 12:09:14 +08:00
Ahmed Harmouche	146e1caea3	Downgrade wgpu to prevent sd segfault (#7969 )	2024-12-02 15:48:44 +01:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
George Hotz	65f188aafb	bump version to 0.10.0	2024-11-19 08:27:28 +08:00
George Hotz	d40673505f	new cloud is cloudy [pr] (#7631 ) * new cloud is cloudy [pr] * waste lines to add security * safety, with speed and less lines * timing and del * lines * cleanups * restore CloudSession * bump to 3.10 * quotes * renderer security	2024-11-11 20:18:04 +08:00
George Hotz	4af228e9fc	hotfix: pin mypy	2024-10-21 16:22:24 +08:00
leopf	b6d9b276bb	GGUF support (#7046 ) * basic loader, untested * testing * remove utils import in test * q8_0 * q4_1 * end to end testing * minor cleanup * fix casting * moved to state * move tests * move dequant to fn * fix lint elif * remove gguf from extra * fix dict union * q6_k simpler * naming and spacing * gpt2-gguf example * cleanup * move gguf example * minor cleanup --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-21 16:15:34 +08:00
chenyu	68e59eb3f5	update mlperf-logging to 4.1.0-rc3 (#6796 )	2024-09-28 21:45:37 -04:00
wozeparrot	abd484a9f7	fix: need numpy for docs and testing (#6766 )	2024-09-26 16:44:59 +08:00
wozeparrot	2b899164c6	no numpy (#6751 )	2024-09-26 16:40:18 +08:00
mesozoic-egg	992cde05d7	Metal with CDLL instead of py-objc (#6545 ) * Add CDLL interface for metal * remove two unused functions * Cover most of the API methods * switch to cdll * directly call objc message in ops_metal * keep only obj interface * Use direct message sending for graph * may have found a solution to the memoryview on ctypes pointer * buf indexing bug fixed * fix c_int * fix c int to bytes * fix gpu time bug * line savings for cdll metal core * wip * c int bug * fix buf casting * dedup for c_void_p * dedup for c_void_p * linter fix * remove unused stuff * my py fix * more mypy error fix * line savings * line savings * rename send_message to msg; add __hash__ and __eq__ for dedup * wip * refactor * refactor * remove named import from ctypes * forgot to change variable name * file reorg, put support.py to ops_metal * refactor * hash error * remove to_ns_array * test oom exception, fix exception change * typevar for msg * add back dedup * test for compile error * move constant to graph * move header constant around * get label for icb buffer * check icb label using "in" * wip fixing mypy reported error * fixed mypy error * code formatting * all_resources dedup match previous * code formatting * code formatting; buffer set to objc_id * revert changes on buf for the manual release, seems like _free is not always called * skip unless on metal, for test_metal * fix premature mem release causing seg fault * test_metal check for device before importing * Buffer should only be released under _free explicitly * mypy fixes * change object ownership * test compile success * lint fixes * remove load_library * wrap sel_register in cache * simplify to_struct * swap lines * fix type error in to_struct * bump line to 9800 * remove pyobjc from setup.py * command buffer should be objc_instance and get released * stringWithUTF8String: returns objc_instance * Use constant for MTLPipelineOptionNone * better explanation for [MTLBuffer contents:] return * Use dyld_find in case the path differs * trailing whitespace * handle exception for methods that take error: * load /System/Library instead of /Library * Init c_void_p with None instead of zero for error objects --------- Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-25 17:43:01 +08:00
George Hotz	c6e117c899	add a single py.typed (#6083 )	2024-08-14 17:31:46 -07:00
wozeparrot	518c022c29	feat: tag 0.9.2 (#6067 )	2024-08-13 16:15:36 -07:00
chenyu	2cadf21684	include "mkdocs" in setup docs (#5798 )	2024-07-29 15:54:52 -04:00
uuuvn	3cb94a0a15	Rename tinygrad/runtime/driver to support (#5413 )	2024-07-12 11:06:42 -07:00
wozeparrot	dfbee4f0f5	feat: add blobfile to testing (#5254 )	2024-07-01 19:33:58 -07:00
wozeparrot	7bcb74ab23	feat: tag 0.9.1 (#5220 )	2024-06-28 20:16:14 -07:00
wozeparrot	8209cd3c55	easier llama3 + fetch subdir (#4938 )	2024-06-14 13:47:27 -07:00
SnakeOnex	b1db2d0094	tqdm replacement (#4846 ) * tqdm replacement almost * formatting * formatting * imports * line len * fix * removed set description :( * removed set description :( * fix * fix * green check? * rewrote as class, fixed several bugs * types spacing * removed imports * fix * iterable * typing * mypy disagreement * imports * more e2e tests vs tqdm * removed seed setting * robustness against time.sleep() flakiness * flaky fix * automatic bar closing when count==total * cleanup * clang error with tqdm * tqdm back * use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program * back to shutil * unit_scale + unit_scale test * custom unit to tests * pretty * clean * removed flaky test * less test iters * empty line * remove disable	2024-06-09 23:46:03 +02:00
wozeparrot	6fcf220b21	feat: tag 0.9.0 (#4762 )	2024-05-28 18:44:45 +00:00
George Hotz	5ba611787d	move image into tensor.py. delete features (#4603 ) * move image into tensor.py * change setup.py * openpilot tests need pythonpath now	2024-05-15 10:50:25 -07:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
chenyu	d31e220cbf	add mlperf-logging to setup.py mlperf (#4289 )	2024-04-24 23:34:34 -04:00
wozeparrot	4c99d49c4d	some docstrings (#4201 ) * feat: create and data access docstrings * fix: linter --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-21 16:34:08 +04:00
George Hotz	8f749ae0eb	New docs are in mkdocs (#4178 ) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data	2024-04-16 10:59:51 +04:00
geohotstan	fe88591890	update onnx to 1.16.0 (#4127 ) * update * pass tests and skip tests	2024-04-10 11:19:13 -04:00
George Hotz	150ea2eb76	create engine folder and move code (#3948 ) * retry * older tf * that	2024-03-26 20:38:03 -07:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
George Hotz	fe97a85014	the compiler is a driver (#3427 )	2024-02-16 10:18:09 +01:00
George Hotz	473935125a	use comgr to compile (#3248 ) * use comgr to compile * fast * bfloat16 * move comgr to it's own file * cleaner style * comgr in new place * comgr free + dtype cleanup	2024-01-26 18:27:49 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00
George Hotz	8cbcd1b342	Remove webgpu, back to 5k lines (#3040 ) * remove webgpu * max 5000 lines	2024-01-08 09:10:07 -08:00
George Hotz	6617dcf095	move graph to runtime, check line count with sz.py (#2842 ) * move graph to runtime, check line count with sz.py * oops, didn't save * dtype aliases * restore comment, REALCOUNT	2023-12-18 20:30:06 -08:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
qazal	29f2653d8d	add graph (#2670 )	2023-12-07 10:53:31 -08:00
qazal	c704a77ca0	green dtypes ALU tests (#2617 ) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-06 08:15:46 -08:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00

1 2 3 4

176 Commits