tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	2158dc4849	full fix for as_strided in torch backend (#9257 ) * fixes from chargpt for torch backend * shrink support * add stride support * comment cleanup * a few more * work * import the stream hack * llvm multi auto	2025-02-26 22:34:05 +08:00
George Hotz	7780393460	rig up torch's testing framework [pr] (#9254 ) * rig up torch's testing framework [pr] * support more movement ops * dec on expand * fix tests * work * fix tests * a few more * decomps + opt hook * installed pytest	2025-02-26 18:46:22 +08:00
George Hotz	b603af373e	run some tests from torch [pr] (#9252 ) * run some tests from torch [pr] * yml * wrap_out * clean up for the new people * a lil more	2025-02-26 15:42:22 +08:00
chenyu	731d14e718	hotfix bump testmetal2 timeout-minutes to 20 (#9235 ) setup is taking too long	2025-02-24 20:23:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
George Hotz	fd731e740a	hotfix: add note on backend2.py	2025-02-24 11:23:03 +08:00
chenyu	e0adb1fc76	really run test_ops with TINY_BACKEND in ci (#9206 ) was failing with `line 1: pytest: command not found`	2025-02-22 15:51:24 -05:00
George Hotz	97bc723538	torch backend works for ResNet-18 (#9200 ) * torch backend progress, a few more functions * resnet works * pillow * tv	2025-02-22 22:16:23 +08:00
George Hotz	f92820d30d	torch backend tests (#9198 ) * torch backend tests * pythonpath * install ninja	2025-02-22 16:01:49 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
Ahmed Harmouche	0f94b98646	Force WebGPU backend type [pr] (#9164 ) * Force webgpu backend type * Mypy fix * Rename to WEBGPU_BACKEND * Add it to env_vars docs * Remove link	2025-02-19 17:19:39 +08:00
George Hotz	af9d8d39d2	dsp matchers + bump line count to 11300 (#9130 )	2025-02-17 17:31:54 +08:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
George Hotz	7e09057afa	fixup clang devectorize (#9099 ) * fixup clang devectorize * __builtin_convertvector is some casts * dsp fixups	2025-02-15 09:29:47 +08:00
JaSpa99	d2ff55e9c6	OSX GPUOcelot (#8209 ) * add patches * add osx test in ci * macos specific uvm, gpfifo mask * only do that for now * Revert "add patches" This reverts commit `80d3112a57`. * use fork for now * workflow only one worker * merge osxtests with tests * Revert "merge osxtests with tests" This reverts commit `3461c8f46c`. * macos pagesize 16384 --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 12:24:29 +08:00
rmtew	b3eab03055	Three things to get Windows CI working correctly: (#9047 ) - Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV - It doesn't actually persist for Windows unless shell is explicitly set to bash. - Add the assertion to ensure the selected backend is actually used.	2025-02-12 14:41:00 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	45aae8a6bc	hotfix: add External Benchmark Schedule to CI	2025-02-11 22:06:17 +08:00
chenyu	6c39aa4a6b	adjust cuda ci test targets (#9014 )	2025-02-10 15:29:59 -05:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
George Hotz	208097d488	try reducing testing deps [pr] (#8976 ) * reduce testing deps * break out test models * add PR to models, add models to metal * okay, not that * mac cleanup * mac typo * other typo	2025-02-09 15:22:32 +08:00
George Hotz	5bdd6a1cc4	increase CI speed with more runners [pr] (#8961 ) * increase CI speed with more runners [pr] * splits + cleanups [pr] * more runners * need that dep * split that too * can't be minimal * move test readme * bugfix + naming * one more split * bump to 22.04	2025-02-08 09:04:36 +08:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
George Hotz	9ed2d0dfa2	refactor into subactions (#8946 ) * refactor into subactions * this work? * add shell * move install opencl * valid? * support mac os x * refactor other osx * fix linux/osx * fixes * cleanups * used everywhere * no quotes * quotes on true * bugfixes * this run? * hardcode * that * process replay action * fix checkout * restore to branch * fix caching * fix osx python cache * does replace function exist * Revert "does replace function exist" This reverts commit `622177c5a0`. * Revert "fix osx python cache" This reverts commit `e70d55cd93`. * user on osx to fix untar issue * that	2025-02-07 18:06:44 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	dbda72f91d	hotfix: raise line limit to 11200 for new webgpu backend	2025-02-07 14:29:20 +08:00
George Hotz	b1e1319972	ci speed on the enterprise plan [pr] (#8942 )	2025-02-07 11:18:12 +08:00
uuuvn	a51c688f39	Cleanup llvm cleanup (and some clang things too) (#8871 ) * Cleanup llvm cleanup (and some clang things too) * Tests * Tests 2 * forgot mockgpu * more print some sources	2025-02-05 07:49:05 +08:00
George Hotz	56fa5c1191	dsp simulator (#8869 ) * dsp simulator * progress * fix * close on test tiny * working * less waste * line savings * Device DSP compiler * mock DSP at the bottom * DSP tests * docker caching * test update * need load * skip that test for CI DSP * last touch * ugh	2025-02-04 09:45:04 +08:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
chenyu	7f606fbde4	remove DEBUG=5 in windows ci test [pr] (#8803 ) DEBUG=5 prints a lot of info that's slow, and is not visible if test passed on CI. also skip two tests that took 3 minutes in python backend	2025-01-29 14:18:17 -05:00
FICTURE7	ec120ce6b9	Fix allocator memory alignment (#8800 ) * Fix allocator memory alignment * Run `test_ops.py` using LLVM and CLANG on Windows	2025-01-29 21:03:17 +03:00
b1tg	da464d039f	fix windows ci cache (#8787 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-28 13:22:15 +02:00
b1tg	5d62aa28dc	Support CLANG backend on Windows (#8768 ) * Support CLANG on Windows * Put both backends in a windows ci * remove coff loader * use memmove --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 18:19:34 +09:00
b1tg	efc7971090	add windows test to ci (#8761 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-27 14:53:21 +09:00
George Hotz	1b4618e257	gradient cleanup (#8750 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi * remove unused backward passes [pr] * lower line count	2025-01-26 09:30:55 +09:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00

1 2 3 4 5 ...

518 Commits