tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 04:47:56 -05:00

Author	SHA1	Message	Date
uuuvn	a51c688f39	Cleanup llvm cleanup (and some clang things too) (#8871 ) * Cleanup llvm cleanup (and some clang things too) * Tests * Tests 2 * forgot mockgpu * more print some sources	2025-02-05 07:49:05 +08:00
George Hotz	56fa5c1191	dsp simulator (#8869 ) * dsp simulator * progress * fix * close on test tiny * working * less waste * line savings * Device DSP compiler * mock DSP at the bottom * DSP tests * docker caching * test update * need load * skip that test for CI DSP * last touch * ugh	2025-02-04 09:45:04 +08:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
chenyu	7f606fbde4	remove DEBUG=5 in windows ci test [pr] (#8803 ) DEBUG=5 prints a lot of info that's slow, and is not visible if test passed on CI. also skip two tests that took 3 minutes in python backend	2025-01-29 14:18:17 -05:00
FICTURE7	ec120ce6b9	Fix allocator memory alignment (#8800 ) * Fix allocator memory alignment * Run `test_ops.py` using LLVM and CLANG on Windows	2025-01-29 21:03:17 +03:00
b1tg	da464d039f	fix windows ci cache (#8787 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-28 13:22:15 +02:00
b1tg	5d62aa28dc	Support CLANG backend on Windows (#8768 ) * Support CLANG on Windows * Put both backends in a windows ci * remove coff loader * use memmove --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 18:19:34 +09:00
b1tg	efc7971090	add windows test to ci (#8761 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-27 14:53:21 +09:00
George Hotz	1b4618e257	gradient cleanup (#8750 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi * remove unused backward passes [pr] * lower line count	2025-01-26 09:30:55 +09:00
chenyu	0c759e1ff6	add bert to bechmark ci (#8741 ) with `DISABLE_DROPOUT=1 BERT_LAYERS=2` for now	2025-01-24 14:45:11 -05:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	0addbad36d	Happy New Year! Let's get AM merged	2024-12-30 13:15:10 -05:00
qazal	9defbc7d54	add symbolic_simple to the scheduler [pr] (#8419 )	2024-12-26 20:05:08 +08:00
George Hotz	9f62c80f68	hotfix: this is a loan	2024-12-20 14:47:04 -08:00
qazal	d78e75f710	hotfix: use ubuntu-22.04 ci from 8249 (#8251 )	2024-12-15 02:23:00 +02:00
George Hotz	8a50868264	touchup function.py [pr] (#8220 ) * touchup function.py [pr] * remove ALLOWED_READ_IMAGE * eh, keep it, just change it	2024-12-13 13:07:00 -08:00
ignaciosica	0a00187dce	add real AMX tests to benchmark (#8216 ) * add real amx to benchmark * add debug=2 to check tc are triggered	2024-12-13 14:03:41 -05:00
George Hotz	d9a0880d33	delete fuzz uops (not tested) [pr] (#8181 )	2024-12-12 01:41:27 -08:00
chenyu	26e049ab40	add ALLOWED_READ_IMAGE=2131 to openpilot (#8166 ) added as exact number check now as it's not clear if more/less than allowed is any better	2024-12-11 12:14:17 -08:00
Ahmed Harmouche	a73e3677d0	Test linearizer on webgpu (#8159 ) * Test linearizer on wgpu * Skip tests due to exceeded dims	2024-12-11 17:03:26 +01:00
chenyu	d462f8ace0	use HALF in cifar wino benchmarks (#8153 ) more representative as it hits tensor cores on tinyboxes	2024-12-10 20:21:00 -05:00
Ahmed Harmouche	a8cfdc70ed	Run more webgpu tests (#8142 )	2024-12-10 23:20:04 +01:00
Ahmed Harmouche	ed7318a3f5	Fix puppeteer install (#8148 ) Clean npm cache before puppeteer install	2024-12-10 23:06:33 +01:00
Ahmed Harmouche	71dd222f66	Fix setitem on wgpu (#8144 )	2024-12-10 19:34:25 +01:00
George Hotz	f83d715f41	move checks into compile3, delete compile2 [pr] (#8127 ) * move checks into compile3 [pr] * test_vs_onnx * test v torch works * float16 won't compile on compile3 * actually delete compile2	2024-12-09 14:21:42 -08:00
George Hotz	87c360c4b5	hotfix: add --size 8B to llama3	2024-12-09 07:53:20 -08:00
chenyu	e9692de42b	don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096 ) mostly for speed, this is just making sure the script runs	2024-12-06 17:22:17 -05:00
Ahmed Harmouche	ce72fe1411	u32 to f16 in tinygrad (#8074 ) * f16 decompression in tinygrad * Typing and cleanup	2024-12-06 12:00:13 +01:00

... 4 5 6 7 8 ...

916 Commits