tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
JaSpa99	d2ff55e9c6	OSX GPUOcelot (#8209 ) * add patches * add osx test in ci * macos specific uvm, gpfifo mask * only do that for now * Revert "add patches" This reverts commit `80d3112a57`. * use fork for now * workflow only one worker * merge osxtests with tests * Revert "merge osxtests with tests" This reverts commit `3461c8f46c`. * macos pagesize 16384 --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-13 12:24:29 +08:00
rmtew	b3eab03055	Three things to get Windows CI working correctly: (#9047 ) - Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV - It doesn't actually persist for Windows unless shell is explicitly set to bash. - Add the assertion to ensure the selected backend is actually used.	2025-02-12 14:41:00 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
George Hotz	45aae8a6bc	hotfix: add External Benchmark Schedule to CI	2025-02-11 22:06:17 +08:00
chenyu	6c39aa4a6b	adjust cuda ci test targets (#9014 )	2025-02-10 15:29:59 -05:00
chenyu	f9898f7554	update gpuocelot commit (#9011 )	2025-02-10 12:18:44 -05:00
qazal	b17ec42b56	remove const_arg (#9002 ) * remove const_arg * use -m pytest * remove test_const_arg test, variable arg on CONST does not exist. * use base in test_const_dtype	2025-02-10 12:45:11 +01:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
George Hotz	2983285315	use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] (#8993 ) * use HEX_REG_QEMU_INSN_CNT from qemu as a DSP timer [pr] * add quantize test to dsp * fix tests * older onnx * debug, let's see what's happening	2025-02-10 11:07:35 +08:00
nimlgen	52a69dd5e9	Revert "use am in training benchmarks (#8965 )" (#8981 ) This reverts commit `107e616857`.	2025-02-09 15:43:45 +03:00
George Hotz	208097d488	try reducing testing deps [pr] (#8976 ) * reduce testing deps * break out test models * add PR to models, add models to metal * okay, not that * mac cleanup * mac typo * other typo	2025-02-09 15:22:32 +08:00
nimlgen	107e616857	use am in training benchmarks (#8965 ) * am in training benchmarks * fix * not needed anymore	2025-02-08 20:20:47 +03:00
qazal	e7182bbb2c	fix "fatal bad object" log in process replay [pr] (#8966 )	2025-02-08 11:57:38 +01:00
George Hotz	5bdd6a1cc4	increase CI speed with more runners [pr] (#8961 ) * increase CI speed with more runners [pr] * splits + cleanups [pr] * more runners * need that dep * split that too * can't be minimal * move test readme * bugfix + naming * one more split * bump to 22.04	2025-02-08 09:04:36 +08:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
George Hotz	9ed2d0dfa2	refactor into subactions (#8946 ) * refactor into subactions * this work? * add shell * move install opencl * valid? * support mac os x * refactor other osx * fix linux/osx * fixes * cleanups * used everywhere * no quotes * quotes on true * bugfixes * this run? * hardcode * that * process replay action * fix checkout * restore to branch * fix caching * fix osx python cache * does replace function exist * Revert "does replace function exist" This reverts commit `622177c5a0`. * Revert "fix osx python cache" This reverts commit `e70d55cd93`. * user on osx to fix untar issue * that	2025-02-07 18:06:44 +08:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
George Hotz	dbda72f91d	hotfix: raise line limit to 11200 for new webgpu backend	2025-02-07 14:29:20 +08:00
George Hotz	b1e1319972	ci speed on the enterprise plan [pr] (#8942 )	2025-02-07 11:18:12 +08:00
George Hotz	0cbb7d7f1e	hotfix: metal has known sync issue	2025-02-06 14:29:41 +08:00
uuuvn	a51c688f39	Cleanup llvm cleanup (and some clang things too) (#8871 ) * Cleanup llvm cleanup (and some clang things too) * Tests * Tests 2 * forgot mockgpu * more print some sources	2025-02-05 07:49:05 +08:00
George Hotz	56fa5c1191	dsp simulator (#8869 ) * dsp simulator * progress * fix * close on test tiny * working * less waste * line savings * Device DSP compiler * mock DSP at the bottom * DSP tests * docker caching * test update * need load * skip that test for CI DSP * last touch * ugh	2025-02-04 09:45:04 +08:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
chenyu	7f606fbde4	remove DEBUG=5 in windows ci test [pr] (#8803 ) DEBUG=5 prints a lot of info that's slow, and is not visible if test passed on CI. also skip two tests that took 3 minutes in python backend	2025-01-29 14:18:17 -05:00
FICTURE7	ec120ce6b9	Fix allocator memory alignment (#8800 ) * Fix allocator memory alignment * Run `test_ops.py` using LLVM and CLANG on Windows	2025-01-29 21:03:17 +03:00
b1tg	da464d039f	fix windows ci cache (#8787 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-28 13:22:15 +02:00
b1tg	5d62aa28dc	Support CLANG backend on Windows (#8768 ) * Support CLANG on Windows * Put both backends in a windows ci * remove coff loader * use memmove --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-28 18:19:34 +09:00
b1tg	efc7971090	add windows test to ci (#8761 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-01-27 14:53:21 +09:00
George Hotz	1b4618e257	gradient cleanup (#8750 ) * switch backward to use gradient [pr] * set device correctly, dedup * why does that fail? * add noop cast * simple backward * fix beautiful_mnist * touchups * set in compute_gradient * uop_count * uop_count was wrong * collections * no note * skip that test * update sched kernel counts * train mnist is 65 * fix metadata and gc * fixes * materialize_grads * no pathlib stuff * add contiguous_backward, fix bugs * add some realize * fix multi * remove unused backward passes [pr] * lower line count	2025-01-26 09:30:55 +09:00
chenyu	0c759e1ff6	add bert to bechmark ci (#8741 ) with `DISABLE_DROPOUT=1 BERT_LAYERS=2` for now	2025-01-24 14:45:11 -05:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00

1 2 3 4 5 ...

687 Commits