tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 06:34:03 -05:00

Author	SHA1	Message	Date
George Hotz	fd02ab1e8b	move disassemblers and openpilot (#4592 ) * move disassemblers and openpilot * delete junk * put that in pre-commit * fixup readme	2024-05-14 19:30:02 -07:00
nimlgen	9b02aef45a	remove rhip (#4579 ) * remove rhip * remove hip runner	2024-05-14 17:58:19 +03:00
nimlgen	2131556c2c	amd mockgpu (#4535 ) * start mock amd gpu * virt files * cleaner * init ci * small fixes * linter * better? * ugh * linter * fix * diable some * run shorter * fixes * add hcq test * fix * fix cmd revert	2024-05-14 14:28:04 +03:00
qazal	760776c59d	merge EfficientNet to C with clang job (#4426 ) * merge ImageNet to C with linters * add to clang * delete from linter	2024-05-05 20:33:12 +03:00
Francis Lata	bb849a57d1	[MLPerf] UNet3D dataloader (#4343 ) * add support for train/val datasets for kits19 * split dataset into train and val sets * add tests for kits19 dataloader * add MLPerf dataset tests to CI * update unet3d model_eval script * fix linting * add nibabel * fix how mock dataset gets created * update ref implementation with permalink and no edits * clean up test and update rand_flip implementation * cleanups	2024-04-28 22:34:18 -04:00
George Hotz	dbe3e1d548	or true fixes ci (#4283 ) * or true fixes ci * all with two pipes	2024-04-24 20:48:26 +08:00
George Hotz	9e53d6cffa	hotfix: 8000 lines	2024-04-22 20:58:16 +04:00
nimlgen	e6227bdb15	nv driver (#4044 ) * start * fix err 93 * gpu * ioctl mappings * alloc like cuda * semaphores * wait for semaphores value * start ops_nv * very simple kernels work * init several gpus * qmd dumper * dirty, but most of kernels work * always all test_ops * progress, more tests, stable * test_ops passes, gpt2 works but wth big fifo, wrap of fifo doesn't work, i think it's something coherency releated * need better sync * fix sync * alloc2 * all tests pass! * cleanup 1 * cleanup * multigpu, simple transfer * fix sync * correct init * nv_gpu autogen + sync bug fix * clean extra/nv_gpu_driver * p2p * clean up * remove old gen * small fixes * cleanup * cleanup 2 * small fixes * bigger queue size * cleanups * wait * fixed signals for devs * fix hang + parallel beam * small fixes * detect when local memory is big in kernel * correct assert * small fixes * correct tls size est * one va space * less lines * shorter * save 2 lines * save some lines * remove type ignores --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-22 19:50:20 +04:00
chenyu	f1d9d0a151	cleanup external_test_opt (#4234 ) no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now	2024-04-20 04:00:08 -04:00
Francis Lata	3644077a42	[MLPerf][UNet3D] Add DICE loss + metrics (#4204 ) * add DICE loss and metrics * update dice to include reference implementation's link * remove unused imports * remove unnecessary test file and update pred + label for metrics and losses test * add tests to CI + add exclusion of mlperf_unet3d --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-04-17 20:09:33 -04:00
qazal	ba8602612b	Fuzz all permutations of schedule (#4136 ) * simple toposort * fuzzer * init in_degree * move to tests * same seed * configure paths * internal graph * compare LazyBuffers * simpler * simple graph * assign works * simpler * fix JIT * upstream ci * move ci * fix the path * DEBUG=1 * limit max paths * launch a cmp kernel * Revert "launch a cmp kernel" This reverts commit `791c608992`. * exec ground truth * better perf * copy ground truth once * gpu allclose ast try1 * Revert "gpu allclose ast try1" This reverts commit `1f82103af3`. * prerealized bufs freezing * teeny cleanups * reuse Buffers * Revert "reuse Buffers" This reverts commit `a71de94b03`. --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-04-17 05:03:21 +04:00
George Hotz	8f749ae0eb	New docs are in mkdocs (#4178 ) * start mkdocs * simple docs for tensor * more docs * move those back * more docs * copy markdown extensions * docs legacy * docs building workflow * fix showcase links * only that? * install tinygrad * add docs to setup.py * Delete examples/llm.c/data	2024-04-16 10:59:51 +04:00
George Hotz	e14a9bca0c	hotfix: bump line count to 7500 for NV backend	2024-04-15 23:18:46 +04:00
George Hotz	10dbf90b2c	hotfix: test speed	2024-04-09 13:20:39 -07:00
geohotstan	15f2f39658	conceptually simpler fancy index (#3335 ) * init * add failed case * fix: temp comment out MULACC cast * is this right? * add test case * oops, forgot to get rid of temp test * WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY * cleaner * comment cleanup * update docs * resolve conflict * oops * SUPA FAST * comment out a test * del some print statements * use new broadcast stuff * more clean up * move try except * skip fancy indexing for python backend test_ops	2024-04-09 11:18:04 -04:00
chenyu	a023a1ed87	update github action to actions/cache@v4 (#4077 ) get rid of warning `Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/cache@v3.`	2024-04-04 22:24:26 -04:00
chenyu	9e0ebf8979	remove dtype from FlopCounter (#4075 ) the annoying thing to remove all FlopCounter is that for device that does not support local, matmul index alu is huge. we can remove the dtype first. sneak in updating `ruff` command to `ruff check`	2024-04-04 21:23:28 -04:00
George Hotz	2abb474d43	kfd driver wip (#3912 ) * kfd driver wip * cleanups * kfd almost ready to ring doorbell * ding dong? * issues with signals * something * works * ops kfd * add amd_signal_t * works...sometimes * program runs * _gpu_alloc cleanup * cleanups * work * header + enable profiling (#3959) * header + enable profiling * just cleaner * measure * only local time domain * remove old comments * fix with master * elf parsing (#3965) * elf parsing * fix kernels with private * not used * clean up * clean up 2 * add flags * kfd sdma (#3970) * working sdma * remove driver, shorter * all commands we might need * svm * kfd remove hardcoded values (#4007) * remove hardcoded values * match above line * 7k lines + revert hsa * update that from origin * fix sdma reg gen * not the updated SDMA * compiler_opts * don't require kfd_ioctl * get ioctls from python * get ioctls from python * remove build_sdma_command * merge into 64-bit fields * shorter * fix property spelling and off by one --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-30 15:08:12 -07:00
chenyu	3fee689ded	fix ops_python for test_uops (#3982 )	2024-03-28 22:48:55 -04:00
George Hotz	629cbc5587	only abstractions 2 (#3947 )	2024-03-26 20:02:18 -07:00
Francis Lam	5530b0cbed	fuzz_linearizer: reduce debug verbosity and make easier for CI usage (#3942 ) * fuzz_linearizer: reduce debug verbosity and make easier for CI usage * rename FUZZ_BEAM to FUZZ_ALL_ACTIONS (not choosing a subset) * skip simple ASTs (easier to use with LOGOPS output) * don't fuzz a previously seen AST * add options to allow non-zero --expected-failures * clean up naming and use set	2024-03-26 16:25:24 -04:00
chenyu	2c69888654	include negative float in test_dtype (#3884 ) * include negative float in test_dtype * that is ub * too annoying * pack can overflow	2024-03-24 02:39:15 -04:00
chenyu	ee502c8055	fixup to_movement_ops and add back to CI (#3881 )	2024-03-22 18:14:49 -04:00
George Hotz	f4055439dc	don't include hip common (#3851 ) * don't install hip common * only that * Revert "only that" This reverts commit `85f22015d9`. * less * needed * sep comgr * header file * 6.0.2 * update hsa * hsakmt * Revert "hsakmt" This reverts commit `d3a118078e`.	2024-03-22 08:50:50 -07:00
chenyu	47b9cc2dfe	use float32 for rand buffer in test_beam_search and test in metal (#3831 )	2024-03-19 23:22:58 -04:00
chenyu	20681d5c4a	remove old dist multigpu (#3811 )	2024-03-18 18:31:05 -04:00
chenyu	5dd048a378	remove HIP in core tinygrad (#3810 ) * remove HIP in core tinygrad ci test uses device RHIP and HSA compiler (LinearizerOpt), so fine to remove HIP from tc. Also updated README and EMULATE tc test flag * EMULATE_CUDA	2024-03-18 18:19:27 -04:00
wozeparrot	a0ab755317	threefry again (#3785 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| * feat: restore old --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-18 16:47:07 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
George Hotz	53adcb34f5	remove hip backend (#3783 ) * remove hip backend * remove unused * rhip * more RHIP	2024-03-17 10:12:16 -07:00
chenyu	8ea53951c1	bfloat16 Tensor.rand (#3764 ) * Tensor.rand for bfloat16 for numpy based random, generate one for float then cast for bfloat16. close #3653 * remove realize	2024-03-15 15:05:13 -04:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
chenyu	922f8319cb	Run test_real_world in METAL test (#3760 ) * clean up test_real_world * skip that * JIT=2 for metal * all device	2024-03-15 13:56:52 -04:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
George Hotz	568353fa84	hotfix: bump line count to 6500	2024-03-06 07:52:18 -08:00
chenyu	c3b8d285aa	cleanup uops (#3605 ) using `is` to compare with enums, remove long lines and slightly more compact	2024-03-04 11:03:14 -05:00
George Hotz	770707b376	hotfix: gpuocelot no rebuild	2024-03-02 15:57:38 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
chenyu	b7e555f6c0	run test_linearizer_failures on PYTHON backend (#3565 ) * run test_linearizer_failures on PYTHON backend only test 1, some have hanging issues and gated store is not implemented * --durations=20 * two less slow ones	2024-03-01 17:00:18 -05:00
George Hotz	5a6e151844	no barrier side effect (#3550 ) * no barrier side effect * finish barrier removal	2024-02-29 18:10:04 -08:00
George Hotz	2c19ab6561	define var (#3548 ) * define var * remove vars from there * fix python symbolic ops * fix llvm * pypath	2024-02-29 16:43:27 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00

1 2 3 4 5 ...

312 Commits