tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	2abb474d43	kfd driver wip (#3912 ) * kfd driver wip * cleanups * kfd almost ready to ring doorbell * ding dong? * issues with signals * something * works * ops kfd * add amd_signal_t * works...sometimes * program runs * _gpu_alloc cleanup * cleanups * work * header + enable profiling (#3959) * header + enable profiling * just cleaner * measure * only local time domain * remove old comments * fix with master * elf parsing (#3965) * elf parsing * fix kernels with private * not used * clean up * clean up 2 * add flags * kfd sdma (#3970) * working sdma * remove driver, shorter * all commands we might need * svm * kfd remove hardcoded values (#4007) * remove hardcoded values * match above line * 7k lines + revert hsa * update that from origin * fix sdma reg gen * not the updated SDMA * compiler_opts * don't require kfd_ioctl * get ioctls from python * get ioctls from python * remove build_sdma_command * merge into 64-bit fields * shorter * fix property spelling and off by one --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-30 15:08:12 -07:00
chenyu	3fee689ded	fix ops_python for test_uops (#3982 )	2024-03-28 22:48:55 -04:00
George Hotz	629cbc5587	only abstractions 2 (#3947 )	2024-03-26 20:02:18 -07:00
Francis Lam	5530b0cbed	fuzz_linearizer: reduce debug verbosity and make easier for CI usage (#3942 ) * fuzz_linearizer: reduce debug verbosity and make easier for CI usage * rename FUZZ_BEAM to FUZZ_ALL_ACTIONS (not choosing a subset) * skip simple ASTs (easier to use with LOGOPS output) * don't fuzz a previously seen AST * add options to allow non-zero --expected-failures * clean up naming and use set	2024-03-26 16:25:24 -04:00
chenyu	2c69888654	include negative float in test_dtype (#3884 ) * include negative float in test_dtype * that is ub * too annoying * pack can overflow	2024-03-24 02:39:15 -04:00
chenyu	ee502c8055	fixup to_movement_ops and add back to CI (#3881 )	2024-03-22 18:14:49 -04:00
George Hotz	f4055439dc	don't include hip common (#3851 ) * don't install hip common * only that * Revert "only that" This reverts commit `85f22015d9`. * less * needed * sep comgr * header file * 6.0.2 * update hsa * hsakmt * Revert "hsakmt" This reverts commit `d3a118078e`.	2024-03-22 08:50:50 -07:00
chenyu	47b9cc2dfe	use float32 for rand buffer in test_beam_search and test in metal (#3831 )	2024-03-19 23:22:58 -04:00
chenyu	20681d5c4a	remove old dist multigpu (#3811 )	2024-03-18 18:31:05 -04:00
chenyu	5dd048a378	remove HIP in core tinygrad (#3810 ) * remove HIP in core tinygrad ci test uses device RHIP and HSA compiler (LinearizerOpt), so fine to remove HIP from tc. Also updated README and EMULATE tc test flag * EMULATE_CUDA	2024-03-18 18:19:27 -04:00
wozeparrot	a0ab755317	threefry again (#3785 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| * feat: restore old --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-18 16:47:07 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
George Hotz	53adcb34f5	remove hip backend (#3783 ) * remove hip backend * remove unused * rhip * more RHIP	2024-03-17 10:12:16 -07:00
chenyu	8ea53951c1	bfloat16 Tensor.rand (#3764 ) * Tensor.rand for bfloat16 for numpy based random, generate one for float then cast for bfloat16. close #3653 * remove realize	2024-03-15 15:05:13 -04:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
chenyu	922f8319cb	Run test_real_world in METAL test (#3760 ) * clean up test_real_world * skip that * JIT=2 for metal * all device	2024-03-15 13:56:52 -04:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
George Hotz	568353fa84	hotfix: bump line count to 6500	2024-03-06 07:52:18 -08:00
chenyu	c3b8d285aa	cleanup uops (#3605 ) using `is` to compare with enums, remove long lines and slightly more compact	2024-03-04 11:03:14 -05:00
George Hotz	770707b376	hotfix: gpuocelot no rebuild	2024-03-02 15:57:38 -08:00
Francis Lam	162dfb07d9	fuzz_linearizer: fix uops and add to test.yml (#3588 )	2024-03-02 15:03:42 -08:00
Francis Lam	e17f1821a7	wmma: add CUDA tensor core and fix test_speed_v_torch failure (#3544 )	2024-03-01 17:51:02 -08:00
chenyu	b7e555f6c0	run test_linearizer_failures on PYTHON backend (#3565 ) * run test_linearizer_failures on PYTHON backend only test 1, some have hanging issues and gated store is not implemented * --durations=20 * two less slow ones	2024-03-01 17:00:18 -05:00
George Hotz	5a6e151844	no barrier side effect (#3550 ) * no barrier side effect * finish barrier removal	2024-02-29 18:10:04 -08:00
George Hotz	2c19ab6561	define var (#3548 ) * define var * remove vars from there * fix python symbolic ops * fix llvm * pypath	2024-02-29 16:43:27 -08:00
George Hotz	c34d382a1e	bump to macos-14 M1 (#3520 ) * bump to macos-14 M1 * bump cache key * no -n auto * jit=2 * real tensor cores	2024-02-28 10:28:25 -08:00
George Hotz	7698781389	Revert "wmma: add CUDA tensor core (#3464 )" (#3474 ) This reverts commit `e9cef13f0b`.	2024-02-22 11:58:16 +01:00
Francis Lam	e9cef13f0b	wmma: add CUDA tensor core (#3464 )	2024-02-22 11:57:08 +01:00
chenyu	7c0fc40123	enable test IMAGE=2 PYTHON=1 python3 test/test_ops.py TestOps.test_simple_conv2d (#3468 )	2024-02-21 18:30:12 -05:00
chenyu	77d2a4c12a	regenerate kernel dataset after reduce arg to axis change (#3467 ) ``` ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ```	2024-02-21 18:16:13 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
qazal	7864fb69d1	delete MovementOps (#3434 ) * delete MovementOps * keep extra/to_movement_ops.py	2024-02-19 23:21:44 +01:00
Patrick Tsai	ac9d94a068	Cast correctly in python emulator (dtype tests pass) (#3446 ) * Cast correctly in python emulator * Update test yml and fix lint * make ruff pass * mypy passes --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-19 13:34:02 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	75f7e21a80	Make tests in `test/test_ops.py` pass for Python emulator (#3384 ) * fix OverflowError in UnaryOps.EXP2 * avoid accessing outputs for void uops * skip execution for UOps.IF and UOps.ENDIF * initialize bytearray to the correct size in UOps.DEFINE_LOCAL * validate len of input that has .sz > 1 * remove comment in code * reinitialize loop of already iterated * validate first value in input to be a list for inputs with .sz > 1 * add python ops tests to CI * skip long runtime tests for PYTHON backend * respect dtype.sz arg in UOps.CONST, and remove incorrect validation in UOps.STORE * use math.inf instead of float('int') * handle 0 args to UnaryOPs.LOG2 * handle load op with default of .sz > 1 * initialize the loop correctly using UOps.LOOP arg * remove unnecessary TODO comment * remove newline * select a subset of 22 ops tests to skip in CI when PYTHON=1 * handle gated UOps.LOAD referencing values that have .sz > 1 * Revert "select a subset of 22 ops tests to skip in CI when PYTHON=1" This reverts commit `7674fee81d`. * skip tests in python backend CI command * push fix lost in conflict resolve * Revert "skip long runtime tests for PYTHON backend" This reverts commit `5dd2a0376e`. * clear loop state after last iteration	2024-02-15 16:40:25 +01:00
qazal	49cb1fee54	run test_indexing on remu (#3404 ) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`.	2024-02-15 11:52:40 +01:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
Francis Lam	ce21fdfb67	ops_python: add HIP tensor core mock and refactor METAL (#3354 ) * ops_python: add HIP tensor core mock and refactor METAL * Add tests to CI * add DEBUG=2 to full tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-09 12:46:06 +01:00
George Hotz	b385234961	oops, change to 3.12 (#3357 )	2024-02-09 12:21:06 +01:00
George Hotz	7726eef464	ops_python: add image support (#3356 ) * ops_python: add image support * uops tests in their own CI * fix ci	2024-02-09 12:02:06 +01:00
George Hotz	c32ea95d7d	Python uop emulator (#3327 ) * start uop emu * tiny_add passes * more ops * emulate the whole warp * test_gemm passes * metal gemm test pass * works on big gemm * works on big gemm * more tests pass * touch ups * fix mypy * cleanups * exp2 mypy * arch is where it belongs * actually emulate tensor cores * fix test * new style	2024-02-08 19:24:55 +01:00
chenyu	0d2dacb549	test intermediate tensors created by function have same device as input (#3338 ) run on TORCH since it's the fastest one on CI. caught a bug in multinomial, and update the behavior of fancy index and gather to move the indices Tensor to same device as self.	2024-02-07 09:24:36 -05:00
qazal	5b46b0ff3d	Simple RDNA3 emulator (#2974 ) * mockhip->hipcpu * allocate buffers * launch a kernel read_asm api * run remu in CI * remu 0.0.2, real test ops * simple driver * 0.0.3, all test_ops * run the latest emulator * 9 minutes is way too long, drop backprop in CI * bring back the backward pass * Revert "bring back the backward pass" This reverts commit `3781e1bc56`. * Print slowest tests * emulated device directly in ops_hip * fix ruff, override mypy for specific rules * test in the same code path - hip backend env variables - install packages and verify autogen - run certain tests - remove the other hip tests path - verify Device.DEFAULT * remove the emulated hip in extra --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-30 10:39:28 -08:00
George Hotz	0aad8d238b	rebuild ocelot (#3259 ) * rebuild * strip trailing whitespace	2024-01-26 18:46:36 -08:00
George Hotz	03a6bc59c1	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
George Hotz	a3869ffd46	move gpuctypes in tree (#3253 ) * move gpuctypes in tree * fix mypy * regex exclude * autogen sh * mypy exclude * does that fix it * fix mypy * add hip confirm * verify all autogens * build clang2py * opencl headers * gpu on 22.04	2024-01-26 12:25:03 -08:00

1 2 3 4 5 ...

295 Commits