tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 06:58:11 -05:00

Author	SHA1	Message	Date
chenyu	5dd048a378	remove HIP in core tinygrad (#3810 ) * remove HIP in core tinygrad ci test uses device RHIP and HSA compiler (LinearizerOpt), so fine to remove HIP from tc. Also updated README and EMULATE tc test flag * EMULATE_CUDA	2024-03-18 18:19:27 -04:00
Francis Lam	a7afd2f6bf	test_linearizer_failures: add failing kernel from GPT2 CUDA (#3808 ) * test_linearizer_failures: add failing kernel from GPT2 CUDA * test_linearizer_failure: remove "HIP" from failed_platforms	2024-03-18 17:16:40 -04:00
George Hotz	d8296d4a3f	simple assign tests (#3807 )	2024-03-18 13:57:01 -07:00
wozeparrot	a0ab755317	threefry again (#3785 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| * feat: restore old --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-18 16:47:07 -04:00
Elias Wahl	7af7467f22	Fix: BEAM search with PTX fails (#3786 ) * Pass Uopgraph instead of list * Add search compile linearizer test * Rename * fix lint * Remove test. Add UOpGraph type	2024-03-18 16:13:32 -04:00
nimlgen	629757eaa1	hotfix: update inputs of correct transfers in hsagraph (#3800 ) * hotfix: update inputs of correct transfers in hsagraph * test it * run in ci?	2024-03-18 15:52:27 -04:00
chenyu	1711274654	7B llama on 4 gpus on benchmark (#3804 )	2024-03-18 14:32:37 -04:00
qazal	d79a1d315b	add outbufs back (#3803 ) * update outcounts * update JIT * refactor search * hsa uses outcount	2024-03-18 10:30:53 -07:00
George Hotz	7fe08fa5a0	hotfix: put that assign hack back	2024-03-18 09:25:05 -07:00
George Hotz	0183a05f0a	test assign (#3798 ) * Reapply "add failing assign test (#3796)" (#3797) This reverts commit `1e1beb888c`. * no realized check	2024-03-18 08:58:04 -07:00
George Hotz	1e1beb888c	Revert "add failing assign test (#3796 )" (#3797 ) This reverts commit `2dea12832c`.	2024-03-18 08:55:36 -07:00
George Hotz	2dea12832c	add failing assign test (#3796 ) * that was a hack * tests to reveal the issue * add assign for realized assign	2024-03-18 08:47:30 -07:00
nimlgen	e78df485c7	update inputs for transfers in hsagraph (#3560 )	2024-03-18 18:01:04 +03:00
George Hotz	086291e8c6	hotfix: add test for JIT reset	2024-03-17 21:35:49 -07:00
chenyu	dccefab23f	remove mixtral weight to clang first (#3792 ) seems fine without it now	2024-03-17 23:33:17 -04:00
George Hotz	bf3e1c4df2	support pickling tensors and others (#3787 ) * test pickle tensors * pickle unrealized tensor * pickle jit, don't save Device in every CompiledASTRunner * real test of pickle, move delete	2024-03-17 18:29:14 -07:00
chenyu	5ac1fa933f	apply the same fix_bf16 in llama and coder (#3789 ) * apply the same fix_bf16 in llama and coder did not realize the same logic was in llama too. really fix #2775 * flag for native SUPPORT_BF16 cast	2024-03-17 21:25:24 -04:00
chenyu	639bd5dbfc	move bf16 cast hack to Tensor.llvm_bf16_cast (#3788 )	2024-03-17 18:51:22 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
George Hotz	53adcb34f5	remove hip backend (#3783 ) * remove hip backend * remove unused * rhip * more RHIP	2024-03-17 10:12:16 -07:00
George Hotz	2a14d1b5e0	Revert "add outbufs info to CompiledASTRunner (#3781 )" (#3782 ) This reverts commit `722dd4276c`.	2024-03-17 09:47:23 -07:00
qazal	722dd4276c	add outbufs info to CompiledASTRunner (#3781 ) * add outbufs * Revert "add outbufs" This reverts commit `5f4c0668f5`. * simplify	2024-03-17 07:52:20 -07:00
chenyu	9255332d9e	use llvm as bridge to fix_bf16 loading (#3774 ) This is how bf16 load is tested in test_bf16_disk_write_read now and it should fix #2775. I tested that it fixed loading coder using PYTHON backend. Will separate this special bf16 load v.s. regular bf16 support	2024-03-16 15:22:19 -04:00
nimlgen	987a055c0d	increase jit batch size progressivly (#3771 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-16 11:58:11 -04:00
chenyu	77febb44e6	llama 7B on 6 gpus benchmark (#3773 )	2024-03-16 11:38:52 -04:00
David Hou	07324b56d5	[experimenting] use contiguous instead of realize in optim (#3770 ) * run CI * comment * remove t.grad to try * Revert "remove t.grad to try" This reverts commit `05ec2d3b89`.	2024-03-15 23:06:50 -07:00
qazal	e3e89c244b	multioutput uoping infra (#3706 ) * linearize multioutput * add vars to copy	2024-03-15 21:56:59 -07:00
chenyu	e1c5aa9cce	estimated resnet training time for BENCHMARK (#3769 )	2024-03-15 22:36:58 -04:00
George Hotz	0870dd5b3b	hotfix: switch resnet training from HIP -> HSA in CI	2024-03-15 13:35:52 -07:00
qazal	d8070876d2	bfs scheduler, infra for multioutput (#3763 ) * bfs * use _schedule_one	2024-03-15 13:33:58 -07:00
nimlgen	91e181ee02	make alignment readable (#3766 )	2024-03-15 23:18:40 +03:00
chenyu	8ea53951c1	bfloat16 Tensor.rand (#3764 ) * Tensor.rand for bfloat16 for numpy based random, generate one for float then cast for bfloat16. close #3653 * remove realize	2024-03-15 15:05:13 -04:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
George Hotz	8af87e20a0	unrealized, assign is replace (#3761 )	2024-03-15 11:17:59 -07:00
chenyu	922f8319cb	Run test_real_world in METAL test (#3760 ) * clean up test_real_world * skip that * JIT=2 for metal * all device	2024-03-15 13:56:52 -04:00
chenyu	4bd5535d72	update mlperf resnet default hparams (#3758 ) we might be able to have higher lr given smaller BS, but this is good. Trained to 75.9% https://wandb.ai/chenyuxyz/tinygrad-examples_mlperf/runs/xi2f48se/overview	2024-03-15 12:09:26 -04:00
George Hotz	aad9332e21	remove that extra assign line, is it fixed? (#3757 ) * remove that extra assign line, is it fixed? * only lazyop things that realize * _schedule_one	2024-03-15 08:59:41 -07:00
nimlgen	ba79a3c09a	some hsa lines saving + fixes (#3752 ) * fix write to ring + some lines * hsa driver test	2024-03-15 18:12:18 +03:00
George Hotz	ca19eb3e82	where fold try 2 (#3748 ) * where fold try 2 * assign fold * test_where_fold works * add gated store support to ops_python --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-15 07:46:26 -07:00
nimlgen	6b8c66e04f	fix broken loops in llvm (#3751 )	2024-03-15 11:57:51 +03:00
chenyu	d3a6319630	bf16 tests in test_dtype.py (#3749 ) With bf16 creation and bf16 to numpy, we can test bf16 in test_dtype. Only support HIP now as it needs bf16 buffer support. Also the rtoal is slightly larger	2024-03-15 00:17:11 -04:00
Rohan Potdar	33c01c9db0	Fix kwargs in JIT (#3730 ) * Update jit.py * Update jit.py * added failing test * fix type error * Revert to itertools * fix sorted	2024-03-14 23:55:19 -04:00
George Hotz	641f347232	simple LoadOps.ASSIGN (#3745 ) * simple LoadOps.ASSIGN * skip that test * don't assign in onnx ops gemm * track cache usage * recreate the lazybuffer to avoid the cache * fix contigs * skip that test * lol * better letters	2024-03-14 20:44:34 -07:00
chenyu	75d4344cda	UOps.BITCAST (#3747 ) * UOps.BITCAST implicitly fixed no const folding for bitcast * python backend * ptx * consistent llvm	2024-03-14 21:00:35 -04:00
chenyu	9a00a453c7	add test case for uop cast constant fold (#3746 ) and a expected failed bitcast fold test case. Will fix with UOps.BITCAST refactor	2024-03-14 20:00:27 -04:00
chenyu	11c61ae044	Revert "fix const bitcast should not be constant folded (#3743 )" (#3744 ) This reverts commit `38ba277ac8`.	2024-03-14 19:24:05 -04:00
George Hotz	d52d0b0efb	test_assign_kv_cache	2024-03-14 16:17:20 -07:00
chenyu	38ba277ac8	fix const bitcast should not be constant folded (#3743 ) * fix const bitcast should not be constant folded * fixed const bf16 creation * LLVM still broken	2024-03-14 19:13:52 -04:00
chenyu	557c7a5c54	fix yolov8.py (#3742 ) replaced an `assign` with `replace`, and add '.png' for output if input URL does not contain an extention	2024-03-14 17:33:45 -04:00

1 2 3 4 5 ...

3855 Commits