tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
George Hotz	1e1beb888c	Revert "add failing assign test (#3796 )" (#3797 ) This reverts commit `2dea12832c`.	2024-03-18 08:55:36 -07:00
George Hotz	2dea12832c	add failing assign test (#3796 ) * that was a hack * tests to reveal the issue * add assign for realized assign	2024-03-18 08:47:30 -07:00
nimlgen	e78df485c7	update inputs for transfers in hsagraph (#3560 )	2024-03-18 18:01:04 +03:00
George Hotz	086291e8c6	hotfix: add test for JIT reset	2024-03-17 21:35:49 -07:00
chenyu	dccefab23f	remove mixtral weight to clang first (#3792 ) seems fine without it now	2024-03-17 23:33:17 -04:00
George Hotz	bf3e1c4df2	support pickling tensors and others (#3787 ) * test pickle tensors * pickle unrealized tensor * pickle jit, don't save Device in every CompiledASTRunner * real test of pickle, move delete	2024-03-17 18:29:14 -07:00
chenyu	5ac1fa933f	apply the same fix_bf16 in llama and coder (#3789 ) * apply the same fix_bf16 in llama and coder did not realize the same logic was in llama too. really fix #2775 * flag for native SUPPORT_BF16 cast	2024-03-17 21:25:24 -04:00
chenyu	639bd5dbfc	move bf16 cast hack to Tensor.llvm_bf16_cast (#3788 )	2024-03-17 18:51:22 -04:00
George Hotz	311cf2b7d3	Revert "threefry_2x32 (#2601 )" (#3784 ) This reverts commit `db3de54bc4`.	2024-03-17 10:27:20 -07:00
wozeparrot	db3de54bc4	threefry_2x32 (#2601 ) * feat: initial xor * feat: initial threefly * feat: remove custom random * fix: really need to install precommit * feat: lmao forgot that this is rotate not a shift * clean: put that there * feat: numpy xor * feat: quick test for xor * feat: llvm xor * feat: slightly working xor in torch * feat: rand works in jit * clean: save a line * feat: match jax * feat: maybe test against jax * feat: requires_grad * fix: fix test_symbolic_ops * feat: lower alpha * feat: just pad * fix: maybe fix training tests? * fix: fix some llvm stuff * feat: cursed realize on the way out * feat: testing jax * fix: why is the jax install process not simple * fix: maybe passing test * fix: symbolic workarounds * clean: still need that precommit * fix: aaaa * fix: more test fixes * fix: quick fix for wgsl * feat: need to set requires_grad on the final tensor * feat: one more tensor * feat: don't take forever * feat: seeing y ci is brok * feat: can't allocate 64GiB lmao * fix: fix this * feat: hope this doesn't break smth before i go to bed * feat: don't destroy ram * feat: int * feat: remove jax * feat: properish workaround? * feat: skip slow webgpu tests * feat: no longer fails * feat: use dtypes * feat: real number * fix: torch * fix: don't test against reference for torch * feat: to device * feat: fix advanced indexing * feat: correct casting * feat: even rng_counter * feat: match master * feat: this was actually bad * fix: maybe? * feat: store * feat: remove realizes * feat: somehow this is important * feat: somehow this is also important * feat: save a line * fix: don't need that anymore * feat: restore this * fix: linter * feat: remove realizes * fix: realized is in base now * fix: add back cast * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: bump deadline * fix: :( * fix: :( * fix: not being dumb * feat: try changing less tests * feat: shouldn't have to change that * feat: contiguous bumps it by one * fix: hmm * fix: numpy memory moment * fix: cl_khr_fp16 * fix: torch has different tensor count * fix: missing contiguous * hmm: hmm * fix: some fixes * fix: typing * feat: dont do that * feat: typing fixes * feat: why is this realize required? * feat: ngl kinda odd typing * feat: oh * feat: remove realizes * feat: why is this realize required? * fix: hacky patch for cudacpu * fix: without this realize pytest crashes????? * fix: shorter line * fix: cudacpu fixes * fix: cudacpu fixes * feat: real buffer * feat: don't search when searching lmao * fix: can't use contiguous things * fix: no more 100GB arrays * fix: revert * fix: skip 7 and 10 * feat: working ish beam * feat: minimize changes * feat: seed 0 stable diffusion example changed * fix: different on ci * fix: no beam * feat: make threefry optional * fix: check value * fix: unused import * feat: threefry default * fix: 5d * feat: allow non upcast div * fix: 5d better * fix: 5d better * fix: save all dtype * feat: proper error * feat: lazyop key * fix: check float * feat: try removing this realize now * feat: disable threefry for uops hip tensor cores * feat: don't need that * feat: only check upcast * fix: disable threefry for some metal tests * feat: disable for metal tensor uops as well * feat: disable for most uops * fix: disable threefry for new uops tests * feat: multitensor * fix: typing * feat: threefry default off * feat: skip threefry half rand * feat: restore old * fix: bad git * clean: ruff * feat: bfloat16 fix * fix: :\| --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-17 10:19:33 -07:00
George Hotz	53adcb34f5	remove hip backend (#3783 ) * remove hip backend * remove unused * rhip * more RHIP	2024-03-17 10:12:16 -07:00
George Hotz	2a14d1b5e0	Revert "add outbufs info to CompiledASTRunner (#3781 )" (#3782 ) This reverts commit `722dd4276c`.	2024-03-17 09:47:23 -07:00
qazal	722dd4276c	add outbufs info to CompiledASTRunner (#3781 ) * add outbufs * Revert "add outbufs" This reverts commit `5f4c0668f5`. * simplify	2024-03-17 07:52:20 -07:00
chenyu	9255332d9e	use llvm as bridge to fix_bf16 loading (#3774 ) This is how bf16 load is tested in test_bf16_disk_write_read now and it should fix #2775. I tested that it fixed loading coder using PYTHON backend. Will separate this special bf16 load v.s. regular bf16 support	2024-03-16 15:22:19 -04:00
nimlgen	987a055c0d	increase jit batch size progressivly (#3771 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-16 11:58:11 -04:00
chenyu	77febb44e6	llama 7B on 6 gpus benchmark (#3773 )	2024-03-16 11:38:52 -04:00
David Hou	07324b56d5	[experimenting] use contiguous instead of realize in optim (#3770 ) * run CI * comment * remove t.grad to try * Revert "remove t.grad to try" This reverts commit `05ec2d3b89`.	2024-03-15 23:06:50 -07:00
qazal	e3e89c244b	multioutput uoping infra (#3706 ) * linearize multioutput * add vars to copy	2024-03-15 21:56:59 -07:00
chenyu	e1c5aa9cce	estimated resnet training time for BENCHMARK (#3769 )	2024-03-15 22:36:58 -04:00
George Hotz	0870dd5b3b	hotfix: switch resnet training from HIP -> HSA in CI	2024-03-15 13:35:52 -07:00
qazal	d8070876d2	bfs scheduler, infra for multioutput (#3763 ) * bfs * use _schedule_one	2024-03-15 13:33:58 -07:00
nimlgen	91e181ee02	make alignment readable (#3766 )	2024-03-15 23:18:40 +03:00
chenyu	8ea53951c1	bfloat16 Tensor.rand (#3764 ) * Tensor.rand for bfloat16 for numpy based random, generate one for float then cast for bfloat16. close #3653 * remove realize	2024-03-15 15:05:13 -04:00
chenyu	a2d3cf64a5	move is_dtype_supported to test.helpers (#3762 ) * move is_dtype_supported to test.helpers updated all places that check if float16 is supports * fix tests	2024-03-15 14:33:26 -04:00
George Hotz	8af87e20a0	unrealized, assign is replace (#3761 )	2024-03-15 11:17:59 -07:00
chenyu	922f8319cb	Run test_real_world in METAL test (#3760 ) * clean up test_real_world * skip that * JIT=2 for metal * all device	2024-03-15 13:56:52 -04:00
chenyu	4bd5535d72	update mlperf resnet default hparams (#3758 ) we might be able to have higher lr given smaller BS, but this is good. Trained to 75.9% https://wandb.ai/chenyuxyz/tinygrad-examples_mlperf/runs/xi2f48se/overview	2024-03-15 12:09:26 -04:00
George Hotz	aad9332e21	remove that extra assign line, is it fixed? (#3757 ) * remove that extra assign line, is it fixed? * only lazyop things that realize * _schedule_one	2024-03-15 08:59:41 -07:00
nimlgen	ba79a3c09a	some hsa lines saving + fixes (#3752 ) * fix write to ring + some lines * hsa driver test	2024-03-15 18:12:18 +03:00
George Hotz	ca19eb3e82	where fold try 2 (#3748 ) * where fold try 2 * assign fold * test_where_fold works * add gated store support to ops_python --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2024-03-15 07:46:26 -07:00
nimlgen	6b8c66e04f	fix broken loops in llvm (#3751 )	2024-03-15 11:57:51 +03:00
chenyu	d3a6319630	bf16 tests in test_dtype.py (#3749 ) With bf16 creation and bf16 to numpy, we can test bf16 in test_dtype. Only support HIP now as it needs bf16 buffer support. Also the rtoal is slightly larger	2024-03-15 00:17:11 -04:00
Rohan Potdar	33c01c9db0	Fix kwargs in JIT (#3730 ) * Update jit.py * Update jit.py * added failing test * fix type error * Revert to itertools * fix sorted	2024-03-14 23:55:19 -04:00
George Hotz	641f347232	simple LoadOps.ASSIGN (#3745 ) * simple LoadOps.ASSIGN * skip that test * don't assign in onnx ops gemm * track cache usage * recreate the lazybuffer to avoid the cache * fix contigs * skip that test * lol * better letters	2024-03-14 20:44:34 -07:00
chenyu	75d4344cda	UOps.BITCAST (#3747 ) * UOps.BITCAST implicitly fixed no const folding for bitcast * python backend * ptx * consistent llvm	2024-03-14 21:00:35 -04:00
chenyu	9a00a453c7	add test case for uop cast constant fold (#3746 ) and a expected failed bitcast fold test case. Will fix with UOps.BITCAST refactor	2024-03-14 20:00:27 -04:00
chenyu	11c61ae044	Revert "fix const bitcast should not be constant folded (#3743 )" (#3744 ) This reverts commit `38ba277ac8`.	2024-03-14 19:24:05 -04:00
George Hotz	d52d0b0efb	test_assign_kv_cache	2024-03-14 16:17:20 -07:00
chenyu	38ba277ac8	fix const bitcast should not be constant folded (#3743 ) * fix const bitcast should not be constant folded * fixed const bf16 creation * LLVM still broken	2024-03-14 19:13:52 -04:00
chenyu	557c7a5c54	fix yolov8.py (#3742 ) replaced an `assign` with `replace`, and add '.png' for output if input URL does not contain an extention	2024-03-14 17:33:45 -04:00
George Hotz	5b3d8a886e	split tinybox benchmark into two (#3741 ) * split tinybox benchmark into two * symlinks	2024-03-14 14:12:32 -07:00
George Hotz	3527c5a9d2	add Tensor.replace (#3738 ) * add Tensor.replace * fix dtypes in that test * should be replace * and mixtral	2024-03-14 13:34:14 -07:00
chenyu	0ead0bdb65	script to benchmark beam v hcopt (#3737 ) the goal is that big enough beam should be faster than hcopt/tc also this failed on tc opt NUM=2 FILTER_REDUCE=1 TEST_N=20 BEAM=4 DEBUG=2 python test/external/speed_beam_v_hcopt.py	2024-03-14 15:04:03 -04:00
chenyu	90e55a9fd1	fix buf_index not found case in _apply_tc_opt (#3739 ) ValueError if src.src[0] is not a LOAD. Replaced with returning None in _apply_tc_opt and test to make sure the net output is KernelOptError.	2024-03-14 14:27:05 -04:00
nimlgen	6bf11a2ce3	fix incorrect direct store with gep (#3735 ) * fix incorrect direct store with gep * better comment * phi as well * dtype check there * mypy happy? * not used * renames * phi in phi	2024-03-14 20:58:50 +03:00
P4ssenger	bbad3b1dd9	call self.nbytes (#3736 )	2024-03-14 08:10:12 -07:00
qazal	00c56db1a4	Fix JITItem count assert for HSAGraph (#3734 ) * exclude HSA graph * cant import HSAGraph directly	2024-03-14 14:12:35 +03:00
nimlgen	4b01c44579	hotfix: sdma/aql are visible again (#3733 )	2024-03-14 10:33:22 +03:00
qazal	43953c0ba9	skip grouped store for umatching upcasts (#3723 ) * skip if upcasts dont match * outputs match now * this ast is hardcoded --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-14 01:18:31 -04:00
David Hou	199f7c4342	MLPerf Resnet (cleaned up) (#3573 ) * this is a lot of stuff TEST_TRAIN env for less data don't diskcache get_train_files debug message no lr_scaler for fp32 comment, typo type stuff don't destructure proc make batchnorm parameters float make batchnorm parameters float resnet18, checkpointing hack up checkpointing to keep the names in there oops wandb_resume lower lr eval/ckpt use e+1 lars report top_1_acc some wandb stuff split fw and bw steps to save memory oops save model when reach target formatting make sgd hparams consistent just always write the cats tag... pass X and Y into backward_step to trigger input replace shuffle eval set to fix batchnorm eval dataset is sorted by class, so the means and variances are all wrong small cleanup hack restore only one copy of each tensor do bufs from lin after cache check (lru should handle it fine) record epoch in wandb more digits for topk in eval more env vars small cleanup cleanup hack tricks cleanup hack tricks don't save ckpt for testeval cleanup diskcache train file glob clean up a little device_str SCE into tensor small small log_softmax out of resnet.py oops hack :( comments HeNormal, track gradient norm oops log SYNCBN to wandb real truncnorm less samples for truncated normal custom init for Linear log layer stats small Revert "small" This reverts commit `988f4c1cf3`. Revert "log layer stats" This reverts commit `9d98224585`. rename BNSYNC to SYNCBN to be consistent with cifar optional TRACK_NORMS fix label smoothing :/ lars skip list only weight decay if not in skip list comment default 0 TRACK_NORMS don't allocate beam scratch buffers if in cache clean up data pipeline, unsplit train/test, put back a hack remove print run test_indexing on remu (#3404) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`. fix bad seeding UnsyncBatchNorm2d but with synced trainable weights label downsample batchnorm in Bottleneck :/ :/ i mean... it runs... its hits the acc... its fast... new unsyncbatchnorm for resnet small fix don't do assign buffer reuse for axis change * remove changes * remove changes * move LARS out of tinygrad/ * rand_truncn rename * whitespace * stray whitespace * no more gnorms * delete some dataloading stuff * remove comment * clean up train script * small comments * move checkpointing stuff to mlperf helpers * if WANDB * small comments * remove whitespace change * new unsynced bn * clean up prints / loop vars * whitespace * undo nn changes * clean up loops * rearrange getenvs * cpu_count() * PolynomialLR whitespace * move he_normal out * cap warmup in polylr * rearrange wandb log * realize both x and y in data_get * use double quotes * combine prints in ckpts resume * take UBN from cifar * running_var * whitespace * whitespace * typo * if instead of ternary for resnet downsample * clean up dataloader cleanup a little? * separate rng for shuffle * clean up imports in model_train * clean up imports * don't realize copyin in data_get * remove TESTEVAL (train dataloader didn't get freed every loop) * adjust wandb_config entries a little * clean up wandb config dict * reduce lines * whitespace * shorter lines * put shm unlink back, but it doesn't seem to do anything * don't pass seed per task * monkeypatch batchnorm * the reseed was wrong * add epoch number to desc * don't unsyncedbatchnorm is syncbn=1 * put back downsample name * eval every epoch * Revert "the reseed was wrong" This reverts commit 3440a07dff3f40e8a8d156ca3f1938558a59249f. * cast lr in onecycle * support fp16 * cut off kernel if expand after reduce * test polynomial lr * move polynomiallr to examples/mlperf * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * no more half * polylr and lars were merged * undo search change * override Linear init * remove half stuff from model_train * update scheduler init with new args * don't divide by input mean * mistake in resnet.py * restore whitespace in resnet.py * add test_data_parallel_resnet_train_step * move initializers out of resnet.py * unused imports * log_softmax to model output in test to fix precision flakiness * log_softmax to model output in test to fix precision flakiness * oops, don't realize here * is None * realize initializations in order for determinism * BENCHMARK flag for number of steps * add resnet to bechmark.yml * return instead of break * missing return * cpu_count, rearrange benchmark.yml * unused variable * disable tqdm if BENCHMARK * getenv WARMUP_EPOCHS * unlink disktensor shm file if exists * terminate instead of join * properly shut down queues * use hip in benchmark for now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-14 00:53:41 -04:00

1 2 3 4 5 ...

3845 Commits