tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	f7aeff6061	viz: cli.py cleanups, do not require PYTHONPATH (#15085 ) * cleanup the print * sys.exit * equal check * cleanup unpacker * cli doesn't need PYTHONPATH * no semicolons * %s/PYTHONPATH=. //g	2026-03-02 19:24:38 +09:00
wozeparrot	a4f6365929	llama3: fstep takes grads (#15069 )	2026-03-01 20:05:07 -08:00
wozeparrot	cfc5cf65ad	llama3: vocab padding fix + jit copies on fakedata (#15067 )	2026-02-28 08:44:55 -08:00
George Hotz	bb84e389cf	functions for llama trainer (#15045 ) * functions for llama trainer * function there * axis match * fix multi * lil cleaner * there's a bug with HK_FLASH_ATTENTION * training functions * for commit	2026-02-28 12:15:18 +08:00
Nick	af94bfc401	fix retinanet shared memory race condition in parallel tests (#15030 ) Append PID to shared memory names in batch_load_retinanet to prevent FileExistsError when pytest-xdist runs multiple test workers that each call _setup_shared_mem with the same hardcoded name.	2026-02-27 08:36:24 +08:00
wozeparrot	d941dd5aeb	llama3: pad vocab when mp sharding (#14998 )	2026-02-25 00:04:06 -08:00
wozeparrot	e1c9985715	llama3: better time keeping (#14999 )	2026-02-24 22:42:05 -08:00
wozeparrot	8d9545e09e	llama3: correctly shard wqkv (#14978 )	2026-02-23 23:57:10 -08:00
wozeparrot	a36a26d4ed	llama3: optim does grad acc in correct order (#14965 )	2026-02-23 22:25:13 -08:00
wozeparrot	3cda781876	llama optim offload (#14901 )	2026-02-21 08:53:45 -08:00
George Hotz	55d3a5def9	preallocate all realized buffers (#14823 ) * preallocate all realized buffers * contiguous * work * comment that out * move to schedule * better * correct fix * just buffer * disk bufs * fixes disk tensor stuff * fix symbolic stuff * fix multi * 162 failures * bugfixes * don't check that anymore * fix schedule tests * mnist should be contiguious * type and buffer * fix tests * shrink axis correction * mypy fixes * tests skips * same 37 failures * dedup * no shrink in the graph * 29 failures * skips * fix custom kernel * fix training * those optimizations aren't supported currently * simpler * more correct * tests * 14 failures * works * fix that test * broken * 11 failures * only kernel counts left * fixes * all tests pass * remove tensor_map * op test * 200 -> 230 * test fixes * fixes * revert test_tiny thing * guard * revert that * test tiny passes * no contigs there * base realize back * Revert "no contigs there" This reverts commit `c45bb9fcfd`. * revert that * chop many assigns * 12 failures * fix tests * tests * apply after * pre-commit * remove old code * delete that * fix types * remove extra contig * fix dataloader * torch fix * disk fix * update kernel fusion numbres * runs on amd * restore kernel count * add that rule back * that * disable that * wrong * add the correct rule for that folding * more tests * guard c1.arg * no newlines * realize those * split into a different file * remove detach/contig back * skip 2 * update that	2026-02-20 20:05:54 +08:00
George Hotz	fc5677c28b	resnet dataloader + more test cleanups (#14899 ) * resnet dataloader * tests	2026-02-20 10:05:47 +08:00
chenyu	f84a11bb9f	delete uneven shard tests and mentions (#14867 )	2026-02-18 14:10:33 -05:00
wozeparrot	6d301ad2c4	feat: llama wqkv (#14841 )	2026-02-17 23:01:33 -08:00
wozeparrot	95e97ec341	seperate llama optim (#14810 )	2026-02-17 13:02:35 -08:00
wozeparrot	45aebe1572	hipkittens fa backward (#14723 )	2026-02-16 00:38:44 -08:00
wozeparrot	4b5d3bda1f	llama3: data seed (#14681 )	2026-02-11 19:04:40 -08:00
wozeparrot	a60220bed9	llama3: move dl to numpy & jit more (#14677 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-02-10 18:16:40 -08:00
wozeparrot	69574542ab	fix: use correct fa implementation in eval (#14651 )	2026-02-09 18:20:44 -08:00
qazal	50d3f6cea5	EVAL_BS=0 in llama profile (#14643 )	2026-02-10 00:49:43 +09:00
nimlgen	e087c58ae0	print tables in llama/profile.sh (#14639 )	2026-02-09 12:32:54 +03:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00
George Hotz	d59e6e7a37	move more tests to test/null, split some existing ones (#14512 ) * move more tests to test/null, split some existing ones * null work * null work * move more * fixes * move PIL * PIL in CLIP * don't move that	2026-02-03 20:20:20 +08:00
George Hotz	dd2de4f838	rename all DEFINE_GLOBAL to PARAM (#14511 )	2026-02-03 15:09:38 +08:00
wozeparrot	bbcd3d67a3	fa: faster (#14453 )	2026-02-02 21:34:17 -08:00
qazal	616e9c1483	CDNA assembly gemm in tensor.py with flag (#14310 ) * work * work * the assembly * remove the old one * remove ws bufs, assert splitk * notes cleanup * work * gemm args * gemm in mixins would be nice * add gemm gradient * print counters * the realize is for DEBUG=2 aesthetics * dedup * rewrite to python dsl, no list copies * leave that * add B, M, N, K to gemm name * it's M0 not NULL * fp16 support * test cleanup + more gemms * work from viz * more work * gemm batch_size * xccg path work * tiny comments on the label naming * s_waitcnt	2026-01-31 22:34:14 +09:00
George Hotz	c9a3ddb341	benchmark llama walltime script (#14454 ) * benchmark llama walltime script * adj layers	2026-01-31 10:21:54 +08:00
George Hotz	f5346d6a1a	fix USE_ATOMICS for non float dtypes and make it the default (#14444 ) * embedded multistep test * complex test * with jit * fix dtypes and reenable USE_ATOMICS * that test didn't catch anything	2026-01-31 09:44:16 +08:00
George Hotz	ee2c78709d	mlperf/llama: disable USE_ATOMICS for now	2026-01-31 00:42:08 +08:00
George Hotz	838cd078bc	use atomics for embedding backward (#14400 ) * embedding is slow * failing * float is fine * null * it fails * simplify embedding with broadcasting * ATOMIC_ADD incoming * min change * simpler test * better test * fix test * real test * simpler * cleanups * types and names * _zero_kernel * grad multi * hack * none * multi unshard * more for call * don't tag in call * good * call_multi * call_multi wow claude is useless * embedding backward mutli test * test passes * fix as_param * shape_to_shape_arg * add clip * before cast * fix spec=2, use atomics	2026-01-30 18:10:59 +08:00
George Hotz	793afbd473	simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419 )	2026-01-29 17:22:13 +08:00
wozeparrot	4845e42135	llama3 gradacc fixes (#14414 )	2026-01-28 19:12:39 -08:00
nimlgen	aec1ae0de1	llama: set manual_seed (#14409 )	2026-01-28 14:40:00 -08:00
George Hotz	0c6b3f50aa	add marker to llama training (#14401 )	2026-01-28 22:44:28 +08:00
Jakob Sachs	2b7c00d3d2	fix sd-example dtype for CLIP embeddings (#14397 )	2026-01-28 09:07:19 -05:00
qazal	5bffa17f82	llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395 ) * beam opens devices * switch to hip renderer * amd: true? * llvm true is for test_autogen	2026-01-28 17:31:22 +09:00
wozeparrot	e496547720	llama3 gradacc (#14291 )	2026-01-27 19:48:10 -08:00
chenyu	db010a31be	IGNORE_OOB -> CHECK_OOB [pr] (#14374 ) flip the meaning	2026-01-27 12:20:59 -05:00
wozeparrot	a987a4abc3	feat: llama8b dev_beam.sh (#14358 )	2026-01-26 14:51:23 -08:00
nimlgen	e152f1b0f5	llama: use ALL2ALL (#14353 )	2026-01-26 22:01:53 +03:00
George Hotz	11ce1e847d	llama train: null device support	2026-01-26 08:53:05 +08:00
wozeparrot	963c59ebdb	fix: pull fixes from gradacc branch (#14296 )	2026-01-22 23:07:54 -08:00
George Hotz	52b989c6c8	don't place consts early + fixes from anthropic challenge (#14286 ) * don't place consts early * add anthropic challenge * with ref * do we still have to devectorize bools? * tests pass * just WHERE * fine, revert that * fine, revert * only index * z3 validator doesn't support vectorized * Revert "z3 validator doesn't support vectorized" This reverts commit `1b7930ecb3`. * z3 not for vec * no spec * VLIWRenderer * loop unrolling * better comments * cleanups * skip cast * renderer * cleanups * prints * no hack * hacks * bump to 11 * reg warning * lil clean * cleaner renderer	2026-01-23 10:48:39 +09:00
wozeparrot	c1d14ea832	llama8b train fixes (#14264 )	2026-01-20 20:34:47 -08:00
wozeparrot	ba90e1b52e	feat: script to run llama8b training (#14239 )	2026-01-20 12:44:06 -08:00
C T	26f8b12e01	Whisper audio helpers (mel filters in tinygrad) (#13478 ) * add whisper audio helpers for stft/mel/resample * cleanup * add whisper stft test * make only stft test explicitly depend on librosa * extract sinc_window_kernel * dehardcode device * use same device argument * simplify * type annotate * ruff format audio_helpers.py * ruff format test_whisper.py * add WHISPER_NEW_STFT * rename * undo ruff format changes * use new stft and mel for whisper * remove stft test that depends on librosa * remove whitespace * add Tensor.log10 with test\test_ops.py::TestOps::test_log10 * use Tensor.log10 * fix lint * future: remove unused STFT class * future: remove resample code since it isn't used (yet) * match openai with pad_mode="reflect" * pad_to * future: cut resample leftovers * cleanup * add mel tests * future: cut stft * future: cut non-mel prep_audio changes * reduce diff * move audio_helpers.py to examples * reduce whitespace * fix imports * reduce whitespace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-01-20 10:50:02 -05:00
wozeparrot	a879b54234	tk: fa jit fix (#14170 )	2026-01-16 16:38:45 -08:00
b1tg	0fbc551622	train bert with fp8 (#13874 ) * fp8 train * clean * lint * test fix from #13439 * skip first/last layer * rm __init__, restore unroll <=32 check * tests * clean test, remove unused * multi-gpu test, clean quantize_to_fp8 * remove bert contiguous * run script * test: better check * run script search * add seed in bert data shuffle * move script to mi350x folder --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2026-01-09 09:21:59 -05:00
b1tg	241f0402b4	add seed in bert data shuffle (#14054 )	2026-01-07 10:02:05 -05:00

1 2 3 4 5 ...

1283 Commits