tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-15 01:48:23 -05:00

Author	SHA1	Message	Date
qazal	0d86288bd7	viz: calculate timeline fixed points in client side (#11805 ) * viz: calculate timeline fixed points in client side * 26 bytes / event * math	2025-08-24 01:44:40 +03:00
qazal	2407fecdae	viz bytepack format (#11792 ) * viz bytepack format Training a 1B llama yields ~20M profiler events. With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB. Design decisions: - Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events. - Strings (kernel names, metadata, etc) are deduped. - Buffer sizes are in u64 nbytes. More optimization possible: - The string lookup is a JSON dumped array, we can compress this. - Can store less for memory by moving the layout to client. Results \| \| Events \| JSON \| bytepack \| \|----------------\|---------\|-------------\|-------------\| \| DP=8 llama 1B train (`command: [1]`) \| 24M \| 5.8GB \| 640MB \| \| examples/beautiful_mnist.py \| 16K \| 3.7MB \| 745KB \| \| examples/gpt2.py \| 55K \| 12.54MB \| 1.40MB \| `[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` * python reference decoder * 27 bytes / event, 1hr hard limit	2025-08-23 23:50:21 +03:00
qazal	b12d1d866c	count bytes per kernel in test_viz (#11801 ) Currently at ~100 bytes/kernel with JSON.	2025-08-23 23:35:27 +03:00
Sieds Lykles	6a50ab6b87	adjust idiv min_max (#11802 ) * change div min_max * add tests	2025-08-23 22:25:51 +02:00
George Hotz	aefabaf774	add AxisType to range (#11798 ) * add AxisType to range * missed them * fix that test * fix that test	2025-08-23 11:15:00 -07:00
qazal	b975830424	add profile loader helper in test_viz (#11797 )	2025-08-23 19:20:29 +03:00
qazal	9ff03680ba	viz: store relative timestamps (#11787 ) * viz: store relative timestamps * err * update test	2025-08-22 19:30:21 +03:00
qazal	2e0eb88549	viz: add metadata to UOp tracing (#11772 ) * viz: add metadata to UOp tracing * place after tag * optional field * err, refcount of root must be 0	2025-08-22 00:18:45 +03:00
chenyu	be7b0b6970	TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES (#11752 )	2025-08-20 10:29:36 -04:00
ttomsa	70c3f1fb29	x.where(False, True) -> !x (#11738 ) * add pat * add test	2025-08-19 19:08:16 -04:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
George Hotz	4b3fcb4064	Revert "REDUCE_AXIS keepdim=False (#11311 )" (#11718 ) This reverts commit `b518a7378a`.	2025-08-18 13:28:53 -07:00
b1tg	b518a7378a	REDUCE_AXIS keepdim=False (#11311 ) * progress * fix tests * fix tests * remove hack for test_symfold * fix test_conv.py on llvm * hack test_cache_speed * lint * remove hack for helper_linearizer_opt * tests * fix DSP * clean up * remove hack for kernelize.py * hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none * clean * uop.r need reshape? * lower_store cause fail * fix lower? * avoid contiguous hack * 2134 * conv2d count * remove unused * hack lower * reduced and clean up * fix TestMultiTensor.test_matmul_shard_none * src sync + fix TestMultiTensor.test_matmul_shard_none * remove excluded in mop --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-08-18 10:09:17 -07:00
chenyu	c30a113b2a	support bf16 and fp8 in Tensor.tolist (#11704 ) memoryview does not support it, but casting works fine so cast is fine	2025-08-17 15:11:13 -04:00
qazal	d762edd694	viz: define tracks in python (#11701 ) * viz: defines tracks in python * update unittests * figuring it out * works * diff cleanup * math * y axis is back	2025-08-17 18:19:13 +03:00
qazal	c8ba48b223	show rewrite errors in viz (#11684 )	2025-08-15 19:09:47 +03:00
George Hotz	560984fd8d	small changes from rangeify (#11682 ) * small changes from rangeify * const like thing * ksym	2025-08-15 08:45:52 -07:00
Sieds Lykles	06beeb6e13	Nest div even if factor is negative (#11666 )	2025-08-14 13:58:59 +02:00
Sieds Lykles	661e9a2d5d	div_and_mod_folding refactor (#11585 ) * divmod const folding is its own function * split nested mod optimization out of div and mod folding * make `fold_binary_numerator` its own function * factor out `fold_divmod_congruence` * check sign of numerator * add tests * assert int on vmin and vmax * add type: ignore * factor out more rules * remove div_and_mod_folding * cached_property to property * remove import * add returns * restore old order * check sign of x.vmin and newx.vmin * check more signs * add some test that would have caught bugs * better test if the div simplified * shorten line * replace terms_factors_const with pop_const * move that back * minor cleanup * remove comments * some cleanup	2025-08-14 11:52:42 +02:00
chenyu	4fe19eec72	Ops.TRUNC (#11659 )	2025-08-13 18:40:48 -04:00
George Hotz	22bdf48cdd	render ranges in viz, name gbufs with sizes. changes from rangeify (#11656 ) * render ranges in viz, name gbufs with sizes. changes from rangeify * fix unit test dtypes	2025-08-13 12:46:16 -07:00
George Hotz	d2521d828a	transcendental+idiv+threefry are uop decompositions (#11636 ) * transcendental+idiv+threefry are uop decompositions [pr] * threefry decomp * fix randomness tests * fix webgpu * unneeded now * fix * move prematcher * all cast should probably be cast_vec	2025-08-13 09:37:12 -07:00
Sieds Lykles	4c3982c44e	Take sign out of mod (#11631 ) * Add rule and test * fix tests	2025-08-12 18:44:36 +02:00
George Hotz	ca41b5e38b	skip_0 in graph rewrite [pr] (#11627 ) * skip_0 in graph rewrite [pr] * no track_rewrites on test * use dict instead of set	2025-08-11 18:29:04 -07:00
George Hotz	996c907c0b	rewrite not ready + children machinery (#11607 ) * rewrite not ready + children machinery * it doesn't like track rewrites	2025-08-10 15:28:30 -07:00
qazal	960cc6533a	pass through name function args in track_rewrites (#11572 )	2025-08-08 02:28:52 +03:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
George Hotz	6ed2dfd187	delete the arange dim mismatch restriction (#11568 ) * delete the arange dim mismatch restriction * skip that test race	2025-08-07 13:46:17 -07:00
George Hotz	9764c6cdee	fix mismatch reduce, try 2 (#11560 ) * fix mismatch reduce, try 2 * fix heuristic * delete that test * don't start allowing ones	2025-08-07 07:57:58 -07:00
George Hotz	a1aa5670aa	Revert "fix mismatch reduce (#11547 )" (#11549 ) This reverts commit `49d21a9055`.	2025-08-06 22:43:15 -07:00
George Hotz	49d21a9055	fix mismatch reduce (#11547 ) * fix mismatch reduce * cleanups * fix shape * fix mypy * resolve	2025-08-06 21:12:51 -07:00
George Hotz	21570545d3	move view pushing to codegen, try 2 (#11534 ) * move view pushing to codegen, try 2 * fix up some linearizer tests * fix test search * fix test schedule * delete that test * fix test arange * fix a few tests * update tests * push views * ebs cleanup * fix local/reg * test and lint * fix more tests * test cleanups * skipped that one	2025-08-06 15:58:38 -07:00
George Hotz	80d9cced07	more test cleanups (#11544 ) * more test cleanups * revert that	2025-08-06 15:05:21 -07:00
qazal	846a2826ab	viz: remove TracingKey.fmt (#11482 ) * viz: remove TracingKey.fmt * remove from test too	2025-08-05 00:00:03 +03:00
leopf	4f0ee4e982	BPE tokenizer (#11415 ) * BPE works * refactor tok * oops * basic tests * fix eval * smaller diff * fix error * proper vocab decoding * use regex for splitting * escape ucatrange * full compat --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-04 09:52:38 -07:00
chenyu	e0106b6b25	1/(xc) -> (1/c)(1/x) (#11491 ) example: 2(2a).reciprocal() -> a.reciprocal() # TODO: bounds for reciprocal # TODO: should z3 work?	2025-08-03 23:35:46 -04:00
chenyu	66be747908	few more dtype cast convinience methods (#11480 )	2025-08-02 15:47:09 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
qazal	fa66d9772d	viz: show const node when it's root (#11456 )	2025-08-01 01:01:58 +03:00
chenyu	d5fc6af4a2	remove unused ShapeTracker.consecutive [pr] (#11426 )	2025-07-29 18:36:19 -04:00
chenyu	88c338bfcc	add kernelize to keccak for each data block (#11370 ) * add kernelize to keccak for each data block test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep * this? * hash stuff * gate test * mv	2025-07-25 16:07:20 -04:00
chenyu	82e6de7fc6	more keccak reference tests (#11329 )	2025-07-23 22:06:39 -04:00
George Hotz	e14b4fefa5	ranges on store (#11334 ) * ranges on store * fix store spec * fix that * fix gates * fix tests * fix ptx	2025-07-22 21:00:50 -07:00
chenyu	4535908679	update keccak test_long (#11331 ) it should compare with arg "shake_128"	2025-07-22 16:08:01 -04:00
qazal	6668d6d241	fix word_wrap with newlines in input string [pr] (#11319 )	2025-07-22 12:03:13 +03:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
wozeparrot	30ce16a424	feat: failing test for long keccak (#11292 )	2025-07-21 12:49:23 -04:00
nimlgen	188ed38315	replace from_mv with lightweight mv_address (#11280 )	2025-07-19 13:50:51 +03:00
quortus	924bc7c9ae	Fix test_uop_spec (#11259 )	2025-07-16 11:02:31 +03:00
Alisher Zhubanyshev	4ef6b46b34	hcq: reduce launch overhead (#11193 ) * nv: improve mmio creation speed * add memoryview test * fix indents * move mv bench to `test_helpers`, remove comparison	2025-07-13 19:25:50 +03:00

... 3 4 5 6 7 ...

952 Commits