tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
chenyu	aabe7756be	fix type in fold_bitcast [pr] (#11853 )	2025-08-26 13:22:30 -04:00
Jordan Chalupka	4785cd959a	[TYPED=1] cvar should allow dtype as a tuple (#11770 ) * cvar dtype:DType\|tuple[DType, ...]\|None=None * fmt * add a test * list typeguard as a dep for CI * extra step to install mypy * fix venv * ci fixes * mv typeguard to testing install group * simpler TYPED=1 test * add typeguard to lint group	2025-08-26 12:49:51 -04:00
qazal	b111076301	viz: fixup click on overlay rect (#11850 )	2025-08-26 19:25:42 +03:00
b1tg	1dd613cb89	test float_to_bf16 round-to-even behavior (#11849 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-26 12:16:10 -04:00
b1tg	409399c609	fix nan in float_to_bf16 (#11843 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-26 11:42:25 -04:00
qazal	43d5d66d34	viz: add UOp ports to edges (#11847 ) * viz: add UOp ports to edges * one edge label * g.tag styling * replace with NodeList	2025-08-26 18:31:52 +03:00
chenyu	f28f613f85	improved float_to_bf16 (#11848 ) round instead of truncate	2025-08-26 11:14:06 -04:00
nimlgen	afe14ccbfa	amd: aql default when several xccs (#11832 )	2025-08-26 15:16:36 +03:00
qazal	3674c0754e	viz: small uop click changes (#11846 ) * also highlight self * can always unselect by clicking outside * less layout	2025-08-26 14:56:13 +03:00
qazal	f2a3c27372	viz: g.edges() once (#11845 )	2025-08-26 13:29:59 +03:00
qazal	b0df3e62a8	viz: light up srcs and paths on UOp click (#11844 ) * viz: light up srcs and paths on UOp click * safari doesn't have context-stroke * safari also has a bug * safari acceptance	2025-08-26 09:03:09 +03:00
qazal	6236749867	viz: move rect styles to classes (#11842 ) * viz: move rect styles to classes * add rect	2025-08-26 07:55:34 +03:00
qazal	81ffa07439	viz: pass through nodes without a link (#11841 )	2025-08-26 07:00:43 +03:00
Sieds Lykles	265d287615	add decomp for !x&!y -> !(x\|y) (#11836 )	2025-08-26 05:21:06 +02:00
chenyu	337e979a59	call dtypes.as_const in Tensor(list) (#11840 )	2025-08-25 22:08:26 -04:00
George Hotz	215818379b	new (post) group for reduce (#11837 ) * new (post) group for reduce * fixes * leave if * fix locals * size * no vectorized buf * image fixes * don't track that * fix ptx * name buffer with reduce range * remove unused in lowerer * yay DEFINE_REG refactor	2025-08-25 18:03:00 -07:00
chenyu	ac3449b0c8	truncate_fp16 cleanup (#11838 ) native `@` is default	2025-08-25 19:03:41 -04:00
qazal	e146418f65	hotfix: profiler content-type is application/octet-stream (#11831 )	2025-08-25 15:56:42 +03:00
qazal	a1f6823060	viz: memory layout in client side (#11830 ) * viz: memory layout in client side * update test_viz	2025-08-25 14:49:33 +03:00
George Hotz	a6dbb09058	changes for postrange (#11828 )	2025-08-24 17:37:07 -07:00
George Hotz	27701ef823	add locals support to rangeify (#11826 )	2025-08-24 14:03:12 -07:00
Sieds Lykles	a286a1a6f7	Fast idiv try removing factors of two before cast (#11824 ) * try removing factors of two * dont return if None * add test	2025-08-24 20:04:25 +02:00
George Hotz	a03b930339	hotfix: green v2 in docs	2025-08-24 10:25:14 -07:00
George Hotz	6540bb32a6	move into codegen late [pr] (#11823 )	2025-08-24 10:23:25 -07:00
nimlgen	bba088ef11	amd aql queue (#11708 ) * amd aql queue * xcc * fiz * aql better * llvm * no for aql * wrap * is_sql * am support * complete * fix * mypy * minor	2025-08-24 19:53:00 +03:00
George Hotz	1fa09d9ede	BLOCK_REORDER is context var, heuristic cleanups [pr] (#11819 ) * BLOCK_REORDER is context var, heuristic cleanups [pr] * split get opt and do opt * oops, should be on	2025-08-24 09:41:34 -07:00
qazal	8b18cc2a94	viz memory layout cleanup (#11820 ) * rename to dtype_size * cleanr memory shape creator	2025-08-24 19:37:31 +03:00
Sieds Lykles	dd69114573	Revert "Better div nesting (#11811 )" (#11818 ) This reverts commit `952f729b07`.	2025-08-24 18:11:24 +02:00
nimlgen	e19f901330	amd: rptr/wptr in create_queue (#11817 )	2025-08-24 18:03:45 +03:00
nimlgen	d71444857e	amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM (#11816 ) * amd: apply relocs for kernel_code_entry_byte_offset for AMD_LLVM * fix	2025-08-24 17:48:40 +03:00
George Hotz	44bc7dc73d	remove KernelInfo from GROUP_REDUCE (#11814 )	2025-08-23 19:55:41 -07:00
George Hotz	229adfb7c3	Revert "remove KernelInfo from gpudims (#11809 )" (#11813 ) This reverts commit `846753f343`.	2025-08-23 19:37:10 -07:00
Sieds Lykles	952f729b07	Better div nesting (#11811 ) * remove check * use fold_divmod_congruence instead of simplify * adjust tests * shorten line	2025-08-24 04:17:40 +02:00
Sieds Lykles	e652062f92	tweak divmod_folding condition (#11810 )	2025-08-24 02:59:02 +02:00
George Hotz	846753f343	remove KernelInfo from gpudims (#11809 ) * remove KernelInfo from gpudims * that's good in there	2025-08-23 16:32:45 -07:00
Sieds Lykles	07d4ed7e4c	one more symbolic add variation (#11807 )	2025-08-24 01:15:04 +02:00
qazal	759ebea4eb	viz: reflect timeline API boundary in names (#11808 ) * define shapes once * depth isn't an event property * update server naming	2025-08-24 02:12:12 +03:00
George Hotz	132f09fab7	global/locals from AxisType in range (#11806 )	2025-08-23 15:49:17 -07:00
qazal	0d86288bd7	viz: calculate timeline fixed points in client side (#11805 ) * viz: calculate timeline fixed points in client side * 26 bytes / event * math	2025-08-24 01:44:40 +03:00
George Hotz	a75da49951	use AxisType for UPCAST/UNROLL (#11800 ) * use AxisType for UPCAST/UNROLL * fixes * fix the bug * fix hack * bad test * flaky test	2025-08-23 14:44:48 -07:00
qazal	2407fecdae	viz bytepack format (#11792 ) * viz bytepack format Training a 1B llama yields ~20M profiler events. With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB. Design decisions: - Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events. - Strings (kernel names, metadata, etc) are deduped. - Buffer sizes are in u64 nbytes. More optimization possible: - The string lookup is a JSON dumped array, we can compress this. - Can store less for memory by moving the layout to client. Results \| \| Events \| JSON \| bytepack \| \|----------------\|---------\|-------------\|-------------\| \| DP=8 llama 1B train (`command: [1]`) \| 24M \| 5.8GB \| 640MB \| \| examples/beautiful_mnist.py \| 16K \| 3.7MB \| 745KB \| \| examples/gpt2.py \| 55K \| 12.54MB \| 1.40MB \| `[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` * python reference decoder * 27 bytes / event, 1hr hard limit	2025-08-23 23:50:21 +03:00
qazal	b12d1d866c	count bytes per kernel in test_viz (#11801 ) Currently at ~100 bytes/kernel with JSON.	2025-08-23 23:35:27 +03:00
Sieds Lykles	6a50ab6b87	adjust idiv min_max (#11802 ) * change div min_max * add tests	2025-08-23 22:25:51 +02:00
chenyu	9d4cccd0f9	test_dtype_alu cleanups (#11799 )	2025-08-23 15:11:17 -04:00
George Hotz	aefabaf774	add AxisType to range (#11798 ) * add AxisType to range * missed them * fix that test * fix that test	2025-08-23 11:15:00 -07:00
qazal	b975830424	add profile loader helper in test_viz (#11797 )	2025-08-23 19:20:29 +03:00
chenyu	7123df3928	Use Tensor.logaddexp to implement Tensor.softplus (#11796 ) instead of piecewise linear, numerical is handled by logaddexp. jax does this and i think it's more elegant than torch's approach	2025-08-23 11:52:29 -04:00
qazal	aaea6b97ad	viz memory: compute nbytes (#11795 ) * viz memory: compute nbytes * local map	2025-08-23 17:34:07 +03:00
qazal	58653b5eae	viz: store memory scale (#11794 )	2025-08-23 16:19:44 +03:00
chenyu	fb8ee02424	Tensor.logaddexp (#11793 )	2025-08-23 09:15:00 -04:00

... 10 11 12 13 14 ...

10490 Commits