tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-12 16:38:15 -05:00

Author	SHA1	Message	Date
George Hotz	cabd4add48	more work parsing SQTT, separate VIZ/PROFILE (#13308 ) * more work parsing SQTT * more minimal runner * sep VIZ/PROFILE * parse print new * improve parser * more filter * that * split them * lil cleanup * skip flaky test * AQL in mmapeak	2025-11-16 10:40:39 -08:00
qazal	7a6853fa40	viz: show python callstack in the first graph (#13218 )	2025-11-12 20:52:28 +08:00
qazal	bc55bc4849	cleanup test_viz profiler tests (#13221 )	2025-11-12 03:46:48 +08:00
George Hotz	8a941d95a4	SPEC=2 is full spec, SPEC=1 is default (#12910 ) * SPEC=1 passes all tests * just use SPEC, not __debug__	2025-10-25 11:10:43 +08:00
qazal	32af1ff84b	viz graph drawing small cleanups (#12830 ) * viz graph drawing small cleanups * str literal	2025-10-21 15:51:32 +08:00
George Hotz	ba593f7b98	don't render index (#12796 ) * don't render index * update to ignore_indexing --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-10-20 09:48:36 +08:00
qazal	c8ef4b60f6	viz: share match tracing and TINY device profiler (#12783 ) * set a default name for the traces * set profile_matches + renames * profile_matches test * traces 4 steps total	2025-10-19 14:30:07 +08:00
qazal	0160f034d6	viz: show display name for copy runners (#12761 ) * viz: show display name for copy runners * more u32	2025-10-17 16:59:51 +08:00
qazal	253d32b065	viz: add metadata to buffer user list (#12758 ) * simple failing test * encodings * test passing * key is deduped	2025-10-17 16:28:54 +08:00
qazal	dfb8f9fc9e	viz: annotate buffer mutability in the memory graph (#12750 )	2025-10-17 11:53:02 +08:00
qazal	533f18b22c	viz: add trace data for inflight buffers (#12728 ) * viz: add trace data for inflight buffers * add test_inflight_buf * temp stores the keys * update tests / use Tensor.ones	2025-10-16 19:15:03 +08:00
qazal	069177c1be	trace buffer producer and consumers (#12639 ) * trace buffer producer and consumers * work * generic colored util * fix batched * basic clicking works * generic javascript that works for producer and consumers * keep focused shape * idle time * timings for producer and consumers dedup * from sd test * tiny cleanups * timeline * work * up to here * assert * list it * work	2025-10-16 11:11:31 +08:00
qazal	768dc952de	viz ui cleanups / renaming (#12691 ) * better viz names * delete unused * don't use opacity, it's multiplicative * keep styles * scrollbar coloring * pyrender doesn't work here beautiful_mnist r_64_16_32_36@lower all index dtypes	2025-10-15 18:40:22 +08:00
qazal	f0268d13f6	cleanup viz server (#12688 )	2025-10-15 15:58:36 +08:00
qazal	b5afa3848e	viz: fix memory graph total nbytes (#12622 ) * viz: fix memory graph total nbytes * post increment * simple regression test * loop with markers + slightly off text baseline * cpu events clear	2025-10-12 14:32:46 +03:00
qazal	2e19354c1c	viz: reorder timeline graphs (#12498 ) * viz: reorder timeline graphs * update test_viz with the new order	2025-10-08 07:10:23 +03:00
qazal	f664bcc8bd	use recursive_property in UOp tracing (#12469 ) * test * simple passing	2025-10-06 21:10:52 +03:00
qazal	a388d2cb1a	remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176 ) * remove PROFILE=1 option, it's just VIZ=1 [pr] * sqtt * sqtt 2 * return last * rename	2025-09-15 12:51:50 +03:00
Sieds Lykles	581b2388c2	add dtypes.index (#12015 ) * add dtypes.index * cast shape, stride and mask to dtypes.index in view.create * move pm_lower_index_dtype to ops * DEFINE_VAR is dtype.index by default * merge var_val_using_str * remove int from commutative * fix test_rewrite_map * change that to dtypes.index * change some int to index * shorten those * remove old cast in renderer * cleanup * change that back * add comment * delete comment * just delete those * view doesnt have to cast anymore * adjust comment	2025-09-06 06:03:44 +02:00
qazal	da61b40604	some viz tests don't need track_rewrites (#12010 )	2025-09-04 23:59:32 +03:00
qazal	be364a1adb	viz: add default tracing group (#12009 ) This enables seeing rewrites in unit tests like `VIZ=1 python3 test/test_uop_graph.py TestUOpGraph.test_in_bounds_access_gated_local` that call graph_rewrite directly. `@track_rewrites` keeps existing as an optional helper to organize larger traces.	2025-09-04 23:29:56 +03:00
qazal	4996bb668b	load all traces before asserting in test_viz (#12004 )	2025-09-04 21:34:48 +03:00
qazal	f750c15965	viz: add python marker (#11952 ) * viz: add python marker * remove duplicate	2025-09-02 23:44:00 +03:00
qazal	0a53e72f70	viz: fix trace duration in python test decoder (#11949 )	2025-09-01 14:32:25 +03:00
qazal	27c9ed5a84	viz: more consistent naming of events (#11948 ) * s/shapes/events in test_viz * s/bufs/events in the memory packer	2025-09-01 14:16:47 +03:00
qazal	c27b99d68f	viz: refactor to indexed rewrite traces (#11923 )	2025-08-30 20:01:47 +03:00
qazal	bf0d055b39	viz: color by name (#11919 )	2025-08-30 16:04:58 +03:00
qazal	a1f6823060	viz: memory layout in client side (#11830 ) * viz: memory layout in client side * update test_viz	2025-08-25 14:49:33 +03:00
qazal	0d86288bd7	viz: calculate timeline fixed points in client side (#11805 ) * viz: calculate timeline fixed points in client side * 26 bytes / event * math	2025-08-24 01:44:40 +03:00
qazal	2407fecdae	viz bytepack format (#11792 ) * viz bytepack format Training a 1B llama yields ~20M profiler events. With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB. Design decisions: - Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events. - Strings (kernel names, metadata, etc) are deduped. - Buffer sizes are in u64 nbytes. More optimization possible: - The string lookup is a JSON dumped array, we can compress this. - Can store less for memory by moving the layout to client. Results \| \| Events \| JSON \| bytepack \| \|----------------\|---------\|-------------\|-------------\| \| DP=8 llama 1B train (`command: [1]`) \| 24M \| 5.8GB \| 640MB \| \| examples/beautiful_mnist.py \| 16K \| 3.7MB \| 745KB \| \| examples/gpt2.py \| 55K \| 12.54MB \| 1.40MB \| `[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` * python reference decoder * 27 bytes / event, 1hr hard limit	2025-08-23 23:50:21 +03:00
qazal	b12d1d866c	count bytes per kernel in test_viz (#11801 ) Currently at ~100 bytes/kernel with JSON.	2025-08-23 23:35:27 +03:00
qazal	b975830424	add profile loader helper in test_viz (#11797 )	2025-08-23 19:20:29 +03:00
qazal	9ff03680ba	viz: store relative timestamps (#11787 ) * viz: store relative timestamps * err * update test	2025-08-22 19:30:21 +03:00
qazal	2e0eb88549	viz: add metadata to UOp tracing (#11772 ) * viz: add metadata to UOp tracing * place after tag * optional field * err, refcount of root must be 0	2025-08-22 00:18:45 +03:00
qazal	d762edd694	viz: define tracks in python (#11701 ) * viz: defines tracks in python * update unittests * figuring it out * works * diff cleanup * math * y axis is back	2025-08-17 18:19:13 +03:00
qazal	c8ba48b223	show rewrite errors in viz (#11684 )	2025-08-15 19:09:47 +03:00
George Hotz	22bdf48cdd	render ranges in viz, name gbufs with sizes. changes from rangeify (#11656 ) * render ranges in viz, name gbufs with sizes. changes from rangeify * fix unit test dtypes	2025-08-13 12:46:16 -07:00
qazal	960cc6533a	pass through name function args in track_rewrites (#11572 )	2025-08-08 02:28:52 +03:00
qazal	846a2826ab	viz: remove TracingKey.fmt (#11482 ) * viz: remove TracingKey.fmt * remove from test too	2025-08-05 00:00:03 +03:00
qazal	fa66d9772d	viz: show const node when it's root (#11456 )	2025-08-01 01:01:58 +03:00
qazal	d3ec63a5c3	viz: add base class for unittests (#11178 )	2025-07-11 13:58:03 +03:00
qazal	5c1d215b41	viz: add Graph stream (#11144 ) * viz: stack an event for the entire batch * multi * whitespace * work * multi graph, Graph gets its own row	2025-07-09 20:56:46 +03:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
qazal	81781dc12b	viz: renames and spacing changes to tracing (#11102 )	2025-07-05 18:40:39 +03:00
qazal	4fcfaa0ef7	viz: switch to TracingKey (#11100 ) * viz: switch to TracingKey * tuple * order is name, keys, fmt * add test_tracing_key	2025-07-05 17:46:18 +03:00
qazal	b695e8c4d6	viz: remove support for naming with self (#11076 )	2025-07-03 17:29:14 +03:00
qazal	8b0871ac31	viz: test for no lockup on infinite loop (#11041 ) * viz: add test infinite loop fallback * assert * continue til the end * work * bring that back * fallback to nop	2025-07-01 17:44:20 +03:00
qazal	2ea4737930	viz: fix newlines breaking label colors (#11030 ) * viz: fix newlines breaking label colors * TestViz.test_colored_label * TestWordWrap	2025-06-30 13:39:44 +03:00
qazal	4c8d2a0383	buffer viz (#10960 ) * add mem_layout * ui * cleanup * work * debugLine work and expander * tooltip style * real expand device * wheel does one thing * diff * shows llama oom * add y axis * mypy chill * work * unittests for the memory layout	2025-06-28 21:50:32 +03:00
qazal	a39343e39f	viz: move timeline layout to python (#10998 ) * viz: move timeline layout to python * DevEvent has a device and a name	2025-06-27 13:06:00 +03:00

1 2

71 Commits