tinygrad/test at 2407fecdae1bc6c59eca147d099173471f2abd52 - tinygrad - AtHeartEngineering

github/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Files

History

qazal 2407fecdae viz bytepack format (#11792 )

* viz bytepack format

Training a 1B llama yields ~20M profiler events.

With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB.

**Design decisions:**

- Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events.
- Strings (kernel names, metadata, etc) are deduped.
- Buffer sizes are in u64 nbytes.

More optimization possible:

- The string lookup is a JSON dumped array, we can compress this.
- Can store less for memory by moving the layout to client.

**Results**

|  | Events | JSON | bytepack |
|----------------|---------|-------------|-------------|
| DP=8 llama 1B train (`command: [1]`) | 24M | 5.8GB | 640MB |
| examples/beautiful_mnist.py | 16K | 3.7MB | 745KB |
| examples/gpt2.py | 55K | 12.54MB | 1.40MB |

`[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py`

* python reference decoder

* 27 bytes / event, 1hr hard limit

2025-08-23 23:50:21 +03:00

..

cleanup tests, bump caches (#11746 )

2025-08-19 21:21:07 -07:00

correct row_count in process replay (#11748 )

2025-08-19 22:21:07 -07:00

amd: parse soc enums (#11727 )

2025-08-19 15:06:09 +03:00

move device tests to test/device + test cleanups (#11735 )

2025-08-19 16:02:20 -07:00

fix some assigns on rangeify (#11774 )

2025-08-21 15:15:54 -07:00

break swizzle into three chunks [pr] (#11153 )

2025-07-09 15:30:34 -07:00

viz bytepack format (#11792 )

2025-08-23 23:50:21 +03:00

time viz (#10763 )

2025-06-17 19:39:34 +03:00

__init__.py

All devices are equal! (#196 )

2020-12-15 23:44:08 -08:00

Dockerfile

new cloud is cloudy [pr] (#7631 )

2024-11-11 20:18:04 +08:00

helpers.py

remove unused test helper (#10999 )

2025-06-27 13:48:48 +03:00

test_arange.py

move opt under codegen (#11569 )

2025-08-07 14:19:17 -07:00

test_assign.py

rename lazydata to uop (#10698 )

2025-06-08 08:42:22 -07:00

test_compile_failures.py

cleanup tests, bump caches (#11746 )

2025-08-19 21:21:07 -07:00

test_const_folding.py

fix test_const_tensor_index index (#11660 )

2025-08-13 19:50:16 -04:00

test_define_reg.py

assert shape on lowerer store [pr] (#11395 )

2025-07-27 10:41:57 -07:00

test_dtype_alu.py

test_dtype_alu cleanups (#11799 )

2025-08-23 15:11:17 -04:00

test_dtype.py

move some test_dtype tests to unit (#11479 )

2025-08-02 15:25:00 -04:00

test_edgecases.py

assign should broadcast input tensor (#11629 )

2025-08-11 23:36:35 -04:00

test_fusion_op.py

just schedule in test_recursive_pad [pr] (#8860 )

2025-02-02 15:01:24 +02:00

test_gc.py

s/lb_refcount/uop_refcount [pr] (#10865 )

2025-06-18 21:48:04 +03:00

test_graph.py

hcq: cpu can be graphed (#11474 )

2025-08-02 21:01:19 +03:00

test_image_dtype.py

Revert "image_dot of 2 half inputs returns half (#11007 )" (#11274 )

2025-07-17 17:34:18 -04:00

test_interop.py

hotfix: interop example (#9237 )

2025-02-25 10:32:00 +03:00

test_jit_cases.py

enumerate cases of Tensors in the JIT (#10548 )

2025-05-28 11:51:27 -07:00

test_jit.py

cloud: a bit better err handling (#11616 )

2025-08-11 15:51:22 +03:00

test_kernel_cache.py

CLANG -> CPU (#9189 )

2025-02-20 18:03:09 -05:00

test_linearizer_dumb.py

Revert "REDUCE_AXIS keepdim=False (#11311 )" (#11718 )

2025-08-18 13:28:53 -07:00

test_linearizer_overflows.py

move opt under codegen (#11569 )

2025-08-07 14:19:17 -07:00

test_linearizer.py

fix getitem with inf in tensor (#11781 )

2025-08-21 21:55:32 -04:00

test_memory_planner.py

s/lb_refcount/uop_refcount [pr] (#10865 )

2025-06-18 21:48:04 +03:00

test_method_cache.py

simple LoadOps.ASSIGN (#3745 )

2024-03-14 20:44:34 -07:00

test_multitensor.py

move device tests to test/device + test cleanups (#11735 )

2025-08-19 16:02:20 -07:00

test_nn.py

** rangeify, try 3 (#11683 )

2025-08-20 14:22:44 -07:00

test_ops.py

Use Tensor.logaddexp to implement Tensor.softplus (#11796 )

2025-08-23 11:52:29 -04:00

test_opt_gemm.py

move opt under codegen (#11569 )

2025-08-07 14:19:17 -07:00

test_optim.py

[bounty] Muon optim (#11414 )

2025-08-13 14:27:55 -04:00

test_outerworld_range.py

outerworld range test [pr] (#11059 )

2025-07-02 14:28:44 -07:00

test_pickle.py

rename lazydata to uop (#10698 )

2025-06-08 08:42:22 -07:00

test_profiler.py

viz: add metadata and var_vals tracing (#11753 )

2025-08-20 18:39:51 +03:00

test_quantize_onnx.py

Revert "REDUCE_AXIS keepdim=False (#11311 )" (#11718 )

2025-08-18 13:28:53 -07:00

test_randomness.py

fix device arg to Tensor.randn (#11194 )

2025-07-12 13:51:59 -04:00

test_rangeify.py

test_vmap + permute isn't a sint (#11783 )

2025-08-21 22:39:35 -07:00

test_remote.py

Remote scheduler changes (#11177 )

2025-07-21 09:29:44 -07:00

test_renderer_failures.py

rename lazydata to uop (#10698 )

2025-06-08 08:42:22 -07:00

test_sample.py

enable WEBGPU tests with buffer limit (#11489 )

2025-08-03 13:02:44 -07:00

test_schedule.py

list indexing can normalize in python (#11609 )

2025-08-10 20:02:38 -04:00

test_search.py

move opt under codegen (#11569 )

2025-08-07 14:19:17 -07:00

test_setitem.py

Add Test for Setitem (#10559 )

2025-07-30 22:03:41 -04:00

test_softmax_fusion.py

a*(1/b) -> a/b on LLVM, CPU (#11743 )

2025-08-20 09:35:10 -04:00

test_stunning.py

move bind to big graph (#11539 )

2025-08-06 13:27:51 -07:00

test_subbuffer.py

redundant code (#11014 )

2025-06-29 09:06:10 -07:00

test_symbolic_jit.py

support variable shape none slice in getitem (#10724 )

2025-06-09 11:53:02 -07:00

test_symbolic_ops.py

fix symbolic usage. use shrink, not reshape (#11762 )

2025-08-20 18:35:42 -07:00

test_tensor_data.py

[BUGFIX] Tensor([]).data() (#7884 )

2024-11-24 16:42:57 -05:00

test_tensor_uop.py

rename lazydata to uop (#10698 )

2025-06-08 08:42:22 -07:00

test_tensor_variable.py

[bounty] [pr] index validation with z3 (#9981 )

2025-04-24 08:06:08 -04:00

test_tensor.py

hotfix: test tensor dims start at 1

2025-08-05 15:40:24 -07:00

test_tiny.py

fix symbolic usage. use shrink, not reshape (#11762 )

2025-08-20 18:35:42 -07:00

test_to_numpy.py

Apply ruff linting rules to tests (#2473 )

2023-11-27 21:24:06 -08:00

test_transcendental.py

Fix DSP transcendentals (#9542 )

2025-03-22 11:08:18 +08:00

test_uop_graph.py

add AxisType to range (#11798 )

2025-08-23 11:15:00 -07:00

test_uops_stats.py

no ast for mem estimate (#11744 )

2025-08-19 20:18:45 -07:00

test_uops.py

Add Ops.CMPEQ (#10431 )

2025-08-10 13:13:16 +02:00

test_winograd.py

update some tests for less Kernel (#11543 )

2025-08-06 14:19:59 -07:00

test_zero_copy.py

rename lazydata to uop (#10698 )

2025-06-08 08:42:22 -07:00