Commit Graph

10417 Commits

Author SHA1 Message Date
qazal
e3d024afa0 viz: split into scale, shapes, axes last (#11018)
* viz: split into scale, shapes, axes last

* set zoom on render
2025-06-28 19:10:58 +03:00
qazal
508bc68078 viz: small fixups from memory graph (#11017)
* don't need div.id

* tooltip z-index
2025-06-28 16:34:14 +03:00
qazal
fc3e509822 viz: new canvas on first render (#11016) 2025-06-28 16:04:51 +03:00
chenyu
c14c9a8eff llama3 grad clip (#11003) 2025-06-27 19:14:12 -04:00
nimlgen
e53673a0b2 amd: sdma queue overrun fix (#11012)
* amd: sdma queue overrun fix

* add ()

* fix

* bug

* this is correct
2025-06-28 01:42:03 +03:00
chenyu
f2548afeb5 bert grad clipping start with const 0 (#11008)
saved the init kernels
2025-06-27 18:02:23 -04:00
chenyu
a6485d00c8 very tiny generate_dataset (#11013)
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
qazal
382fa6a325 viz: support axis colors in UOp nodes (#11009)
* work

* javascript

* optional defaultColor

* fine
2025-06-27 23:02:55 +03:00
qazal
44257f25e4 bump line count to 14600 (#11010) 2025-06-27 22:48:14 +03:00
George Hotz
be53ef4f0a rename DEFINE_ACC -> DEFINE_REG (#11006)
* rename DEFINE_ACC -> DEFINE_REG

* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
05c35d0db8 reorder ops and add comments (#11005) 2025-06-27 10:52:14 -07:00
George Hotz
5a1911b7c4 apply the global dims late (#11002)
* apply the global dims late [pr]

* late gpudims

* tests passing

* remove the random local_dims inc

* simpler
2025-06-27 09:54:34 -07:00
qazal
4ef10c57f9 remove unused test helper (#10999) 2025-06-27 13:48:48 +03:00
qazal
a39343e39f viz: move timeline layout to python (#10998)
* viz: move timeline layout to python

* DevEvent has a device and a name
2025-06-27 13:06:00 +03:00
George Hotz
b4eb876d5a kernel.py no longer permutes reduce axis [pr] (#10968)
* kernel.py no longer permutes reduce axis [pr]

* delete tests that handcode uops

* regen of sops is broken...

* put import back

* just remove that

* disable those tests
2025-06-26 17:44:58 -07:00
chenyu
6ab5a5cb6c llama3 mlperf train (#10983)
work in progress. now it can overfit small examples and vram roughly matches
2025-06-26 20:24:27 -04:00
George Hotz
856759c79c add halide example (#10980)
* add halide example

* upd halide gemm

* partial works

* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46 move perfetto to extra (#10994)
* move perfetto to extra

* update TestViz and fix tests

* remove perfetto.html from viz directory

* work

* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167 fix extract_dataset + add tests to CI (#10995)
* fix extract_dataset + tests

* add CI

* sops.gz itself is same as master

* yml + gzip -c + ge

* don't commit that

* bump limit to 1000

* axis=7

* test_tiny
2025-06-27 01:51:36 +03:00
chenyu
4572e65f0f remove duplicated move_early logic in UOp.r [pr] (#10993) 2025-06-26 18:33:54 -04:00
Ignacio Sica
579194f523 remove some linearize calls from tests 2 [pr] (#10992)
* refactor count_float4 to take uops as input instead of kernel

* remove some calls to linearize in test_linearizer

* remove some more calls

* remove one more call
2025-06-26 18:22:27 -03:00
geohotstan
50936b4a18 ONNX real float16 (#10694)
* squash commits

* temp fix for const tensor

* actually realizing float16 can only happen in raw_data

* .float -> cast(float) to rerun CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-26 14:05:12 -04:00
qazal
73484b0803 viz: generic shape tooltip/click handlers + renames (#10990)
* viz: generic tooltip

* assign kernel

* labelParts/label

* rect with a fillColor

* line
2025-06-26 19:14:04 +03:00
qazal
7f79c1388f viz: update y offset calculation (#10987)
* viz: update y offset calculation

* don't rescale padding
2025-06-26 12:05:20 +03:00
chenyu
49bba2f0a0 improve test_nll_loss (#10986)
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
chenyu
0612acfc70 improve Tensor.cross_entropy (#10985)
separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases
2025-06-26 01:39:48 -04:00
chenyu
8751d47985 CosineAnnealingLRWithWarmup (#10981) 2025-06-25 17:45:21 -04:00
Ignacio Sica
21f1c4cc09 remove some linearize calls from tests [pr] (#10978)
* remove some linearize calls from tests

speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd

* more clear assert message
2025-06-25 12:37:17 -07:00
chenyu
efad567ebd ruff check whole examples/mlperf/ (#10979) 2025-06-25 12:57:48 -04:00
Sieds Lykles
15e60caf09 add Ops.EQ (#10976) 2025-06-25 11:25:10 -04:00
Ignacio Sica
98d2cde293 revert tc_group feature (#10971) 2025-06-24 20:58:13 -07:00
George Hotz
306dbc76f6 early view simplify (#10974)
* shape const if it has a device [pr]

* early view simplify
2025-06-24 20:52:45 -07:00
b1tg
77fff73295 fix viz vscode link on windows (#10972)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-06-25 06:47:59 +03:00
George Hotz
9d995c2a4d shape const if it has a device [pr] (#10969) 2025-06-24 16:22:54 -07:00
George Hotz
cf60ccac6a support new const lowering (#10967)
* support new const lowering

* delete invalid linearizer failure tests
2025-06-24 15:21:41 -07:00
George Hotz
8a65720528 hotfix: disable test_tensor_core_opts_group test on real metal 2025-06-24 15:21:33 -07:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
uuuvn
c8d0f68763 Weaker renderer validation in remote (#10964)
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
    with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
                                                 ^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
    for x in GPUS: Device[x]
                   ~~~~~~^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
    def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
    ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
    if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
                                                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
2025-06-24 14:15:09 -07:00
George Hotz
c2f5f0f198 more robust reduce_gradient (#10965) 2025-06-24 14:09:33 -07:00
George Hotz
8743ca40e2 force reduce to be in axis order (#10837)
* force reduce to be in axis order

* disable rule causing loop

* disable that rule

* no ra there

* only move non reduce

* fix tests
2025-06-24 13:00:16 -07:00
chenyu
ffb032e31d test_diagonal touchup (#10962) 2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632 Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945)
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend

* cleanup

* generic fix

* tests

* cmp with diagonal too

* oops

* move tests

* fix test

* remove unnecessary import

* fix assert

* compare against numpy

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-06-24 15:36:06 -04:00
nimlgen
26ddf8d714 amd: rename dev_iface -> iface to match nv (#10959) 2025-06-24 20:22:19 +03:00
chenyu
bfa87f3490 clean up binary_crossentropy_logits (#10958) 2025-06-24 12:23:40 -04:00
qazal
2ccddfc0ca viz: match canvas fontsize (#10957)
it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com.
2025-06-24 19:07:06 +03:00
qazal
de4b9bf53b add opts_to_apply option to AST KernelInfo (#10950)
* proposal: add option to override opts in the get_program API

* update test_linearizer_rewrite

* state in uops

* update process_replay and names

* empty isn't none

* fix process replay
2025-06-24 18:55:39 +03:00
chenyu
18e264a449 Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
Ignacio Sica
f15247d2d2 remove outdated index masking in lowerer [pr] (#10953)
* add assert to check idx is never replaced with const 0

* remove outdated index masking
2025-06-24 07:53:30 -07:00
b1tg
cc32394b32 support copyin/copyout/is_allocated for subbuffers (#10869)
* support copyin/copyout/is_allocated for subbuffers

* simple

* clean up

* rm underlying_buf
* add function is_initialized
* add tests

* better test_subbuffer_copy_in_out

* fix allocator

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-06-24 07:49:04 -07:00
chenyu
35504c938e torch.clip(x,y) -> x.clip(y) in test_ops (#10954)
* torch.clip(x,y) -> x.clip(y) in test_ops

* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00