qazal
e3d024afa0
viz: split into scale, shapes, axes last ( #11018 )
...
* viz: split into scale, shapes, axes last
* set zoom on render
2025-06-28 19:10:58 +03:00
qazal
508bc68078
viz: small fixups from memory graph ( #11017 )
...
* don't need div.id
* tooltip z-index
2025-06-28 16:34:14 +03:00
qazal
fc3e509822
viz: new canvas on first render ( #11016 )
2025-06-28 16:04:51 +03:00
chenyu
c14c9a8eff
llama3 grad clip ( #11003 )
2025-06-27 19:14:12 -04:00
nimlgen
e53673a0b2
amd: sdma queue overrun fix ( #11012 )
...
* amd: sdma queue overrun fix
* add ()
* fix
* bug
* this is correct
2025-06-28 01:42:03 +03:00
chenyu
f2548afeb5
bert grad clipping start with const 0 ( #11008 )
...
saved the init kernels
2025-06-27 18:02:23 -04:00
chenyu
a6485d00c8
very tiny generate_dataset ( #11013 )
...
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
qazal
382fa6a325
viz: support axis colors in UOp nodes ( #11009 )
...
* work
* javascript
* optional defaultColor
* fine
2025-06-27 23:02:55 +03:00
qazal
44257f25e4
bump line count to 14600 ( #11010 )
2025-06-27 22:48:14 +03:00
George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG ( #11006 )
...
* rename DEFINE_ACC -> DEFINE_REG
* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
05c35d0db8
reorder ops and add comments ( #11005 )
2025-06-27 10:52:14 -07:00
George Hotz
5a1911b7c4
apply the global dims late ( #11002 )
...
* apply the global dims late [pr]
* late gpudims
* tests passing
* remove the random local_dims inc
* simpler
2025-06-27 09:54:34 -07:00
qazal
4ef10c57f9
remove unused test helper ( #10999 )
2025-06-27 13:48:48 +03:00
qazal
a39343e39f
viz: move timeline layout to python ( #10998 )
...
* viz: move timeline layout to python
* DevEvent has a device and a name
2025-06-27 13:06:00 +03:00
George Hotz
b4eb876d5a
kernel.py no longer permutes reduce axis [pr] ( #10968 )
...
* kernel.py no longer permutes reduce axis [pr]
* delete tests that handcode uops
* regen of sops is broken...
* put import back
* just remove that
* disable those tests
2025-06-26 17:44:58 -07:00
chenyu
6ab5a5cb6c
llama3 mlperf train ( #10983 )
...
work in progress. now it can overfit small examples and vram roughly matches
2025-06-26 20:24:27 -04:00
George Hotz
856759c79c
add halide example ( #10980 )
...
* add halide example
* upd halide gemm
* partial works
* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46
move perfetto to extra ( #10994 )
...
* move perfetto to extra
* update TestViz and fix tests
* remove perfetto.html from viz directory
* work
* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167
fix extract_dataset + add tests to CI ( #10995 )
...
* fix extract_dataset + tests
* add CI
* sops.gz itself is same as master
* yml + gzip -c + ge
* don't commit that
* bump limit to 1000
* axis=7
* test_tiny
2025-06-27 01:51:36 +03:00
chenyu
4572e65f0f
remove duplicated move_early logic in UOp.r [pr] ( #10993 )
2025-06-26 18:33:54 -04:00
Ignacio Sica
579194f523
remove some linearize calls from tests 2 [pr] ( #10992 )
...
* refactor count_float4 to take uops as input instead of kernel
* remove some calls to linearize in test_linearizer
* remove some more calls
* remove one more call
2025-06-26 18:22:27 -03:00
geohotstan
50936b4a18
ONNX real float16 ( #10694 )
...
* squash commits
* temp fix for const tensor
* actually realizing float16 can only happen in raw_data
* .float -> cast(float) to rerun CI
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-26 14:05:12 -04:00
qazal
73484b0803
viz: generic shape tooltip/click handlers + renames ( #10990 )
...
* viz: generic tooltip
* assign kernel
* labelParts/label
* rect with a fillColor
* line
2025-06-26 19:14:04 +03:00
qazal
7f79c1388f
viz: update y offset calculation ( #10987 )
...
* viz: update y offset calculation
* don't rescale padding
2025-06-26 12:05:20 +03:00
chenyu
49bba2f0a0
improve test_nll_loss ( #10986 )
...
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
chenyu
0612acfc70
improve Tensor.cross_entropy ( #10985 )
...
separate when Y is prob vs indices and check shapes for indices. also fix higher dim cases
2025-06-26 01:39:48 -04:00
chenyu
8751d47985
CosineAnnealingLRWithWarmup ( #10981 )
2025-06-25 17:45:21 -04:00
Ignacio Sica
21f1c4cc09
remove some linearize calls from tests [pr] ( #10978 )
...
* remove some linearize calls from tests
speed_compare_cuda_ptx
test_uop_spec
test_linearizer
test_uops
test_winograd
* more clear assert message
2025-06-25 12:37:17 -07:00
chenyu
efad567ebd
ruff check whole examples/mlperf/ ( #10979 )
2025-06-25 12:57:48 -04:00
Sieds Lykles
15e60caf09
add Ops.EQ ( #10976 )
2025-06-25 11:25:10 -04:00
Ignacio Sica
98d2cde293
revert tc_group feature ( #10971 )
2025-06-24 20:58:13 -07:00
George Hotz
306dbc76f6
early view simplify ( #10974 )
...
* shape const if it has a device [pr]
* early view simplify
2025-06-24 20:52:45 -07:00
b1tg
77fff73295
fix viz vscode link on windows ( #10972 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-06-25 06:47:59 +03:00
George Hotz
9d995c2a4d
shape const if it has a device [pr] ( #10969 )
2025-06-24 16:22:54 -07:00
George Hotz
cf60ccac6a
support new const lowering ( #10967 )
...
* support new const lowering
* delete invalid linearizer failure tests
2025-06-24 15:21:41 -07:00
George Hotz
8a65720528
hotfix: disable test_tensor_core_opts_group test on real metal
2025-06-24 15:21:33 -07:00
nimlgen
1c45b9f7fb
start nvpci ( #10521 )
...
* start nvpci
* talk to fsp
* boot args
* riscv core bootted
* q
* agen
* got gsp init msg
* some fixes
* set registry, stuck aft lockdown(
* start ga/ad port
* gsp init on ada
* more classes allocated
* more
* mm
* fixes and progress
* no huge pages for now
* mm seems workin, but switch to 512mb page for simplicity
* working state
* not cleaned
* claned
* nvd=1
* start gr ctx
* compute
* clean 1
* cleanup 2
* cleanup 3
* cleaner 4
* cleaner 6
* add iface to nv
* save before reboot
* merged into NV
* moveout mm
* post merge
* cleaner 7
* merge and rebase
* pciiface abstraction + reset
* download fw from web
* print logs
* minor changes + p2p
* cleaner 8
* cleaner 9
* cleaner 10
* delete
* delete this as well
* linter 1
* oops
* priv_client -> priv_root
* fix mypy
* mypy?
* mypy?
* small changes
* shorter
* ops
* remove this
* do not allocate paddr for reserve
* nodiff
* unified script
* ops
* dif ver
* add lock
* setup
2025-06-25 00:37:34 +03:00
uuuvn
c8d0f68763
Weaker renderer validation in remote ( #10964 )
...
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
for x in GPUS: Device[x]
~~~~~~^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
2025-06-24 14:15:09 -07:00
George Hotz
c2f5f0f198
more robust reduce_gradient ( #10965 )
2025-06-24 14:09:33 -07:00
George Hotz
8743ca40e2
force reduce to be in axis order ( #10837 )
...
* force reduce to be in axis order
* disable rule causing loop
* disable that rule
* no ra there
* only move non reduce
* fix tests
2025-06-24 13:00:16 -07:00
chenyu
ffb032e31d
test_diagonal touchup ( #10962 )
2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632
Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops ( #10945 )
...
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend
* cleanup
* generic fix
* tests
* cmp with diagonal too
* oops
* move tests
* fix test
* remove unnecessary import
* fix assert
* compare against numpy
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-06-24 15:36:06 -04:00
nimlgen
26ddf8d714
amd: rename dev_iface -> iface to match nv ( #10959 )
2025-06-24 20:22:19 +03:00
chenyu
bfa87f3490
clean up binary_crossentropy_logits ( #10958 )
2025-06-24 12:23:40 -04:00
qazal
2ccddfc0ca
viz: match canvas fontsize ( #10957 )
...
it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com .
2025-06-24 19:07:06 +03:00
qazal
de4b9bf53b
add opts_to_apply option to AST KernelInfo ( #10950 )
...
* proposal: add option to override opts in the get_program API
* update test_linearizer_rewrite
* state in uops
* update process_replay and names
* empty isn't none
* fix process replay
2025-06-24 18:55:39 +03:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
Ignacio Sica
f15247d2d2
remove outdated index masking in lowerer [pr] ( #10953 )
...
* add assert to check idx is never replaced with const 0
* remove outdated index masking
2025-06-24 07:53:30 -07:00
b1tg
cc32394b32
support copyin/copyout/is_allocated for subbuffers ( #10869 )
...
* support copyin/copyout/is_allocated for subbuffers
* simple
* clean up
* rm underlying_buf
* add function is_initialized
* add tests
* better test_subbuffer_copy_in_out
* fix allocator
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-06-24 07:49:04 -07:00
chenyu
35504c938e
torch.clip(x,y) -> x.clip(y) in test_ops ( #10954 )
...
* torch.clip(x,y) -> x.clip(y) in test_ops
* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00