Commit Graph

10633 Commits

Author SHA1 Message Date
George Hotz
cf60ccac6a support new const lowering (#10967)
* support new const lowering

* delete invalid linearizer failure tests
2025-06-24 15:21:41 -07:00
George Hotz
8a65720528 hotfix: disable test_tensor_core_opts_group test on real metal 2025-06-24 15:21:33 -07:00
nimlgen
1c45b9f7fb start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
uuuvn
c8d0f68763 Weaker renderer validation in remote (#10964)
```
training bert
training on ['REMOTE:0', 'REMOTE:1', 'REMOTE:2', 'REMOTE:3', 'REMOTE:4', 'REMOTE:5']
Traceback (most recent call last):
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 1300, in <module>
    with Profiling(enabled=getenv("PYPROFILE")): globals()[nm]()
                                                 ^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/examples/mlperf/model_train.py", line 975, in train_bert
    for x in GPUS: Device[x]
                   ~~~~~~^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 22, in __getitem__
    def __getitem__(self, ix:str) -> Compiled: return self.__get_canonicalized_item(self.canonicalize(ix))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/device.py", line 28, in __get_canonicalized_item
    ret = [cls for cname, cls in inspect.getmembers(importlib.import_module(f'{base}.runtime.ops_{x}')) \
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uuuvn/src/tinygrad/tinygrad/runtime/ops_remote.py", line 417, in __init__
    if not renderer[0].startswith("tinygrad.renderer.") or not renderer[1].endswith("Renderer"): raise RuntimeError(f"bad renderer {renderer}")
                                                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: bad renderer ('tinygrad.runtime.ops_null', 'NullRenderer', ())
```
2025-06-24 14:15:09 -07:00
George Hotz
c2f5f0f198 more robust reduce_gradient (#10965) 2025-06-24 14:09:33 -07:00
George Hotz
8743ca40e2 force reduce to be in axis order (#10837)
* force reduce to be in axis order

* disable rule causing loop

* disable that rule

* no ra there

* only move non reduce

* fix tests
2025-06-24 13:00:16 -07:00
chenyu
ffb032e31d test_diagonal touchup (#10962) 2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632 Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945)
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend

* cleanup

* generic fix

* tests

* cmp with diagonal too

* oops

* move tests

* fix test

* remove unnecessary import

* fix assert

* compare against numpy

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-06-24 15:36:06 -04:00
nimlgen
26ddf8d714 amd: rename dev_iface -> iface to match nv (#10959) 2025-06-24 20:22:19 +03:00
chenyu
bfa87f3490 clean up binary_crossentropy_logits (#10958) 2025-06-24 12:23:40 -04:00
qazal
2ccddfc0ca viz: match canvas fontsize (#10957)
it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com.
2025-06-24 19:07:06 +03:00
qazal
de4b9bf53b add opts_to_apply option to AST KernelInfo (#10950)
* proposal: add option to override opts in the get_program API

* update test_linearizer_rewrite

* state in uops

* update process_replay and names

* empty isn't none

* fix process replay
2025-06-24 18:55:39 +03:00
chenyu
18e264a449 Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
Ignacio Sica
f15247d2d2 remove outdated index masking in lowerer [pr] (#10953)
* add assert to check idx is never replaced with const 0

* remove outdated index masking
2025-06-24 07:53:30 -07:00
b1tg
cc32394b32 support copyin/copyout/is_allocated for subbuffers (#10869)
* support copyin/copyout/is_allocated for subbuffers

* simple

* clean up

* rm underlying_buf
* add function is_initialized
* add tests

* better test_subbuffer_copy_in_out

* fix allocator

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-06-24 07:49:04 -07:00
chenyu
35504c938e torch.clip(x,y) -> x.clip(y) in test_ops (#10954)
* torch.clip(x,y) -> x.clip(y) in test_ops

* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00
Fang-Pen Lin
86d458533f Add pos_weight for binary_crossentropy_logits (#10855)
* Add pos_weight for binary_crossentropy_logits

* Remove debug code

* Code style

* Code style

* Rename
2025-06-24 09:42:37 -04:00
Sieds Lykles
61dad3740f fix min_max and add test (#10952) 2025-06-24 09:33:26 -04:00
qazal
ab8c5d04ab viz: convert to function_name in server [pr] (#10951)
* viz: convert to function_name in server [pr]

* it exists
2025-06-24 13:59:37 +03:00
nimlgen
c0d9cf09e0 system: flock (#10949)
* system: flock

* imports

* xx
2025-06-24 11:33:49 +03:00
nimlgen
5202970feb system: move memory_barrier to System (#10948)
* system: move memory_barrier to System

* fixed
2025-06-24 11:09:43 +03:00
qazal
f41c28a048 update test_tensor_uop_representation comments [pr] (#10946)
These comments can update to match new tinygrad.
2025-06-24 10:47:09 +03:00
qazal
7a5e4e0bf1 fix unittests process replay [pr] (#10947) 2025-06-24 10:30:23 +03:00
George Hotz
7d560dbd75 hotfix: corealize in the tiny mnist test 2025-06-23 17:41:16 -07:00
Alexey Zaytsev
230ad3a460 [bounty] Don't use numpy inside hlb_cifar10 training loop (#10777)
* Don't use numpy inside hlb_cifar10 training loop

* Lint it

* jit it

* Drop the last half-batch

* Use gather for random_crop and reuse perms

* Wrap train_cifar in FUSE_ARANGE context

* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py

* Add cutmix to jittable augmentations

* Remove .contiguous() from fetch_batches

* Fix indexing boundary

---------

Co-authored-by: Irwin1138 <irwin1139@gmail.com>
2025-06-23 17:24:56 -07:00
George Hotz
383010555f delete linearize and to_program from kernel.py (#10943) 2025-06-23 17:04:05 -07:00
George Hotz
0f89660ce4 Revert "change clang -march flag to -mcpu on arm (#10841)" (#10942)
This reverts commit 897e42fd1b.
2025-06-23 16:48:28 -07:00
Ignacio Sica
956a8391a5 minor cleanup on test_tensor_core_opts tests (#10924)
* minor cleanup on test_tensor_core_opts tests

Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup

* refactor test_tensor_core_opts_group

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-06-23 16:30:21 -07:00
ttomsa
897e42fd1b change clang -march flag to -mcpu on arm (#10841)
* change clang -march flag to -mcpu with fp16 disassembly test

* fix

* add capstone to macos dependencies

* just check no cast in test

* rm import

* woops

* lets check

* move check

* llvm init before cpu chcek

* try this

* bump autogen llvm version

* also update libclang?

* revert

* add comment

* skip llvm test and add comment

* linter
2025-06-23 16:28:48 -07:00
Sieds Lykles
772cd02ad2 Perform index validation on load/store, not on the index (#10849)
* move index validation to load/stores

* add name

* add linearizer_failure

* add validate_store with implicit gates

* linearizer_failure_58 is fixed!

* add test_uop_graph test

* rename cond to gate

* test gated load/stores

* use or_casted()
2025-06-23 16:25:05 -07:00
George Hotz
ae4d2d71b4 bump line count to 14500 2025-06-23 15:32:27 -07:00
Harsh Natuskar
79d7cdd9ba Fix device (#10929)
* fix: pkg

* better

* added test

* less lines
2025-06-23 15:30:19 -07:00
George Hotz
e15754db28 remove (some) kernelize from llama and test schedule speed (#10939)
* remove kernelize from llama

* 405B

* space
2025-06-23 15:07:31 -07:00
chenyu
3699d1d3ba hotfix llama3 temperature is float (#10938) 2025-06-23 15:20:56 -04:00
uuuvn
4e2c9e36c7 Remote multihost (p2p transfer) (#10601) 2025-06-23 11:47:29 -07:00
chenyu
42b1c9625b skip test TestKiTS19Dataset::test_training_set (#10936)
flaky
2025-06-23 14:27:24 -04:00
patrini32
9e9fd44987 refactor test/external/external_llama_eval.py (#10567)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-23 10:43:20 -07:00
chenyu
785b4ea8ac optim flatten().shape[0] is numel (#10935) 2025-06-23 13:11:19 -04:00
qazal
ac39f27ae6 viz: non blocking UOp tracing (#10913)
* viz: non blocking UOp tracing

* u.arg

* no if Ops.KENREL

* drop replace

* switch to weakref.WeakKeyDictionary

* back

* remove ram usage skips, viz works here

* cache on reconstruct
2025-06-23 19:59:28 +03:00
Ignacio Sica
b8d09a1dae tc with group/grouptop (#10903) 2025-06-23 09:58:41 -07:00
qazal
9944c2c02d viz: show time taken on hover (#10934) 2025-06-23 19:00:40 +03:00
George Hotz
1e99a7f1c9 hotfix: don't viz the indexing rewrites 2025-06-23 08:20:26 -07:00
chenyu
f9b59924f1 OPTIM_DTYPE to specify dtype for optim params (#10925)
one more flag
2025-06-23 10:32:03 -04:00
qazal
7820aeca8e update codegen process replay to use get_program [pr] (#10921)
* update codegen process replay to get_program [pr]

* precommit

* try str replace

* +to_function_name

* fixup tc

* local2.sh

* fix openpilot NOLOCALS

* new local.sh

* correct merge

* beam cache

* back

* revert beam thing

* adding opts_override and name_override makes output of get_program
reproducible

* min diff
2025-06-23 17:31:41 +03:00
nimlgen
eceb7a00d2 nv: rename iface mem functions (#10931) 2025-06-23 16:34:51 +03:00
qazal
4e864bd304 fix: getenv("NOLOCALS")/NOLOCALS context var (#10927)
OptOps shouldn't rely on os.environ.
2025-06-23 11:23:59 +03:00
alpharush
22f9696522 Fix/hcqfuzz harnesss bug (#10923)
* update command so extra module is found

* fix empty range in randrange errors

* lint
2025-06-23 11:22:30 +03:00
qazal
f037f85532 s/getenv("TC")/USE_TC context var (#10922) 2025-06-23 00:39:45 +03:00
qazal
9201224e0b viz: remove Kernel check [pr] (#10920)
* viz: remove Kernel check [pr]

* TestVizIntegration

* test/unit allows opening of devices

* kernel -> Kernel
2025-06-22 20:47:54 +03:00
nimlgen
3ccdb2356b system: factor out PCIIfaceBase (#10917)
* system: factor out PCIIfaceBase

* linter

* typing
2025-06-22 20:03:14 +03:00