Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
0d8a0d7a96 update test_multi_const_folding_tensor to include pow (#11635)
pow folds now
2025-08-12 13:35:37 -04:00
Sieds Lykles
4d6e407eb0 Extend fast_idiv to negative ints (#11632)
* fast idiv for signed ints

* Add rule and test

* fix tests

* redo fuzz_fast_idiv to do negative ints as well

* adjust comments

* remove unused imports
2025-08-12 19:34:49 +02:00
qazal
17adbe86d8 hotfix: do not default to capturing args in track_rewrites (#11634) 2025-08-12 20:01:24 +03:00
geohotstan
ad9dec25b3 combine onnx parser and onnx (#11485)
* start

* more

* fix onnx_runner test

* pass

* patch for disk and add domains from huggingface

* simpler docs

* revert domain changes

* rerun ci

* revert onnx ops test change

* add fix from strenum stuff

* correct way

* revert correct way to leave the fix for another PR

* test segfault

* Revert "test segfault"

This reverts commit 4e1aaf41e7.

* remove some unnecessary documentation

* test segfault again

* Revert "test segfault again"

This reverts commit 56fc5f03e7.

* try gemini suggested patch for sys._getframe

* keep trying with gemini

* revert not working gemini suggestions and try faulthandler

* remove pythonfaulthandler

* trigger CI a few times

* minimize diff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e Take sign out of mod (#11631)
* Add rule and test

* fix tests
2025-08-12 18:44:36 +02:00
qazal
e28605e324 rename profile point event fields [pr] (#11633) 2025-08-12 19:11:21 +03:00
nimlgen
8a7be0a747 metal: workaround for transfers sync issue (#11622)
* metal: workaround for transfers sync issue

* metal tracsfer sync is broken

* hm

* rm it?

* keep it
2025-08-12 16:16:34 +03:00
qazal
efe8b5611d move ProfilePointEvent out of device.py [pr] (#11630)
Generic profiling events exist in helpers so they can be imported from
everywhere in tinygrad.
2025-08-12 09:58:32 +03:00
chenyu
0d7075f2de assign should broadcast input tensor (#11629)
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
Joshua Kissoon
c44760c89d torch backend: fix arange, add linalg.cross, add tests (#11628) 2025-08-11 23:34:41 -04:00
George Hotz
ca41b5e38b skip_0 in graph rewrite [pr] (#11627)
* skip_0 in graph rewrite [pr]

* no track_rewrites on test

* use dict instead of set
2025-08-11 18:29:04 -07:00
Sardor
ca7a641442 fix bugs at examples/yolov3.py (#11614)
* Update load_weight. Give valid model url

* Fix bug in iou function
2025-08-11 21:14:47 -04:00
chenyu
0c97d6de1b don't round pow output for int pow int (#11625)
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850 support int Tensor pow to const non-negative int (#11624)
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc fix test_arange_float_step (#11623) 2025-08-11 16:58:42 -04:00
chenyu
0806677b51 rewrite sort idx (#11613) 2025-08-11 16:20:56 -04:00
George Hotz
700c11597b switch contextvars.ContextVar to _ContextVar (#11621) 2025-08-11 12:20:09 -07:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
geohotstan
27bcb9fd1c Support cubic mode for ONNX Resize OP (#11612)
* start

* add reference

* this is so much slower

* this makes sense but differs from official impl, but results are still correct..?

* add a comment

* Just keep it simple for now since I don't fully get it yet

* address comments

* correct

* teeny clean up

* another small comment improvement lol
2025-08-11 11:49:30 -04:00
nimlgen
d2bb1bcb97 cloud: a bit better err handling (#11616)
* cloud: err propagation to client

* fix

* print exc

* linter

* excs

* fix

* hm

* flaky
2025-08-11 15:51:22 +03:00
qazal
6a232ccdac viz: add tiny range drawing helper (#11620)
* viz: add tiny range drawing helper

* less
2025-08-11 15:15:43 +03:00
qazal
e768773e13 viz: use colors helper (#11618) 2025-08-11 13:10:15 +03:00
qazal
7d6c0a8cc7 viz: refactor progress msg (#11617) 2025-08-11 13:01:36 +03:00
chenyu
630edcffd8 remove .float calls in olmoe (#11610)
still matches torch
2025-08-10 20:33:22 -04:00
chenyu
a67e0917c3 list indexing can normalize in python (#11609)
* list indexing can normalize in python

list index does not need to be normalized in tensor

* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2 few more tensor indexing test cases (#11608) 2025-08-10 18:56:42 -04:00
George Hotz
996c907c0b rewrite not ready + children machinery (#11607)
* rewrite not ready + children machinery

* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00
Sieds Lykles
1875bc69f9 Late rewrite rules for CMPLT (#11591)
* add rules

* more rules

* fix comment spelling

* remove two rules
2025-08-10 22:18:13 +02:00
nimlgen
5403a4aeaf null dev: support offset on buffers (#11606)
* null dev: support offset on buffers

* nolimit
2025-08-10 21:58:37 +03:00
geohotstan
b0dab6a4cd onnx Resize OP clean up (#11603)
* start

* slight clean up
2025-08-10 14:10:39 -04:00
Sieds Lykles
10540414cd Add Ops.CMPEQ (#10431)
* Add op

* add to Groupop.ALU

* fix spec

* fix ptx

* temporary pickle by name to see process replay

* add Ops.EQ to binary ops

* Actuall rename properly

* add test to assert CMPEQ is being used

* Ops.CMPEQ is automatic cast to bool

* add Ops.CMPEQ to llvm

* add Ops.CMPEQ to llvm
2025-08-10 13:13:16 +02:00
chenyu
f7aa1b85fe minor sort cleanups (#11602) 2025-08-10 01:51:23 -04:00
chenyu
dfb702ef33 fix sort for small dim (#11601)
* fix sort for small dim

* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
ef17af85c6 remove .float call in llama logit (#11598)
* remove .float call in llama logit

* bfloat item
2025-08-10 00:02:18 -04:00
chenyu
dd3d2eb36c add training llama3 test in ci (#11599) 2025-08-09 22:35:39 -04:00
chenyu
3e64467322 remove freqs_cis contiguous in llama (#11597) 2025-08-09 21:11:12 -04:00
chenyu
7338ffead0 small beautiful_mnist update (#11596)
gather is fast now. there's a conv/bw kernel that only gets fast with BEAM, but whole thing runs < 5 seconds now regardless
2025-08-09 19:51:14 -04:00
chenyu
45baec1aab model parallel llama (#11588)
MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
2025-08-09 16:54:27 -04:00
nimlgen
09bc377da3 search: print runtime failures on debug (#11593) 2025-08-09 23:01:19 +03:00
nimlgen
14f99ff1a1 amd: doorbell_cpu_addr is not used (#11592)
* amd: doorbell_cpu_addr is not used

* hm
2025-08-09 20:03:21 +03:00
Sieds Lykles
01c770c77b Fix z3 float cast in indexing (#11590)
* adjust dtype of z3_renderer and add rule for cast

* dtypes.bool is also cast noop

* add regression test

* make embedding smaller

* even smaller test
2025-08-09 17:59:23 +02:00
Sieds Lykles
10d388499d Refactor optional.py (#11578)
* move fast_idiv to transcendental

* move optional.py

* adjust comment

* change import

* mypy needs this?
2025-08-09 17:35:05 +02:00
nimlgen
20e46a175c do not use disk with usb (#11119)
* not use disk with usb

* better name
2025-08-09 11:58:02 +03:00
qazal
53179953fc viz: factor out memory graph render (#11586) 2025-08-08 20:18:11 +03:00
qazal
8ce72d3fad simpler disassembly table spec (#11583)
* simpler disassembly table spec

* update ui

* move to scalar/vec render
2025-08-08 17:59:26 +03:00
qazal
44a222a9b2 viz: move resource usage summary to server (#11582) 2025-08-08 17:08:28 +03:00
qazal
793ace530e update amd_uop_matmul.py import (#11581)
Using this for testing SQTT
2025-08-08 17:07:35 +03:00
chenyu
b232c60def benchmark openpilot 0.9.9 (#11575)
* benchmark openpilot 0.9.9

not sure what to do with the 0.9.7 ones with IMAGE=2 and validate

* name
2025-08-08 01:26:14 -04:00
qazal
16f0edbe90 pass opts arg in get_program process replay [pr] (#11571)
* fix ptx process replay

* keyword arg

* renderer is also optional [pr]

* test_linearizer fixup

* name function order is args,ret,kwargs

* can use opts_to_apply

* pass through p.applied_opts

* sink_arg

* now it opens devices too
2025-08-08 03:05:09 +03:00
qazal
960cc6533a pass through name function args in track_rewrites (#11572) 2025-08-08 02:28:52 +03:00