chenyu
0d8a0d7a96
update test_multi_const_folding_tensor to include pow ( #11635 )
...
pow folds now
2025-08-12 13:35:37 -04:00
Sieds Lykles
4d6e407eb0
Extend fast_idiv to negative ints ( #11632 )
...
* fast idiv for signed ints
* Add rule and test
* fix tests
* redo fuzz_fast_idiv to do negative ints as well
* adjust comments
* remove unused imports
2025-08-12 19:34:49 +02:00
qazal
17adbe86d8
hotfix: do not default to capturing args in track_rewrites ( #11634 )
2025-08-12 20:01:24 +03:00
geohotstan
ad9dec25b3
combine onnx parser and onnx ( #11485 )
...
* start
* more
* fix onnx_runner test
* pass
* patch for disk and add domains from huggingface
* simpler docs
* revert domain changes
* rerun ci
* revert onnx ops test change
* add fix from strenum stuff
* correct way
* revert correct way to leave the fix for another PR
* test segfault
* Revert "test segfault"
This reverts commit 4e1aaf41e7 .
* remove some unnecessary documentation
* test segfault again
* Revert "test segfault again"
This reverts commit 56fc5f03e7 .
* try gemini suggested patch for sys._getframe
* keep trying with gemini
* revert not working gemini suggestions and try faulthandler
* remove pythonfaulthandler
* trigger CI a few times
* minimize diff
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e
Take sign out of mod ( #11631 )
...
* Add rule and test
* fix tests
2025-08-12 18:44:36 +02:00
qazal
e28605e324
rename profile point event fields [pr] ( #11633 )
2025-08-12 19:11:21 +03:00
nimlgen
8a7be0a747
metal: workaround for transfers sync issue ( #11622 )
...
* metal: workaround for transfers sync issue
* metal tracsfer sync is broken
* hm
* rm it?
* keep it
2025-08-12 16:16:34 +03:00
qazal
efe8b5611d
move ProfilePointEvent out of device.py [pr] ( #11630 )
...
Generic profiling events exist in helpers so they can be imported from
everywhere in tinygrad.
2025-08-12 09:58:32 +03:00
chenyu
0d7075f2de
assign should broadcast input tensor ( #11629 )
...
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
Joshua Kissoon
c44760c89d
torch backend: fix arange, add linalg.cross, add tests ( #11628 )
2025-08-11 23:34:41 -04:00
George Hotz
ca41b5e38b
skip_0 in graph rewrite [pr] ( #11627 )
...
* skip_0 in graph rewrite [pr]
* no track_rewrites on test
* use dict instead of set
2025-08-11 18:29:04 -07:00
Sardor
ca7a641442
fix bugs at examples/yolov3.py ( #11614 )
...
* Update load_weight. Give valid model url
* Fix bug in iou function
2025-08-11 21:14:47 -04:00
chenyu
0c97d6de1b
don't round pow output for int pow int ( #11625 )
...
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850
support int Tensor pow to const non-negative int ( #11624 )
...
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc
fix test_arange_float_step ( #11623 )
2025-08-11 16:58:42 -04:00
chenyu
0806677b51
rewrite sort idx ( #11613 )
2025-08-11 16:20:56 -04:00
George Hotz
700c11597b
switch contextvars.ContextVar to _ContextVar ( #11621 )
2025-08-11 12:20:09 -07:00
ttomsa
ae0c3cfff6
change clang -march flag to -mcpu on arm ( #10970 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-08-11 13:38:48 -04:00
geohotstan
27bcb9fd1c
Support cubic mode for ONNX Resize OP ( #11612 )
...
* start
* add reference
* this is so much slower
* this makes sense but differs from official impl, but results are still correct..?
* add a comment
* Just keep it simple for now since I don't fully get it yet
* address comments
* correct
* teeny clean up
* another small comment improvement lol
2025-08-11 11:49:30 -04:00
nimlgen
d2bb1bcb97
cloud: a bit better err handling ( #11616 )
...
* cloud: err propagation to client
* fix
* print exc
* linter
* excs
* fix
* hm
* flaky
2025-08-11 15:51:22 +03:00
qazal
6a232ccdac
viz: add tiny range drawing helper ( #11620 )
...
* viz: add tiny range drawing helper
* less
2025-08-11 15:15:43 +03:00
qazal
e768773e13
viz: use colors helper ( #11618 )
2025-08-11 13:10:15 +03:00
qazal
7d6c0a8cc7
viz: refactor progress msg ( #11617 )
2025-08-11 13:01:36 +03:00
chenyu
630edcffd8
remove .float calls in olmoe ( #11610 )
...
still matches torch
2025-08-10 20:33:22 -04:00
chenyu
a67e0917c3
list indexing can normalize in python ( #11609 )
...
* list indexing can normalize in python
list index does not need to be normalized in tensor
* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2
few more tensor indexing test cases ( #11608 )
2025-08-10 18:56:42 -04:00
George Hotz
996c907c0b
rewrite not ready + children machinery ( #11607 )
...
* rewrite not ready + children machinery
* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00
Sieds Lykles
1875bc69f9
Late rewrite rules for CMPLT ( #11591 )
...
* add rules
* more rules
* fix comment spelling
* remove two rules
2025-08-10 22:18:13 +02:00
nimlgen
5403a4aeaf
null dev: support offset on buffers ( #11606 )
...
* null dev: support offset on buffers
* nolimit
2025-08-10 21:58:37 +03:00
geohotstan
b0dab6a4cd
onnx Resize OP clean up ( #11603 )
...
* start
* slight clean up
2025-08-10 14:10:39 -04:00
Sieds Lykles
10540414cd
Add Ops.CMPEQ ( #10431 )
...
* Add op
* add to Groupop.ALU
* fix spec
* fix ptx
* temporary pickle by name to see process replay
* add Ops.EQ to binary ops
* Actuall rename properly
* add test to assert CMPEQ is being used
* Ops.CMPEQ is automatic cast to bool
* add Ops.CMPEQ to llvm
* add Ops.CMPEQ to llvm
2025-08-10 13:13:16 +02:00
chenyu
f7aa1b85fe
minor sort cleanups ( #11602 )
2025-08-10 01:51:23 -04:00
chenyu
dfb702ef33
fix sort for small dim ( #11601 )
...
* fix sort for small dim
* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
ef17af85c6
remove .float call in llama logit ( #11598 )
...
* remove .float call in llama logit
* bfloat item
2025-08-10 00:02:18 -04:00
chenyu
dd3d2eb36c
add training llama3 test in ci ( #11599 )
2025-08-09 22:35:39 -04:00
chenyu
3e64467322
remove freqs_cis contiguous in llama ( #11597 )
2025-08-09 21:11:12 -04:00
chenyu
7338ffead0
small beautiful_mnist update ( #11596 )
...
gather is fast now. there's a conv/bw kernel that only gets fast with BEAM, but whole thing runs < 5 seconds now regardless
2025-08-09 19:51:14 -04:00
chenyu
45baec1aab
model parallel llama ( #11588 )
...
MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
2025-08-09 16:54:27 -04:00
nimlgen
09bc377da3
search: print runtime failures on debug ( #11593 )
2025-08-09 23:01:19 +03:00
nimlgen
14f99ff1a1
amd: doorbell_cpu_addr is not used ( #11592 )
...
* amd: doorbell_cpu_addr is not used
* hm
2025-08-09 20:03:21 +03:00
Sieds Lykles
01c770c77b
Fix z3 float cast in indexing ( #11590 )
...
* adjust dtype of z3_renderer and add rule for cast
* dtypes.bool is also cast noop
* add regression test
* make embedding smaller
* even smaller test
2025-08-09 17:59:23 +02:00
Sieds Lykles
10d388499d
Refactor optional.py ( #11578 )
...
* move fast_idiv to transcendental
* move optional.py
* adjust comment
* change import
* mypy needs this?
2025-08-09 17:35:05 +02:00
nimlgen
20e46a175c
do not use disk with usb ( #11119 )
...
* not use disk with usb
* better name
2025-08-09 11:58:02 +03:00
qazal
53179953fc
viz: factor out memory graph render ( #11586 )
2025-08-08 20:18:11 +03:00
qazal
8ce72d3fad
simpler disassembly table spec ( #11583 )
...
* simpler disassembly table spec
* update ui
* move to scalar/vec render
2025-08-08 17:59:26 +03:00
qazal
44a222a9b2
viz: move resource usage summary to server ( #11582 )
2025-08-08 17:08:28 +03:00
qazal
793ace530e
update amd_uop_matmul.py import ( #11581 )
...
Using this for testing SQTT
2025-08-08 17:07:35 +03:00
chenyu
b232c60def
benchmark openpilot 0.9.9 ( #11575 )
...
* benchmark openpilot 0.9.9
not sure what to do with the 0.9.7 ones with IMAGE=2 and validate
* name
2025-08-08 01:26:14 -04:00
qazal
16f0edbe90
pass opts arg in get_program process replay [pr] ( #11571 )
...
* fix ptx process replay
* keyword arg
* renderer is also optional [pr]
* test_linearizer fixup
* name function order is args,ret,kwargs
* can use opts_to_apply
* pass through p.applied_opts
* sink_arg
* now it opens devices too
2025-08-08 03:05:09 +03:00
qazal
960cc6533a
pass through name function args in track_rewrites ( #11572 )
2025-08-08 02:28:52 +03:00