George Hotz
7c5e115747
test_mismatch_reduce ( #11538 )
2025-08-06 10:02:14 -07:00
George Hotz
4fe11725c6
pass through sink arg, update linearizer test ( #11536 )
...
* pass through sink arg, update linearizer test
* get_program help
* bump line count
* use new api
2025-08-06 09:48:48 -07:00
George Hotz
bfebb5c37b
do store in the replace_buffers ( #11535 )
2025-08-06 08:42:45 -07:00
geohotstan
1163292759
move onnx_parser into onnx ( #11530 )
2025-08-06 10:46:27 -04:00
George Hotz
7b16fadd87
load view late + simpler rewrite ( #11525 )
...
* add the load view later
* simpler replace buffers
* rewrite name
2025-08-06 06:55:11 -07:00
nimlgen
930d8dae0c
hcq: lazy prof signal allocation ( #11531 )
2025-08-06 15:28:11 +03:00
nimlgen
eafc7fda12
upd perfetto ( #11528 )
2025-08-06 14:00:34 +03:00
nimlgen
1afb290027
ci: fix runner in nv ( #11527 )
2025-08-06 10:38:04 +03:00
qazal
61dae0685c
viz: show total mem in tooltip ( #11526 )
2025-08-06 06:51:26 +03:00
George Hotz
cf66df0ea6
put load early to make pointers match ( #11524 )
2025-08-05 20:04:32 -07:00
George Hotz
92175626e3
prereqs: move views to codegen ( #11522 )
2025-08-05 19:27:58 -07:00
chenyu
c9225d22ce
only disable flaky test_jit_multidev_xfer ( #11523 )
2025-08-05 22:17:25 -04:00
George Hotz
f58fd3143d
cleanup fix_kernel ( #11520 )
...
* cleanup fix_kernel
* early load buffer
* early meta ops
* move those to fix_kernel_ops
* fix tests
* remote metal was flaky
* Revert "fix tests"
This reverts commit a27019383d .
* that hack broke things
* fine for ptx
2025-08-05 18:38:43 -07:00
George Hotz
067daee5be
pin torch to 2.7.1 ( #11519 )
2025-08-05 15:58:57 -07:00
George Hotz
b39f43c46a
optimize in rewrite, try 2 ( #11518 )
...
* changes
* fix test uops
* optimize in rewrite, try 2
2025-08-05 15:52:53 -07:00
George Hotz
07b0df0d86
hotfix: test tensor dims start at 1
2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d
Revert "optimize in rewrite ( #11516 )" ( #11517 )
...
This reverts commit 3b777a9e05 .
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05
optimize in rewrite ( #11516 )
...
* changes
* fix test uops
* dim shouldn't be 0
* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
ec676eddfa
nv: move base address higher ( #11514 )
2025-08-05 22:42:53 +03:00
qazal
7703f8b805
viz: skip flops info if estimates is symbolic ( #11513 )
2025-08-05 22:12:52 +03:00
nimlgen
fc4e713d1c
jit graph split tests ( #11507 )
...
* jit graph split tests
* fix
* one more test
* more tests
* fix
* xm
* rmeote
2025-08-05 21:32:37 +03:00
George Hotz
c57fde51f9
move swizzler to opt ( #11509 )
2025-08-05 11:31:30 -07:00
chenyu
ace8e9a706
fix test_conv2d_winograd ( #11511 )
2025-08-05 12:15:46 -04:00
chenyu
223aaa0492
clean up more conv tests ( #11510 )
2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23
extract conv layer test logic ( #11488 )
...
* refactor: extract conv layer test logic
* tuple is unnecessary
* integrate _test_conv logic into all conv tests
* fix linter, forgot dilation
* undo winograd extraction
adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
b1tg
8b8bd6c534
make einsum generate same kernels ( #11508 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-05 11:12:52 -04:00
uuuvn
011ef8fa9d
Fix incorrect jit current batch devs reset ( #11505 )
...
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly
which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d
fix fuse gate_contiguous unique ( #11504 )
2025-08-04 23:43:31 -04:00
George Hotz
7f6acfb0d5
give define global and friends a shape ( #11502 )
...
* give define global and friends a shape
* ignore negative size
* ptx fix
2025-08-04 19:09:39 -07:00
chenyu
83385e7abc
update gradient src in ramp.py ( #11499 )
...
that's simplified now
2025-08-04 18:58:03 -04:00
qazal
846a2826ab
viz: remove TracingKey.fmt ( #11482 )
...
* viz: remove TracingKey.fmt
* remove from test too
2025-08-05 00:00:03 +03:00
chenyu
01d44e8f16
tiny reduce_gradient cleanup [pr] ( #11498 )
2025-08-04 16:12:53 -04:00
chenyu
8a11af01ed
remove broken paperswithcode links in doc ( #11497 )
2025-08-04 13:12:33 -04:00
leopf
4f0ee4e982
BPE tokenizer ( #11415 )
...
* BPE works
* refactor tok
* oops
* basic tests
* fix eval
* smaller diff
* fix error
* proper vocab decoding
* use regex for splitting
* escape ucatrange
* full compat
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236
fix double exception + add name,loc in error msg ( #11487 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-04 13:41:23 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well ( #11494 )
2025-08-04 11:47:07 +03:00
chenyu
e0106b6b25
1/(x*c) -> (1/c)*(1/x) ( #11491 )
...
example: 2*(2*a).reciprocal() -> a.reciprocal()
# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
qazal
5870352fe1
viz: factorize llvm-mca call ( #11490 )
2025-08-04 00:31:23 +03:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit ( #11489 )
...
TestSample still fails?
2025-08-03 13:02:44 -07:00
nimlgen
8f374ee1f7
nv: print devfmr in gsp logs ( #11484 )
2025-08-03 15:12:53 +03:00
chenyu
823f1a01db
move cast around expand backward to tensor.py ( #11483 )
2025-08-02 23:03:54 -04:00
chenyu
0ce0f51010
generic double cast folding ( #11481 )
...
b.cast(a).cast(b) -> b if a preserves all values in b
2025-08-02 19:26:37 -04:00
qazal
72e0d1d0dc
viz: profile the compiler in TINY device ( #11457 )
...
* viz: profile the compiler in TINY device
* leanup
2025-08-03 02:03:20 +03:00
chenyu
66be747908
few more dtype cast convinience methods ( #11480 )
2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5
move some test_dtype tests to unit ( #11479 )
2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed ( #11474 )
...
* hcq: cpu can be graphed
* ops
* new jit decisions
* fix test
* fix remote
* cleaner
* fix
2025-08-02 21:01:19 +03:00
chenyu
f7965f85aa
Revert "feat: faster index building ( #11462 )" ( #11478 )
...
This reverts commit 3a4deb08d2 .
2025-08-02 12:50:48 -04:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug ( #11477 )
...
* failing test case
* fix
* better test
* space
2025-08-02 09:47:41 -07:00
b1tg
6ecaf8e7b2
refactor: use less index and simplify reduce axes check [pr] ( #11476 )
...
* use output_shape/full_shape
* simple final_reduces check
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-02 09:44:51 -07:00
wozeparrot
3a4deb08d2
feat: faster index building ( #11462 )
...
* feat: faster index building
* feat: correct training samples
2025-08-02 11:50:18 -04:00