George Hotz
cf66df0ea6
put load early to make pointers match ( #11524 )
2025-08-05 20:04:32 -07:00
George Hotz
92175626e3
prereqs: move views to codegen ( #11522 )
2025-08-05 19:27:58 -07:00
chenyu
c9225d22ce
only disable flaky test_jit_multidev_xfer ( #11523 )
2025-08-05 22:17:25 -04:00
George Hotz
f58fd3143d
cleanup fix_kernel ( #11520 )
...
* cleanup fix_kernel
* early load buffer
* early meta ops
* move those to fix_kernel_ops
* fix tests
* remote metal was flaky
* Revert "fix tests"
This reverts commit a27019383d .
* that hack broke things
* fine for ptx
2025-08-05 18:38:43 -07:00
George Hotz
067daee5be
pin torch to 2.7.1 ( #11519 )
2025-08-05 15:58:57 -07:00
George Hotz
b39f43c46a
optimize in rewrite, try 2 ( #11518 )
...
* changes
* fix test uops
* optimize in rewrite, try 2
2025-08-05 15:52:53 -07:00
George Hotz
07b0df0d86
hotfix: test tensor dims start at 1
2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d
Revert "optimize in rewrite ( #11516 )" ( #11517 )
...
This reverts commit 3b777a9e05 .
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05
optimize in rewrite ( #11516 )
...
* changes
* fix test uops
* dim shouldn't be 0
* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
ec676eddfa
nv: move base address higher ( #11514 )
2025-08-05 22:42:53 +03:00
qazal
7703f8b805
viz: skip flops info if estimates is symbolic ( #11513 )
2025-08-05 22:12:52 +03:00
nimlgen
fc4e713d1c
jit graph split tests ( #11507 )
...
* jit graph split tests
* fix
* one more test
* more tests
* fix
* xm
* rmeote
2025-08-05 21:32:37 +03:00
George Hotz
c57fde51f9
move swizzler to opt ( #11509 )
2025-08-05 11:31:30 -07:00
chenyu
ace8e9a706
fix test_conv2d_winograd ( #11511 )
2025-08-05 12:15:46 -04:00
chenyu
223aaa0492
clean up more conv tests ( #11510 )
2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23
extract conv layer test logic ( #11488 )
...
* refactor: extract conv layer test logic
* tuple is unnecessary
* integrate _test_conv logic into all conv tests
* fix linter, forgot dilation
* undo winograd extraction
adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
b1tg
8b8bd6c534
make einsum generate same kernels ( #11508 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-05 11:12:52 -04:00
uuuvn
011ef8fa9d
Fix incorrect jit current batch devs reset ( #11505 )
...
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly
which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d
fix fuse gate_contiguous unique ( #11504 )
2025-08-04 23:43:31 -04:00
George Hotz
7f6acfb0d5
give define global and friends a shape ( #11502 )
...
* give define global and friends a shape
* ignore negative size
* ptx fix
2025-08-04 19:09:39 -07:00
chenyu
83385e7abc
update gradient src in ramp.py ( #11499 )
...
that's simplified now
2025-08-04 18:58:03 -04:00
qazal
846a2826ab
viz: remove TracingKey.fmt ( #11482 )
...
* viz: remove TracingKey.fmt
* remove from test too
2025-08-05 00:00:03 +03:00
chenyu
01d44e8f16
tiny reduce_gradient cleanup [pr] ( #11498 )
2025-08-04 16:12:53 -04:00
chenyu
8a11af01ed
remove broken paperswithcode links in doc ( #11497 )
2025-08-04 13:12:33 -04:00
leopf
4f0ee4e982
BPE tokenizer ( #11415 )
...
* BPE works
* refactor tok
* oops
* basic tests
* fix eval
* smaller diff
* fix error
* proper vocab decoding
* use regex for splitting
* escape ucatrange
* full compat
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236
fix double exception + add name,loc in error msg ( #11487 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-04 13:41:23 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well ( #11494 )
2025-08-04 11:47:07 +03:00
chenyu
e0106b6b25
1/(x*c) -> (1/c)*(1/x) ( #11491 )
...
example: 2*(2*a).reciprocal() -> a.reciprocal()
# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
qazal
5870352fe1
viz: factorize llvm-mca call ( #11490 )
2025-08-04 00:31:23 +03:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit ( #11489 )
...
TestSample still fails?
2025-08-03 13:02:44 -07:00
nimlgen
8f374ee1f7
nv: print devfmr in gsp logs ( #11484 )
2025-08-03 15:12:53 +03:00
chenyu
823f1a01db
move cast around expand backward to tensor.py ( #11483 )
2025-08-02 23:03:54 -04:00
chenyu
0ce0f51010
generic double cast folding ( #11481 )
...
b.cast(a).cast(b) -> b if a preserves all values in b
2025-08-02 19:26:37 -04:00
qazal
72e0d1d0dc
viz: profile the compiler in TINY device ( #11457 )
...
* viz: profile the compiler in TINY device
* leanup
2025-08-03 02:03:20 +03:00
chenyu
66be747908
few more dtype cast convinience methods ( #11480 )
2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5
move some test_dtype tests to unit ( #11479 )
2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed ( #11474 )
...
* hcq: cpu can be graphed
* ops
* new jit decisions
* fix test
* fix remote
* cleaner
* fix
2025-08-02 21:01:19 +03:00
chenyu
f7965f85aa
Revert "feat: faster index building ( #11462 )" ( #11478 )
...
This reverts commit 3a4deb08d2 .
2025-08-02 12:50:48 -04:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug ( #11477 )
...
* failing test case
* fix
* better test
* space
2025-08-02 09:47:41 -07:00
b1tg
6ecaf8e7b2
refactor: use less index and simplify reduce axes check [pr] ( #11476 )
...
* use output_shape/full_shape
* simple final_reduces check
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-02 09:44:51 -07:00
wozeparrot
3a4deb08d2
feat: faster index building ( #11462 )
...
* feat: faster index building
* feat: correct training samples
2025-08-02 11:50:18 -04:00
nimlgen
8cc2d64edb
amd: reuse create_queues for usb iface ( #11473 )
2025-08-02 14:40:46 +03:00
chenyu
9e8e6b45ab
grad acc train llama ( #11467 )
...
* grad acc train llama
* log step time
2025-08-01 15:54:50 -04:00
chenyu
7ad7329257
data parallel train llama ( #11466 )
2025-08-01 12:13:51 -04:00
nimlgen
9f2182f92f
cpu: start threading ( #11324 )
...
* cpu: threading
* syncs
* llvm
* fix
* opt
* fx
* fix
* missed sync
* one line less
* cleaner
* fix
2025-08-01 15:35:07 +03:00
qazal
c7ae1bd474
viz: more consistent border styling ( #11464 )
2025-08-01 09:31:06 +03:00
George Hotz
8ff03806e8
add llama layers ( #11460 )
...
* add llama layers
* add contig bw for speed
2025-07-31 16:28:04 -07:00
qazal
719827b95d
viz: add flops / mem bw to device programs ( #11459 )
...
* viz: add flops / mem bw to device programs
* better spacing style
2025-08-01 02:12:30 +03:00
chenyu
3f742a5a7c
comma space lab models benchmark ( #11461 )
2025-07-31 19:06:18 -04:00
George Hotz
474ee9daa5
hotfix: add contiguous_backward to llama
2025-07-31 15:07:12 -07:00