chenyu
83385e7abc
update gradient src in ramp.py ( #11499 )
...
that's simplified now
2025-08-04 18:58:03 -04:00
qazal
846a2826ab
viz: remove TracingKey.fmt ( #11482 )
...
* viz: remove TracingKey.fmt
* remove from test too
2025-08-05 00:00:03 +03:00
chenyu
01d44e8f16
tiny reduce_gradient cleanup [pr] ( #11498 )
2025-08-04 16:12:53 -04:00
chenyu
8a11af01ed
remove broken paperswithcode links in doc ( #11497 )
2025-08-04 13:12:33 -04:00
leopf
4f0ee4e982
BPE tokenizer ( #11415 )
...
* BPE works
* refactor tok
* oops
* basic tests
* fix eval
* smaller diff
* fix error
* proper vocab decoding
* use regex for splitting
* escape ucatrange
* full compat
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236
fix double exception + add name,loc in error msg ( #11487 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-04 13:41:23 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well ( #11494 )
2025-08-04 11:47:07 +03:00
chenyu
e0106b6b25
1/(x*c) -> (1/c)*(1/x) ( #11491 )
...
example: 2*(2*a).reciprocal() -> a.reciprocal()
# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
qazal
5870352fe1
viz: factorize llvm-mca call ( #11490 )
2025-08-04 00:31:23 +03:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit ( #11489 )
...
TestSample still fails?
2025-08-03 13:02:44 -07:00
nimlgen
8f374ee1f7
nv: print devfmr in gsp logs ( #11484 )
2025-08-03 15:12:53 +03:00
chenyu
823f1a01db
move cast around expand backward to tensor.py ( #11483 )
2025-08-02 23:03:54 -04:00
chenyu
0ce0f51010
generic double cast folding ( #11481 )
...
b.cast(a).cast(b) -> b if a preserves all values in b
2025-08-02 19:26:37 -04:00
qazal
72e0d1d0dc
viz: profile the compiler in TINY device ( #11457 )
...
* viz: profile the compiler in TINY device
* leanup
2025-08-03 02:03:20 +03:00
chenyu
66be747908
few more dtype cast convinience methods ( #11480 )
2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5
move some test_dtype tests to unit ( #11479 )
2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed ( #11474 )
...
* hcq: cpu can be graphed
* ops
* new jit decisions
* fix test
* fix remote
* cleaner
* fix
2025-08-02 21:01:19 +03:00
chenyu
f7965f85aa
Revert "feat: faster index building ( #11462 )" ( #11478 )
...
This reverts commit 3a4deb08d2 .
2025-08-02 12:50:48 -04:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug ( #11477 )
...
* failing test case
* fix
* better test
* space
2025-08-02 09:47:41 -07:00
b1tg
6ecaf8e7b2
refactor: use less index and simplify reduce axes check [pr] ( #11476 )
...
* use output_shape/full_shape
* simple final_reduces check
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-02 09:44:51 -07:00
wozeparrot
3a4deb08d2
feat: faster index building ( #11462 )
...
* feat: faster index building
* feat: correct training samples
2025-08-02 11:50:18 -04:00
nimlgen
8cc2d64edb
amd: reuse create_queues for usb iface ( #11473 )
2025-08-02 14:40:46 +03:00
chenyu
9e8e6b45ab
grad acc train llama ( #11467 )
...
* grad acc train llama
* log step time
2025-08-01 15:54:50 -04:00
chenyu
7ad7329257
data parallel train llama ( #11466 )
2025-08-01 12:13:51 -04:00
nimlgen
9f2182f92f
cpu: start threading ( #11324 )
...
* cpu: threading
* syncs
* llvm
* fix
* opt
* fx
* fix
* missed sync
* one line less
* cleaner
* fix
2025-08-01 15:35:07 +03:00
qazal
c7ae1bd474
viz: more consistent border styling ( #11464 )
2025-08-01 09:31:06 +03:00
George Hotz
8ff03806e8
add llama layers ( #11460 )
...
* add llama layers
* add contig bw for speed
2025-07-31 16:28:04 -07:00
qazal
719827b95d
viz: add flops / mem bw to device programs ( #11459 )
...
* viz: add flops / mem bw to device programs
* better spacing style
2025-08-01 02:12:30 +03:00
chenyu
3f742a5a7c
comma space lab models benchmark ( #11461 )
2025-07-31 19:06:18 -04:00
George Hotz
474ee9daa5
hotfix: add contiguous_backward to llama
2025-07-31 15:07:12 -07:00
qazal
fa66d9772d
viz: show const node when it's root ( #11456 )
2025-08-01 01:01:58 +03:00
qazal
056dabda5a
viz: refactor to color scheme ( #11455 )
2025-08-01 00:17:50 +03:00
nimlgen
e5b6149dfb
more typing in drivers ( #11454 )
...
* more typing in drivers
* rm
2025-07-31 23:26:33 +03:00
qazal
bad3cf5731
viz: add LLVM machine code analysis ( #11421 )
...
* start
* works everywhere
* add viz api
* utilization table
* reg pressure ui
* use llvm-mca
* llvm-mca ui
* work
* cleanup
* cycle through, defaults are enough
* x86 pending
* x86 nops
* get mcpu/mtriple from autogen
* cleanup server diff
* move parser to python
* normalize to pct of max
* segments legend
* imports
* also monospace
* max comes from the total per instruction
* base on the value
2025-08-01 01:59:26 +08:00
chenyu
e847677e8a
use AxisType in search instead of colors ( #11452 )
2025-07-31 13:07:33 -04:00
nimlgen
75c2c42def
suppress exceptions only during finalization ( #11451 )
...
* suppress exceptions only during finalization
* fix
* fix typing
* fix more warns
* fix
* better?
* Revert "better?"
This reverts commit a068aa5793 .
* mm?
* no as e
2025-07-31 13:57:12 +03:00
wozeparrot
24dd0d52ed
feat: test remove to cpu ( #11444 )
2025-07-30 20:18:56 -07:00
kevvz
c3cfcb50cb
Add linalg_det and test for torch backend ( #11405 )
...
* add linalg_det and test
* space
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-30 22:04:44 -04:00
Eitan Turok
cba3655de5
Add Test for Setitem ( #10559 )
...
* init
* update
* better
* failing test
* works
* Delete test file
* clean
* lint
* simplify variable name
* rm contigious, rm int dtype, and add assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-07-30 22:03:41 -04:00
wozeparrot
6252f7770e
feat: fake data ( #11447 )
2025-07-30 17:18:20 -07:00
chenyu
e300451f3a
update llama3 ( #11446 )
...
`LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7
2025-07-30 19:34:21 -04:00
wozeparrot
5fb975351a
feat: flag for training on val ( #11441 )
2025-07-30 14:29:45 -07:00
chenyu
4ca430e5bf
fix search dedup ( #11439 )
...
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
wozeparrot
d3da20eca6
feat: bump mlperf workflow timeout to 6 hours ( #11440 )
2025-07-30 14:12:12 -07:00
wozeparrot
825b6a2505
feat: llama3 dataloader ( #11340 )
2025-07-30 13:27:55 -07:00
qazal
af357b5dc8
disable TRACK_MATCH_STATS in BEAM workers [pr] ( #11437 )
2025-07-30 23:22:08 +03:00
George Hotz
7c2d2eff86
check tensor core dims ( #11436 )
...
* check elements_per_thread in tensorcore [pr]
* check tc dims
2025-07-30 13:06:59 -07:00
nimlgen
5fc5bb5237
ci: clear processes ( #11434 )
...
* unified hcq_smi for managment
* fix
* fix
* no reset for amd
2025-07-30 22:15:18 +03:00
George Hotz
4f26a9ad32
check elements_per_thread in tensorcore [pr] ( #11435 )
2025-07-30 11:55:48 -07:00
nimlgen
4b4ba5454c
ci: move driver start higher ( #11431 )
2025-07-30 10:48:38 +03:00