Commit Graph

9652 Commits

Author SHA1 Message Date
chenyu
7ad7329257 data parallel train llama (#11466) 2025-08-01 12:13:51 -04:00
nimlgen
9f2182f92f cpu: start threading (#11324)
* cpu: threading

* syncs

* llvm

* fix

* opt

* fx

* fix

* missed sync

* one line less

* cleaner

* fix
2025-08-01 15:35:07 +03:00
qazal
c7ae1bd474 viz: more consistent border styling (#11464) 2025-08-01 09:31:06 +03:00
George Hotz
8ff03806e8 add llama layers (#11460)
* add llama layers

* add contig bw for speed
2025-07-31 16:28:04 -07:00
qazal
719827b95d viz: add flops / mem bw to device programs (#11459)
* viz: add flops / mem bw to device programs

* better spacing style
2025-08-01 02:12:30 +03:00
chenyu
3f742a5a7c comma space lab models benchmark (#11461) 2025-07-31 19:06:18 -04:00
George Hotz
474ee9daa5 hotfix: add contiguous_backward to llama 2025-07-31 15:07:12 -07:00
qazal
fa66d9772d viz: show const node when it's root (#11456) 2025-08-01 01:01:58 +03:00
qazal
056dabda5a viz: refactor to color scheme (#11455) 2025-08-01 00:17:50 +03:00
nimlgen
e5b6149dfb more typing in drivers (#11454)
* more typing in drivers

* rm
2025-07-31 23:26:33 +03:00
qazal
bad3cf5731 viz: add LLVM machine code analysis (#11421)
* start

* works everywhere

* add viz api

* utilization table

* reg pressure ui

* use llvm-mca

* llvm-mca ui

* work

* cleanup

* cycle through, defaults are enough

* x86 pending

* x86 nops

* get mcpu/mtriple from autogen

* cleanup server diff

* move parser to python

* normalize to pct of max

* segments legend

* imports

* also monospace

* max comes from the total per instruction

* base on the value
2025-08-01 01:59:26 +08:00
chenyu
e847677e8a use AxisType in search instead of colors (#11452) 2025-07-31 13:07:33 -04:00
nimlgen
75c2c42def suppress exceptions only during finalization (#11451)
* suppress exceptions only during finalization

* fix

* fix typing

* fix more warns

* fix

* better?

* Revert "better?"

This reverts commit a068aa5793.

* mm?

* no as e
2025-07-31 13:57:12 +03:00
wozeparrot
24dd0d52ed feat: test remove to cpu (#11444) 2025-07-30 20:18:56 -07:00
kevvz
c3cfcb50cb Add linalg_det and test for torch backend (#11405)
* add linalg_det and test

* space

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:04:44 -04:00
Eitan Turok
cba3655de5 Add Test for Setitem (#10559)
* init

* update

* better

* failing test

* works

* Delete test file

* clean

* lint

* simplify variable name

* rm contigious, rm int dtype, and add assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:03:41 -04:00
wozeparrot
6252f7770e feat: fake data (#11447) 2025-07-30 17:18:20 -07:00
chenyu
e300451f3a update llama3 (#11446)
`LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7
2025-07-30 19:34:21 -04:00
wozeparrot
5fb975351a feat: flag for training on val (#11441) 2025-07-30 14:29:45 -07:00
chenyu
4ca430e5bf fix search dedup (#11439)
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
wozeparrot
d3da20eca6 feat: bump mlperf workflow timeout to 6 hours (#11440) 2025-07-30 14:12:12 -07:00
wozeparrot
825b6a2505 feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
qazal
af357b5dc8 disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437) 2025-07-30 23:22:08 +03:00
George Hotz
7c2d2eff86 check tensor core dims (#11436)
* check elements_per_thread in tensorcore [pr]

* check tc dims
2025-07-30 13:06:59 -07:00
nimlgen
5fc5bb5237 ci: clear processes (#11434)
* unified hcq_smi for managment

* fix

* fix

* no reset for amd
2025-07-30 22:15:18 +03:00
George Hotz
4f26a9ad32 check elements_per_thread in tensorcore [pr] (#11435) 2025-07-30 11:55:48 -07:00
nimlgen
4b4ba5454c ci: move driver start higher (#11431) 2025-07-30 10:48:38 +03:00
George Hotz
1bef2d80c1 unrolls are all in the same scope (#11429)
* unrolls are all in the same scope

* fix that import
2025-07-29 16:55:37 -07:00
chenyu
204da24cfc increase driverbenchmark timeout-minutes to 15 (#11428) 2025-07-29 19:45:05 -04:00
chenyu
d5fc6af4a2 remove unused ShapeTracker.consecutive [pr] (#11426) 2025-07-29 18:36:19 -04:00
George Hotz
49a2583584 real new lowerer (#11419)
* real new lowerer

* fix group for reduce

* skip missing ranges

* fix wmma and unroll/contract

* real fix for wmma

* disable that test

* fix if gate

* simpler

* flash attention fusion works

* no end barriers

* still broken

* flash attention finally works
2025-07-29 15:35:51 -07:00
chenyu
0e5d8d5c3c remove tests that used .to_uop() (#11425)
* remove tests that used .to_uop()

* import
2025-07-29 15:52:16 -04:00
nimlgen
c88e401d0e ci: fix typos in h machine benchmarks (#11423) 2025-07-29 22:11:47 +03:00
chenyu
90a5a312eb simplify ShapeTracker in UOp.const [pr] (#11424) 2025-07-29 15:04:06 -04:00
chenyu
398594029b spec checks arg of VIEW are ShapeTracker (#11422) 2025-07-29 14:05:12 -04:00
George Hotz
1f1f99c287 hotfix: add DEBUG=3 to driver CI 2025-07-29 11:03:47 -07:00
George Hotz
50fae54175 global local dims in gpudims [pr] (#11420) 2025-07-29 10:39:03 -07:00
chenyu
9bc413f104 remove ShapeTracker.to_uop [pr] (#11418) 2025-07-29 13:29:37 -04:00
George Hotz
ba2c4df125 dont render cast ptrs standalone (#11417)
* dont render cast ptrs standalone

* barrier cleanups
2025-07-29 09:24:26 -07:00
nimlgen
d38d285489 ci: add h machines (#11416)
* ci: add h machines

* more

* fix names

* names not collide

* 20

* 10
2025-07-29 19:21:51 +03:00
Tom Clesius
2568bc0d99 ci: add caching for apt packages (#11162)
* add caching for apt packages

* remove 'inputs' from apt cache key, use outputs instead of env

* remove unnecessary mkdir for partial

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-29 09:04:56 -07:00
George Hotz
03909f2772 permute locals for HL uop matmul (#11412)
* permute locals for HL uop matmul

* parens fix that

* permutes

* 20 TFLOPS
2025-07-29 08:19:59 -07:00
nimlgen
e0c9747684 amd: fix typo in has_scratch_base_registers for mi350 (#11413) 2025-07-29 10:30:06 +03:00
George Hotz
735ad5f10d kernel4 and 5 in uops (#11411)
* move simplify views to merge views

* add amd kernel 4

* Revert "move simplify views to merge views"

This reverts commit 1e07dff384.

* k4 in python

* kernel4 written in uops

* k5 support

* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668 HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
nimlgen
c7b4ab86e4 fix llvm tc on mi350 (#11404) 2025-07-28 21:37:43 +03:00
chenyu
9f7c72ff8f remove UOp.valid method [pr] (#11402)
only used in add_buffer_ops
2025-07-28 11:29:08 -04:00
chenyu
b22a34331b remove const valid in fixup_ast [pr] (#11401) 2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0 viz: tabulate runtime stats (#11400) 2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627 remove a branch in UOp.r [pr] (#11398) 2025-07-27 18:00:01 -04:00