Commit Graph

8380 Commits

Author SHA1 Message Date
Francis Lata
f8fe15e64e move BoxCoder to mlperf helpers (#9773) 2025-04-07 20:27:06 -04:00
Eitan Turok
bb7922b95f Vectorize Transcendental Regression Tests (#9753)
* init test

* cleanup
2025-04-08 01:27:39 +08:00
chenyu
7c4a739fe4 full script for bert mi300x (#9772) 2025-04-07 11:41:31 -04:00
Sieds Lykles
07d1aefaf4 fast idiv (#9755)
* fast idiv with tests and fuzzer

* Add todo comment

* Add env variable to toggle fast_idiv

* Move env check

* Add fuzz fast_idiv to ci

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-07 08:32:24 -04:00
nimlgen
fa888ee077 minor test cleanups (#9770)
* fix test_graph on max

* pcie5
2025-04-07 15:29:12 +03:00
chenyu
3069ebfad1 use BERT_LAYERS=2 in bert init (#9769)
save 5 minut scheduling in setup so we can fit more search
2025-04-07 07:46:37 -04:00
qazal
891322fd51 split into grouper.py (#9768)
* split into grouper.py

* update tests

* reorder
2025-04-07 18:40:59 +08:00
qazal
219b8c9e8b return becomes_map in scheduler [pr] (#9766)
* add a graph_rewrite pass for creating asts [pr]

* disk

* benchmark

* return becomes_map in scheduler

* reorder schedule.py into grouper and linearizer [pr]

* comments
2025-04-07 17:36:23 +08:00
qazal
6306dea6e2 add a graph_rewrite pass for creating asts [pr] (#9765)
* add a graph_rewrite pass for creating asts [pr]

* disk

* benchmark
2025-04-07 16:32:11 +08:00
qazal
07eea567d4 reorder tensor_map and grouper parts [pr] (#9764) 2025-04-07 15:36:13 +08:00
qazal
8ddb1357c0 fix UPat.location after pickle (#9763)
* fix UPat.location after pickle [pr]

* named upat test
2025-04-07 15:16:42 +08:00
qazal
4cd27aa0e6 hotfix: viz recenter and unlimited zoom (#9760)
* hotfix: viz recenter and unlimited zoom

* add shapes to the ast graph

* not for COPY
2025-04-07 14:38:03 +08:00
chenyu
d0dace4306 update doc for permute to 3d tensor (#9758)
easier to see if it's permuted to or permuted from
2025-04-07 00:38:05 -04:00
chenyu
b190d85ad7 benchmark script bert softmax (#9759) 2025-04-07 00:31:18 -04:00
Ignacio Sica
58785181a8 AMD bf16xf32 TC (#9717)
* dont test bf16 for emulated amd tc

* skip bf16 tc test in ci

* skip bf16 for AMD in test_tensor_cores_codegen

* add simple bf16 gemm test to benchmark
2025-04-07 11:41:04 +08:00
chenyu
43e4565148 weighted linear in external_benchmark_bert_matmuls (#9757)
include the linear to get qkv, and permute so that stride matches with the real run
2025-04-06 23:35:42 -04:00
George Hotz
28e06d2d44 minor cleanups from patternmatcher [pr] (#9756) 2025-04-07 11:28:14 +08:00
qazal
1ce4912770 viz profiler ui (#9664)
* localhost:8000/prof

* selector + table

* add pid

* on null selection reset filters

* table sort

* charset=utf-8

* clear the rest

* sort by duration

* render table

* format

* nothing in copy thread

* keep starts

* sort back

* less javascript

* diff

* works on firefox
2025-04-07 00:30:17 +08:00
chenyu
8a585dc5c1 benchmark script for matmuls in bert (#9752)
2 main matmuls in the bert layers. getting these to be fast makes bert fast
2025-04-06 19:34:25 +08:00
qazal
139999c6d7 map viz files + query params cleanup [pr] (#9754)
* map viz files + query params cleanup [pr]

* .width + fix
2025-04-06 16:20:00 +08:00
Francis Lata
71b8890dd6 use validation dataloader inside retinanet eval (#9747) 2025-04-05 16:46:55 -04:00
nimlgen
5f7c79676f jit: prune independent copies (#9749)
* jit: prune independent copies

* linter

* check kernel cnt
2025-04-05 20:50:28 +03:00
nimlgen
c2573b247c jit: rename optimize_weights -> replan_buffers_memory_layout (#9751) 2025-04-05 20:35:15 +03:00
uuuvn
493fb315b1 fix RDNA2 support (#9700)
linux amdgpu_discovery.c:amdgpu_discovery_set_ip_blocks is a ton of
switch cases with sometimes weird choices like replacing nbio 3.X with
2.3 while nbio 2.5 is somehow nbio 7.0. `import_module` currently just
tries to replace revision and minor with zeroes if there is no exact
match, but that's not enough to cover all that weirdness
2025-04-05 18:42:47 +03:00
chenyu
5a04f4d4ba revert bert hparams for green and red (#9744)
did more runs and it's not really better and not worth the change. only useful for BS=1024
2025-04-05 07:38:01 -04:00
chenyu
407ca54382 symbolic fold double where (#9436)
* symbolic fold double where

a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer

* test case
2025-04-05 05:12:17 -04:00
Sieds Lykles
9c2fc695b5 cond.logical_not().where(a,b) -> cond.where(b,a) (#9741)
* Add rule for negation in where, simplifies arange patterns

* 0 becomes 0.0 again

* Only if cond is bool

* ne is never None

* Add a test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-04 19:13:32 -04:00
Sieds Lykles
e9a3ac02a5 Remove ne from arange pattern (#9743) 2025-04-04 18:31:13 -04:00
nimlgen
86c55414d7 ops_amd: simplify gfx version (#9742)
* ops_amd: simplify gfx version

* fix

* all vesrsion compact style

* mypy

* revert this

* rename back to target
2025-04-04 22:18:11 +03:00
qazal
16d6aa15f1 record unittest name in process replay (#9731)
* record unittest name in process replay

* getitem

* filename + (optional) name

* del

* get_test_method

* not solved

* try with linecache

* test: print_loc

* format

* without linecache

* checkout master
2025-04-05 01:39:48 +08:00
qazal
354db961c6 viz refactor to prep for profiler [pr] (#9739) 2025-04-04 17:13:14 +08:00
chenyu
fe998798fb linearizer failure test for OUT OF BOUNDS ACCESS (#9738) 2025-04-04 03:48:43 -04:00
George Hotz
8b5a523743 fix minimum length in pattern matcher (#9736) 2025-04-04 14:57:01 +08:00
chenyu
640ff681c3 rename bert script to 8xMI300X (#9734)
and adds a script for single MI300X
2025-04-03 23:36:24 -04:00
George Hotz
b719aa1fb0 only check once for divisible fold lengths (#9732) 2025-04-04 11:27:34 +08:00
George Hotz
926b0bcc57 cache folded upcast [pr] (#9733) 2025-04-04 11:23:19 +08:00
George Hotz
8206c7281e move const multiply after REDUCE (#9730) 2025-04-04 11:07:46 +08:00
chenyu
6b3480ec70 update mi300x bert haparams (#9716)
* update mi300x bert haparams

borrowed from previous submission that also did BS=1024

* update
2025-04-03 22:30:00 -04:00
George Hotz
cac8bcf8b5 use Ops.REDUCE (#9721)
* decrease bert python time [pr]

* order copies

* Revert "order copies"

This reverts commit 3f62c8693b.

* rewrite count

* Ops.REDUCE

* acc first in the add chain

* Fix tensor core acc

* arange patterns look good

* fix multireduce gate

* reduce rewrite rule

* bump that to 15 minutes

* multiwmma isn't fusing

* gep through wmma is gep pushing

* bump that timeout too, it's all env setup

* add failing test
2025-04-04 10:14:34 +08:00
nimlgen
949459fdd6 jit: fix deallocate on unallocated buffers in free_intermediates (#9699) 2025-04-03 18:32:51 +03:00
qazal
52a8ecb15e record unittest location in process replay [pr] (#9727) 2025-04-03 20:50:09 +08:00
geohotstan
ac713e04db ONNX add output shape validation (#9720)
* add output shape validation and remove support for sequence_type

* nit better err msg

* add sequence_type back

* improve err msg

* Revert "improve err msg"

This reverts commit dc9eaea4bb.

* Revert "add sequence_type back"

This reverts commit 288170b2d9.

* do explicit shape equality

* small nit
2025-04-03 05:44:53 -04:00
chenyu
7dadbf3697 insert float() in bert acc (#9726)
sum of bool by default uses default_float for acc. So without float, it might overflow with a large BS and default_float=HALF.

fixed clsf_accuracy to not be inf in mi300x bert
2025-04-03 05:44:09 -04:00
chenyu
79145e3d40 cleanup truncate_bf16 [pr] (#9725)
use torch bfloat16 for groundtruth in test. also a TODO for discrepancy
2025-04-03 05:43:49 -04:00
Ignacio Sica
bc2d86195e increase test tolerance (#9719) 2025-04-03 15:24:09 +08:00
chenyu
1d25844d44 Revert "disable CI red llama 3 4 gpu beam (#9690)" (#9709)
This reverts commit 6a5eacba8b.
2025-04-03 02:34:39 -04:00
George Hotz
49dafe6d43 add gc tests [pr] (#9718)
* add gc tests [pr]

* del

* more gc tests

* add NullGraph
2025-04-03 14:08:32 +08:00
Ignacio Sica
bc91fffc5d fix gated store with index in python backend (#9703)
* add default gate in index

* assert store

* add TestRendererFailures

- move test_gated_store_with_alu to new TestRenderFailures class for
tests that fail on multiple renderers
- add test_renderer_failures.py run on python CI

* add test for gated index in 2d

* test TestRenderFailures
2025-04-03 12:48:28 +08:00
qazal
f2bd65ccfc delete Ops.EMPTY and Tensor._metaop (#9715)
* delete Ops.EMPTY and Tensor._metaop [pr]

* test_creation

* arg=

* abstractions2
2025-04-03 12:29:02 +08:00
George Hotz
5c7b549eab use functools.cache instead of lru_cache(None) [pr] (#9714)
* use functools.cache instead of lru_cache(None) [pr]

* more cache
2025-04-03 11:47:13 +08:00