Commit Graph

8167 Commits

Author SHA1 Message Date
qazal
cde4fd3be3 do not view_left assign + elementwise sources always have a shape [pr] (#9491) 2025-03-18 17:42:51 +08:00
George Hotz
117b7a16ef VALIDATE_WITH_CPU [pr] (#9488)
* VALIDATE_WITH_CPU [pr]

* fix test
2025-03-18 15:15:04 +08:00
qazal
935cd01f56 simple failing test for graph_rewrite children [pr] (#9489)
* simple failing test for graph_rewrite children [pr]

* lint

* update too
2025-03-18 13:07:21 +08:00
George Hotz
d20494e6d7 move buffer logic to Buffer [pr] (#9487)
* move buffer logic to Buffer [pr]

* pass shape into as_typed_buffer

* pass shape into as_typed_buffer

* work

* cleaner

* fix tests
2025-03-18 11:21:21 +08:00
qazal
3be228182f unbind Tensor variables last [pr] (#9486)
* reorder do_realize [pr]

* move merge_views

* unbind all variables at the end [pr]
2025-03-18 09:52:01 +08:00
qazal
b44f9c409a reorder do_realize [pr] (#9485)
* reorder do_realize [pr]

* move merge_views
2025-03-18 09:30:10 +08:00
nimlgen
a82c9332d3 am: rename soc21 to soc (#9482) 2025-03-18 08:54:26 +08:00
qazal
b100fc0b20 split the rule that uses context in scheduler simplifier [pr] (#9484)
* split the rule that uses context in scheduler simplifier [pr]

* add
2025-03-18 08:12:26 +08:00
Anish Umale
5e58f4b65b Tiny backend test_ops fix part 3 (#9483)
* extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302

* pass dtype and device for ones_like
2025-03-17 18:01:51 -04:00
TJ
9fcef4d009 add masked_select to tensor.py (#9468)
* add masked_select to tensor.py

* fix tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-03-17 16:05:36 -04:00
chenyu
4f8eac59ea failed test case for threefry (#9469)
* failed test case for threefry

not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far.

use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes

* workaround

* why is this different?

* revert those

* and that
2025-03-17 14:52:10 -04:00
b1tg
6dd8e5ba7c refactor llvm compiler (#9403)
* refactor LLVMCompiler

* new interface

* automatic configuration

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-03-18 00:13:49 +08:00
geohotstan
53d6f1e1bb Add bitonic cat sort (#9422)
* poc

* repeated values fail, sigh

* is this being timed out?

* fix up down names

* bitonic v2, does this run?

* bitonic v3, faster

* bitonic v3.1, faster

* bitonic v3.1.1, same speed unlucky

* support dim and indices

* bitonic v3.2, simpler code, TODO repeated indices

* bruv gimme green for once cmon

* cat (stack) implementation, slow but maybe one day when cat is fast meow

* revert to v3.2

* bitonic v4, who let the cats out edition

* clean up variable names

* figured out repeated indices :D

* ruff check --fix

* use sort for topk

* add Tensor.sort everywhere

* fix docs and add some types

* slightly better variable names

* am I doing torch inplace correctly?

* delegate sort to values_stable

* add a contig, faster first sort

* maybe don't test_inplace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-17 12:01:23 -04:00
chenyu
f53be010d7 lower bert learning rate (#9481)
slightly better. first sub 3hr run https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/0or96ink/overview
2025-03-17 10:49:56 -04:00
qazal
e03c0aacf2 more explicit DONT_PUSH_VIEWS [pr] (#9479)
* more explicit DONT_PUSH_VIEWS [pr]

* update tests to not handcode ast

* lint

* test_recursive_swizzle and test_simple_store_reshape
2025-03-17 20:43:21 +08:00
qazal
3b00a778ba fix view_left for unsafe pad ops [pr] (#9478) 2025-03-17 19:02:02 +08:00
qazal
813f713edc merge_views for buffer ops + create valids last (#9472)
* merge_views for buffer ops + create valids last

* view.arg

* pass
2025-03-17 17:15:44 +08:00
qazal
bd1f71c1e2 simple failing test for extra ops in VALID [pr] (#9474)
* simple failing test for extra valids [pr]

* this has DEBUG=4
2025-03-17 17:02:40 +08:00
qazal
e26caf4c3a hotfix: skip test_mean_half_precision_underflow on amd ci (#9476)
The global size is very large (781250 gidx) and the emulated version takes more than 1
minute to execute the kernel.
2025-03-17 16:47:48 +08:00
George Hotz
824c5f41ac dsp work try 3 (#9475)
* dsp work try 3

* padding
2025-03-17 16:42:12 +08:00
George Hotz
242daa4f9a ptrcat (#9473) 2025-03-17 16:06:37 +08:00
George Hotz
52ae9af4dd Fast DSP for MobileNetV2 (try 2) (#9467)
* Fast DSP for MobileNetV2 (try 2)

* enable fast path on uchar

* fix tests
2025-03-17 15:10:36 +08:00
George Hotz
15ee742afa add get_children_map to uop (#9470)
* add get_children_map to uop

* update_children

* fix new children
2025-03-17 14:36:13 +08:00
chenyu
d2cfbd8a4d bert lower learning rate and total steps (#9466)
closer to the other submission with BS=240. converged with 10% less epochs
2025-03-16 17:21:20 -04:00
George Hotz
09e7708b49 minimum change for rdna4 [pr] (#9455) 2025-03-16 13:39:24 +08:00
qazal
be2161652b reorder into swizzler + ast_fixup [pr] (#9456) 2025-03-15 09:00:14 +01:00
George Hotz
cb7a7f69c7 quantization preprocessor from DSP, should be universal (#9437)
* quantization preprocessor from DSP, should be universal

* touchups

* fix tests
2025-03-15 07:49:37 +08:00
chenyu
ca5064a5b6 remove Kernel.float4_axis [pr] (#9448) 2025-03-14 17:54:32 -04:00
chenyu
0e591baf43 redo simple_matmul change (#9450)
numpy does not support bfloat16
2025-03-14 17:53:52 -04:00
chenyu
b0f63d3c04 Revert "simple_matmul.py uses np to generate random (#9438)" (#9449)
This reverts commit 14018050c1.
2025-03-14 17:14:22 -04:00
Ignacio Sica
14018050c1 simple_matmul.py uses np to generate random (#9438)
* np generates randoms

* hotfix: use generator for int dtype

* float32 as default dtype for float generator

* use np.float32 instead of stirng

* add dtype= to integers generator

* change import _to_np_dtype source
2025-03-14 17:36:50 -03:00
qazal
2a50e6440d filter sink by DONT_PUSH_VIEWS + remove extra base [pr] (#9446) 2025-03-14 21:27:46 +01:00
qazal
3af7a08a06 ast_fixup in one graph_rewrite pass [pr] (#9444) 2025-03-14 20:14:31 +01:00
nimlgen
bd4ae5ac53 am: hotfix: import modules (#9443)
* am: hotfix: import modules

* hmm
2025-03-15 03:10:18 +08:00
nimlgen
77a8430616 am: use smu based on discovery (#9441) 2025-03-15 02:10:45 +08:00
uuuvn
5ff90cb261 am: less magic values (#9440) 2025-03-15 02:10:35 +08:00
Ignacio Sica
459d0cd14f add arch to AMDRenderer and HIPRenderer (#9431) 2025-03-13 13:06:27 -03:00
nimlgen
357e364ab8 am: turn off unord dispatch (#9433) 2025-03-13 23:59:28 +08:00
chenyu
99b0287e4e add GROUP and GROUPTOP to test_arange (#9432)
it does not grow quadratically, but it's not 0 ops now
2025-03-13 11:28:38 -04:00
qazal
90ffa9bd45 swizzle without buffer ops try 2 [pr] (#9427)
* add DONT_PUSH_VIEWS to matchers

* swizzle without buffer ops try 2 [pr]

* swizzle reduceop

* simple failing test

* fix failing test

* s/on/for
2025-03-13 10:00:40 +01:00
qazal
4df2b6347d hotfix: bump tinybox red training CI timeout to 30 minutes (#9426) 2025-03-13 09:31:44 +01:00
George Hotz
931436204c hotfix: 12000 lines, for AMD stuff 2025-03-13 10:48:14 +08:00
George Hotz
bfc68d1953 add gep rules to simplify (#9419)
* add gep rules to simplify

* ws

* flipped direction
2025-03-13 09:46:25 +08:00
geohotstan
0bed9b6cd2 benchmark huggingface onnx models (#8493)
* add ability to ORT=1

* test_vs_ort

* useless f

* actually have benchmark take in modelproto for more flexibility in huggingface stuff

* ok runs

* good

* oops fix benchmark_onnx __main__

* 224 as default

* add ORT=1 option to huggingface_onnx

* use Tensor to get_input

* add abilty to do single onnx model testing

* better names

* merge properly...

* copy in onnx_helpers

* better

* decent script

* need to add debug tool first

* new limit usage

* why did narrowing_error come back..

* pretty decent

* revert validate change

* more ops bug fixes

* revert unnecessary changes

* fix InstanceNorm too

* remove op from O4

* minimize diff

* address old feedback

* unsure of this, just revert

* remove that assert

* working attention

* to_python_const Attention

* cant init from np constant so just do this

* final

* fix bug in attention

* attention clean ups

* add hard TODOs and REPOPATH and TRUNCATE envvar

* fix input_ids default value

* final

* fix scatter

* cleaner _prepare_quantize

* use new attention and tempfile for huggingface script

* more stats

* update

* remove outdated code

* big refactor to something usable by CI

* booooooom

* clean up

* update to using yaml as env var input

* add dry run

* try

* valid pad

* use argparser and fix gather bug

* ignore all yaml

* tiny bit more polish

* woah ignoring all yaml was not right

* typo

* decouple huggingface_onnx_run debug run with huggingface_onnx_download

* bug fix for downloading single model

* WOOOO ok much better

* oops argparse 'required' is an invalid argument for positionals

* oops argparse 'required' is an invalid argument for positionals

* add assert

* fix types

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-12 20:13:12 -04:00
chenyu
4992958dae update bert beam params (#9423)
BEAM_MIN_PROGRESS=5 for setup speed
2025-03-12 13:00:41 -04:00
qazal
12978f0d05 reorder contiguous/assign ast rules [pr] (#9420)
* apply setitem ShapeTracker when creating store [pr]

* comments + early contiguous remove

* better

* linter
2025-03-12 12:13:27 +01:00
George Hotz
5f6d5b057d expand index isn't grouping by access size (#9418)
* expand index isn't grouping by access size

* split_load_store

* scalar vec

* +correct_load_store

* vectorized and

* correct_load_store always

* simplify before divides
2025-03-12 17:24:10 +08:00
George Hotz
815ad0b7a8 support load/store grouping in DEVECTORIZE=0 (#9409) 2025-03-12 11:34:37 +08:00
Priyank Patel
4714c4f9ad torch backend multigpu - add devices and tests (#9414)
* add multi-device support and tests

* simplify
2025-03-12 11:33:11 +08:00
chenyu
22fc0a2e36 bert sum acc in half (#9412)
also BS=96
2025-03-11 23:03:15 -04:00