Commit Graph

467 Commits

Author SHA1 Message Date
chenyu
e7d5fe4a32 improve idiv _min_max (#8066)
for the cases that the we don't know the exact bounds, we might still know the sign. with this, can remove some resolve for symbolic shapetracker
2024-12-05 23:02:16 -05:00
Sieds Lykles
49c6dab74b Add pattern for div mod recombine with gcd (#8061)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-05 13:16:58 -05:00
George Hotz
df18e7cc37 accept filename decorator [pr] (#8049)
* accept filename decorator [pr]

* add test for safe_load

* bring old tar tests back
2024-12-05 11:40:59 +08:00
chenyu
b3220ca7b1 test cases of always True/False lt (#8048)
* test cases of always True/False lt

* one more
2024-12-04 20:38:40 -05:00
Sieds Lykles
70db1bab5c Fold nested div with const (#8010)
* Rebase nested div and with const

* Update the ordering

* return None on vectors

Fixes cpu test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 14:59:09 -05:00
leopf
f0401e14e8 tar_extract with Tensors (#7853)
* initial

* USTAR, PAX and GNU support + testing

* from_bytes byteorder

* use TarInfo.frombuf

* tensor only usage

* remove contextlib.suppress

* shorter ow,pax

* more tests

* testing length + move tests

* cleanup

* new approach: RawTensorIO

* fix fetch

* enable read test

* cleanup and ignore fix

* fix for python < 3.12

* make it RawIO

* functions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 17:03:19 +08:00
chenyu
0c060fa040 update uop and tests to not use lt/gt/le/ge [pr] (#8023)
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
chenyu
ef3752625b add test case of realize_size with 0 in shape (#8011) 2024-12-03 09:19:50 -05:00
George Hotz
09eac42fd6 cache indexed uops in st [pr] (#8008)
* cache indexed uops in st [pr]

* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f Improved div folding (#7996)
* First version of div_mod folding together

* Working version with old div folding behaviour

* Test is fixed

* Fix linting

* Happy mypy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-03 08:11:25 -05:00
chenyu
c7bc75e634 alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900)
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)

only do if at least one branch is const, so total alu won't increase

* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
Sieds Lykles
d267a2d9eb Div mod recombine test for issue (#7957)
* Add test for failing div_mod recombine

* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
Sieds Lykles
864758423e Don't take const in gcd and change the "nothing_changed" condition (#7926)
* Don't take const in gcd and change the "nothing_changed" condition

Biggest difference is probably actually that I forgot to check if gcd
changed if nothing else changed
The TODO was fixed by not using the const in the gcd, and then taking it
out

* Fix more tests
2024-11-27 18:07:36 -05:00
chenyu
988d64900b add TODO case to test_mod_congruence (#7925)
same alu count but better bounds
2024-11-27 15:23:21 -05:00
Sieds Lykles
d318867776 Factoring gcd out of mod (#7916)
* Factoring gcd out of mod

Curious if this will be faster/better

* Update bounds on test
2024-11-26 21:17:22 -05:00
chenyu
ff3f2a9c1a Revert "move attention upcast (#7830)" (#7903)
This reverts commit c07daf40e7.
2024-11-25 18:59:51 -05:00
chenyu
a49ca0c2ff clean up fully_flatten [pr] (#7885)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-25 06:53:18 -05:00
Sieds Lykles
a49a7c4784 Improved mod folding (#7887)
* Remove uneccessary if statement

In all paths where something_changed was set to True, remainder is
appended so the list can't be empty

* Working version of improved mod folding

* Fix offset calculation

Passing fuzz_symbolic.py to 130_000 so far
Added an extra test

* Cleaner offset calculation
2024-11-24 22:21:34 -05:00
George Hotz
8c3d3181dd bottom up rewrite fixes substitute [pr] (#7862)
* single pass rewrite fixes substitute [pr]

* caching for single_pass_rewrite

* allow multiple rewrites

* a simple test

* bottom_up_rewrite is fully flexible
2024-11-23 20:53:37 +08:00
George Hotz
144e9f00df viz is local, new test, and new quantize [pr] (#7859)
* viz is local, new test, and new quantize [pr]

* fix mime types

* remove font

* after index
2024-11-23 14:27:10 +08:00
chenyu
c07daf40e7 move attention upcast (#7830)
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00
George Hotz
c5d458ce02 BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
Francis Lata
a1c1b9547f Context manager support for tqdm (#7770)
* add context manager support

* add test case for context manager usage
2024-11-18 14:12:03 -05:00
chenyu
e3105675fb cond.where(True, False) is cond (#7733) 2024-11-16 09:44:17 -05:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
chenyu
aeb1301bab enable a few tests that work now (#7721)
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
qazal
e84d089ef1 delete ReduceOps, only use REDUCE_AXIS (#7667) 2024-11-13 19:04:27 +08:00
qazal
9d6b03d691 early assert swizzle in kernel [pr] (#7610)
* early assert swizzle in kernel [pr]

* better

* note changes

* TestIndexing 2
2024-11-09 21:54:43 +08:00
chenyu
74b4d1c1e1 rewrite idx again in real_strides after uop_given_valid (#7600)
uop_given_valid does not guarantee output to be flat. fixed one last real_strides test.
2024-11-08 14:30:32 -05:00
chenyu
c6189e38c1 simplify_valid in real_strides (#7599)
improved one more real_strides. after finishing the last one will think about always applying these in to_indexed_uops
2024-11-08 10:45:22 -05:00
chenyu
a1dfd288bb different valid order (#7589)
in simplify_valid, we start with valids that are in others' parent so the others is more likely to be simplified
2024-11-07 20:27:56 -05:00
chenyu
4378b100ad make UOp.range arg a tuple [pr] (#7583)
* make UOp.range arg a tuple [pr]

so render works on output of ShapeTracker.to_indexed_uops

* fix
2024-11-07 11:58:09 -05:00
chenyu
bb7b5362be uop_given_valid in real_strides (#7231)
simplified idx allows deriving more strides
2024-11-07 09:41:16 -05:00
George Hotz
205befa788 move is_dtype_supported to device [pr] (#7575) 2024-11-07 20:38:03 +08:00
Carl Basho
630a7f37cf update tests (#7554)
Co-authored-by: John Doe <null@mail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-05 11:35:15 -05:00
George Hotz
99bd4372a5 Ops.ALU is no more, the arg is just an op (#7525)
* op arg alu [pr]

* more

* more passing

* fix more tests

* more tests passing

* fix single failing test

* so much cleaner

* noop to not have process replay trigger

* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
9c3ee64a3e hotfix: QoL assert if op is a str 2024-11-04 17:11:38 +08:00
George Hotz
0c19b6298b rename ops to have unique names (#7522) 2024-11-04 17:09:45 +08:00
George Hotz
bac251d2c1 idx_load_store in lowerer [pr] (#7477)
* idx_load_store in lowerer [pr]

* fix tests (#7513)

Co-authored-by: John Doe <null@mail.com>

* work

---------

Co-authored-by: Carl Basho <76494676+oldpondplop@users.noreply.github.com>
Co-authored-by: John Doe <null@mail.com>
2024-11-04 10:18:40 +08:00
chenyu
7758f7211b Revert "s/UPat/Pat (#7506)" [pr] (#7517)
* Revert "s/UPat/Pat (#7506)"

This reverts commit 400011a8c1.

* fix
2024-11-03 16:33:02 -05:00
chenyu
84592225d8 tweak tqdm (#7510)
reduce parentheses and fuzz more tests now there's no sleep
2024-11-03 12:07:11 -05:00
chenyu
c25a69b97e fix tqdm tests (#7509)
time.sleep masked two issues:
(1) iters_per_sec might have unitscale in it, and calling `float` on it fails
(2) default rate is too low to ensure the output matches, it might skip updating
2024-11-03 10:53:22 -05:00
chenyu
400011a8c1 s/UPat/Pat (#7506) 2024-11-03 08:26:19 -05:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
George Hotz
a7ba3d2d91 move reduce to lowerer [pr] (#7462)
* move reduce to lowerer [pr]

* simpler
2024-11-01 16:39:20 +08:00
chenyu
a21434504b update payne_hanek_reduction [pr] (#7455) 2024-10-31 18:41:22 -04:00
chenyu
4065c3dec8 remove special 0 case in frexp (#7450)
we can safely assume input is non-zero, also removed unneeded bitcast
2024-10-31 13:02:33 -04:00
chenyu
0739895b4d tiny clena up pow2if and payne_hanek_reduction (#7423) 2024-10-30 22:22:48 -04:00
chenyu
118dd7721f clean up transcendental.rintk [pr] (#7422)
added unit tests and updated the comment. it's rounding away from 0 for negatives
2024-10-30 20:37:28 -04:00
chenyu
16e60d25b9 move polyN to helper [pr] (#7405)
also move `eval_uop` to `test.helpers`
2024-10-30 10:09:57 -04:00