chenyu
cb5702f170
tiny cleanup to transcendental xexp2 ( #7326 )
...
also added test for exp and log of nan and inf
2024-10-27 21:54:20 -04:00
chenyu
4c855ae692
unit test transcendental helpers ( #7325 )
...
added a test to run UOps with const inputs. seems to have issue with both payne_hanek_reduction and cody_waite_reduction
2024-10-27 19:55:00 -04:00
qazal
8d9459f281
always run process replay with contextvars ( #7323 )
...
* always run process replay with contextvars [pr]
* not the last two
* extra
* no pr
2024-10-27 20:44:42 +02:00
qazal
adcdaa17bb
map BUFFER to Metadata [pr] ( #7324 )
2024-10-27 20:31:04 +02:00
qazal
d634261c51
late buffer uops [pr] ( #7322 )
2024-10-27 19:34:01 +02:00
chenyu
cdbe08b94b
use UOp.render in colored_shape ( #7321 )
...
similar to function name, print rendered str instead of raw UOp
2024-10-27 11:42:31 -04:00
chenyu
4a03e00aa1
fix llama3 download_model assert ( #7320 )
...
false positive if download_model and model are not provided
2024-10-27 11:20:24 -04:00
talati
d4d201d87b
fixing branch condition on UOps.IF in the ptx renderer ( #7315 )
...
* fixing branch condition on UOps.IF in the ptx renderer
* ptx works
---------
Co-authored-by: Nick Talati <nick.talati@quantworks.com >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
Co-authored-by: qazal <qazal.software@gmail.com >
2024-10-27 14:27:38 +02:00
qazal
a410b46c1d
unskip test_gated_store_with_if [pr] ( #7319 )
2024-10-27 14:03:12 +02:00
Maximilian Wolf
3c992250d5
Failing test: different behavior on different devices ( #7193 )
...
* add minimal failing test
* more tiny makes linter happy
* tinyfy
* no walrus in assert
* a tiny bit simpler
* minimal
* better place, better name, expected failure
* skip devices with correct behavior
2024-10-27 09:53:58 +08:00
eliotgolding
e920f1d663
Llama 3.2 1B load from GGUF ( #7295 )
...
* gguf 1b-instruct
* not needed
2024-10-27 09:29:02 +08:00
chenyu
d66fe7a66f
fix simplify_valid ( #7313 )
...
the simplex should compare with valid bound, not its vmin
2024-10-26 14:21:12 -04:00
chenyu
0a4d01f6d4
disable simplify_valid ( #7312 )
...
fixed test_failure_55. will reenable it later after fixing the bug
2024-10-26 12:42:48 -04:00
nimlgen
293714610a
capture beam log runtime errors ( #7311 )
2024-10-26 13:59:45 +03:00
nimlgen
3c62315aa8
add resnet pf ( #7310 )
...
* add resnet pf
* all platforms
2024-10-26 13:20:32 +03:00
nimlgen
68cd2c0669
nv correct local memory based on device ( #7307 )
...
* nv correct local memory based on device
* linter
* oops
* oops2
2024-10-25 22:23:42 +03:00
chenyu
2ddfb9678a
update exponent_bias in transcendental ( #7304 )
...
from https://en.wikipedia.org/wiki/Exponent_bias , 15, 127, 1023 are bias
2024-10-25 10:45:49 -04:00
chenyu
e7cd21c5e3
remove custom render in test_simplify_valid_idx ( #7303 )
...
use UOp render to compare
2024-10-25 10:20:26 -04:00
chenyu
4688c01e3e
transcendental cleanups ( #7301 )
...
simplified polyN and some redundant line cleanups
2024-10-25 09:30:25 -04:00
George Hotz
aadf688aeb
order flipper as *normal* rewrite rule ( #7300 )
...
* instant isn't actually used [pr]
* order flipper as *normal* rewrite rule
* fix inf loop
* need simplify now
2024-10-25 21:28:30 +08:00
George Hotz
3c31497f55
instant isn't actually used [pr] ( #7299 )
...
* instant isn't actually used [pr]
* tolerance bump
2024-10-25 21:01:29 +08:00
George Hotz
199a991237
line reduction [pr] ( #7296 )
2024-10-25 17:05:09 +07:00
George Hotz
4fed358511
hotfix: timeouts to 20 minutes. better no stats update than a red x
2024-10-25 16:31:52 +08:00
George Hotz
dc3148c677
hotfix: minor speed increase + stable diffusion relax
2024-10-25 16:27:21 +08:00
George Hotz
4812801aa6
try for canonical order ( #7286 )
...
* try for canonical order
* cmp better
* disable bad tests
* flip const order
* fix test
* fix tests
* different fix for NOOP
* metaclass here
* fix tests
* narrower scope
2024-10-25 16:04:54 +08:00
George Hotz
d3500af71b
move consts last in uop toposort ( #7290 )
...
* move consts last in uop toposort
* consts first in toposort
2024-10-25 14:58:48 +08:00
qazal
e3c9c94896
Revert "move anything that isn't bfs [pr] ( #7273 )" ( #7289 )
...
This reverts commit b805711f86 .
2024-10-25 14:38:30 +08:00
qazal
0b47eca085
schedule.py reorders [pr] ( #7285 )
...
* schedule.py reorders [pr]
* diff
* more renames
2024-10-25 14:30:23 +08:00
George Hotz
004af512e6
try all matches in the function ( #7288 )
2024-10-25 14:17:04 +08:00
George Hotz
bcf0537653
canonicalize the order prereqs ( #7283 )
...
* canonicalize the order
* don't change that yet
* that order isn't safe with uops
2024-10-25 11:37:51 +08:00
qazal
603d637105
split to fuse.py and schedule.py [pr] ( #7284 )
2024-10-25 06:17:24 +03:00
qazal
698457c5ce
big graph ScheduleContext [pr] ( #7282 )
2024-10-25 05:58:23 +03:00
qazal
b805711f86
move anything that isn't bfs [pr] ( #7273 )
2024-10-25 05:34:21 +03:00
George Hotz
6dc7d3c949
instant uop rules [pr] ( #7263 )
...
* instant uop rules [pr]
* real instant
* only instant folder
* better diff
* instant means instant
* Revert "instant means instant"
This reverts commit e58d9161bf .
2024-10-25 10:32:45 +08:00
chenyu
90f720d703
limit idiv by neg bound to only if s0 is non-negative [pr] ( #7277 )
...
also updated the tests when div by negative const
2024-10-24 15:46:50 -04:00
chenyu
d4c94d0d32
disable llama 1 4gpu and 6gpu benchmark ( #7276 )
...
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402
RUN_PROCESS_REPLAY=0 on llama 70B and resnet training ( #7272 )
...
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training
also added a 15 minutes total timeout, this cannot grow indefinitely
* add a few more
* a few more just for NV
2024-10-24 12:09:54 -04:00
chenyu
b777cfdcba
update test_max_simplify_and_cancel ( #7270 )
...
it's fixed and no longer dumb
2024-10-24 10:29:05 -04:00
qazal
d3953c6c55
split the preload path [pr] ( #7271 )
2024-10-24 17:28:03 +03:00
chenyu
acad11ea8e
minor cleanup to View add [pr] ( #7247 )
2024-10-24 09:18:47 -04:00
nimlgen
98f8d0ccf9
nv limit max local memory with envvar ( #7265 )
2024-10-24 16:01:50 +03:00
qazal
f20b651ee0
prescheduling refactor from big graph [pr] ( #7268 )
...
* prescheduling refactor from big graph [pr]
* finally replay
2024-10-24 14:55:07 +03:00
George Hotz
2e6ec43c49
hotfix: become the latest for process replay
2024-10-24 19:02:59 +08:00
George Hotz
9a3d498d9c
with commutative hack, uops can change. fix that ( #7266 )
...
* with commutative hack, uops can change. fix that
* simpler
2024-10-24 18:50:23 +08:00
qazal
d482d927a8
hotfix: nobody uses [run_process_replay] [pr] ( #7264 )
2024-10-24 13:37:29 +03:00
qazal
fa5dc7857a
assign toposort with big graph, bfs [pr] ( #7242 )
...
* assign toposort with big graph, bfs [pr]
* cycle
* merge 2
* filter bufs
* delete inputs
2024-10-24 13:09:01 +03:00
George Hotz
4d081eb560
double mod is single mod ( #7262 )
...
* double mod is single mod
* unused name
2024-10-24 18:02:51 +08:00
George Hotz
e4631a47f4
symbolic arange support ( #7252 )
...
* symbolic arange support WIP [pr]
* smin/smax from old try
* pad2d symbolic works
* real test
* sym arange
* symbolic arange test passes
* double mod is single mod
* lol that's not right
* more tests
* Update ops.py
2024-10-24 17:55:53 +08:00
George Hotz
23b26d40d5
small max rules + windows VIZ ( #7261 )
...
* a rule from smax
* that rule too
* fix VIZ on windows
2024-10-24 17:43:39 +08:00
George Hotz
415186da3c
Revert "some rules to simplify max ( #7258 )" ( #7260 )
...
This reverts commit b56fab54ea .
2024-10-24 17:15:52 +08:00