George Hotz
7f46bfa587
hotfix: docs touchup
2024-06-28 14:36:20 -07:00
nimlgen
c941a58581
amd refactor queue creation ( #5216 )
...
* amd refactor queue creation
* fixes
* use data64_le
* fix linter
2024-06-28 23:24:49 +03:00
chenyu
7ba4938510
simplify View.permute arg check [run_process_replay] ( #5218 )
...
it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`
2024-06-28 16:18:46 -04:00
George Hotz
80ac21200b
hotfix: linearizer test fixup
2024-06-28 10:52:25 -07:00
George Hotz
c9714dfcf4
rename graph to children [run_process_replay] ( #5215 )
2024-06-28 09:53:52 -07:00
kormann
6c456b6d66
remove uopgraph dedup + slight speedup ( #5199 )
...
* rm dedup
* rm dedup
* tests
* reduce diff
* oups
* reduce diff
* rm UOp.tuple
2024-06-28 09:26:32 -07:00
nimlgen
9b08a9397c
amd inline bf16 funcs ( #5212 )
2024-06-28 18:45:00 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
63fa4e2a0e
fix seed = 0 in sdxl ( #5209 )
...
removed a few unneeded realize and contiguous too
2024-06-28 08:48:59 -04:00
Tobias Fischer
4688f97d48
Add SDXL Inference to Examples ( #5206 )
...
* added sdxl inference code
* fixed trailing whitespace
* use original impl code, removed uneeded numpy calls
2024-06-28 07:42:28 -04:00
qazal
3e56c8422c
remu err handling ( #5208 )
...
* add error handling
* use pre release
* minor
* works
2024-06-28 13:15:18 +03:00
nimlgen
7f7fa26e03
allow hugepage failure in memadvise ( #5207 )
2024-06-28 11:41:10 +03:00
chenyu
73395b998b
better error msg for TinyJit inside TinyJit ( #5202 )
...
it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now
2024-06-27 18:09:19 -04:00
nimlgen
ac748cccdb
nv apply relocs ( #5165 )
...
* nv do reloc
* a bit cleaner
2024-06-27 23:54:16 +03:00
Roelof van Dijk
540ebdf47c
missing init files ( #5196 )
2024-06-27 15:30:02 -04:00
chenyu
d8dc43ad06
remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark ( #5198 )
...
this no longer helps
2024-06-27 15:20:34 -04:00
George Hotz
345bcc2099
move graph_dedup out of class [run_process_replay] ( #5197 )
2024-06-27 12:04:00 -07:00
George Hotz
d094a6828f
single pass rewrite ( #5159 )
...
* single pass rewrite
* claude cleanups
* claude cleanups
* skip those tests
* restrict that to ints
* comment
* asserts i don't expect to fail do fail
* simplest...rewrite...ever
* simplest...rewrite...ever
* add that rule back
* tests pass?
* only collapse reduce loops
* second SHL/SHR arg must be 4 bytes
* fix verify
* no SHL/SHR in ptx
* put that back
* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
Roelof van Dijk
1ff9bbaa61
ruff: close file handle ( #5180 )
...
* close file handle
* some more open file handles
* must stay open
* remove this close, stays open
2024-06-27 11:29:47 -07:00
chenyu
83da8b3558
use NV instead of CUDA in benchmark ( #5192 )
...
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b
CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark ( #5191 )
...
ignoring beam cache but using compile cache should be fine, saved some benchmark time.
also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d
benchmark use JITBEAM for llama and gpt2 ( #5189 )
2024-06-27 12:56:02 -04:00
chenyu
ad91962dcf
CACHECOLLECTING -> CAPTURING and don't capture clear_l2 ( #5190 )
...
fixed first time BEAM slowness
2024-06-27 12:32:28 -04:00
Roelof van Dijk
01e8838b65
ruff: suppressible-exception ( #5182 )
...
* fix: use contextlib to suppress errors
* enable rule SIM105
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-06-27 08:23:44 -07:00
Roelof van Dijk
9704c7d4d4
ruff rule if-exp-instead-of-or-operator (FURB110) ( #5178 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:22:19 -07:00
chenyu
5b8fda3c65
fix: JIT=0 means no JIT ( #5188 )
2024-06-27 10:31:37 -04:00
qazal
3af17849bf
safely parse quoted titles [run_process_replay] ( #5183 )
2024-06-27 16:39:48 +03:00
Roelof van Dijk
975b811ad9
names shadowing builtins ( #5179 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:15:01 -04:00
Roelof van Dijk
26e254c42b
ruff: else-raise and else-return ( #5175 )
...
* ruff: enable else-raise and else-return
* ruff: add error names
* fix order
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 07:54:59 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
reddyn12
f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained ( #5167 )
...
* Fix batchnorm shapes
* make it general reshape
2024-06-26 18:44:10 -04:00
George Hotz
396ce6cfc9
clean up graph dedup function [run_process_replay] ( #5169 )
2024-06-26 15:07:34 -07:00
kormann
3a04e518ec
print_tree UPat +fix ( #5132 )
...
* fix and extend print_tree
* typing
* typing
* fix upat
* fix none
* ws
* rm prefix
* mv luop dag
* typo
* test print_tree
2024-06-26 15:02:19 -07:00
chenyu
0ba093dea0
hotfix: only validate stable diffusion when using threefry ( #5166 )
2024-06-26 16:50:38 -04:00
chenyu
e4a5870b36
validate stable_diffusion output ( #5163 )
...
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
Roelof van Dijk
294bd1a9ff
refactor: name check [run_process_replay] ( #5158 )
2024-06-26 11:39:41 -07:00
Roelof van Dijk
2c80583e14
perf: cache const UOp creation [run_process_replay] ( #5156 )
2024-06-26 11:13:14 -07:00
George Hotz
eda2824cd8
freeze uop [run_process_replay] ( #5155 )
2024-06-26 10:18:15 -07:00
Elias Wahl
e267f3161d
Add MLLogger ( #5125 )
...
* add MLPerf logger
* eval steps
* start with step 1
* compliance for 3.1.0 and 4.0.0
* more compliance
* assert, comment and contiguous
2024-06-26 12:23:56 -04:00
nimlgen
16405b973a
fix hcq sync ( #5062 )
...
* fix hcq sync
* rewrite
* linter + comment
* fix profiler
* no default dict
* correct sync of unjitted transfer
* fix test
2024-06-26 17:50:37 +03:00
David Hou
3604642847
Llama shard axis 0 sometimes ( #5123 )
...
* make buffer view optional with a flag [run_process_replay]
* do not view when sharding to save memory [run_process_replay]
* llama shard axis=0 sometimes
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-26 10:35:25 -04:00
nimlgen
fd27f19e92
graph tests ( #5153 )
...
* graph tests
* add test
* cleanup
2024-06-26 16:31:20 +03:00
George Hotz
7b709c3ccd
switch tensorcoreoptions to tuple [run_process_replay] ( #5143 )
...
* switch tensorcoreoptions to tuple [run_process_replay]
* localbuffer can stay namedtuple for now
* freeze LocalBuffer
* remove NamedTuple
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-06-26 14:12:53 +03:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
George Hotz
ee4f080a14
rewrite div const [run_process_replay] [no_assert] ( #5151 )
...
* rewrite div const [run_process_replay] [no_assert]
* Update uops.py
2024-06-25 20:23:14 -07:00
David Hou
666a9c1448
don't view origin buffer when sharding ( #5122 )
...
* make buffer view optional with a flag
* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
George Hotz
89e106686a
simpler unmatch [run_process_replay] ( #5149 )
2024-06-25 19:57:40 -07:00
George Hotz
c98ca23cb9
test pickle variable ( #5150 )
...
* test pickle variable
* fix process replay
2024-06-25 19:49:21 -07:00