hikettei
ad1ca7da64
[Feature] Added BinaryOps.AND/BinaryOps.OR ( #5223 )
...
* [Feature] Added BinaryOps.AND/BinaryOps.OR
* Add: __rand__, __ror__
2024-06-29 17:20:25 -07:00
chenyu
50b05dd3f4
tqdm minor cleanup ( #5229 )
...
combined some if branches
2024-06-29 18:58:24 -04:00
chenyu
b2ea610df8
fix tqdm unit_scale and support hours in time ( #5227 )
...
* fix tqdm unit_scale and support hours in time
previously it only supports MM:SS.
more chars to unitscales, strip trailing "." and " " in formatting, and more tests
* simpler
2024-06-29 14:48:51 -04:00
qazal
f374fb77af
assert bool dtype for valid [run_process_replay] ( #5214 )
...
* valid is always bool
* prevent NumNode to begin with
* part 2
* test: disable pattern matchers, asserts should pass
* test: store without cast
* test: if (0)
* cleanup time
* only pattern match bool literal
* better for upstream debug
2024-06-29 21:20:32 +03:00
qazal
3f4eeb8b54
late UOps.IF generation [run_process_replay] [no_assert] ( #5027 )
...
* find all places
* test gates
* test
* gate based on depths
* add ctx
* that cache was so wrong
* delete useless things
* dont double write if
* self.if_cond
* move UOps.IF to gated store
* test_padto_where_multioutput
* test_padto_group
* minor cleanup
* hmm this actually works?
* need a good barrier
* merge 2
* delete ctx
* p1
* maybe p2
* p3
* minor fixup
* fixup 2
* smart thing from the Lowerer branch
* refactoring
* refactoring 2
* maybe before graph_rewrite
* slightly more acceptable Linearizer diff
* more correct
* [run_process_replay] [no_assert]
2024-06-29 12:22:14 -04:00
chenyu
42d1f92fc1
simpler tqdm ( #5221 )
...
can do more, but many cases are not tested
2024-06-29 07:41:46 -04:00
nimlgen
dd7eef7d71
libc defs to autogen ( #5217 )
...
* libc defs to autogen
* amd import libc
* linter
* better a bit
* remove comment, check this
* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
6b08cb5e38
ptx runs on nv in benchmarks ( #5224 )
2024-06-29 11:06:44 +03:00
nimlgen
b4c49ae3fa
remove cudacpu in favour of mockgpu ( #5225 )
...
* remove cudacpu in favour of mockgpu
* remove unused import
* not used as well
2024-06-29 11:05:16 +03:00
nimlgen
ee02dcb98e
nv supports PTX=1 ( #5222 )
...
* nv supports PTX=1
* not needed
* split nv compiler into nvrtc autogen
* remove to_c_array
* test
* Revert "test"
This reverts commit f0b56f308b .
2024-06-29 10:46:29 +03:00
wozeparrot
7bcb74ab23
feat: tag 0.9.1 ( #5220 )
v0.9.1
2024-06-28 20:16:14 -07:00
George Hotz
7f46bfa587
hotfix: docs touchup
2024-06-28 14:36:20 -07:00
nimlgen
c941a58581
amd refactor queue creation ( #5216 )
...
* amd refactor queue creation
* fixes
* use data64_le
* fix linter
2024-06-28 23:24:49 +03:00
chenyu
7ba4938510
simplify View.permute arg check [run_process_replay] ( #5218 )
...
it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`
2024-06-28 16:18:46 -04:00
George Hotz
80ac21200b
hotfix: linearizer test fixup
2024-06-28 10:52:25 -07:00
George Hotz
c9714dfcf4
rename graph to children [run_process_replay] ( #5215 )
2024-06-28 09:53:52 -07:00
kormann
6c456b6d66
remove uopgraph dedup + slight speedup ( #5199 )
...
* rm dedup
* rm dedup
* tests
* reduce diff
* oups
* reduce diff
* rm UOp.tuple
2024-06-28 09:26:32 -07:00
nimlgen
9b08a9397c
amd inline bf16 funcs ( #5212 )
2024-06-28 18:45:00 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
63fa4e2a0e
fix seed = 0 in sdxl ( #5209 )
...
removed a few unneeded realize and contiguous too
2024-06-28 08:48:59 -04:00
Tobias Fischer
4688f97d48
Add SDXL Inference to Examples ( #5206 )
...
* added sdxl inference code
* fixed trailing whitespace
* use original impl code, removed uneeded numpy calls
2024-06-28 07:42:28 -04:00
qazal
3e56c8422c
remu err handling ( #5208 )
...
* add error handling
* use pre release
* minor
* works
2024-06-28 13:15:18 +03:00
nimlgen
7f7fa26e03
allow hugepage failure in memadvise ( #5207 )
2024-06-28 11:41:10 +03:00
chenyu
73395b998b
better error msg for TinyJit inside TinyJit ( #5202 )
...
it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now
2024-06-27 18:09:19 -04:00
nimlgen
ac748cccdb
nv apply relocs ( #5165 )
...
* nv do reloc
* a bit cleaner
2024-06-27 23:54:16 +03:00
Roelof van Dijk
540ebdf47c
missing init files ( #5196 )
2024-06-27 15:30:02 -04:00
chenyu
d8dc43ad06
remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark ( #5198 )
...
this no longer helps
2024-06-27 15:20:34 -04:00
George Hotz
345bcc2099
move graph_dedup out of class [run_process_replay] ( #5197 )
2024-06-27 12:04:00 -07:00
George Hotz
d094a6828f
single pass rewrite ( #5159 )
...
* single pass rewrite
* claude cleanups
* claude cleanups
* skip those tests
* restrict that to ints
* comment
* asserts i don't expect to fail do fail
* simplest...rewrite...ever
* simplest...rewrite...ever
* add that rule back
* tests pass?
* only collapse reduce loops
* second SHL/SHR arg must be 4 bytes
* fix verify
* no SHL/SHR in ptx
* put that back
* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
Roelof van Dijk
1ff9bbaa61
ruff: close file handle ( #5180 )
...
* close file handle
* some more open file handles
* must stay open
* remove this close, stays open
2024-06-27 11:29:47 -07:00
chenyu
83da8b3558
use NV instead of CUDA in benchmark ( #5192 )
...
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b
CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark ( #5191 )
...
ignoring beam cache but using compile cache should be fine, saved some benchmark time.
also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d
benchmark use JITBEAM for llama and gpt2 ( #5189 )
2024-06-27 12:56:02 -04:00
chenyu
ad91962dcf
CACHECOLLECTING -> CAPTURING and don't capture clear_l2 ( #5190 )
...
fixed first time BEAM slowness
2024-06-27 12:32:28 -04:00
Roelof van Dijk
01e8838b65
ruff: suppressible-exception ( #5182 )
...
* fix: use contextlib to suppress errors
* enable rule SIM105
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-06-27 08:23:44 -07:00
Roelof van Dijk
9704c7d4d4
ruff rule if-exp-instead-of-or-operator (FURB110) ( #5178 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:22:19 -07:00
chenyu
5b8fda3c65
fix: JIT=0 means no JIT ( #5188 )
2024-06-27 10:31:37 -04:00
qazal
3af17849bf
safely parse quoted titles [run_process_replay] ( #5183 )
2024-06-27 16:39:48 +03:00
Roelof van Dijk
975b811ad9
names shadowing builtins ( #5179 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:15:01 -04:00
Roelof van Dijk
26e254c42b
ruff: else-raise and else-return ( #5175 )
...
* ruff: enable else-raise and else-return
* ruff: add error names
* fix order
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 07:54:59 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
reddyn12
f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained ( #5167 )
...
* Fix batchnorm shapes
* make it general reshape
2024-06-26 18:44:10 -04:00
George Hotz
396ce6cfc9
clean up graph dedup function [run_process_replay] ( #5169 )
2024-06-26 15:07:34 -07:00
kormann
3a04e518ec
print_tree UPat +fix ( #5132 )
...
* fix and extend print_tree
* typing
* typing
* fix upat
* fix none
* ws
* rm prefix
* mv luop dag
* typo
* test print_tree
2024-06-26 15:02:19 -07:00
chenyu
0ba093dea0
hotfix: only validate stable diffusion when using threefry ( #5166 )
2024-06-26 16:50:38 -04:00
chenyu
e4a5870b36
validate stable_diffusion output ( #5163 )
...
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
Roelof van Dijk
294bd1a9ff
refactor: name check [run_process_replay] ( #5158 )
2024-06-26 11:39:41 -07:00
Roelof van Dijk
2c80583e14
perf: cache const UOp creation [run_process_replay] ( #5156 )
2024-06-26 11:13:14 -07:00