gip
04ef0fd328
fix: message when applegpu tools missiong ( #5236 )
2024-07-03 09:07:09 -07:00
nimlgen
21d41f06a2
nv follows HCQCompatAllocRes protocol ( #5275 )
...
* nv follows HCQCompatAllocRes protocol
* fix amd
2024-07-03 11:34:10 +03:00
Vyacheslav Pachkov
d3e4e21759
add return type for HCQCompatAllocator _alloc ( #5267 )
...
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-07-03 10:25:44 +03:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
chenyu
ce52b10f6f
add a flag DISABLE_LOOP_COLLAPSE ( #5270 )
...
workaround if user encountered UNMUL error
2024-07-02 20:01:11 -04:00
George Hotz
e53b164e1a
small changes from lowerer ( #5266 )
2024-07-02 15:03:54 -07:00
nimlgen
7be776f9af
add _alloc_signal/_free_signal to hcq ( #5264 )
...
* add _alloc_signal/_free_signal api
* oops, revert this
* linter
2024-07-02 23:35:39 +03:00
qazal
59bc837ad1
refactor gated load rendering [run_process_replay] ( #5259 )
...
* refactor gated load rendering [run_process_replay]
* hotfix: extra line
* remove llvm diff
2024-07-02 15:13:10 +03:00
nimlgen
e050603b4b
nv close fds after mapping ( #5246 )
2024-07-02 13:57:46 +03:00
qazal
d3cfb6c2e3
refactor UOps.LOAD barrier [run_process_replay] ( #5258 )
2024-07-02 13:48:47 +03:00
qazal
a1044e6063
iterate over scoped uops once [run_process_replay] ( #5255 )
2024-07-02 09:21:09 +03:00
nimlgen
57e89645cd
hcq spec test ( #5226 )
...
* start hcq spec test
* more test
* fixes
* run on amd as well
* test amdgpu exec
* fix amd
* amd mockgpu support sdma timestamp
2024-07-01 17:36:37 +03:00
Carson Powers
d7839fdc5f
Add x!=0 -> (bool)x pattern [run_process_replay] [no_assert] ( #5237 )
...
* x!=0 -> (bool)x pattern
* bool != bool pattern
* redundant upat
2024-06-30 17:48:45 -07:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
nimlgen
7b7b751513
simple hip backend for debugging ( #5201 )
...
* hip backend
* fix mypy
* shorter
* fixes
* tiny changes
2024-06-30 23:00:11 +03:00
chenyu
649641a2f2
fix tqdm with generator without __len__ ( #5238 )
...
it should be treated as total = 0 (just show iteration count).
also removed duplicated ": " in fetch and fixed unit scale with total = 0
2024-06-30 12:20:59 -04:00
chenyu
fd53b6d901
tqdm supports fractional blocks ( #5233 )
...
enabled progress bar match in test, it matched perfectly now
2024-06-29 22:30:18 -04:00
chenyu
ae10ae4722
simplify tqdm scale math ( #5231 )
...
expand the log of log stuff
2024-06-29 21:17:40 -04:00
hikettei
ad1ca7da64
[Feature] Added BinaryOps.AND/BinaryOps.OR ( #5223 )
...
* [Feature] Added BinaryOps.AND/BinaryOps.OR
* Add: __rand__, __ror__
2024-06-29 17:20:25 -07:00
chenyu
50b05dd3f4
tqdm minor cleanup ( #5229 )
...
combined some if branches
2024-06-29 18:58:24 -04:00
chenyu
b2ea610df8
fix tqdm unit_scale and support hours in time ( #5227 )
...
* fix tqdm unit_scale and support hours in time
previously it only supports MM:SS.
more chars to unitscales, strip trailing "." and " " in formatting, and more tests
* simpler
2024-06-29 14:48:51 -04:00
qazal
f374fb77af
assert bool dtype for valid [run_process_replay] ( #5214 )
...
* valid is always bool
* prevent NumNode to begin with
* part 2
* test: disable pattern matchers, asserts should pass
* test: store without cast
* test: if (0)
* cleanup time
* only pattern match bool literal
* better for upstream debug
2024-06-29 21:20:32 +03:00
qazal
3f4eeb8b54
late UOps.IF generation [run_process_replay] [no_assert] ( #5027 )
...
* find all places
* test gates
* test
* gate based on depths
* add ctx
* that cache was so wrong
* delete useless things
* dont double write if
* self.if_cond
* move UOps.IF to gated store
* test_padto_where_multioutput
* test_padto_group
* minor cleanup
* hmm this actually works?
* need a good barrier
* merge 2
* delete ctx
* p1
* maybe p2
* p3
* minor fixup
* fixup 2
* smart thing from the Lowerer branch
* refactoring
* refactoring 2
* maybe before graph_rewrite
* slightly more acceptable Linearizer diff
* more correct
* [run_process_replay] [no_assert]
2024-06-29 12:22:14 -04:00
chenyu
42d1f92fc1
simpler tqdm ( #5221 )
...
can do more, but many cases are not tested
2024-06-29 07:41:46 -04:00
nimlgen
dd7eef7d71
libc defs to autogen ( #5217 )
...
* libc defs to autogen
* amd import libc
* linter
* better a bit
* remove comment, check this
* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
b4c49ae3fa
remove cudacpu in favour of mockgpu ( #5225 )
...
* remove cudacpu in favour of mockgpu
* remove unused import
* not used as well
2024-06-29 11:05:16 +03:00
nimlgen
ee02dcb98e
nv supports PTX=1 ( #5222 )
...
* nv supports PTX=1
* not needed
* split nv compiler into nvrtc autogen
* remove to_c_array
* test
* Revert "test"
This reverts commit f0b56f308b .
2024-06-29 10:46:29 +03:00
nimlgen
c941a58581
amd refactor queue creation ( #5216 )
...
* amd refactor queue creation
* fixes
* use data64_le
* fix linter
2024-06-28 23:24:49 +03:00
chenyu
7ba4938510
simplify View.permute arg check [run_process_replay] ( #5218 )
...
it checks if `axis` is a valid permutation, which is the same as `sorted(axis) == list(range(len(self.shape)))`
2024-06-28 16:18:46 -04:00
George Hotz
c9714dfcf4
rename graph to children [run_process_replay] ( #5215 )
2024-06-28 09:53:52 -07:00
kormann
6c456b6d66
remove uopgraph dedup + slight speedup ( #5199 )
...
* rm dedup
* rm dedup
* tests
* reduce diff
* oups
* reduce diff
* rm UOp.tuple
2024-06-28 09:26:32 -07:00
nimlgen
9b08a9397c
amd inline bf16 funcs ( #5212 )
2024-06-28 18:45:00 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
nimlgen
7f7fa26e03
allow hugepage failure in memadvise ( #5207 )
2024-06-28 11:41:10 +03:00
chenyu
73395b998b
better error msg for TinyJit inside TinyJit ( #5202 )
...
it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now
2024-06-27 18:09:19 -04:00
nimlgen
ac748cccdb
nv apply relocs ( #5165 )
...
* nv do reloc
* a bit cleaner
2024-06-27 23:54:16 +03:00
Roelof van Dijk
540ebdf47c
missing init files ( #5196 )
2024-06-27 15:30:02 -04:00
George Hotz
345bcc2099
move graph_dedup out of class [run_process_replay] ( #5197 )
2024-06-27 12:04:00 -07:00
George Hotz
d094a6828f
single pass rewrite ( #5159 )
...
* single pass rewrite
* claude cleanups
* claude cleanups
* skip those tests
* restrict that to ints
* comment
* asserts i don't expect to fail do fail
* simplest...rewrite...ever
* simplest...rewrite...ever
* add that rule back
* tests pass?
* only collapse reduce loops
* second SHL/SHR arg must be 4 bytes
* fix verify
* no SHL/SHR in ptx
* put that back
* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
Roelof van Dijk
1ff9bbaa61
ruff: close file handle ( #5180 )
...
* close file handle
* some more open file handles
* must stay open
* remove this close, stays open
2024-06-27 11:29:47 -07:00
chenyu
0c6c7c5f7b
CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark ( #5191 )
...
ignoring beam cache but using compile cache should be fine, saved some benchmark time.
also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
ad91962dcf
CACHECOLLECTING -> CAPTURING and don't capture clear_l2 ( #5190 )
...
fixed first time BEAM slowness
2024-06-27 12:32:28 -04:00
Roelof van Dijk
01e8838b65
ruff: suppressible-exception ( #5182 )
...
* fix: use contextlib to suppress errors
* enable rule SIM105
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-06-27 08:23:44 -07:00
Roelof van Dijk
9704c7d4d4
ruff rule if-exp-instead-of-or-operator (FURB110) ( #5178 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:22:19 -07:00
chenyu
5b8fda3c65
fix: JIT=0 means no JIT ( #5188 )
2024-06-27 10:31:37 -04:00
Roelof van Dijk
26e254c42b
ruff: else-raise and else-return ( #5175 )
...
* ruff: enable else-raise and else-return
* ruff: add error names
* fix order
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 07:54:59 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
George Hotz
396ce6cfc9
clean up graph dedup function [run_process_replay] ( #5169 )
2024-06-26 15:07:34 -07:00
kormann
3a04e518ec
print_tree UPat +fix ( #5132 )
...
* fix and extend print_tree
* typing
* typing
* fix upat
* fix none
* ws
* rm prefix
* mv luop dag
* typo
* test print_tree
2024-06-26 15:02:19 -07:00
Roelof van Dijk
294bd1a9ff
refactor: name check [run_process_replay] ( #5158 )
2024-06-26 11:39:41 -07:00