George Hotz
06f476b371
late transcendental ( #7498 )
2024-11-03 10:53:58 +08:00
chenyu
91a3b27fa9
disable test_setitem_inplace_operator again ( #7495 )
...
it was flaky, not broken broken
2024-11-02 19:01:23 -04:00
chenyu
ba0c246cfd
update test_setitem_overlapping_inplace1 ( #7494 )
...
failed on LLVM and remu, not real AMD
2024-11-02 18:40:53 -04:00
chenyu
f887de0fd6
update test_setitem ( #7493 )
...
some tests passed now
2024-11-02 17:53:04 -04:00
chenyu
49ae2df036
ConstLike type [pr] ( #7492 )
...
* ConstLike type [pr]
`ConstLike = ConstType|Variable|Tuple[ConstType, ...]`. fixed the wrong `Tuple[ConstType]`
* old pylint
2024-11-02 17:13:17 -04:00
chenyu
dc9ffb41a8
cleanup multi [pr] ( #7491 )
2024-11-02 16:38:34 -04:00
chenyu
f8376b3766
all_reduce cosmetic change [pr] ( #7490 )
2024-11-02 15:40:55 -04:00
chenyu
baaec39ffc
update get_transcendental_patterns [pr] ( #7489 )
...
i think ths is better than `(p[0], cast(Callable, p[1]))`
2024-11-02 14:25:31 -04:00
chenyu
55bd136746
clean up reshape_and_permute ( #7488 )
...
probably will rewrite it later as reshape and permute function on Kernel, but for now it's shorter with better types
2024-11-02 13:44:14 -04:00
chenyu
74c7b9d84a
clean up Kernel.name ( #7486 )
...
* clean up Kernel.name
* narrow that str
2024-11-02 12:48:37 -04:00
geohotstan
b1866cbfd9
failure test case for pool ops ( #7483 )
...
* add failure test case
* minimum case
2024-11-02 12:13:38 -04:00
geohotstan
585f3a0f24
Add isinf and isnan ops to Tensor ( #7484 )
...
* move isinf and isnan to new branch
* sneak a roll documentation fix in
* add to docs
* update test coverage for detect_positive and detect_negative
* add types to isinf args
2024-11-02 12:12:52 -04:00
George Hotz
72a9ac27e9
support image dtype in cloud [pr] ( #7482 )
...
* support image dtype in cloud [pr]
* remove outdated osx hack
* unused imports
2024-11-02 23:54:27 +08:00
qazal
24d7fde63d
early skip const [pr] ( #7480 )
2024-11-02 13:18:45 +08:00
qazal
c56364fad0
realize before copy rule [pr] ( #7476 )
...
* realize before COPY and BUFFER_VIEW rule [pr]
* only upat.view
* move the assert to lazy
2024-11-02 13:07:27 +08:00
qazal
3819f5cf4d
realize meta ops from graph_rewrite [pr] ( #7474 )
2024-11-02 01:48:57 +02:00
qazal
e149777b52
start realizing from big graph [pr] ( #7473 )
2024-11-02 00:27:38 +02:00
qazal
2a1aa55882
add realizes to context [pr] ( #7470 )
...
* add realizes set
* add from fuse
2024-11-02 00:00:30 +02:00
qazal
e3ea7cc4b4
prep refactor to UPatLoadStore [pr] ( #7472 )
...
* prep refactor to UPatLoadStore [pr]
* [pr]
2024-11-01 23:50:20 +02:00
qazal
6febd20fcf
set forced_realize for outputs [pr] ( #7469 )
2024-11-01 20:03:12 +02:00
Tobias Fischer
7c9a1d69f9
sdxl gen fix ( #7459 )
2024-11-01 13:57:01 -04:00
ignaciosica
9c832483f2
update shifts spec ( #7468 )
...
* update shifts spec
* hotfix: old style
2024-11-01 12:40:41 -04:00
ignaciosica
18bd98c203
Add shl and shr to llvmir ( #7449 )
...
* add shl and shr to llvmir
* hotfix: enforce type alignment for shr and shl in all backends
* hotfix: change shl and shr spec
* hotfix: typo
* hotfix: refactor shl and shr rules and add casting to ptx shl
* hotfix: bug
* hotfix: ptx shl and shr require buint32
* hotfix: cleanups
2024-11-01 23:49:34 +08:00
chenyu
18e159c9ac
comment about multi real and more tests [pr] ( #7467 )
2024-11-01 11:49:11 -04:00
chenyu
1f343aa40e
replace x.alu(BinaryOps.ADD, y) with add in multi [pr] ( #7466 )
2024-11-01 10:50:57 -04:00
geohotstan
6513690223
Add Tensor.hardsigmoid ( #7433 )
...
* move hardsigmoid to new branch
* add to test
* add NOTE to mention differing values for alpha and beta that match torch
* shift from relu6
* correct shift implementation
* or we just use relu? no more 666
2024-11-01 08:36:52 -04:00
George Hotz
fe78ed8cb7
improve match speed [pr] ( #7465 )
...
* improve match speed [pr]
* no sym in expand
* remove useless rule, sym back
* don't track that
2024-11-01 17:33:53 +08:00
George Hotz
a7ba3d2d91
move reduce to lowerer [pr] ( #7462 )
...
* move reduce to lowerer [pr]
* simpler
2024-11-01 16:39:20 +08:00
George Hotz
2cfca230b5
reduce collapse as a rule ( #7464 )
...
* reduce collapse as a rule
* better [pr]
* cleaner
2024-11-01 16:25:44 +08:00
George Hotz
4f6cf1f8cc
expand DEFINE_ACC [pr] ( #7461 )
2024-11-01 15:20:43 +08:00
qazal
d9f38f9518
group stores by UOp [pr] ( #7460 )
2024-11-01 15:09:16 +08:00
qazal
c1bd2d3f71
viz increment -1 kernel on enter [pr] ( #7448 )
...
* viz increment -1 kernel on enter [pr]
* two paths
* share
2024-11-01 14:14:54 +08:00
Tobias Fischer
1a9e145388
Tensor Clone Function ( #7154 )
...
* implemented clone function
* cleanup linting, single func
* added tests, cleaned up grad cloning
* fixed whitespace
2024-11-01 12:24:43 +08:00
chenyu
acd0fa1a7a
s/hasattr(self, '_buf')/self.is_allocated() [pr] ( #7458 )
...
use is_allocated helper in Buffer
2024-10-31 20:55:20 -04:00
chenyu
036409266d
clean up _prepare_jit_inputs [pr] ( #7457 )
...
removed an unnecessary cast and reordered a bit
2024-10-31 20:41:02 -04:00
chenyu
a21434504b
update payne_hanek_reduction [pr] ( #7455 )
2024-10-31 18:41:22 -04:00
chenyu
4f27862242
no more UPat._any [pr] ( #7454 )
2024-10-31 16:48:37 -04:00
chenyu
5777fca904
clean up cody_waite_reduction magic numbers ( #7452 )
2024-10-31 14:45:04 -04:00
chenyu
5648b9788e
more xlog2 cleanups ( #7451 )
...
following the notations in the paper closer
2024-10-31 13:52:31 -04:00
chenyu
4065c3dec8
remove special 0 case in frexp ( #7450 )
...
we can safely assume input is non-zero, also removed unneeded bitcast
2024-10-31 13:02:33 -04:00
chenyu
53db3478fe
cast to float32 for float16 xlog2 ( #7447 )
...
formula has 2X error with denormal floats
2024-10-31 10:36:29 -04:00
chenyu
5085b2fde7
cleanup xlog2 and remove unneeded functions ( #7446 )
...
denormal_map still looks wrong but a lot cleaner
2024-10-31 09:45:16 -04:00
chenyu
02636bc05e
simpler switch over in xsin ( #7426 )
2024-10-31 08:56:01 -04:00
qazal
c5a50465d1
big graph first [pr] ( #7443 )
...
* big graph first [pr]
* move things
2024-10-31 20:10:11 +08:00
qazal
38b1790575
move image dtype fixup [pr] ( #7444 )
...
* move image dtype fixup [pr]
* more work
* late dtype
* use base
2024-10-31 19:51:46 +08:00
George Hotz
f579693ec9
hotfix: casted nan/inf
2024-10-31 19:50:17 +08:00
George Hotz
a43b7a4b7c
less rewrite stages in matcher ( #7445 )
...
* less rewrite stages in matcher
* better name
2024-10-31 19:45:21 +08:00
George Hotz
5dd1ffd5d0
don't const rewrite in cstyle ( #7442 )
...
* don't const rewrite in cstyle
* Update cstyle.py
* simple_symbolic
* fix bfloat16 const on AMD
2024-10-31 19:16:49 +08:00
qazal
bdde795239
early filter sink buffers [pr] ( #7440 )
2024-10-31 18:50:36 +08:00
qazal
9905de3362
late append realizes [pr] ( #7439 )
...
* dont unbind in ops
* late append realizes [pr]
* Revert "dont unbind in ops"
This reverts commit e8d9da936d .
* delete ctx.realizes
* empty
2024-10-31 18:04:42 +08:00