chenyu
0e57152dbb
clean up test_uop_symbolic [pr] ( #8165 )
...
removed old `Node` references
2024-12-11 14:13:19 -05:00
chenyu
5eadae204b
test multi device rand with manual_seed ( #8164 )
2024-12-11 13:11:31 -05:00
Maxim Zakharov
e53a5bf0c3
StableDdiffusion UI - convenient send via Enter ( #8160 )
2024-12-11 19:05:24 +01:00
qazal
047a6dabc3
prereq for scheduler contiguous_child [pr] ( #8163 )
...
* the whole context is fine here [pr]
* fix that
2024-12-12 02:02:22 +08:00
ignaciosica
3a8e8ac6c2
remove dead code ( #8161 )
2024-12-11 12:07:19 -05:00
George Hotz
8f4299fcc8
hotfix: suppress shutdown errors in CLProgram
2024-12-11 08:08:32 -08:00
Ahmed Harmouche
a73e3677d0
Test linearizer on webgpu ( #8159 )
...
* Test linearizer on wgpu
* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
qazal
b894657aa7
assert the same things without mutating or accessing internal ops state [pr] ( #8157 )
...
* don't mutate internal state in test_lazybuffer
* fix test_schedule internals
* save time
* third si
* fine sometimes buffer_view isn't there
2024-12-11 22:01:27 +08:00
qazal
63de8f2208
late scheduler context builder [pr] ( #8155 )
2024-12-11 19:59:39 +08:00
chenyu
d462f8ace0
use HALF in cifar wino benchmarks ( #8153 )
...
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
George Hotz
c8e7707a7e
hotfix: disable flaky move tensor test
2024-12-10 17:11:21 -08:00
chenyu
155f7df599
lower test_gemm_4096 expectation on green ( #8152 )
...
getting 119 sometimes, so lowered to 115
2024-12-10 18:05:12 -05:00
chenyu
c4be1529cf
update test for Tensor.softplus ( #8150 )
...
test beta and extreme inputs.
to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api
2024-12-10 17:48:02 -05:00
Ahmed Harmouche
a8cfdc70ed
Run more webgpu tests ( #8142 )
2024-12-10 23:20:04 +01:00
Ahmed Harmouche
ed7318a3f5
Fix puppeteer install ( #8148 )
...
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
George Hotz
a1b3724ff8
prepickle process replay [pr] ( #8147 )
2024-12-10 11:46:36 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] ( #8146 )
...
* changes from delete lazy [pr]
* test tweak
2024-12-10 11:06:17 -08:00
chenyu
286fec115e
fix Tensor.minimum for int ( #8145 )
...
use invert instead of just neg. consolidate min, argmin, and minimum
also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
Ahmed Harmouche
71dd222f66
Fix setitem on wgpu ( #8144 )
2024-12-10 19:34:25 +01:00
qazal
b69fea6ae5
process replay without global list [pr] ( #8143 )
2024-12-11 02:20:09 +08:00
qazal
08405279f9
pre merge_views+ops_folding refactor [pr] ( #8140 )
...
* simple start
* valid early
* more dumb things removed
* don't ever use base
* cleaner
2024-12-11 00:55:00 +08:00
qazal
56c84cee29
derive COPY nbytes late in realize [pr] ( #8137 )
...
* derive COPY arg later in realize [pr]
* can assume no implicit casts or movement ops here
2024-12-10 22:04:07 +08:00
qazal
2d26b011ac
allow VIEW on BUFFER [pr] ( #8136 )
...
* allow VIEW of BUFFER [pr]
* base it later
* better diff
* base shouldn't exist after anywhere merge_views
2024-12-10 21:29:38 +08:00
qazal
3a2658efbd
small changes to refine the delete_lazy diff ( #8134 )
...
* _view -> view
* const_arg things
2024-12-10 18:46:10 +08:00
qazal
6d33da09c9
split scalar getitem tests into correctness and optimization [pr] ( #8133 )
2024-12-10 18:18:46 +08:00
qazal
7436ebef2f
spend lines on const_arg for tensor and scheduler [pr] ( #8132 )
...
* spend lines on const_arg for tensor and scheduler [pr]
* simple test_const_arg
* base on lazy
2024-12-10 18:07:35 +08:00
chenyu
917deb88a4
make //0 return 0 in python_alu ( #8131 )
...
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
chenyu
358287959b
fix pow of int to negative const int ( #8129 )
...
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0
failed test case for int pow ( #8128 )
...
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9
scheduler ops_folding from delete_lazy ( #8124 )
...
* scheduler diff from delete_lazy
* test_std_mean
* late fold copy of CONST
* clang const is fine
2024-12-10 00:36:01 +08:00
George Hotz
87c360c4b5
hotfix: add --size 8B to llama3
2024-12-09 07:53:20 -08:00
George Hotz
a773c5a571
hotfix: default llama3 is 1B with download_model
2024-12-09 07:23:35 -08:00
Ahmed Harmouche
c6277fce09
Remove f16 decompression lib from SD compile.py ( #8121 )
...
* Remove f16-to-f32-gpu lib, use tinygrad exported decompression
* No need to create new instance
2024-12-09 14:09:00 +01:00
qazal
22d99f1421
test_pickle_realized_tensor actually tests pickle [pr] ( #8119 )
...
* test_pickle_realized_tensor actually tests pickle [pr]
* clang
2024-12-09 17:26:19 +08:00
chenyu
ccf54c2375
fix argmax/min on int32 min ( #8118 )
2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4
fix bitwise_not for signed int ( #8117 )
...
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
ttomsa
e22d7b6fb0
fix var vmax inside special ( #8116 )
2024-12-09 01:16:08 -05:00
qazal
0033012096
init noop changes from delete_lazy [pr] ( #8115 )
2024-12-09 01:42:05 +08:00
qazal
5dd61035f7
revert VALID early folding for now ( #8114 )
...
This reverts commit 4074f52317 .
2024-12-09 00:34:24 +08:00
qazal
69e48da961
set NOOPT in test_avg_pool3d_failure ( #8112 )
...
* set NOOPT=0 in test_avg_pool3d_failure
* noopt should still pass
2024-12-08 10:48:29 -05:00
nimlgen
3a7d64b96c
hcq remove update from args state ( #8104 )
...
* hcq remove update from args state
fix amd
ugh
qcom?
qcom ops
ops
qcom fix
qcom texture info
fx
qcom fix
qcom
qcom, sry
minor
works
* remove old code
* unrelated+sint
* qcom
* typing
* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd
hcq buffer is a class ( #8106 )
...
* hcq buffer is a class
* qcom
* no from_mv in qcom
* remove qcombuffer
* useless cast
* mypy
* qcom fix
* _md -> meta
2024-12-08 13:29:43 +03:00
chenyu
b9c977f1c8
clean up bounds in Tensor.shard ( #8107 )
2024-12-07 17:19:43 -05:00
geohotstan
f8294b3bda
add avg pool 3d failure test ( #8105 )
...
* add test
* try simplify test case
* add TODO comment
2024-12-07 16:34:38 -05:00
qazal
6be388be86
failing test for const folding breaking indexing [pr] ( #8103 )
2024-12-07 19:55:02 +08:00
nimlgen
8b1fa9cb7d
nv hcq queue touchups ( #8102 )
2024-12-07 14:09:38 +03:00
qazal
4074f52317
VALID early folding ( #8100 )
...
* fold valid
* :)
* fix test_verify_ast
* keep symbolic working
2024-12-07 18:37:47 +08:00
qazal
07b6d5cf63
assign early folding ( #8093 )
...
* assign early folding [pr]
* move to to_si
* -
* fix generate_dataset
* diff too big
* no recreation, no diff
* gzip
* new sops from tiny10
* final try
2024-12-07 17:02:55 +08:00
George Hotz
00ac0db9d4
np tensors have the memory from numpy in compile3 [pr] ( #8098 )
2024-12-07 14:01:51 +08:00