Ahmed Harmouche
ed7318a3f5
Fix puppeteer install ( #8148 )
...
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
George Hotz
a1b3724ff8
prepickle process replay [pr] ( #8147 )
2024-12-10 11:46:36 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] ( #8146 )
...
* changes from delete lazy [pr]
* test tweak
2024-12-10 11:06:17 -08:00
chenyu
286fec115e
fix Tensor.minimum for int ( #8145 )
...
use invert instead of just neg. consolidate min, argmin, and minimum
also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
Ahmed Harmouche
71dd222f66
Fix setitem on wgpu ( #8144 )
2024-12-10 19:34:25 +01:00
qazal
b69fea6ae5
process replay without global list [pr] ( #8143 )
2024-12-11 02:20:09 +08:00
qazal
08405279f9
pre merge_views+ops_folding refactor [pr] ( #8140 )
...
* simple start
* valid early
* more dumb things removed
* don't ever use base
* cleaner
2024-12-11 00:55:00 +08:00
qazal
56c84cee29
derive COPY nbytes late in realize [pr] ( #8137 )
...
* derive COPY arg later in realize [pr]
* can assume no implicit casts or movement ops here
2024-12-10 22:04:07 +08:00
qazal
2d26b011ac
allow VIEW on BUFFER [pr] ( #8136 )
...
* allow VIEW of BUFFER [pr]
* base it later
* better diff
* base shouldn't exist after anywhere merge_views
2024-12-10 21:29:38 +08:00
qazal
3a2658efbd
small changes to refine the delete_lazy diff ( #8134 )
...
* _view -> view
* const_arg things
2024-12-10 18:46:10 +08:00
qazal
6d33da09c9
split scalar getitem tests into correctness and optimization [pr] ( #8133 )
2024-12-10 18:18:46 +08:00
qazal
7436ebef2f
spend lines on const_arg for tensor and scheduler [pr] ( #8132 )
...
* spend lines on const_arg for tensor and scheduler [pr]
* simple test_const_arg
* base on lazy
2024-12-10 18:07:35 +08:00
chenyu
917deb88a4
make //0 return 0 in python_alu ( #8131 )
...
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
chenyu
358287959b
fix pow of int to negative const int ( #8129 )
...
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0
failed test case for int pow ( #8128 )
...
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9
scheduler ops_folding from delete_lazy ( #8124 )
...
* scheduler diff from delete_lazy
* test_std_mean
* late fold copy of CONST
* clang const is fine
2024-12-10 00:36:01 +08:00
George Hotz
87c360c4b5
hotfix: add --size 8B to llama3
2024-12-09 07:53:20 -08:00
George Hotz
a773c5a571
hotfix: default llama3 is 1B with download_model
2024-12-09 07:23:35 -08:00
Ahmed Harmouche
c6277fce09
Remove f16 decompression lib from SD compile.py ( #8121 )
...
* Remove f16-to-f32-gpu lib, use tinygrad exported decompression
* No need to create new instance
2024-12-09 14:09:00 +01:00
qazal
22d99f1421
test_pickle_realized_tensor actually tests pickle [pr] ( #8119 )
...
* test_pickle_realized_tensor actually tests pickle [pr]
* clang
2024-12-09 17:26:19 +08:00
chenyu
ccf54c2375
fix argmax/min on int32 min ( #8118 )
2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4
fix bitwise_not for signed int ( #8117 )
...
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
ttomsa
e22d7b6fb0
fix var vmax inside special ( #8116 )
2024-12-09 01:16:08 -05:00
qazal
0033012096
init noop changes from delete_lazy [pr] ( #8115 )
2024-12-09 01:42:05 +08:00
qazal
5dd61035f7
revert VALID early folding for now ( #8114 )
...
This reverts commit 4074f52317 .
2024-12-09 00:34:24 +08:00
qazal
69e48da961
set NOOPT in test_avg_pool3d_failure ( #8112 )
...
* set NOOPT=0 in test_avg_pool3d_failure
* noopt should still pass
2024-12-08 10:48:29 -05:00
nimlgen
3a7d64b96c
hcq remove update from args state ( #8104 )
...
* hcq remove update from args state
fix amd
ugh
qcom?
qcom ops
ops
qcom fix
qcom texture info
fx
qcom fix
qcom
qcom, sry
minor
works
* remove old code
* unrelated+sint
* qcom
* typing
* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd
hcq buffer is a class ( #8106 )
...
* hcq buffer is a class
* qcom
* no from_mv in qcom
* remove qcombuffer
* useless cast
* mypy
* qcom fix
* _md -> meta
2024-12-08 13:29:43 +03:00
chenyu
b9c977f1c8
clean up bounds in Tensor.shard ( #8107 )
2024-12-07 17:19:43 -05:00
geohotstan
f8294b3bda
add avg pool 3d failure test ( #8105 )
...
* add test
* try simplify test case
* add TODO comment
2024-12-07 16:34:38 -05:00
qazal
6be388be86
failing test for const folding breaking indexing [pr] ( #8103 )
2024-12-07 19:55:02 +08:00
nimlgen
8b1fa9cb7d
nv hcq queue touchups ( #8102 )
2024-12-07 14:09:38 +03:00
qazal
4074f52317
VALID early folding ( #8100 )
...
* fold valid
* :)
* fix test_verify_ast
* keep symbolic working
2024-12-07 18:37:47 +08:00
qazal
07b6d5cf63
assign early folding ( #8093 )
...
* assign early folding [pr]
* move to to_si
* -
* fix generate_dataset
* diff too big
* no recreation, no diff
* gzip
* new sops from tiny10
* final try
2024-12-07 17:02:55 +08:00
George Hotz
00ac0db9d4
np tensors have the memory from numpy in compile3 [pr] ( #8098 )
2024-12-07 14:01:51 +08:00
George Hotz
22feb3a2f1
move copy into the JIT for openpilot compile3 ( #7937 )
...
* move copy into the JIT, test fails
* ahh, prune was the issue
2024-12-07 13:26:26 +08:00
leopf
0ed731b5ea
torch_load with Tensors ( #8037 )
...
* torch_load with Tensors
* remove passthrough_reset + use accept_filename
* Revert "remove passthrough_reset"
* version note
* cleanup
2024-12-07 09:55:41 +08:00
chenyu
2d321646b8
default tensors to int32 in test_ops ( #8097 )
...
torch defaults to int64 but we care more about int32 anyway. remove skipped tests due to int64 not supported
2024-12-06 20:33:36 -05:00
chenyu
e9692de42b
don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py ( #8096 )
...
mostly for speed, this is just making sure the script runs
2024-12-06 17:22:17 -05:00
chenyu
564b3a3e1b
onnx Bitwise ops ( #8095 )
...
free stuff!
2024-12-06 16:58:09 -05:00
qazal
a97b8fa3c5
maskless const can lower without valid, p1 [pr] ( #8094 )
2024-12-06 23:21:19 +02:00
mesozoic-egg
aaf2379f97
remove ordered parents, seems like dead code [pr] ( #8092 )
...
* remove ordered parents, seems like dead code
* no need to dedup
2024-12-06 16:19:37 -05:00
nimlgen
e180a31c5e
tiny metal cleanup ( #8089 )
...
* tiny metal cleanup
* cast
* sry
2024-12-06 21:44:32 +03:00
chenyu
d000c08f04
fix return type of Tensor.pow ( #8091 )
...
int to power of int should return int etc, it hints that we would like to have Ops.POW
2024-12-06 13:38:29 -05:00
qazal
1ea4dc9565
big graph init conceptual cleanup [pr] ( #8090 )
...
* keep Ops.BUFFER naming consistent [pr]
* big graph init conceptual cleanup [pr]
* make everything pass through
* pylint doesn't complain now
2024-12-06 20:07:00 +02:00
geohotstan
5184410fc3
combine get inputs and type_parse function in onnx [fixed] ( #8081 )
...
* 1 is simpler than 2
* variable name
* change error wording
* shapes for sequence type must be homogeneous
* bug fix for model benchmark
* fix comments too
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-06 12:34:47 -05:00
nimlgen
d1282da7e8
hcq bump alloc ( #8078 )
...
* hcq bump alloc
* hm
* nv
* typo
2024-12-06 19:19:04 +03:00
qazal
df84dc6444
unrelated test fixups from delete_lazy [pr] ( #8088 )
...
* unrelated test fixups from delete_lazy [pr]
* fine if it's scheduled later
2024-12-06 17:31:02 +02:00
geohotstan
0b7c44677d
Fix uint8 cast underflow ( #6305 )
...
* hacky fix for cast
* only float to uint8
* limit to float -> uint8
* touchup alu cast test
* improve tests and support more float to unsigned casts
* del one repeated test
* del 1 more repeated test
* try removing expected failure test
* hmmm try 1 more
* skip tests for flakiness
* uint64 super flaky
* clean up
* grammar
* just match numpy
* why is CI numpy different from local numpy
* increase verbosity
* try
* try2
* try3
* try4
* yeah idk
* new direction
* try again
* just don't support uint32 and uint64
* done?
* oops
* comment
* documentation
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-06 10:25:03 -05:00