Commit Graph

10417 Commits

Author SHA1 Message Date
Ahmed Harmouche
ed7318a3f5 Fix puppeteer install (#8148)
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
George Hotz
a1b3724ff8 prepickle process replay [pr] (#8147) 2024-12-10 11:46:36 -08:00
George Hotz
aa3b094334 changes from delete lazy [pr] (#8146)
* changes from delete lazy [pr]

* test tweak
2024-12-10 11:06:17 -08:00
chenyu
286fec115e fix Tensor.minimum for int (#8145)
use invert instead of just neg. consolidate min, argmin, and minimum

also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
Ahmed Harmouche
71dd222f66 Fix setitem on wgpu (#8144) 2024-12-10 19:34:25 +01:00
qazal
b69fea6ae5 process replay without global list [pr] (#8143) 2024-12-11 02:20:09 +08:00
qazal
08405279f9 pre merge_views+ops_folding refactor [pr] (#8140)
* simple start

* valid early

* more dumb things removed

* don't ever use base

* cleaner
2024-12-11 00:55:00 +08:00
qazal
56c84cee29 derive COPY nbytes late in realize [pr] (#8137)
* derive COPY arg later in realize [pr]

* can assume no implicit casts or movement ops here
2024-12-10 22:04:07 +08:00
qazal
2d26b011ac allow VIEW on BUFFER [pr] (#8136)
* allow VIEW of BUFFER [pr]

* base it later

* better diff

* base shouldn't exist after anywhere merge_views
2024-12-10 21:29:38 +08:00
qazal
3a2658efbd small changes to refine the delete_lazy diff (#8134)
* _view -> view

* const_arg things
2024-12-10 18:46:10 +08:00
qazal
6d33da09c9 split scalar getitem tests into correctness and optimization [pr] (#8133) 2024-12-10 18:18:46 +08:00
qazal
7436ebef2f spend lines on const_arg for tensor and scheduler [pr] (#8132)
* spend lines on const_arg for tensor and scheduler [pr]

* simple test_const_arg

* base on lazy
2024-12-10 18:07:35 +08:00
chenyu
917deb88a4 make //0 return 0 in python_alu (#8131)
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
George Hotz
f83d715f41 move checks into compile3, delete compile2 [pr] (#8127)
* move checks into compile3 [pr]

* test_vs_onnx

* test v torch works

* float16 won't compile on compile3

* actually delete compile2
2024-12-09 14:21:42 -08:00
chenyu
358287959b fix pow of int to negative const int (#8129)
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0 failed test case for int pow (#8128)
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9 scheduler ops_folding from delete_lazy (#8124)
* scheduler diff from delete_lazy

* test_std_mean

* late fold copy of CONST

* clang const is fine
2024-12-10 00:36:01 +08:00
George Hotz
87c360c4b5 hotfix: add --size 8B to llama3 2024-12-09 07:53:20 -08:00
George Hotz
a773c5a571 hotfix: default llama3 is 1B with download_model 2024-12-09 07:23:35 -08:00
Ahmed Harmouche
c6277fce09 Remove f16 decompression lib from SD compile.py (#8121)
* Remove f16-to-f32-gpu lib, use tinygrad exported decompression

* No need to create new instance
2024-12-09 14:09:00 +01:00
qazal
22d99f1421 test_pickle_realized_tensor actually tests pickle [pr] (#8119)
* test_pickle_realized_tensor actually tests pickle [pr]

* clang
2024-12-09 17:26:19 +08:00
chenyu
ccf54c2375 fix argmax/min on int32 min (#8118) 2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4 fix bitwise_not for signed int (#8117)
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
ttomsa
e22d7b6fb0 fix var vmax inside special (#8116) 2024-12-09 01:16:08 -05:00
qazal
0033012096 init noop changes from delete_lazy [pr] (#8115) 2024-12-09 01:42:05 +08:00
qazal
5dd61035f7 revert VALID early folding for now (#8114)
This reverts commit 4074f52317.
2024-12-09 00:34:24 +08:00
qazal
69e48da961 set NOOPT in test_avg_pool3d_failure (#8112)
* set NOOPT=0 in test_avg_pool3d_failure

* noopt should still pass
2024-12-08 10:48:29 -05:00
nimlgen
3a7d64b96c hcq remove update from args state (#8104)
* hcq remove update from args state

fix amd

ugh

qcom?

qcom ops

ops

qcom fix

qcom texture info

fx

qcom fix

qcom

qcom, sry

minor

works

* remove old code

* unrelated+sint

* qcom

* typing

* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd hcq buffer is a class (#8106)
* hcq buffer is a class

* qcom

* no from_mv in qcom

* remove qcombuffer

* useless cast

* mypy

* qcom fix

* _md -> meta
2024-12-08 13:29:43 +03:00
chenyu
b9c977f1c8 clean up bounds in Tensor.shard (#8107) 2024-12-07 17:19:43 -05:00
geohotstan
f8294b3bda add avg pool 3d failure test (#8105)
* add test

* try simplify test case

* add TODO comment
2024-12-07 16:34:38 -05:00
qazal
6be388be86 failing test for const folding breaking indexing [pr] (#8103) 2024-12-07 19:55:02 +08:00
nimlgen
8b1fa9cb7d nv hcq queue touchups (#8102) 2024-12-07 14:09:38 +03:00
qazal
4074f52317 VALID early folding (#8100)
* fold valid

* :)

* fix test_verify_ast

* keep symbolic working
2024-12-07 18:37:47 +08:00
qazal
07b6d5cf63 assign early folding (#8093)
* assign early folding [pr]

* move to to_si

* -

* fix generate_dataset

* diff too big

* no recreation, no diff

* gzip

* new sops from tiny10

* final try
2024-12-07 17:02:55 +08:00
George Hotz
00ac0db9d4 np tensors have the memory from numpy in compile3 [pr] (#8098) 2024-12-07 14:01:51 +08:00
George Hotz
22feb3a2f1 move copy into the JIT for openpilot compile3 (#7937)
* move copy into the JIT, test fails

* ahh, prune was the issue
2024-12-07 13:26:26 +08:00
leopf
0ed731b5ea torch_load with Tensors (#8037)
* torch_load with Tensors

* remove passthrough_reset + use accept_filename

* Revert "remove passthrough_reset"

* version note

* cleanup
2024-12-07 09:55:41 +08:00
chenyu
2d321646b8 default tensors to int32 in test_ops (#8097)
torch defaults to int64 but we care more about int32 anyway. remove skipped tests due to int64 not supported
2024-12-06 20:33:36 -05:00
chenyu
e9692de42b don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096)
mostly for speed, this is just making sure the script runs
2024-12-06 17:22:17 -05:00
chenyu
564b3a3e1b onnx Bitwise ops (#8095)
free stuff!
2024-12-06 16:58:09 -05:00
qazal
a97b8fa3c5 maskless const can lower without valid, p1 [pr] (#8094) 2024-12-06 23:21:19 +02:00
mesozoic-egg
aaf2379f97 remove ordered parents, seems like dead code [pr] (#8092)
* remove ordered parents, seems like dead code

* no need to dedup
2024-12-06 16:19:37 -05:00
nimlgen
e180a31c5e tiny metal cleanup (#8089)
* tiny metal cleanup

* cast

* sry
2024-12-06 21:44:32 +03:00
chenyu
d000c08f04 fix return type of Tensor.pow (#8091)
int to power of int should return int etc, it hints that we would like to have Ops.POW
2024-12-06 13:38:29 -05:00
qazal
1ea4dc9565 big graph init conceptual cleanup [pr] (#8090)
* keep Ops.BUFFER naming consistent [pr]

* big graph init conceptual cleanup [pr]

* make everything pass through

* pylint doesn't complain now
2024-12-06 20:07:00 +02:00
geohotstan
5184410fc3 combine get inputs and type_parse function in onnx [fixed] (#8081)
* 1 is simpler than 2

* variable name

* change error wording

* shapes for sequence type must be homogeneous

* bug fix for model benchmark

* fix comments too

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 12:34:47 -05:00
nimlgen
d1282da7e8 hcq bump alloc (#8078)
* hcq bump alloc

* hm

* nv

* typo
2024-12-06 19:19:04 +03:00
qazal
df84dc6444 unrelated test fixups from delete_lazy [pr] (#8088)
* unrelated test fixups from delete_lazy [pr]

* fine if it's scheduled later
2024-12-06 17:31:02 +02:00
geohotstan
0b7c44677d Fix uint8 cast underflow (#6305)
* hacky fix for cast

* only float to uint8

* limit to float -> uint8

* touchup alu cast test

* improve tests and support more float to unsigned casts

* del one repeated test

* del 1 more repeated test

* try removing expected failure test

* hmmm try 1 more

* skip tests for flakiness

* uint64 super flaky

* clean up

* grammar

* just match numpy

* why is CI numpy different from local numpy

* increase verbosity

* try

* try2

* try3

* try4

* yeah idk

* new direction

* try again

* just don't support uint32 and uint64

* done?

* oops

* comment

* documentation

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-06 10:25:03 -05:00