Commit Graph

7181 Commits

Author SHA1 Message Date
chenyu
0e57152dbb clean up test_uop_symbolic [pr] (#8165)
removed old `Node` references
2024-12-11 14:13:19 -05:00
chenyu
5eadae204b test multi device rand with manual_seed (#8164) 2024-12-11 13:11:31 -05:00
Maxim Zakharov
e53a5bf0c3 StableDdiffusion UI - convenient send via Enter (#8160) 2024-12-11 19:05:24 +01:00
qazal
047a6dabc3 prereq for scheduler contiguous_child [pr] (#8163)
* the whole context is fine here [pr]

* fix that
2024-12-12 02:02:22 +08:00
ignaciosica
3a8e8ac6c2 remove dead code (#8161) 2024-12-11 12:07:19 -05:00
George Hotz
8f4299fcc8 hotfix: suppress shutdown errors in CLProgram 2024-12-11 08:08:32 -08:00
Ahmed Harmouche
a73e3677d0 Test linearizer on webgpu (#8159)
* Test linearizer on wgpu

* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
qazal
b894657aa7 assert the same things without mutating or accessing internal ops state [pr] (#8157)
* don't mutate internal state in test_lazybuffer

* fix test_schedule internals

* save time

* third si

* fine sometimes buffer_view isn't there
2024-12-11 22:01:27 +08:00
qazal
63de8f2208 late scheduler context builder [pr] (#8155) 2024-12-11 19:59:39 +08:00
chenyu
d462f8ace0 use HALF in cifar wino benchmarks (#8153)
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
George Hotz
c8e7707a7e hotfix: disable flaky move tensor test 2024-12-10 17:11:21 -08:00
chenyu
155f7df599 lower test_gemm_4096 expectation on green (#8152)
getting 119 sometimes, so lowered to 115
2024-12-10 18:05:12 -05:00
chenyu
c4be1529cf update test for Tensor.softplus (#8150)
test beta and extreme inputs.
to pass big input, it needs to support `threshold`, which needs fix on backward that we punt until new gradient api
2024-12-10 17:48:02 -05:00
Ahmed Harmouche
a8cfdc70ed Run more webgpu tests (#8142) 2024-12-10 23:20:04 +01:00
Ahmed Harmouche
ed7318a3f5 Fix puppeteer install (#8148)
Clean npm cache before puppeteer install
2024-12-10 23:06:33 +01:00
George Hotz
a1b3724ff8 prepickle process replay [pr] (#8147) 2024-12-10 11:46:36 -08:00
George Hotz
aa3b094334 changes from delete lazy [pr] (#8146)
* changes from delete lazy [pr]

* test tweak
2024-12-10 11:06:17 -08:00
chenyu
286fec115e fix Tensor.minimum for int (#8145)
use invert instead of just neg. consolidate min, argmin, and minimum

also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
Ahmed Harmouche
71dd222f66 Fix setitem on wgpu (#8144) 2024-12-10 19:34:25 +01:00
qazal
b69fea6ae5 process replay without global list [pr] (#8143) 2024-12-11 02:20:09 +08:00
qazal
08405279f9 pre merge_views+ops_folding refactor [pr] (#8140)
* simple start

* valid early

* more dumb things removed

* don't ever use base

* cleaner
2024-12-11 00:55:00 +08:00
qazal
56c84cee29 derive COPY nbytes late in realize [pr] (#8137)
* derive COPY arg later in realize [pr]

* can assume no implicit casts or movement ops here
2024-12-10 22:04:07 +08:00
qazal
2d26b011ac allow VIEW on BUFFER [pr] (#8136)
* allow VIEW of BUFFER [pr]

* base it later

* better diff

* base shouldn't exist after anywhere merge_views
2024-12-10 21:29:38 +08:00
qazal
3a2658efbd small changes to refine the delete_lazy diff (#8134)
* _view -> view

* const_arg things
2024-12-10 18:46:10 +08:00
qazal
6d33da09c9 split scalar getitem tests into correctness and optimization [pr] (#8133) 2024-12-10 18:18:46 +08:00
qazal
7436ebef2f spend lines on const_arg for tensor and scheduler [pr] (#8132)
* spend lines on const_arg for tensor and scheduler [pr]

* simple test_const_arg

* base on lazy
2024-12-10 18:07:35 +08:00
chenyu
917deb88a4 make //0 return 0 in python_alu (#8131)
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
George Hotz
f83d715f41 move checks into compile3, delete compile2 [pr] (#8127)
* move checks into compile3 [pr]

* test_vs_onnx

* test v torch works

* float16 won't compile on compile3

* actually delete compile2
2024-12-09 14:21:42 -08:00
chenyu
358287959b fix pow of int to negative const int (#8129)
it should return in int
2024-12-09 17:20:18 -05:00
chenyu
12f7d284e0 failed test case for int pow (#8128)
also updated test_ops so that non-float compares with `assert_equal`. removed `test_multinomial` which is tested better in test_randomness
2024-12-09 16:15:09 -05:00
qazal
80de06c8b9 scheduler ops_folding from delete_lazy (#8124)
* scheduler diff from delete_lazy

* test_std_mean

* late fold copy of CONST

* clang const is fine
2024-12-10 00:36:01 +08:00
George Hotz
87c360c4b5 hotfix: add --size 8B to llama3 2024-12-09 07:53:20 -08:00
George Hotz
a773c5a571 hotfix: default llama3 is 1B with download_model 2024-12-09 07:23:35 -08:00
Ahmed Harmouche
c6277fce09 Remove f16 decompression lib from SD compile.py (#8121)
* Remove f16-to-f32-gpu lib, use tinygrad exported decompression

* No need to create new instance
2024-12-09 14:09:00 +01:00
qazal
22d99f1421 test_pickle_realized_tensor actually tests pickle [pr] (#8119)
* test_pickle_realized_tensor actually tests pickle [pr]

* clang
2024-12-09 17:26:19 +08:00
chenyu
ccf54c2375 fix argmax/min on int32 min (#8118) 2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4 fix bitwise_not for signed int (#8117)
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
ttomsa
e22d7b6fb0 fix var vmax inside special (#8116) 2024-12-09 01:16:08 -05:00
qazal
0033012096 init noop changes from delete_lazy [pr] (#8115) 2024-12-09 01:42:05 +08:00
qazal
5dd61035f7 revert VALID early folding for now (#8114)
This reverts commit 4074f52317.
2024-12-09 00:34:24 +08:00
qazal
69e48da961 set NOOPT in test_avg_pool3d_failure (#8112)
* set NOOPT=0 in test_avg_pool3d_failure

* noopt should still pass
2024-12-08 10:48:29 -05:00
nimlgen
3a7d64b96c hcq remove update from args state (#8104)
* hcq remove update from args state

fix amd

ugh

qcom?

qcom ops

ops

qcom fix

qcom texture info

fx

qcom fix

qcom

qcom, sry

minor

works

* remove old code

* unrelated+sint

* qcom

* typing

* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd hcq buffer is a class (#8106)
* hcq buffer is a class

* qcom

* no from_mv in qcom

* remove qcombuffer

* useless cast

* mypy

* qcom fix

* _md -> meta
2024-12-08 13:29:43 +03:00
chenyu
b9c977f1c8 clean up bounds in Tensor.shard (#8107) 2024-12-07 17:19:43 -05:00
geohotstan
f8294b3bda add avg pool 3d failure test (#8105)
* add test

* try simplify test case

* add TODO comment
2024-12-07 16:34:38 -05:00
qazal
6be388be86 failing test for const folding breaking indexing [pr] (#8103) 2024-12-07 19:55:02 +08:00
nimlgen
8b1fa9cb7d nv hcq queue touchups (#8102) 2024-12-07 14:09:38 +03:00
qazal
4074f52317 VALID early folding (#8100)
* fold valid

* :)

* fix test_verify_ast

* keep symbolic working
2024-12-07 18:37:47 +08:00
qazal
07b6d5cf63 assign early folding (#8093)
* assign early folding [pr]

* move to to_si

* -

* fix generate_dataset

* diff too big

* no recreation, no diff

* gzip

* new sops from tiny10

* final try
2024-12-07 17:02:55 +08:00
George Hotz
00ac0db9d4 np tensors have the memory from numpy in compile3 [pr] (#8098) 2024-12-07 14:01:51 +08:00