andresgit
00523d5656
New fix accessing elements created by padding ( #2529 )
...
* pad slice test cases, many failing
* fix failing test cases
check mask if we are outside the base buffer
also create a multi-view if in that case we reshape to an empty shape
* real_offset calculation more readable
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2023-12-01 19:08:10 -05:00
chenyu
7d26452305
call ruff with --preview ( #2522 )
...
some checks are ignored without --preview
2023-11-30 13:59:00 -05:00
chenyu
5db0cdfbd3
support list of ints (or other Tensorable) in tensor indices ( #2520 )
...
* support list of ints (or other Tensorable) in tensor indices
* enable some index test cases
2023-11-30 12:46:33 -05:00
Liam
cf0c9096a9
Removing METAL Skips as CI works ( #2488 )
...
* Test metal CI
* remove metal and CI restrictions
* enable dtype tests for metal ci
2023-11-28 19:46:59 -08:00
Christopher Mauri Milan
7f01dd04f0
Apply ruff linting rules to tests ( #2473 )
...
* everything except F821
* enable F821 with noqa
* dumb fix
* fix remaining imports and (former) lambdas
* replace _ with noqa to avoid gc
2023-11-27 21:24:06 -08:00
George Hotz
9e07824542
move device to device.py ( #2466 )
...
* move device to device.py
* pylint test --disable R,C,W,E --enable E0611
* fix tests
2023-11-27 11:34:37 -08:00
George Hotz
8ff2e13550
From teeny ( #2426 )
...
* changes from teenygrad work
* support not supporting ImageDType/PtrDType
* fixups from teeny
2023-11-24 12:50:56 -08:00
George Hotz
8f89e21fca
torch and numpy don't share ops anymore ( #2412 )
...
* torch and numpy don't share ops anymore
* that should be filtered out elsewhere
* still const
* graph + enet example cleanup
* hmm, we do still need it because of symbolic
2023-11-23 16:58:10 -08:00
chenyu
d2c0035c73
add back as_strided, move rebuilt mops to extra ( #2344 )
...
* add back as_strided, move rebuilt mops to extra
* negative stride for ops_cpu
* Revert "negative stride for ops_cpu"
This reverts commit a13b6815ac .
* skip that
* style
2023-11-17 14:34:30 -05:00
George Hotz
1d5501594e
force rebuild of ocelot ( #2334 )
...
* force rebuild of ocelot
* SzymonOzog gpuocelot
* delete that
* downgrade that
* non parallel
* force rebuild
* use llvm
* nauto
* less mem maybe
* print test
* helper_test_exception skip CUDACPU
* helper_test_exception
* shippable
2023-11-16 20:44:14 -08:00
George Hotz
3baaf298d6
two stage cumsum in tensor.py ( #2331 )
...
* two stage cumsum in tensor.py
* 2 more kernels for llama cumsum
* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
27f4c26312
fix getitem slice when end < start ( #2329 )
2023-11-16 11:20:27 -05:00
chenyu
f1f863c953
allow 0-dim array to broadcast into zero shape tensor ( #2315 )
...
* allow 0-dim array to broadcast into zero shape tensor
* not in
2023-11-15 13:12:21 -05:00
chenyu
123a0b86b2
support zero in shape ( #2303 )
...
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?
2023-11-15 11:57:48 -05:00
chenyu
175cdbe815
fix pad None will value ( #2308 )
2023-11-14 23:57:05 -05:00
George Hotz
78623ba204
two simple tests
2023-11-10 16:16:06 -08:00
George Hotz
85d26ddc36
uops loop removal ( #2262 )
...
* remove the loop
* cleanups
* tests failing still
* global_loop_ctx wasn't needed
* replace_op is cleaner
* minor opt
* cast opt was wrong
* uop_num
* uop num was dumb
* tuplize_uops
* torch tests
* fix test_uops
2023-11-10 15:24:47 -08:00
George Hotz
38b7f5a7fd
less phi, proper phi ( #2241 )
...
* less phi, proper phi
* disable flaky whisper test
2023-11-08 16:13:43 -08:00
chenyu
719a97b337
fix IMAGE=2 failed with NOOPT=1 ( #2209 )
...
* IMAGE=2 failed with NOOPT=1
* fix it
2023-11-05 13:16:37 -08:00
chenyu
f582ec56d5
Replace (getenv("CI", "") != "") with helpers.CI ( #2213 )
2023-11-03 15:20:44 -07:00
George Hotz
b245f1307e
add exp2 ( #2192 )
2023-10-31 17:48:42 -07:00
George Hotz
87b714b8cb
split test_conv2d
2023-10-18 14:00:50 -07:00
George Hotz
15da96f393
print test durations and add speed ( #2107 )
...
* print test durations
* decrease sizes to increase speed
* faster
* GPU/CLANG onnx in seperate runner
* test split, move ONNX CPU CI
* simpler tests
* simpler uops test
* faster
* less cuda apt
* running ninja install
* apt install
* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
c5edb3c374
train value net, improve API, add BCE ( #2047 )
...
* api cleanups, BCE losses
* valuenet
* fixup examples
* learning okay
* add valuenet runner
* net improvements
* net improvements
* 40% win rate
2023-10-12 07:56:38 -07:00
geohotstan
8d6cecb25c
Torch eq fix ( #1562 )
...
* init
* Revert "init"
This reverts commit 682bf2073a .
* kids dont do drugs
* one way to fix
* resolve merge conflict
* no more or
* clean up
2023-10-11 12:57:11 -07:00
George Hotz
adab724caa
schedule2, keep the tests working with small changes ( #1932 )
...
* lazy cleanups
* ast functions take in LazyOps
* op instead of self.op
* _base for mops
* fix contiguous
* start schedule
* test_schedule
* fix openpilot
* more tests
* bugfix and test skip
* work
* make sure things get freed
* fix zerosized tensors
* fix failing test
* fix ceil and friends
* fix openpilot
* disable training
* disable test collectives
2023-09-28 09:14:43 -07:00
geohotstan
e36148b1ce
Make __getitem__ TINYer ( #1661 )
2023-09-02 23:01:01 -04:00
George Hotz
cd844ec4b2
remove Token class ( #1723 )
...
* no fusion
* no float4 grouping
* mulacc fusion is fine. remove uop_alu
* fully remove get_grouped_maybe_float4
* removed that test
* that's not float4 anymore
* disable failing arm64
* metal ops pass tokenless
* fix wmma
* update test_uops with new style
* fix gep
* fix float4 store
* fix float4 store more
* cuda tests pass
* disable broadcast pow
* fix ptx
* reenable arm64
* bring cse back
* don't cache the acc
* fix ptx bug
2023-09-01 12:53:07 -07:00
George Hotz
458eb89463
minor changes from prerender ( #1734 )
2023-09-01 10:04:47 -07:00
George Hotz
e3a062ad17
real matvec test
2023-08-31 17:27:25 -07:00
George Hotz
a6d842af7a
move device to ops ( #1646 )
...
* move device to ops
* mlops types
* 2 lines
2023-08-23 08:30:17 -07:00
nimlgen
a65ae1198b
do replace div->mul for non-floats ( #1644 )
2023-08-23 07:34:31 -07:00
George Hotz
db8344ab83
add noalias to llvm ( #1622 )
2023-08-22 09:26:01 -07:00
George Hotz
c64c47a6ae
test arange simple
2023-08-21 20:16:17 -07:00
Umut Zengin
35bf21276f
Argmax/Argmin Feature ( #1576 )
...
* implemented argmax and argmin
* lint
* lint
* match torch behaviour
* format
* removed flip
2023-08-20 18:46:46 -07:00
geohotstan
a293c18d34
Gather bugfix ( #1561 )
2023-08-16 19:53:14 -04:00
geohotstan
8763037f0e
Fancy indexing is fancy wow and gather thing ( #1399 )
2023-08-16 18:35:49 -04:00
nimlgen
b6937acb7e
fix casting behavior for interpreted buffers ( #1525 )
2023-08-13 19:21:37 -07:00
George Hotz
38fe84d92b
cleanup mlops ( #1521 )
...
* cleanup mlops
* that line belongs there
2023-08-10 19:53:28 -07:00
geohotstan
07b79f210f
llvmir support for bool <-> float casting ( #1492 )
2023-08-09 13:12:52 -04:00
Jacky Lee
ef5f648e2f
Tensor.scaled_dot_product_attention to match torch, used in LLaMA, and tested ( #1502 )
...
* Implement scaled_dot_product_attention and test
* Support attn_mask
* Support is_causal too
* Use in llama
* Don't forget to reshape
* Set requires_grad=False for causal
* Remove staticmethod
* Remove extra spaces
2023-08-08 23:27:13 -07:00
George Hotz
d24f936501
just cmplt ( #1493 )
...
* just cmplt
* fix maximum
* don't save, there's no backward
* ugh, no slot either
* eq is a scam
2023-08-08 13:58:10 -07:00
nimlgen
932dad1a2b
fix cast bool->float in llvmir ( #1480 )
...
Closes #1479
2023-08-07 21:30:51 -07:00
Diogo
d7d1011f1e
Add WEBGPU tests to CI ( #1463 )
...
* webgpu tests
* assert device is webgpu
* missed env set
* exclude failing ci tests
* ignore test file
* changed acc for adam test
2023-08-06 10:32:01 -07:00
Francesco Castelli
579f4615a0
Add assert for wrong matmul/dot shapes ( #1438 )
2023-08-04 18:16:56 -04:00
Umut Zengin
52db7d7435
inf, -inf support for pad ( #1436 )
2023-08-04 15:05:25 -04:00
Umut Zengin
8889821547
Const pad support to pad2d and slice ( #1392 )
...
* slice to pad2d migrate
* Gain line
* Mypy happy
* Mypy happy
* Revert
* whitespace
2023-08-02 08:58:52 -07:00
Diogo
ba5e3818a0
Limit dims based on max size ( #1390 )
...
* working
* whitespace
* changed defaults to None
* linter
* last linter error
2023-07-31 19:18:19 -07:00
Umut Zengin
0de5f20970
Re-open constant pad support to Tensor.pad ( #1388 )
...
* Added const padding support to .pad
* Linter
2023-07-31 17:08:57 -07:00
wozeparrot
32d1afa4b5
feat: correct case when base is 0 ( #1360 )
2023-07-27 13:53:38 -04:00