Commit Graph

3755 Commits

Author SHA1 Message Date
George Hotz
44a67bf783 constant folding (#3675)
* constant fold

* bool math

* fix ptx
2024-03-10 14:47:24 -07:00
George Hotz
25aede6fd9 truncate for exec_alu (#3674) 2024-03-10 14:19:04 -07:00
Francis Lata
957ae9b594 Fix Tensor's __repr__ for printing out grad (#3673)
* update check for Tensor's __repr__ with grad

* add test for repr with grad bugfix
2024-03-10 17:04:29 -04:00
George Hotz
0f16729023 RDNA3: restore launch bounds (#3672)
* bring launch bounds back

* works

* that second flag didn't do anything

* fix linter
2024-03-10 10:27:52 -07:00
chenyu
d7452c2a20 clean up llvmir builder (#3671)
```
_block -> block
builder._block.module -> builder.module
var_dtype -> dtype
```
2024-03-09 21:19:36 -05:00
George Hotz
1143c62519 tensor.py touchups (#3667)
* tensor.py touchups

* put back
2024-03-09 16:12:20 -08:00
George Hotz
69ca7f7bf9 changes for teenygrad (#3665)
* changes for teenygrad

* upd

* simpler test
2024-03-09 15:30:34 -08:00
Quentin Wach
89b8b5d549 Fix missing import. (#3666) 2024-03-09 14:55:23 -08:00
Maximilian Wolf
8ae85b2cf5 add inference_mode context manager with decorator support (#3621)
* add inference_mode context manager with decorator support

* change val to mode for train and inference_mode

* fix wrong rename
2024-03-09 08:38:26 -08:00
Obada Khalili
b5cbf1792a Fix Tensor.cumsum when axis of length 0 is selected (#3473)
* fix Tensor.cumsum when axis of length 0 is selected

* add cumsum regression test

* define padding left size in a seperate line
2024-03-09 08:26:41 -08:00
chenyu
915f98791c use custom KernelOptError in kernel opt (#3661)
be more specific about invalid kernel opt, used that in test_linearizer_failures.

make BEAM kernel search work even with assertion disabled.

`BEAM=2 python3 -O examples/llama.py  --temperature=0 --count=10 --prompt="Hello." --timing`
2024-03-08 15:36:16 -05:00
George Hotz
ac02e7347d ptx timing vs cuda timing (#3659) 2024-03-08 10:17:49 -08:00
uuuvn
daa4034e80 No more metal flakiness (#3643) 2024-03-08 08:54:44 -08:00
chenyu
e25879d50e don't get new var_val for the same ast in fuzz_linearizer (#3657)
fixed result comparison for kernels with variables
2024-03-08 09:49:24 -05:00
chenyu
1130c73844 add FUZZ_NTH to fuzz_linearizer (#3656)
* add FUZZ_NTH to fuzz_linearizer

also update tests in test_linearizer_failures to not just run on METAL

* update failures for HIP/HSA

* test_failure_21 LLVM PADTO
2024-03-08 09:16:49 -05:00
David Hou
9f66dcf718 PolynomialDecayWithWarmup + tests (#3649)
* working PolynomialDecayWithWarmup + tests.......

add lars_util.py, oops

* keep lars_util.py as intact as possible, simplify our interface

* whitespace

* clean up

* clean up

* asserts

* test polylr for full resnet training run

* add comment

* rename

* fix do_optim

* don't cast lr

* info

* calculate from train_files

* skip it
2024-03-07 18:53:36 -05:00
chenyu
57df8e8d82 update fuzz_linearizer (#3648)
included non-reduce kernel and kernel with variables. green msg when everything passed
it's possible that creating rawbufs failed due to memory error, included that in failure cases
2024-03-07 18:41:22 -05:00
chenyu
b282a45e39 fix direct store float4 with same vin (#3652)
In a kernel that stores expanded value, the vin of float4 can come from same source, and we only remove once in that case.
2024-03-07 18:11:50 -05:00
chenyu
a66ffec6d3 update kernel dataset to exclude the disktensor ones (#3651)
disk tensor load contains big offset and is not meant to be run by gpu.

repro steps
```
time ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
2024-03-07 17:35:19 -05:00
chenyu
fcf4a5ccf2 fix example that calls Tensor.__bool__ (#3650)
also removed `.cpu()` calls in mask_rcnn so `python3 examples/mlperf/model_spec.py` runs
2024-03-07 16:59:26 -05:00
George Hotz
6e50582e62 working to improve ptx (#3647)
* working to improve ptx

* fix compile fail
2024-03-07 12:39:31 -08:00
Zaffer
1853ec9a02 add tests for bfloat16 on HIP (#3638)
* Fix bug in login functionality

* Remove HSA backend test and add bfloat16 dtype tests that run in CI

* Skip tests on HIPCPU

* skip tests causing segfault on LLVM backend

* Exclude bfloat16 tests causing segfaults in LLVM backend

* move bf16 cast tests to only test on HIP
2024-03-07 10:45:36 -08:00
chenyu
0cef284aac fix typing FlopCounter.flops can be sint (#3646) 2024-03-07 12:49:17 -05:00
chenyu
906cc3a69b cleanup tests Device[Device.DEFAULT] is always Compiled (#3645) 2024-03-07 11:15:42 -05:00
qazal
bdd62c7fd8 make the bf16 include dynamic (#3642)
* dynamic prefix

* add common ones above

these are common dtypes

aesthetics

* regression test

fuzz it

test

* run in CI

* use .append

* faster
2024-03-07 10:31:35 -05:00
chenyu
4552248c84 fix Tensor.to preserves grad.data (#3636) 2024-03-06 21:44:49 -05:00
chenyu
d33311ebe0 remove parens of ALU if it has associative property (#3635)
need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos,
in which case cannot remove the parens of children
2024-03-06 21:12:11 -05:00
chenyu
fe6b6e38c1 remove parentheses of GEP if it's from SSA (#3634)
fixed some bracket nesting level exceeded maximum of 256 errors
2024-03-06 20:22:46 -05:00
David Hou
0afaf70d57 lars optimizer + tests (#3631)
* lars optimizer + tests

* fix skip list!

* use id to compare in skip list

* go back to using set

* Tensor(bool) * Tensor(bool) is and

* don't lint external/mlperf_resnet

* whitespace

* add external_test_optim to opencl tests

* give mlperf task a name

* mlperf under onnx

* remove track_gnorm

* contiguous instead of realize

* assert momentum and weight decay positive

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-06 18:11:01 -05:00
chenyu
b2e92d44fa skip METAL sin test in test_dtype_alu (#3633)
revert this part of #3629. this is flaky
2024-03-06 17:29:19 -05:00
chenyu
8f10bfa2ff ban __bool__ on Tensor (#3632)
* ban __bool__ on Tensor

avoid misuse

* test case

* fix tests

* fix more tests
2024-03-06 17:12:35 -05:00
George Hotz
81baf3eed3 bring ptx back (#3623)
* bring ptx back

* ptx back

* fix define var

* fix a few bugs

* bugfixes

* fixes

* fix llvm bug

* fix test bug
2024-03-06 13:34:21 -08:00
chenyu
c270d54c32 update test_dtype_alu for METAL (#3629) 2024-03-06 14:55:19 -05:00
qazal
abc5f3a6a0 hip bf16 hotfix (#3630)
* hip bf16

* remu dev mac

* Revert "remu dev mac"

This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd.

* skip disk tests in CI

* bring float8 back
2024-03-06 11:42:30 -08:00
chenyu
bc2a13a5f7 test case to show clang and python doing math in double (#3628) 2024-03-06 13:49:03 -05:00
George Hotz
568353fa84 hotfix: bump line count to 6500 2024-03-06 07:52:18 -08:00
Elias Wahl
a1507c7fd4 Fix Tensor.dropout() with multigpu (#3619)
* Tensor.rand with multilazybuffer

* remove recursive + test

* whitespace

* another whitespace. Sorry

* remove else

* Conconicalize multidevice tuple + Remove src
2024-03-05 18:26:21 -05:00
Jungwan Woo
e5ee6bb2bd fix outdated url in showcase doc (#3624) 2024-03-05 14:44:40 -08:00
George Hotz
8500265561 this mem fault still happening (#3620)
* this mem fault still happening

* smaller

* that print doesn't work

* overflows test

* hip doesn't uses_ptr_arithmetic

* only with locals

* test overflow new name

* it's not ptr arith

* simpler

* simple repro

* old compiler

* simpler

* put that back
2024-03-05 10:39:32 -08:00
chenyu
3c3f846c45 tinybox benchmark with HSA (#3603)
* tinybox benchmark with HSA

* torch cuda init can fail

* no TORCHCUDA

* print torch version

* LD_PRELOAD="/opt/rocm/lib/libhsa-runtime64.so"
2024-03-05 11:03:52 -05:00
George Hotz
f500be1313 out of bounds access caused by launch bounds (#3615)
* lin overflow

* remove launch bounds

* remove launch bounds infra

* oops, fix bufs type
2024-03-05 06:34:00 -08:00
qazal
eb83e2d3a0 decouple buffer mutability from cstyle (#3617)
* buffer mutability as an arg

* update test_uops
2024-03-05 06:20:59 -08:00
chenyu
3275260c98 Revert "test: add failing bfloat16 test case for metal backend (#3481)" (#3618)
This reverts commit 1e12a2ae80.
2024-03-05 09:08:42 -05:00
Skosh
1e12a2ae80 test: add failing bfloat16 test case for metal backend (#3481)
* test: add failing bfloat16 test case for metal backend

* test: move bfloat 16 test to dtypes test
2024-03-05 08:44:54 -05:00
chenyu
957e9800f1 llama + beam to mac benchmark, full cifar to nvidia benchmark (#3612)
would merge if it's also ~1 minute. btw why is gpt2 beam not slower in the first beam run?
2024-03-04 21:35:57 -05:00
chenyu
282bbd5acb check the input length into argfix (#3610)
* check the input length into argfix

it's possible to overlook setting keyword for kwargs and argfix silently truncates input

* add test
2024-03-04 19:50:17 -05:00
Elias Wahl
7db6dd725d multilazybuffer fix (#3609) 2024-03-04 17:36:23 -05:00
chenyu
c3b8d285aa cleanup uops (#3605)
using `is` to compare with enums, remove long lines and slightly more compact
2024-03-04 11:03:14 -05:00
qazal
94679322a3 simpler float4 direct store and locals support (#3592)
* swap vins instead

* delete the upcast

* leave it to remove_childless try 1

* Revert "leave it to remove_childless try 1"

This reverts commit bf25e935f8.

* try 2, simpler

* Revert "try 2, simpler"

This reverts commit d2472af711.

* add note
2024-03-04 06:28:28 -08:00
nimlgen
3db826e195 hsa in lin opts (#3602) 2024-03-04 06:17:32 -08:00