Commit Graph

364 Commits

Author SHA1 Message Date
chenyu
7c8fe0fe47 skip interpolate tests for PYTHON=1 (#5664) 2024-07-23 18:47:15 -04:00
George Hotz
e3f00ac77d Fix cuda tc emu test (#5663)
* fix acc folding for NV tensor cores

* fix correctness of reduce_before_expand

* fix test emulated CUDA tensor cores

* test_gemm_fp16 on some devices
2024-07-23 15:04:25 -07:00
qazal
fdfc0015a7 [run_process_replay] for opencl/openpilot (#5009)
* lil reset script

* find the prg

* use lower_schedule_item

* add process replay back

* cleanups
2024-07-18 19:42:33 +03:00
George Hotz
d3b098299d add failing regression test for image (#5540)
* add failing regression test for image

* tg type

* simpler test

* don't realize image to image casts caused issue

* simple pad
2024-07-17 17:27:18 -07:00
Alessandro Benetti
13e200b437 add strict mkdocs check (#5497) 2024-07-15 14:21:37 -07:00
qazal
40ec9410f9 simpler process replay (#5452)
* remove check_process_replay

* that can go to the top

* add assert back

* [run_process_replay]

* checkout code [run_process_replay]

* temp [run_process_replay]

* revert temp [run_process_replay]

* ahh this is why [run_process_replay]

* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
George Hotz
955e1179fb move compile tests and merge (#5451)
* move compile tests and merge

* revert enet move, bump download cache

* oh, try setting clang
2024-07-13 08:04:46 -07:00
chenyu
9a187e6102 fix handcode_opt script (#5435)
* fix handcode_opt script

* run in ci

* real run in ci

* HALF=0
2024-07-12 20:52:28 -04:00
George Hotz
b055ece550 hotfix: bump to cache gpuocelot 2024-07-12 13:54:14 -07:00
chenyu
b17e4adb3a add -c advice.detachedHead=false to process replay git checkout (#5419)
remove the noisy `Note: switching to 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
Roelof van Dijk
6ec7dbc287 ci: parallelize uops tests (#5405) 2024-07-12 11:22:41 +03:00
qazal
b91a0ccdc3 make [run_process_replay] [no_assert] the default (#5390) 2024-07-11 22:36:59 +03:00
qazal
004366b193 context aware process replay [run_process_replay] (#5378)
* test tc as ctx var

* remove from opts

* process replay

* pop variable

* B -> Variable

* fix re-assign

* pop temp vars

* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
chenyu
2396ab9b33 more transcend cleanup [run_process_replay] (#5369)
fix test name, less # noqa: E501 and removed the cast
2024-07-10 23:05:03 -04:00
chenyu
64986f949c more transcend math tests in ci (#5368)
* more transcend math tests in ci

test large input to trig functions that hit different reduction algo, and test TRANSCENDENTAL=2 for all backend

* no CUDACPU

* try that
2024-07-10 21:19:09 -04:00
Ian Paul
d5a68ae6b3 Simple abstractions3.py fix (#5343)
* abstractions3.py fix

* Add abstractions3.py to CI tests
2024-07-09 13:48:42 +03:00
chenyu
631bc974a0 raise line count limit to 8500 (#5331) 2024-07-08 14:00:28 -04:00
SnakeOnex
8c03816ae9 fix README example (#5284)
* fixed README example

* README test

* changed py -> python markdown code flags in REAME
2024-07-04 11:15:07 -04:00
nimlgen
57e89645cd hcq spec test (#5226)
* start hcq spec test

* more test

* fixes

* run on amd as well

* test amdgpu exec

* fix amd

* amd mockgpu support sdma timestamp
2024-07-01 17:36:37 +03:00
nimlgen
dd7eef7d71 libc defs to autogen (#5217)
* libc defs to autogen

* amd import libc

* linter

* better a bit

* remove comment, check this

* not hardcoded path
2024-06-29 14:37:33 +03:00
nimlgen
b4c49ae3fa remove cudacpu in favour of mockgpu (#5225)
* remove cudacpu in favour of mockgpu

* remove unused import

* not used as well
2024-06-29 11:05:16 +03:00
qazal
3af17849bf safely parse quoted titles [run_process_replay] (#5183) 2024-06-27 16:39:48 +03:00
qazal
6ca7b13ed1 limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
qazal
97f1347dd9 fix check_process_replay for special characters (#5072)
* 'test' [run_process_replay] [no_assert]

* test with ( ) { } '' " "

* remove the log [run_process_replay] '' () { } '{

* helpful echos [run_process_replay] [no_assert] () ''

* test [run_process_replay] [no_assert]

* test2 [run_process_replay] [no_assert]

* test3 [run_process_replay] [no_assert]

* it's also correct this way [run_process_replay] [no_assert]

* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
qazal
a6a5dba637 Revert "UPat for has_valid in load/store (#5052)" (#5056)
* manually insert in the Linearizer

* fix process replay
2024-06-19 20:53:36 +03:00
qazal
ee01e464e3 use process replay as a diff creator (#4903)
* add no_assert option [run_process_replay] [no_assert]

* test [run_process_replay] [no_assert]

* [run_process_replay]

* back to normal [run_process_replay]

* remove the log
2024-06-19 18:17:31 +03:00
chenyu
dc942bf1f6 jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
chenyu
acaf9a490d RECIP(-0.0) should be -inf (#5024)
* RECIP(-0.0) should be -inf

added test_dtype_alu for PYTHON backend

* catcht that

* fix those two
2024-06-17 22:26:58 -04:00
uuuvn
92f49efd06 Trigger process replay from pull request title [run_process_replay] (#4980)
* Trigger process replay from pull request title

* idk how this thing works btw

* test if it will work

* try 2

* Revert "idk how this thing works btw"

This reverts commit 580da51b07.

* Revert "try 2"

This reverts commit 7ff1e86d5d.

* test if it works

* meh

* Reapply "idk how this thing works btw"

This reverts commit dd33ad7c14.

* revert
2024-06-15 16:21:00 +03:00
wozeparrot
62dc36d371 autogen _try_dlopen (#4949) 2024-06-14 12:12:18 -07:00
chenyu
f902af4f0b increase metal ci test timeout to 20 minutes (#4920)
make it less annoying for now
2024-06-11 18:45:51 -04:00
qazal
7f3d9e6d94 revert hsa autogen removal (#4914)
* Revert "only install comgr in AMD CI (#4909)"

This reverts commit 7f03420d05.

* rocm-llvm only removal
2024-06-11 12:55:45 -04:00
qazal
7f03420d05 only install comgr in AMD CI (#4909)
* test

* delete hsa autogen
2024-06-11 06:19:33 -04:00
qazal
8b5bcf309a process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
nimlgen
654a8b9ef7 retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
qazal
0db9674dea skip process replay on master (#4808) 2024-06-03 12:29:28 +03:00
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
nimlgen
bd2e7c8b31 amd registers from file (#4778)
* amd registers from file

* remove commentes

* linetr

* no off
2024-05-31 18:48:57 +03:00
Szymon Ożóg
a4de81e9a6 Update ocelot version (#4715) 2024-05-24 14:32:53 -04:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564 disable cuda test in ci (#4630)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
chenyu
8a0d1ca7bb CI test timeout 20 min -> 10 min (#4645)
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
George Hotz
b74cc1d01a uops cleanup (#4634)
* def add cleanup

* minor speedup

* add back ptx speed

* a little faster

* merge that

* only linearize once for ptx

* two graph rewrites for ptx, bug?
2024-05-17 20:02:38 -07:00
nimlgen
eb9689336e nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
George Hotz
5ba611787d move image into tensor.py. delete features (#4603)
* move image into tensor.py

* change setup.py

* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00