Commit Graph

1021 Commits

Author SHA1 Message Date
nimlgen
6b08cb5e38 ptx runs on nv in benchmarks (#5224) 2024-06-29 11:06:44 +03:00
nimlgen
b4c49ae3fa remove cudacpu in favour of mockgpu (#5225)
* remove cudacpu in favour of mockgpu

* remove unused import

* not used as well
2024-06-29 11:05:16 +03:00
chenyu
7090eac8cb validate sdxl output and put it in benchmark (#5211)
* validate sdxl output and put it in benchmark

* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
d8dc43ad06 remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198)
this no longer helps
2024-06-27 15:20:34 -04:00
chenyu
83da8b3558 use NV instead of CUDA in benchmark (#5192)
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191)
ignoring beam cache but using compile cache should be fine, saved some benchmark time.

also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d benchmark use JITBEAM for llama and gpt2 (#5189) 2024-06-27 12:56:02 -04:00
qazal
3af17849bf safely parse quoted titles [run_process_replay] (#5183) 2024-06-27 16:39:48 +03:00
qazal
6ca7b13ed1 limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
qazal
97f1347dd9 fix check_process_replay for special characters (#5072)
* 'test' [run_process_replay] [no_assert]

* test with ( ) { } '' " "

* remove the log [run_process_replay] '' () { } '{

* helpful echos [run_process_replay] [no_assert] () ''

* test [run_process_replay] [no_assert]

* test2 [run_process_replay] [no_assert]

* test3 [run_process_replay] [no_assert]

* it's also correct this way [run_process_replay] [no_assert]

* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
qazal
a6a5dba637 Revert "UPat for has_valid in load/store (#5052)" (#5056)
* manually insert in the Linearizer

* fix process replay
2024-06-19 20:53:36 +03:00
qazal
ee01e464e3 use process replay as a diff creator (#4903)
* add no_assert option [run_process_replay] [no_assert]

* test [run_process_replay] [no_assert]

* [run_process_replay]

* back to normal [run_process_replay]

* remove the log
2024-06-19 18:17:31 +03:00
chenyu
dc942bf1f6 jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
chenyu
e9c6a36894 remove CACHELEVEL=0 in llama3 benchmark (#5025) 2024-06-17 22:43:16 -04:00
chenyu
acaf9a490d RECIP(-0.0) should be -inf (#5024)
* RECIP(-0.0) should be -inf

added test_dtype_alu for PYTHON backend

* catcht that

* fix those two
2024-06-17 22:26:58 -04:00
George Hotz
bee8fc29ee add GPT2 half/half+beam to AMD (#5000)
* add GPT2 half/half+beam to AMD

* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
44dfa37c70 use threefry in stable diffusion benchmark (#4988)
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9 more tinychat fixes (#4971) 2024-06-15 16:29:39 -07:00
qazal
ff8e9eefc3 hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981)
* use replay_codegen [run_process_replay]

* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06 Trigger process replay from pull request title [run_process_replay] (#4980)
* Trigger process replay from pull request title

* idk how this thing works btw

* test if it will work

* try 2

* Revert "idk how this thing works btw"

This reverts commit 580da51b07.

* Revert "try 2"

This reverts commit 7ff1e86d5d.

* test if it works

* meh

* Reapply "idk how this thing works btw"

This reverts commit dd33ad7c14.

* revert
2024-06-15 16:21:00 +03:00
wozeparrot
62dc36d371 autogen _try_dlopen (#4949) 2024-06-14 12:12:18 -07:00
chenyu
f902af4f0b increase metal ci test timeout to 20 minutes (#4920)
make it less annoying for now
2024-06-11 18:45:51 -04:00
qazal
7f3d9e6d94 revert hsa autogen removal (#4914)
* Revert "only install comgr in AMD CI (#4909)"

This reverts commit 7f03420d05.

* rocm-llvm only removal
2024-06-11 12:55:45 -04:00
qazal
7f03420d05 only install comgr in AMD CI (#4909)
* test

* delete hsa autogen
2024-06-11 06:19:33 -04:00
qazal
8b5bcf309a process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
George Hotz
f42183ba28 hotfix: relax cifar to 93.2 2024-06-09 13:09:21 +02:00
nimlgen
654a8b9ef7 retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
nimlgen
6327b50e51 amd in benchmarks (#4861)
* amd in benchmarks

* remove all hsa
2024-06-08 23:24:46 +03:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
qazal
0db9674dea skip process replay on master (#4808) 2024-06-03 12:29:28 +03:00
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
qazal
240d6b5bc0 process replay benchmarks (#4668) 2024-06-01 14:36:21 +03:00
nimlgen
bd2e7c8b31 amd registers from file (#4778)
* amd registers from file

* remove commentes

* linetr

* no off
2024-05-31 18:48:57 +03:00
Szymon Ożóg
a4de81e9a6 Update ocelot version (#4715) 2024-05-24 14:32:53 -04:00
chenyu
38bc38cdff fix llama example quantize (#4699)
* fix llama example quantize

import quantize layers from new example llama3

add to mac benchmark

* fix that

* save the files
2024-05-23 15:35:26 -04:00
chenyu
72560e30fe add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693)
* add CACHELEVEL=0 to tinybox green GEMM BEAM

* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
Yury Zhuravlev
af56f0e68a fix HSA/KFD load for system-wide installation (#4218)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2024-05-22 20:33:21 -07:00
nimlgen
12339f6564 disable cuda test in ci (#4630)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-22 23:23:32 -04:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
wozeparrot
00432496d7 feat: tinyboxgreen (#4366)
* feat: tinyboxgreen

* feat: tinyboxgreenv2

* fix symlink weights

* fix: remove llama 2 70b for now

* feat: naming

* fix: remove extra cifar steps

* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
chenyu
8a0d1ca7bb CI test timeout 20 min -> 10 min (#4645)
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
George Hotz
b74cc1d01a uops cleanup (#4634)
* def add cleanup

* minor speedup

* add back ptx speed

* a little faster

* merge that

* only linearize once for ptx

* two graph rewrites for ptx, bug?
2024-05-17 20:02:38 -07:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
chenyu
ca1df20fa9 benchmark name fix - resnet eval is on eval data (#4628) 2024-05-17 12:56:12 -04:00
chenyu
e5d4e6a8aa BEAM=2 in green CI for 100 TFLOPS (#4624) 2024-05-16 23:28:28 -04:00
nimlgen
eb9689336e nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
George Hotz
5ba611787d move image into tensor.py. delete features (#4603)
* move image into tensor.py

* change setup.py

* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00