Commit Graph

351 Commits

Author SHA1 Message Date
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a correctly insert UOps.END* in fuzz result (#4653) 2024-05-19 21:10:28 +03:00
qazal
954718e6bf reorder DEFINE_GLOBAL in fuzz_uops (#4651)
* globals base

* test: opt out of DEFINE_GLOBAL

* do it like ExecItem
2024-05-19 20:51:31 +03:00
qazal
b0cb02f719 uops fuzzing infra (#4641)
* base with bfs

* find paths

* get last

* try blocks

* Revert "try blocks"

This reverts commit 25f8e3fe85.

* this should be simpler

* full exec

* support debug

* fix lint

* add todo

* copy in_degree
2024-05-18 20:19:57 +03:00
qazal
a5204fe89d refactor UOps.CONST (#4639)
* delete more

* nit: dont need assign

* can this be simpler

* use scalars

* always cast

* clang needs cast

* format
2024-05-18 10:07:36 +03:00
nimlgen
10cf8e459b hcq update queue in place (#4626)
* do not self wait in hcq

* faster enqueue

* comments

* tests

* linter

* fix typo
2024-05-17 22:18:20 +03:00
qazal
f3f2b96583 pick schedule tests from external_test_opt (#4615)
* conv tests

* misc

* that shouldnt const fold
2024-05-16 15:43:41 +03:00
nimlgen
65f7e3b3ab nv setup constbuf4 (#4511)
* nv correct constbuf 4

* compare results to cuda

* test fixed

* failed kernel

* repro

* revert this change
2024-05-16 10:42:35 +03:00
George Hotz
5ba611787d move image into tensor.py. delete features (#4603)
* move image into tensor.py

* change setup.py

* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal
cd4d7e18c7 _recurse_lb small cleanup (#4601)
* minor cleanups

* comments

* extend env in replay
2024-05-15 19:10:42 +03:00
George Hotz
ff64bcab69 move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
George Hotz
fd02ab1e8b move disassemblers and openpilot (#4592)
* move disassemblers and openpilot

* delete junk

* put that in pre-commit

* fixup readme
2024-05-14 19:30:02 -07:00
Szymon Ożóg
5eb81ff764 Fix speed compare script (#4581)
* Fix speed compare script

* Update speed_compare_cuda_ptx.py

* Update speed_compare_cuda_ptx.py

* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
George Hotz
7a26bdac65 move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666 add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
George Hotz
328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
George Hotz
2f970a4fc2 all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0 fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
George Hotz
827058f030 update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
1e843d495e cleaning up search with Program (#4500)
* cleaning up search

* fix tests

* test fix

* minor compiler cleanup
2024-05-09 19:01:53 -07:00
nimlgen
a2e2ba380c nv tune shmem size (#4495)
* nv tune shmem size

* compare them

* linter

* linter2
2024-05-10 00:35:01 +03:00
nimlgen
e14d5b6fd7 nv fix oob qmd ptr (#4478)
* nv fix oob qmd ptr

* test kernargs no oob
2024-05-08 23:11:04 +03:00
Francis Lam
7da1b41f38 fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts (#4468)
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00
nimlgen
a1d350a810 nv timeline semaphores (#4464)
* nv timeline semaphores

* nv hcq fixes
2024-05-07 17:31:19 +03:00
nimlgen
e3bb85fd0e amd timeline semaphores (#4416)
* amd timeline semaphores

* v2

* fixes

* reset signals

* fix

* rollover test

* small fixes

* linter

* copyin
2024-05-07 11:17:32 +03:00
George Hotz
17faae091b optimizer shouldn't be run without training (#4460)
* optimizer shouldn't be run without training

* set training in relevant tests

* fix multitensor

* that too
2024-05-06 15:34:12 -07:00
nimlgen
d0b8862dea fix out of resource kernels on nv (#4450)
* fix out of resource kernels on nv

* better comment

* noqa

* noqa 2

* linter
2024-05-06 19:24:20 +03:00
nimlgen
113c2f00b9 amd doorbell size is 64bits (#4448)
* amd doorbell size ids 64bits

* add test

* test to pass 32bit boundary is more correct

* no need to round there
2024-05-06 16:59:59 +03:00
qazal
3401734e54 infra for scheduler process replay (#4405)
* use getenv

* capture ast

* fix graph

* replay schedules

* exec
2024-05-03 20:29:13 +03:00
George Hotz
f635c4d273 fix define global (#4383)
* fix define global

* remove name from DEFINE_GLOBAL

* fix fuzzing

* fix ptx

* fix python
2024-05-01 22:32:56 -04:00
qazal
ea06f657df fusion tests from test_opt (#4357)
* opt tests

* more sgd

* batchnorm

* models stay in external
2024-05-01 16:44:12 +03:00
Elias Wahl
babe87a8ae BERT: Checkpoint loading tests (#4359)
* Move checkpoint init to helpers. Add test

* linters

* Move the steps outside of the main train loop

* Move data_get

* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
18c61ce077 test/fuzz_linearizer: add --atol/rtol and change half distribution (#4352) 2024-04-29 15:53:59 -04:00
Elias Wahl
27613dd881 MLPerf BERT: Main training loop (#4288)
* BERT language modeling head + trunc normal initializers

* add train loop + helpers

* shuffle in dataloaders + slight changes in main loop

* beam change

* Minor changes

* random.shuffle

* HParam update

* Use deque for dataloader

* wandb bert project name

* half fixes

* BENCHMARK + remove epoch

* cast + print()

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-29 14:35:27 -04:00
qazal
774a9b0bca override assign_target in fuzz_schedule (#4342)
* store assign_targets

* cleanup

* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1 [MLPerf] UNet3D dataloader (#4343)
* add support for train/val datasets for kits19

* split dataset into train and val sets

* add tests for kits19 dataloader

* add MLPerf dataset tests to CI

* update unet3d model_eval script

* fix linting

* add nibabel

* fix how mock dataset gets created

* update ref implementation with permalink and no edits

* clean up test and update rand_flip implementation

* cleanups
2024-04-28 22:34:18 -04:00
qazal
3372bea322 reduce children fusion tests (#4321)
* base tests

* real-world tests
2024-04-28 11:14:02 -04:00
chenyu
24a6342950 add mem/s to external_benchmark_resnet (#4309) 2024-04-26 20:07:17 -04:00
David Hou
6f792b727b More improvements for resnet layer bench (#4272)
* fix first layer size, new schedule stuff

* estimates

* get different conv layers

* \r for estimated times

* E501

* space after comma
2024-04-25 12:40:49 -04:00
George Hotz
acb32e1766 hotfix: PM4 supports timing 2024-04-24 08:38:59 +00:00
George Hotz
ad28fdecb1 si.inputs+outputs -> bufs (#4279) 2024-04-24 15:12:34 +08:00
Elias Wahl
69341144ba Wikipedia preprocessing script (#4229)
* Preprocessing script

* short seq prob

* comments + env vars

* Add preprocessing reference. Add test

* lint fix + add eval test support

* whitespaces

* point to commit

* comment

* rename

* better comments
2024-04-23 10:28:01 -04:00