nimlgen
019f4680e5
check dims before execution on nv ( #4756 )
...
* check dims before execution on nv
* fix linter
2024-05-28 16:57:28 +03:00
qazal
c170ddceaf
fix commavq benchmark ( #4712 )
...
* fix _slice and assert explicit device
* with _slice
2024-05-24 19:40:57 +03:00
qazal
498cf3e7e0
fuzzer path search for DEFINE_ACC ( #4656 )
...
* insert acc
* add test_ops
* find toposorts
* todo - not yet ready
* remove the import
* atol and childless children
2024-05-23 00:50:01 +03:00
Francis Lam
721f9f6acf
test/external/verify_kernel: fix LOGKERNS variable name in comments ( #4685 )
...
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
nimlgen
c9f7f2da70
nv hcq bind api ( #4629 )
...
* hcq bind api for nv
* linter
* linter
* add test
* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a
correctly insert UOps.END* in fuzz result ( #4653 )
2024-05-19 21:10:28 +03:00
qazal
954718e6bf
reorder DEFINE_GLOBAL in fuzz_uops ( #4651 )
...
* globals base
* test: opt out of DEFINE_GLOBAL
* do it like ExecItem
2024-05-19 20:51:31 +03:00
qazal
b0cb02f719
uops fuzzing infra ( #4641 )
...
* base with bfs
* find paths
* get last
* try blocks
* Revert "try blocks"
This reverts commit 25f8e3fe85 .
* this should be simpler
* full exec
* support debug
* fix lint
* add todo
* copy in_degree
2024-05-18 20:19:57 +03:00
qazal
a5204fe89d
refactor UOps.CONST ( #4639 )
...
* delete more
* nit: dont need assign
* can this be simpler
* use scalars
* always cast
* clang needs cast
* format
2024-05-18 10:07:36 +03:00
nimlgen
10cf8e459b
hcq update queue in place ( #4626 )
...
* do not self wait in hcq
* faster enqueue
* comments
* tests
* linter
* fix typo
2024-05-17 22:18:20 +03:00
qazal
f3f2b96583
pick schedule tests from external_test_opt ( #4615 )
...
* conv tests
* misc
* that shouldnt const fold
2024-05-16 15:43:41 +03:00
nimlgen
65f7e3b3ab
nv setup constbuf4 ( #4511 )
...
* nv correct constbuf 4
* compare results to cuda
* test fixed
* failed kernel
* repro
* revert this change
2024-05-16 10:42:35 +03:00
George Hotz
5ba611787d
move image into tensor.py. delete features ( #4603 )
...
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal
cd4d7e18c7
_recurse_lb small cleanup ( #4601 )
...
* minor cleanups
* comments
* extend env in replay
2024-05-15 19:10:42 +03:00
George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
George Hotz
fd02ab1e8b
move disassemblers and openpilot ( #4592 )
...
* move disassemblers and openpilot
* delete junk
* put that in pre-commit
* fixup readme
2024-05-14 19:30:02 -07:00
Szymon Ożóg
5eb81ff764
Fix speed compare script ( #4581 )
...
* Fix speed compare script
* Update speed_compare_cuda_ptx.py
* Update speed_compare_cuda_ptx.py
* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c
amd mockgpu ( #4535 )
...
* start mock amd gpu
* virt files
* cleaner
* init ci
* small fixes
* linter
* better?
* ugh
* linter
* fix
* diable some
* run shorter
* fixes
* add hcq test
* fix
* fix cmd revert
2024-05-14 14:28:04 +03:00
George Hotz
7a26bdac65
move scheduleitem to schedule.py ( #4541 )
...
* move scheduleitem to schedule.py
* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666
add cpu objdump to LLVM/CLANG ( #4537 )
2024-05-11 14:28:44 -07:00
George Hotz
328b083e66
lil profiling script
2024-05-11 11:02:44 -07:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert ( #4525 )
...
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
George Hotz
827058f030
update tests get_runner ( #4522 )
2024-05-10 20:09:22 -07:00
George Hotz
d438d5698d
bring buffer back to device ( #4517 )
2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
1e843d495e
cleaning up search with Program ( #4500 )
...
* cleaning up search
* fix tests
* test fix
* minor compiler cleanup
2024-05-09 19:01:53 -07:00
nimlgen
a2e2ba380c
nv tune shmem size ( #4495 )
...
* nv tune shmem size
* compare them
* linter
* linter2
2024-05-10 00:35:01 +03:00
nimlgen
e14d5b6fd7
nv fix oob qmd ptr ( #4478 )
...
* nv fix oob qmd ptr
* test kernargs no oob
2024-05-08 23:11:04 +03:00
Francis Lam
7da1b41f38
fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts ( #4468 )
...
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00
nimlgen
a1d350a810
nv timeline semaphores ( #4464 )
...
* nv timeline semaphores
* nv hcq fixes
2024-05-07 17:31:19 +03:00
nimlgen
e3bb85fd0e
amd timeline semaphores ( #4416 )
...
* amd timeline semaphores
* v2
* fixes
* reset signals
* fix
* rollover test
* small fixes
* linter
* copyin
2024-05-07 11:17:32 +03:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
nimlgen
d0b8862dea
fix out of resource kernels on nv ( #4450 )
...
* fix out of resource kernels on nv
* better comment
* noqa
* noqa 2
* linter
2024-05-06 19:24:20 +03:00
nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
qazal
3401734e54
infra for scheduler process replay ( #4405 )
...
* use getenv
* capture ast
* fix graph
* replay schedules
* exec
2024-05-03 20:29:13 +03:00
George Hotz
f635c4d273
fix define global ( #4383 )
...
* fix define global
* remove name from DEFINE_GLOBAL
* fix fuzzing
* fix ptx
* fix python
2024-05-01 22:32:56 -04:00
qazal
ea06f657df
fusion tests from test_opt ( #4357 )
...
* opt tests
* more sgd
* batchnorm
* models stay in external
2024-05-01 16:44:12 +03:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
18c61ce077
test/fuzz_linearizer: add --atol/rtol and change half distribution ( #4352 )
2024-04-29 15:53:59 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-29 14:35:27 -04:00
qazal
774a9b0bca
override assign_target in fuzz_schedule ( #4342 )
...
* store assign_targets
* cleanup
* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1
[MLPerf] UNet3D dataloader ( #4343 )
...
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
2024-04-28 22:34:18 -04:00
qazal
3372bea322
reduce children fusion tests ( #4321 )
...
* base tests
* real-world tests
2024-04-28 11:14:02 -04:00
chenyu
24a6342950
add mem/s to external_benchmark_resnet ( #4309 )
2024-04-26 20:07:17 -04:00
David Hou
6f792b727b
More improvements for resnet layer bench ( #4272 )
...
* fix first layer size, new schedule stuff
* estimates
* get different conv layers
* \r for estimated times
* E501
* space after comma
2024-04-25 12:40:49 -04:00
George Hotz
acb32e1766
hotfix: PM4 supports timing
2024-04-24 08:38:59 +00:00
George Hotz
ad28fdecb1
si.inputs+outputs -> bufs ( #4279 )
2024-04-24 15:12:34 +08:00
Elias Wahl
69341144ba
Wikipedia preprocessing script ( #4229 )
...
* Preprocessing script
* short seq prob
* comments + env vars
* Add preprocessing reference. Add test
* lint fix + add eval test support
* whitespaces
* point to commit
* comment
* rename
* better comments
2024-04-23 10:28:01 -04:00