nimlgen
d0b8862dea
fix out of resource kernels on nv ( #4450 )
...
* fix out of resource kernels on nv
* better comment
* noqa
* noqa 2
* linter
2024-05-06 19:24:20 +03:00
nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
qazal
3401734e54
infra for scheduler process replay ( #4405 )
...
* use getenv
* capture ast
* fix graph
* replay schedules
* exec
2024-05-03 20:29:13 +03:00
George Hotz
f635c4d273
fix define global ( #4383 )
...
* fix define global
* remove name from DEFINE_GLOBAL
* fix fuzzing
* fix ptx
* fix python
2024-05-01 22:32:56 -04:00
qazal
ea06f657df
fusion tests from test_opt ( #4357 )
...
* opt tests
* more sgd
* batchnorm
* models stay in external
2024-05-01 16:44:12 +03:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
18c61ce077
test/fuzz_linearizer: add --atol/rtol and change half distribution ( #4352 )
2024-04-29 15:53:59 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-29 14:35:27 -04:00
qazal
774a9b0bca
override assign_target in fuzz_schedule ( #4342 )
...
* store assign_targets
* cleanup
* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1
[MLPerf] UNet3D dataloader ( #4343 )
...
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
2024-04-28 22:34:18 -04:00
qazal
3372bea322
reduce children fusion tests ( #4321 )
...
* base tests
* real-world tests
2024-04-28 11:14:02 -04:00
chenyu
24a6342950
add mem/s to external_benchmark_resnet ( #4309 )
2024-04-26 20:07:17 -04:00
David Hou
6f792b727b
More improvements for resnet layer bench ( #4272 )
...
* fix first layer size, new schedule stuff
* estimates
* get different conv layers
* \r for estimated times
* E501
* space after comma
2024-04-25 12:40:49 -04:00
George Hotz
acb32e1766
hotfix: PM4 supports timing
2024-04-24 08:38:59 +00:00
George Hotz
ad28fdecb1
si.inputs+outputs -> bufs ( #4279 )
2024-04-24 15:12:34 +08:00
Elias Wahl
69341144ba
Wikipedia preprocessing script ( #4229 )
...
* Preprocessing script
* short seq prob
* comments + env vars
* Add preprocessing reference. Add test
* lint fix + add eval test support
* whitespaces
* point to commit
* comment
* rename
* better comments
2024-04-23 10:28:01 -04:00
George Hotz
967638f0d5
update docs, remove corealize ( #4264 )
...
* update docs, remove corealize
* handle 0 line count
* tensor schedule
2024-04-23 12:05:29 +04:00
George Hotz
9a95781d51
renamed ( #4260 )
2024-04-23 09:00:28 +04:00
George Hotz
2ae4f45272
WIP PM4 Support ( #4110 )
...
* pm4 kernel launch works
* disable USE_THREAD_DIMENSIONS
* add kernel code
* work on real pm4
* pm4 signal
* same
* gate pm4
* hcq tests pass
* ops passes
* pm4 is closer
* pm4 debug (#4165 )
* start debug tests passing
* prg
* smth
* hdp flush
* cleaner 1
* do not need this
* logs not need
* small things
* linter
* remove AQL
* test hcq
* fix tests
* it's subtracting, it shouldn't be -1
* pm4 changes (#4251 )
* not need this anymore
* sdma signal with non atomic
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-04-23 08:31:27 +04:00
chenyu
f1d9d0a151
cleanup external_test_opt ( #4234 )
...
no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now
2024-04-20 04:00:08 -04:00
David Hou
dc4b1af09c
more realistic edge behavior for resnet benchmark ( #4231 )
...
* more realistic edge behavior for resnet benchmark
* schedule_step
* realize all parameters ahead of time
* don't save setup and misc schedules
2024-04-19 20:07:46 -04:00
George Hotz
b9570d6100
clean up update stats ( #4226 )
...
* WIP: clean up update stats
* line savings now
* fix graphs
* fix tests
* tighter prints
* remove extra jit=false
* debug=2 means wait
* that won't update stats
* still wait
2024-04-19 15:41:30 +04:00
qazal
1c87e5dbf6
fuzz schedule context vars ( #4223 )
...
* fuzz schedule context vars
* fuzz unique toposorts
* merge ground truth with the rest
* Revert "merge ground truth with the rest"
This reverts commit 1f3463bb57 .
* readability>
* can override
2024-04-19 13:16:25 +03:00
Francis Lata
3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics ( #4204 )
...
* add DICE loss and metrics
* update dice to include reference implementation's link
* remove unused imports
* remove unnecessary test file and update pred + label for metrics and losses test
* add tests to CI + add exclusion of mlperf_unet3d
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-17 20:09:33 -04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c608992 .
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3 .
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b03 .
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 05:03:21 +04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
50e780a588
multitensor shouldn't recompile ( #4164 )
...
* multitensor shouldn't recompile
* type annotations
* fix tests
* outcount in reduce
2024-04-13 00:03:48 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
bbda20c0db
CompiledASTRunner -> CompiledRunner ( #4148 )
2024-04-11 08:49:52 -07:00
George Hotz
b7e281cf10
JitItem -> ExecItem ( #4146 )
...
* JitItem -> ExecItem
* execitem in realize
* cleaner
* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
terafo
5e6d2155e4
Add driving monitoring model to benchmarks ( #4134 )
...
* add driving monitoring model to benchmarks
* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890
update onnx to 1.16.0 ( #4127 )
...
* update
* pass tests and skip tests
2024-04-10 11:19:13 -04:00
George Hotz
ae849d12d7
numpy device + pickle it ( #4120 )
2024-04-09 13:19:30 -07:00
George Hotz
164329a8ea
address kfd feedback ( #4087 )
...
* address kfd feedback
* signals cleanup
* signals cleanup
* handle 2 doorbell pages correctly
* signal reset cleanup
* signals cleanup
* more GTT
* cleanups
* minor cleanups
2024-04-05 15:24:41 -07:00
George Hotz
a337922c44
more work on kfd ( #4079 )
...
* more work on kfd
* fix multitensor test on kfd
* stuff
2024-04-05 08:36:36 -07:00
George Hotz
3de855ea50
don't use SVM memory in KFD ( #4072 )
...
* don't use SVM memory in KFD
* copy from fd
* cleanups
* transfer
* hacks
* ops_hsa
* tighter API
2024-04-04 17:33:21 -07:00
George Hotz
7181ffd630
HWCopyQueue in KFD ( #4042 )
...
* HWCopyQueue in KFD
* hw compute queue
* test
* move test
* more tests
* fix wait
* fix multimap
* mes crash
* tests pass but slow
* stuff is working
* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf
remove old envvar "OPT" ( #4060 )
2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
f916aadaea
external that test
2024-03-29 19:35:50 -07:00
chenyu
d9ff636cf5
use is to compare with enum ( #3993 )
...
* use is to compare with enum
currently it's mixed between `==` and `is`, moved all to `is`
* more
2024-03-29 13:02:56 -04:00
chenyu
b47f6cebb2
LinearizerOptions -> CompilerOptions ( #3978 )
2024-03-28 17:50:23 -04:00
George Hotz
42b9d999ea
Buffer isn't always allocated ( #3974 )
...
* buffer alloc
* allocate
* missing allocates
* last one
2024-03-28 13:33:47 -07:00
geohotstan
bd3a7d068c
correct device for validation test in model benchmark CI ( #3960 )
...
* fix tests
* add clang back for only metal
* change the name to reflect CLANG being ran
* add back cuda
2024-03-27 13:40:06 -04:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00