George Hotz
acb32e1766
hotfix: PM4 supports timing
2024-04-24 08:38:59 +00:00
George Hotz
ad28fdecb1
si.inputs+outputs -> bufs ( #4279 )
2024-04-24 15:12:34 +08:00
Elias Wahl
69341144ba
Wikipedia preprocessing script ( #4229 )
...
* Preprocessing script
* short seq prob
* comments + env vars
* Add preprocessing reference. Add test
* lint fix + add eval test support
* whitespaces
* point to commit
* comment
* rename
* better comments
2024-04-23 10:28:01 -04:00
George Hotz
967638f0d5
update docs, remove corealize ( #4264 )
...
* update docs, remove corealize
* handle 0 line count
* tensor schedule
2024-04-23 12:05:29 +04:00
George Hotz
9a95781d51
renamed ( #4260 )
2024-04-23 09:00:28 +04:00
George Hotz
2ae4f45272
WIP PM4 Support ( #4110 )
...
* pm4 kernel launch works
* disable USE_THREAD_DIMENSIONS
* add kernel code
* work on real pm4
* pm4 signal
* same
* gate pm4
* hcq tests pass
* ops passes
* pm4 is closer
* pm4 debug (#4165 )
* start debug tests passing
* prg
* smth
* hdp flush
* cleaner 1
* do not need this
* logs not need
* small things
* linter
* remove AQL
* test hcq
* fix tests
* it's subtracting, it shouldn't be -1
* pm4 changes (#4251 )
* not need this anymore
* sdma signal with non atomic
---------
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-04-23 08:31:27 +04:00
chenyu
f1d9d0a151
cleanup external_test_opt ( #4234 )
...
no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now
2024-04-20 04:00:08 -04:00
David Hou
dc4b1af09c
more realistic edge behavior for resnet benchmark ( #4231 )
...
* more realistic edge behavior for resnet benchmark
* schedule_step
* realize all parameters ahead of time
* don't save setup and misc schedules
2024-04-19 20:07:46 -04:00
George Hotz
b9570d6100
clean up update stats ( #4226 )
...
* WIP: clean up update stats
* line savings now
* fix graphs
* fix tests
* tighter prints
* remove extra jit=false
* debug=2 means wait
* that won't update stats
* still wait
2024-04-19 15:41:30 +04:00
qazal
1c87e5dbf6
fuzz schedule context vars ( #4223 )
...
* fuzz schedule context vars
* fuzz unique toposorts
* merge ground truth with the rest
* Revert "merge ground truth with the rest"
This reverts commit 1f3463bb57 .
* readability>
* can override
2024-04-19 13:16:25 +03:00
Francis Lata
3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics ( #4204 )
...
* add DICE loss and metrics
* update dice to include reference implementation's link
* remove unused imports
* remove unnecessary test file and update pred + label for metrics and losses test
* add tests to CI + add exclusion of mlperf_unet3d
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-17 20:09:33 -04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c608992 .
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3 .
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b03 .
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 05:03:21 +04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
50e780a588
multitensor shouldn't recompile ( #4164 )
...
* multitensor shouldn't recompile
* type annotations
* fix tests
* outcount in reduce
2024-04-13 00:03:48 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
bbda20c0db
CompiledASTRunner -> CompiledRunner ( #4148 )
2024-04-11 08:49:52 -07:00
George Hotz
b7e281cf10
JitItem -> ExecItem ( #4146 )
...
* JitItem -> ExecItem
* execitem in realize
* cleaner
* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
terafo
5e6d2155e4
Add driving monitoring model to benchmarks ( #4134 )
...
* add driving monitoring model to benchmarks
* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890
update onnx to 1.16.0 ( #4127 )
...
* update
* pass tests and skip tests
2024-04-10 11:19:13 -04:00
George Hotz
ae849d12d7
numpy device + pickle it ( #4120 )
2024-04-09 13:19:30 -07:00
George Hotz
164329a8ea
address kfd feedback ( #4087 )
...
* address kfd feedback
* signals cleanup
* signals cleanup
* handle 2 doorbell pages correctly
* signal reset cleanup
* signals cleanup
* more GTT
* cleanups
* minor cleanups
2024-04-05 15:24:41 -07:00
George Hotz
a337922c44
more work on kfd ( #4079 )
...
* more work on kfd
* fix multitensor test on kfd
* stuff
2024-04-05 08:36:36 -07:00
George Hotz
3de855ea50
don't use SVM memory in KFD ( #4072 )
...
* don't use SVM memory in KFD
* copy from fd
* cleanups
* transfer
* hacks
* ops_hsa
* tighter API
2024-04-04 17:33:21 -07:00
George Hotz
7181ffd630
HWCopyQueue in KFD ( #4042 )
...
* HWCopyQueue in KFD
* hw compute queue
* test
* move test
* more tests
* fix wait
* fix multimap
* mes crash
* tests pass but slow
* stuff is working
* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf
remove old envvar "OPT" ( #4060 )
2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
f916aadaea
external that test
2024-03-29 19:35:50 -07:00
chenyu
d9ff636cf5
use is to compare with enum ( #3993 )
...
* use is to compare with enum
currently it's mixed between `==` and `is`, moved all to `is`
* more
2024-03-29 13:02:56 -04:00
chenyu
b47f6cebb2
LinearizerOptions -> CompilerOptions ( #3978 )
2024-03-28 17:50:23 -04:00
George Hotz
42b9d999ea
Buffer isn't always allocated ( #3974 )
...
* buffer alloc
* allocate
* missing allocates
* last one
2024-03-28 13:33:47 -07:00
geohotstan
bd3a7d068c
correct device for validation test in model benchmark CI ( #3960 )
...
* fix tests
* add clang back for only metal
* change the name to reflect CLANG being ran
* add back cuda
2024-03-27 13:40:06 -04:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
Francis Lam
5530b0cbed
fuzz_linearizer: reduce debug verbosity and make easier for CI usage ( #3942 )
...
* fuzz_linearizer: reduce debug verbosity and make easier for CI usage
* rename FUZZ_BEAM to FUZZ_ALL_ACTIONS (not choosing a subset)
* skip simple ASTs (easier to use with LOGOPS output)
* don't fuzz a previously seen AST
* add options to allow non-zero --expected-failures
* clean up naming and use set
2024-03-26 16:25:24 -04:00
nimlgen
e2d6f76723
_alloc and _free with options ( #3934 )
...
* _alloc has options
* linter
* fix hsa
2024-03-26 09:11:41 -07:00
wozeparrot
9a9cac58f9
add lars to nn ( #3750 )
...
* feat: add lars
* feat: don't remove this comment
* clean: smaller diff
* clean: shorter line
* feat: remove mlperf lars, switch resnet
* fix: fully remove mlperf lars
* clean: comment
* feat: contiguous
* feat: no weight decay on skip params
* feat: optimizergroup
* feat: classic momentum
* fix: pylint
* clean: move comment
* fix: correct algo
* feat: lrschedulergroup
* feat: skip list tests
* feat: :| forgot that params are a thing
* feat: remove skip_list params from main params
* feat: set moment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-03-24 11:43:12 -04:00
chenyu
a2b2597fc2
replace dtype.name str with render_dtype ( #3903 )
...
fixed some bf16 cast issue since it does not have `.name`.
also more robust if there are lang specific type override
2024-03-23 19:25:48 -04:00
Francis Lam
8db7a6bbcc
debug: add optional detailed BEAM_LOG logging ( #3883 )
...
* debug: add optional detailed BEAM_LOG logging
show uop count, compile and run times for each candidate in search
also add --timing to verify_kernel.py to make it easier to explore
hand-crafted applied opts
* fix linter
2024-03-22 19:23:31 -04:00
Francis Lam
5587594a00
fuzz_linearizer: add --ast and --file params to read kernels ( #3877 )
...
also fix up ast_str_to_str to support the new tuple of LazyOps
2024-03-22 14:27:40 -04:00
uuuvn
6729f20aab
Ring allreduce try 2 ( #3852 )
...
* Ring allreduce v3
* Configurable size, number of gpus and jit in benchmark
* ScheduleBarrier v0
* GB/s that make sense
* ScheduleBarrier v0.1
* Fallback on 2 GPUs
* ScheduleBarrier v0.2
* ScheduleBarrier v0.3
* ScheduleBarrier v0.3.1
* ScheduleBarrier v0.3.2
* Replace ScheduleBarrier with automatic optimization
* unused import
* fix comment
* typing
* better fallback
* python 3.8
* RING=2 and use ContextVar
* DEBUG >= 2 and change name
* linter
* type
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2024-03-21 19:17:51 -04:00
Francis Lam
3c0478bfab
fuzz_linearizer: add additional DEBUG info for comparison errors ( #3866 )
2024-03-21 18:58:10 -04:00
chenyu
e50b7abe4f
diversed buf inputs based on dtype in fuzz_linearizer ( #3863 )
2024-03-21 16:23:11 -04:00
chenyu
30fa03243e
reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures ( #3861 )
2024-03-21 14:12:27 -04:00
chenyu
6bf0b82267
alloc new output in fuzz_linearizer between baseline and real one ( #3859 )
...
if the kernel is an assign `a += 1`, the rawbufs[0] is updated twice and gives false compare_error
2024-03-21 11:36:05 -04:00
nimlgen
85691c8e20
fix hsa sync issue ( #3847 )
...
* fix hsa sync issue
* linter
2024-03-21 04:00:30 +03:00