qazal
2094b3b327
graph ScheduleItems ( #4224 )
...
* graph schedules
* add logging
* inplace
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-19 16:17:11 +04:00
George Hotz
cd88afc98b
datasets isn't a feature + filter docstrings ( #4228 )
...
* datasets isn't a feature
* filter docstrings in sz
2024-04-19 16:16:10 +04:00
George Hotz
b9570d6100
clean up update stats ( #4226 )
...
* WIP: clean up update stats
* line savings now
* fix graphs
* fix tests
* tighter prints
* remove extra jit=false
* debug=2 means wait
* that won't update stats
* still wait
2024-04-19 15:41:30 +04:00
qazal
1c87e5dbf6
fuzz schedule context vars ( #4223 )
...
* fuzz schedule context vars
* fuzz unique toposorts
* merge ground truth with the rest
* Revert "merge ground truth with the rest"
This reverts commit 1f3463bb57 .
* readability>
* can override
2024-04-19 13:16:25 +03:00
George Hotz
d99b512084
llm.c timing ( #4219 )
...
* add timing info
* fix malloc
* 8s with beam
2024-04-19 12:43:21 +04:00
qazal
43841a32b7
Merge pull request #4222 from Qazalin/fuzz-multi0
...
Tunable multi output fusion
2024-04-19 08:07:45 +03:00
qazal
b2fe3884fc
Merge branch 'master' into fuzz-multi0
2024-04-19 07:56:26 +03:00
qazal
abb10c83cd
tunable multi output fusion
2024-04-19 07:44:31 +03:00
chenyu
a1133beb80
KFD GEMM ( #4221 )
...
added to benchmark CI and fixed duplicated filenames between cuda and ptx
2024-04-19 00:43:18 -04:00
chenyu
3f3af0fb85
test_linearizer_failures 29 passes now ( #4215 )
...
TC + PADTO fixed
2024-04-18 19:49:23 -04:00
Elias Wahl
2ecd61e3e2
monkey patching ( #4214 )
2024-04-18 19:20:52 -04:00
Francis Lam
126826afc8
linearizer: refactor to define accs with potentially TC-modified idxs ( #4211 )
2024-04-18 15:31:06 -04:00
George Hotz
39b60a25f0
more llm c work ( #4207 )
...
* more llm c work
* print nicely
* fake load pretrained
* select warmups
* output c code
2024-04-18 22:20:44 +04:00
chenyu
f7416916df
update resnet hparams based on BS=1632 RCP ( #4210 )
...
https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json
2024-04-18 12:01:46 -04:00
George Hotz
fa57c3e7ce
continue llm.c ( #4190 )
...
* continue llm.c
* export more
* progress on llm.c
* simpler optim, names work
2024-04-18 10:57:54 +04:00
geohotstan
269a58d5fa
tolist to return multidimensional list ( #4192 )
...
* lol does this work
* some more changes
* a tiny note
* rename a variable
* add test for data const and add TODO comment
* make type correct
make type correct
2024-04-18 07:43:10 +04:00
Francis Lata
3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics ( #4204 )
...
* add DICE loss and metrics
* update dice to include reference implementation's link
* remove unused imports
* remove unnecessary test file and update pred + label for metrics and losses test
* add tests to CI + add exclusion of mlperf_unet3d
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-17 20:09:33 -04:00
chenyu
cd801a15f3
scipy.signal.gaussian -> scipy.signal.windows.gaussian ( #4205 )
...
fixed unet3d model_eval, will add to CI after merging new dice loss
2024-04-17 19:15:37 -04:00
Elias Wahl
6eef8ee22a
Wikipedia download script for MLPerf BERT training ( #4202 )
...
* wikipedia download script
* add link
* checksum valueError
* ops
2024-04-17 16:34:57 -04:00
qazal
f75020a903
minimal diff for multioutput reduce pairs ( #4030 )
...
* simple fusion
* compiler cache patch
* Revert "compiler cache patch"
This reverts commit fa18049597 .
* Revert "Revert "compiler cache patch""
This reverts commit 57f8d41f98 .
* delete that
* early sort
* teeny renames
* spec
* .empty is great
* delete sort
* Update test_schedule.py
* this is one kernel now
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 10:55:44 -04:00
George Hotz
8564e28a1b
new memory scheduler with explicit refcounts ( #4198 )
...
* new memory scheduler with explict refcounts
* move central memory planner
* typo + use central memory planner in openpilot
* cleanups
* include lb_refcount in pickle
* replace PlaceHolder with memory planner
* cleaner
2024-04-17 08:46:47 +04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c608992 .
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3 .
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b03 .
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 05:03:21 +04:00
nimlgen
4ed6b42a8a
fix kernargs check in kfd ( #4194 )
2024-04-17 00:44:50 +03:00
David Hou
97d846dd67
in forced_realize, unchase last op if it is upcast ( #4185 )
...
* in forced_realize, unchase last op if it is upcast
* start on test
* flesh out test
* more test
* comment
* comment out parallel reduce test
* reorder
* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
d49d4324a3
update docs ( #4189 )
2024-04-16 16:07:02 +04:00
George Hotz
55ae73e951
Replicate llm.c in tinygrad ( #4179 )
...
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186 )
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com >
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa
hotfix: skip slow pre-commit test
2024-04-16 11:48:43 +04:00
George Hotz
cda0010020
hotfix: docs-legacy
2024-04-16 11:06:56 +04:00
George Hotz
8f749ae0eb
New docs are in mkdocs ( #4178 )
...
* start mkdocs
* simple docs for tensor
* more docs
* move those back
* more docs
* copy markdown extensions
* docs legacy
* docs building workflow
* fix showcase links
* only that?
* install tinygrad
* add docs to setup.py
* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu
aa093efa43
fix handcode_resnet50_opt flops count ( #4184 )
2024-04-15 22:13:45 -04:00
chenyu
d5b67c1ca3
log resnet TRAIN_BEAM / EVAL_BEAM ( #4181 )
...
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam
9d2273235c
search: BEAM_UOPS_MAX to prune candidates with too many uops ( #4088 )
...
* search: add better default settings for fast search
not the highest possible performance, but adequate for most usage
* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes
also sneak in a link to .gitignore for the unet3d dataset
* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal
286ea697f3
keep order in realizes ( #4180 )
2024-04-16 01:25:50 +04:00
George Hotz
e14a9bca0c
hotfix: bump line count to 7500 for NV backend
2024-04-15 23:18:46 +04:00
chenyu
6a2168e698
TRAIN_BEAM and EVAL_BEAM for resnet ( #4177 )
...
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy
4592fc8fe7
Multireduce Kernels - prereq refactor ( #4173 )
...
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)
* linters
* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou
593c90d7d6
Resnet fp16 training with fp32 master weight copy ( #4144 )
...
* add casts to layers
* FLOAT flag
* detach
* no_grad for eval
* whitespace
* explicit fp32 initialization
* oops
* whitespace
* put back config['DEFAULT_FLOAT']
* bad
* live dangerously (don't hide bugs)
* don't bundle changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-14 11:25:08 -04:00
chenyu
e20d6f9221
correct resnet estimate time ( #4169 )
...
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz
ea18d28253
some overview docs
2024-04-13 17:01:09 -07:00
George Hotz
50e780a588
multitensor shouldn't recompile ( #4164 )
...
* multitensor shouldn't recompile
* type annotations
* fix tests
* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
599eb266b1
optionally use a copy kernel instead of SDMA ( #4116 )
...
* optionally use a copy kernel
* lazyops in copied kernels
* add sync
* no sdma at all
* work
* copy_ast
2024-04-12 23:10:41 -07:00
George Hotz
ba7314c26b
cleanup lbs ( #4163 )
2024-04-12 22:32:16 -07:00
chenyu
a7c6864260
remove CAST_BEFORE_VIEW ( #4152 )
...
* remove CAST_BEFORE_VIEW
testing perf, also this might have issue with assign?
* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
b67f759780
abstractions3 is currently wishful thinking ( #4124 )
...
* abstractions3 is currently wishful thinking
* a3
* work
* minor
* progress on a3
* more
* update abstractions3
* cleaner
2024-04-12 16:46:01 -07:00
MaximilianEmel
27a98aaecc
Rewritten SVG Logos ( #4150 )
...
* rewrote the svg logos to use polygons and render better
* changed self-closing tags' style to better conform to the original
2024-04-12 14:09:57 -07:00