chenyu
30fc1ad415
remove TODO: remove explicit dtypes after broadcast fix in stable_diffusion ( #4241 )
...
this is done
2024-04-21 00:31:24 -04:00
chenyu
a1940ced77
remove the assign hack in whisper ( #4240 )
...
no longer needed, the commented test case was removed too
2024-04-20 23:56:44 -04:00
chenyu
3f126c7664
fix examples vits / converstion.py ( #4239 )
...
it was passing a const numpy array into Tensor.arange
2024-04-20 23:29:12 -04:00
chenyu
31c9d9a228
fix test_linearizer tc opt tests for bf16 ( #4237 )
...
bf16 tc has larger rtol
2024-04-20 11:51:50 -04:00
chenyu
f1d9d0a151
cleanup external_test_opt ( #4234 )
...
no more OPT=2 or OPT=3, check strict number of kernels, enabled tests that fusion works now
2024-04-20 04:00:08 -04:00
David Hou
dc4b1af09c
more realistic edge behavior for resnet benchmark ( #4231 )
...
* more realistic edge behavior for resnet benchmark
* schedule_step
* realize all parameters ahead of time
* don't save setup and misc schedules
2024-04-19 20:07:46 -04:00
David Hou
f6eea03749
SAVE_SCHEDULE as contextvar ( #4230 )
2024-04-19 18:51:57 -04:00
qazal
2094b3b327
graph ScheduleItems ( #4224 )
...
* graph schedules
* add logging
* inplace
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-19 16:17:11 +04:00
George Hotz
cd88afc98b
datasets isn't a feature + filter docstrings ( #4228 )
...
* datasets isn't a feature
* filter docstrings in sz
2024-04-19 16:16:10 +04:00
George Hotz
b9570d6100
clean up update stats ( #4226 )
...
* WIP: clean up update stats
* line savings now
* fix graphs
* fix tests
* tighter prints
* remove extra jit=false
* debug=2 means wait
* that won't update stats
* still wait
2024-04-19 15:41:30 +04:00
qazal
1c87e5dbf6
fuzz schedule context vars ( #4223 )
...
* fuzz schedule context vars
* fuzz unique toposorts
* merge ground truth with the rest
* Revert "merge ground truth with the rest"
This reverts commit 1f3463bb57 .
* readability>
* can override
2024-04-19 13:16:25 +03:00
George Hotz
d99b512084
llm.c timing ( #4219 )
...
* add timing info
* fix malloc
* 8s with beam
2024-04-19 12:43:21 +04:00
qazal
43841a32b7
Merge pull request #4222 from Qazalin/fuzz-multi0
...
Tunable multi output fusion
2024-04-19 08:07:45 +03:00
qazal
b2fe3884fc
Merge branch 'master' into fuzz-multi0
2024-04-19 07:56:26 +03:00
qazal
abb10c83cd
tunable multi output fusion
2024-04-19 07:44:31 +03:00
chenyu
a1133beb80
KFD GEMM ( #4221 )
...
added to benchmark CI and fixed duplicated filenames between cuda and ptx
2024-04-19 00:43:18 -04:00
chenyu
3f3af0fb85
test_linearizer_failures 29 passes now ( #4215 )
...
TC + PADTO fixed
2024-04-18 19:49:23 -04:00
Elias Wahl
2ecd61e3e2
monkey patching ( #4214 )
2024-04-18 19:20:52 -04:00
Francis Lam
126826afc8
linearizer: refactor to define accs with potentially TC-modified idxs ( #4211 )
2024-04-18 15:31:06 -04:00
George Hotz
39b60a25f0
more llm c work ( #4207 )
...
* more llm c work
* print nicely
* fake load pretrained
* select warmups
* output c code
2024-04-18 22:20:44 +04:00
chenyu
f7416916df
update resnet hparams based on BS=1632 RCP ( #4210 )
...
https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json
2024-04-18 12:01:46 -04:00
George Hotz
fa57c3e7ce
continue llm.c ( #4190 )
...
* continue llm.c
* export more
* progress on llm.c
* simpler optim, names work
2024-04-18 10:57:54 +04:00
geohotstan
269a58d5fa
tolist to return multidimensional list ( #4192 )
...
* lol does this work
* some more changes
* a tiny note
* rename a variable
* add test for data const and add TODO comment
* make type correct
make type correct
2024-04-18 07:43:10 +04:00
Francis Lata
3644077a42
[MLPerf][UNet3D] Add DICE loss + metrics ( #4204 )
...
* add DICE loss and metrics
* update dice to include reference implementation's link
* remove unused imports
* remove unnecessary test file and update pred + label for metrics and losses test
* add tests to CI + add exclusion of mlperf_unet3d
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-17 20:09:33 -04:00
chenyu
cd801a15f3
scipy.signal.gaussian -> scipy.signal.windows.gaussian ( #4205 )
...
fixed unet3d model_eval, will add to CI after merging new dice loss
2024-04-17 19:15:37 -04:00
Elias Wahl
6eef8ee22a
Wikipedia download script for MLPerf BERT training ( #4202 )
...
* wikipedia download script
* add link
* checksum valueError
* ops
2024-04-17 16:34:57 -04:00
qazal
f75020a903
minimal diff for multioutput reduce pairs ( #4030 )
...
* simple fusion
* compiler cache patch
* Revert "compiler cache patch"
This reverts commit fa18049597 .
* Revert "Revert "compiler cache patch""
This reverts commit 57f8d41f98 .
* delete that
* early sort
* teeny renames
* spec
* .empty is great
* delete sort
* Update test_schedule.py
* this is one kernel now
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 10:55:44 -04:00
George Hotz
8564e28a1b
new memory scheduler with explicit refcounts ( #4198 )
...
* new memory scheduler with explict refcounts
* move central memory planner
* typo + use central memory planner in openpilot
* cleanups
* include lb_refcount in pickle
* replace PlaceHolder with memory planner
* cleaner
2024-04-17 08:46:47 +04:00
Francis Lam
c91b7b1739
test: add fuzz_matmul and better debugging for simple_matmul ( #4199 )
...
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b
Fuzz all permutations of schedule ( #4136 )
...
* simple toposort
* fuzzer
* init in_degree
* move to tests
* same seed
* configure paths
* internal graph
* compare LazyBuffers
* simpler
* simple graph
* assign works
* simpler
* fix JIT
* upstream ci
* move ci
* fix the path
* DEBUG=1
* limit max paths
* launch a cmp kernel
* Revert "launch a cmp kernel"
This reverts commit 791c608992 .
* exec ground truth
* better perf
* copy ground truth once
* gpu allclose ast try1
* Revert "gpu allclose ast try1"
This reverts commit 1f82103af3 .
* prerealized bufs freezing
* teeny cleanups
* reuse Buffers
* Revert "reuse Buffers"
This reverts commit a71de94b03 .
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-17 05:03:21 +04:00
nimlgen
4ed6b42a8a
fix kernargs check in kfd ( #4194 )
2024-04-17 00:44:50 +03:00
David Hou
97d846dd67
in forced_realize, unchase last op if it is upcast ( #4185 )
...
* in forced_realize, unchase last op if it is upcast
* start on test
* flesh out test
* more test
* comment
* comment out parallel reduce test
* reorder
* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27
logging: change LOGKERN to LOGKERNS to match LOGOPS ( #4193 )
...
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567
touchup resnet_layer_bench ( #4191 )
2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19
Benchmarks for individual resnet layers ( #4182 )
...
* resnet individual layer benchmarks!
* small
* 1 and 2
* mem_used
* no ci
* better conv print
* defaults
* prints
* adjust
* adjust
* adjust
* benchmark only one layer example
* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count
* default jitcnt=1
* scale flops/kernels with jitcnt
* add note about jitcnt memory
* touchup
2024-04-16 13:53:18 -04:00
George Hotz
d49d4324a3
update docs ( #4189 )
2024-04-16 16:07:02 +04:00
George Hotz
55ae73e951
Replicate llm.c in tinygrad ( #4179 )
...
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* test tolist
* simple fix for onnx test failures (#4186 )
* write llm.c and add a few new methods to tensor
* training works
* add jit
* tests for new functions
* bump line count to 7500
* simplest fix
* safenumpy tolist for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
---------
Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com >
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa
hotfix: skip slow pre-commit test
2024-04-16 11:48:43 +04:00
George Hotz
cda0010020
hotfix: docs-legacy
2024-04-16 11:06:56 +04:00
George Hotz
8f749ae0eb
New docs are in mkdocs ( #4178 )
...
* start mkdocs
* simple docs for tensor
* more docs
* move those back
* more docs
* copy markdown extensions
* docs legacy
* docs building workflow
* fix showcase links
* only that?
* install tinygrad
* add docs to setup.py
* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu
aa093efa43
fix handcode_resnet50_opt flops count ( #4184 )
2024-04-15 22:13:45 -04:00
chenyu
d5b67c1ca3
log resnet TRAIN_BEAM / EVAL_BEAM ( #4181 )
...
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam
9d2273235c
search: BEAM_UOPS_MAX to prune candidates with too many uops ( #4088 )
...
* search: add better default settings for fast search
not the highest possible performance, but adequate for most usage
* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes
also sneak in a link to .gitignore for the unet3d dataset
* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal
286ea697f3
keep order in realizes ( #4180 )
2024-04-16 01:25:50 +04:00
George Hotz
e14a9bca0c
hotfix: bump line count to 7500 for NV backend
2024-04-15 23:18:46 +04:00
chenyu
6a2168e698
TRAIN_BEAM and EVAL_BEAM for resnet ( #4177 )
...
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy
4592fc8fe7
Multireduce Kernels - prereq refactor ( #4173 )
...
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)
* linters
* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou
593c90d7d6
Resnet fp16 training with fp32 master weight copy ( #4144 )
...
* add casts to layers
* FLOAT flag
* detach
* no_grad for eval
* whitespace
* explicit fp32 initialization
* oops
* whitespace
* put back config['DEFAULT_FLOAT']
* bad
* live dangerously (don't hide bugs)
* don't bundle changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-14 11:25:08 -04:00
chenyu
e20d6f9221
correct resnet estimate time ( #4169 )
...
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz
ea18d28253
some overview docs
2024-04-13 17:01:09 -07:00