Commit Graph

4154 Commits

Author SHA1 Message Date
chenyu
f7416916df update resnet hparams based on BS=1632 RCP (#4210)
https://github.com/mlcommons/logging/blob/master/mlperf_logging/rcp_checker/training_4.0.0/rcps_resnet.json
2024-04-18 12:01:46 -04:00
George Hotz
fa57c3e7ce continue llm.c (#4190)
* continue llm.c

* export more

* progress on llm.c

* simpler optim, names work
2024-04-18 10:57:54 +04:00
geohotstan
269a58d5fa tolist to return multidimensional list (#4192)
* lol does this work

* some more changes

* a tiny note

* rename a variable

* add test for data const and add TODO comment

* make type correct

make type correct
2024-04-18 07:43:10 +04:00
Francis Lata
3644077a42 [MLPerf][UNet3D] Add DICE loss + metrics (#4204)
* add DICE loss and metrics

* update dice to include reference implementation's link

* remove unused imports

* remove unnecessary test file and update pred + label for metrics and losses test

* add tests to CI + add exclusion of mlperf_unet3d

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-17 20:09:33 -04:00
chenyu
cd801a15f3 scipy.signal.gaussian -> scipy.signal.windows.gaussian (#4205)
fixed unet3d model_eval, will add to CI after merging new dice loss
2024-04-17 19:15:37 -04:00
Elias Wahl
6eef8ee22a Wikipedia download script for MLPerf BERT training (#4202)
* wikipedia download script

* add link

* checksum valueError

* ops
2024-04-17 16:34:57 -04:00
qazal
f75020a903 minimal diff for multioutput reduce pairs (#4030)
* simple fusion

* compiler cache patch

* Revert "compiler cache patch"

This reverts commit fa18049597.

* Revert "Revert "compiler cache patch""

This reverts commit 57f8d41f98.

* delete that

* early sort

* teeny renames

* spec

* .empty is great

* delete sort

* Update test_schedule.py

* this is one kernel now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 10:55:44 -04:00
George Hotz
8564e28a1b new memory scheduler with explicit refcounts (#4198)
* new memory scheduler with explict refcounts

* move central memory planner

* typo + use central memory planner in openpilot

* cleanups

* include lb_refcount in pickle

* replace PlaceHolder with memory planner

* cleaner
2024-04-17 08:46:47 +04:00
Francis Lam
c91b7b1739 test: add fuzz_matmul and better debugging for simple_matmul (#4199)
also show unoptimized shape in verify_kernel
2024-04-16 23:40:31 -04:00
qazal
ba8602612b Fuzz all permutations of schedule (#4136)
* simple toposort

* fuzzer

* init in_degree

* move to tests

* same seed

* configure paths

* internal graph

* compare LazyBuffers

* simpler

* simple graph

* assign works

* simpler

* fix JIT

* upstream ci

* move ci

* fix the path

* DEBUG=1

* limit max paths

* launch a cmp kernel

* Revert "launch a cmp kernel"

This reverts commit 791c608992.

* exec ground truth

* better perf

* copy ground truth once

* gpu allclose ast try1

* Revert "gpu allclose ast try1"

This reverts commit 1f82103af3.

* prerealized bufs freezing

* teeny cleanups

* reuse Buffers

* Revert "reuse Buffers"

This reverts commit a71de94b03.

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-17 05:03:21 +04:00
nimlgen
4ed6b42a8a fix kernargs check in kfd (#4194) 2024-04-17 00:44:50 +03:00
David Hou
97d846dd67 in forced_realize, unchase last op if it is upcast (#4185)
* in forced_realize, unchase last op if it is upcast

* start on test

* flesh out test

* more test

* comment

* comment out parallel reduce test

* reorder

* unused
2024-04-16 17:15:17 -04:00
Francis Lam
e9c1616b27 logging: change LOGKERN to LOGKERNS to match LOGOPS (#4193)
also add printing of ast and applied_opts during verify_kernel
to more easily debug errors if they come up
2024-04-16 16:08:32 -04:00
David Hou
7fb220a567 touchup resnet_layer_bench (#4191) 2024-04-16 14:43:00 -04:00
David Hou
1dbf3b2b19 Benchmarks for individual resnet layers (#4182)
* resnet individual layer benchmarks!

* small

* 1 and 2

* mem_used

* no ci

* better conv print

* defaults

* prints

* adjust

* adjust

* adjust

* benchmark only one layer example

* tensor.training, zero_grad, sum instead of mean, last mem, last kernel count

* default jitcnt=1

* scale flops/kernels with jitcnt

* add note about jitcnt memory

* touchup
2024-04-16 13:53:18 -04:00
George Hotz
d49d4324a3 update docs (#4189) 2024-04-16 16:07:02 +04:00
George Hotz
55ae73e951 Replicate llm.c in tinygrad (#4179)
* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* test tolist

* simple fix for onnx test failures (#4186)

* write llm.c and add a few new methods to tensor

* training works

* add jit

* tests for new functions

* bump line count to 7500

* simplest fix

* safenumpy tolist for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>

---------

Co-authored-by: geohotstan <135171913+geohotstan@users.noreply.github.com>
2024-04-16 15:40:48 +04:00
George Hotz
b6e7243bfa hotfix: skip slow pre-commit test 2024-04-16 11:48:43 +04:00
George Hotz
cda0010020 hotfix: docs-legacy 2024-04-16 11:06:56 +04:00
George Hotz
8f749ae0eb New docs are in mkdocs (#4178)
* start mkdocs

* simple docs for tensor

* more docs

* move those back

* more docs

* copy markdown extensions

* docs legacy

* docs building workflow

* fix showcase links

* only that?

* install tinygrad

* add docs to setup.py

* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu
aa093efa43 fix handcode_resnet50_opt flops count (#4184) 2024-04-15 22:13:45 -04:00
chenyu
d5b67c1ca3 log resnet TRAIN_BEAM / EVAL_BEAM (#4181)
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam
9d2273235c search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088)
* search: add better default settings for fast search

not the highest possible performance, but adequate for most usage

* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes

also sneak in a link to .gitignore for the unet3d dataset

* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal
286ea697f3 keep order in realizes (#4180) 2024-04-16 01:25:50 +04:00
George Hotz
e14a9bca0c hotfix: bump line count to 7500 for NV backend 2024-04-15 23:18:46 +04:00
chenyu
6a2168e698 TRAIN_BEAM and EVAL_BEAM for resnet (#4177)
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy
4592fc8fe7 Multireduce Kernels - prereq refactor (#4173)
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)

* linters

* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou
593c90d7d6 Resnet fp16 training with fp32 master weight copy (#4144)
* add casts to layers

* FLOAT flag

* detach

* no_grad for eval

* whitespace

* explicit fp32 initialization

* oops

* whitespace

* put back config['DEFAULT_FLOAT']

* bad

* live dangerously (don't hide bugs)

* don't bundle changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-14 11:25:08 -04:00
chenyu
e20d6f9221 correct resnet estimate time (#4169)
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz
ea18d28253 some overview docs 2024-04-13 17:01:09 -07:00
George Hotz
50e780a588 multitensor shouldn't recompile (#4164)
* multitensor shouldn't recompile

* type annotations

* fix tests

* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
599eb266b1 optionally use a copy kernel instead of SDMA (#4116)
* optionally use a copy kernel

* lazyops in copied kernels

* add sync

* no sdma at all

* work

* copy_ast
2024-04-12 23:10:41 -07:00
George Hotz
ba7314c26b cleanup lbs (#4163) 2024-04-12 22:32:16 -07:00
chenyu
a7c6864260 remove CAST_BEFORE_VIEW (#4152)
* remove CAST_BEFORE_VIEW

testing perf, also this might have issue with assign?

* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c rewrite the jit in the context of new schedule (#4162)
* rewrite the jit in the context of new schedule

* mypy better

* fix placeholder

* tests

* all functionality should work

* fix tests

* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
b67f759780 abstractions3 is currently wishful thinking (#4124)
* abstractions3 is currently wishful thinking

* a3

* work

* minor

* progress on a3

* more

* update abstractions3

* cleaner
2024-04-12 16:46:01 -07:00
MaximilianEmel
27a98aaecc Rewritten SVG Logos (#4150)
* rewrote the svg logos to use polygons and render better

* changed self-closing tags' style to better conform to the original
2024-04-12 14:09:57 -07:00
chenyu
63eb0a68af fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu
d9c5a2b1bb fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu
f6c8032e5d assert if expr_idxs return might be outside of int32 (#4157) 2024-04-12 14:18:35 -04:00
nimlgen
24a27a01a9 hotfix: CUDA_P2P works (#4155) 2024-04-12 18:20:12 +03:00
nimlgen
5a57b48134 cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
chenyu
380f27d629 move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
George Hotz
bbda20c0db CompiledASTRunner -> CompiledRunner (#4148) 2024-04-11 08:49:52 -07:00
George Hotz
0f16709c00 hotfix: remove test speed vs torch 2024-04-11 08:37:57 -07:00
qazal
c0796374e4 refactor membufs (#4147) 2024-04-11 08:30:44 -07:00
George Hotz
b7e281cf10 JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz
e79a11b99c hotfix: revert llama change 2024-04-10 20:13:15 -07:00
George Hotz
2e6c39b0b2 Do less realizes (#4141)
* less realize

* corealize jit inputs

* prints

* print before we run
2024-04-10 19:50:50 -07:00
chenyu
06bcae13b4 PADTO SUM if parents of sum are all zero-preserving (#4140)
* PADTO SUM if parents of sum are all zero-preserving

* test case unsafe ops after sum is fine

* reuse UNSAFE_PAD_OPS

* update db version
2024-04-10 22:16:12 -04:00