Commit Graph

4136 Commits

Author SHA1 Message Date
George Hotz
cda0010020 hotfix: docs-legacy 2024-04-16 11:06:56 +04:00
George Hotz
8f749ae0eb New docs are in mkdocs (#4178)
* start mkdocs

* simple docs for tensor

* more docs

* move those back

* more docs

* copy markdown extensions

* docs legacy

* docs building workflow

* fix showcase links

* only that?

* install tinygrad

* add docs to setup.py

* Delete examples/llm.c/data
2024-04-16 10:59:51 +04:00
chenyu
aa093efa43 fix handcode_resnet50_opt flops count (#4184) 2024-04-15 22:13:45 -04:00
chenyu
d5b67c1ca3 log resnet TRAIN_BEAM / EVAL_BEAM (#4181)
also run eval in benchmark mode if either one is positive
2024-04-15 19:29:08 -04:00
Francis Lam
9d2273235c search: BEAM_UOPS_MAX to prune candidates with too many uops (#4088)
* search: add better default settings for fast search

not the highest possible performance, but adequate for most usage

* search: revert BEAM_MIN_PROGRESS and BEAM_UPCAST_MAX default changes

also sneak in a link to .gitignore for the unet3d dataset

* revert BEAM_MAX_TASKS_PER_CHILD change and fix uops max condition
2024-04-15 18:56:22 -04:00
qazal
286ea697f3 keep order in realizes (#4180) 2024-04-16 01:25:50 +04:00
George Hotz
e14a9bca0c hotfix: bump line count to 7500 for NV backend 2024-04-15 23:18:46 +04:00
chenyu
6a2168e698 TRAIN_BEAM and EVAL_BEAM for resnet (#4177)
working on measuring compile time
2024-04-15 14:57:21 -04:00
Timmy
4592fc8fe7 Multireduce Kernels - prereq refactor (#4173)
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)

* linters

* addressing concerns
2024-04-14 20:16:54 -04:00
David Hou
593c90d7d6 Resnet fp16 training with fp32 master weight copy (#4144)
* add casts to layers

* FLOAT flag

* detach

* no_grad for eval

* whitespace

* explicit fp32 initialization

* oops

* whitespace

* put back config['DEFAULT_FLOAT']

* bad

* live dangerously (don't hide bugs)

* don't bundle changes

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-14 11:25:08 -04:00
chenyu
e20d6f9221 correct resnet estimate time (#4169)
7.99 hours was rendered as 7h0m.
2024-04-14 02:21:46 -04:00
George Hotz
ea18d28253 some overview docs 2024-04-13 17:01:09 -07:00
George Hotz
50e780a588 multitensor shouldn't recompile (#4164)
* multitensor shouldn't recompile

* type annotations

* fix tests

* outcount in reduce
2024-04-13 00:03:48 -07:00
George Hotz
599eb266b1 optionally use a copy kernel instead of SDMA (#4116)
* optionally use a copy kernel

* lazyops in copied kernels

* add sync

* no sdma at all

* work

* copy_ast
2024-04-12 23:10:41 -07:00
George Hotz
ba7314c26b cleanup lbs (#4163) 2024-04-12 22:32:16 -07:00
chenyu
a7c6864260 remove CAST_BEFORE_VIEW (#4152)
* remove CAST_BEFORE_VIEW

testing perf, also this might have issue with assign?

* remove all
2024-04-13 01:05:08 -04:00
George Hotz
ebc94c9d6c rewrite the jit in the context of new schedule (#4162)
* rewrite the jit in the context of new schedule

* mypy better

* fix placeholder

* tests

* all functionality should work

* fix tests

* no CacheCollector
2024-04-12 21:54:36 -07:00
George Hotz
b67f759780 abstractions3 is currently wishful thinking (#4124)
* abstractions3 is currently wishful thinking

* a3

* work

* minor

* progress on a3

* more

* update abstractions3

* cleaner
2024-04-12 16:46:01 -07:00
MaximilianEmel
27a98aaecc Rewritten SVG Logos (#4150)
* rewrote the svg logos to use polygons and render better

* changed self-closing tags' style to better conform to the original
2024-04-12 14:09:57 -07:00
chenyu
63eb0a68af fix return dtype of gather (#4159) 2024-04-12 16:25:12 -04:00
chenyu
d9c5a2b1bb fix return dtype of getitem Tensor indexing (#4158)
the use of sum can auto-upcast the result. fixed by using the data dtype as the acc_dtype
2024-04-12 15:55:02 -04:00
chenyu
f6c8032e5d assert if expr_idxs return might be outside of int32 (#4157) 2024-04-12 14:18:35 -04:00
nimlgen
24a27a01a9 hotfix: CUDA_P2P works (#4155) 2024-04-12 18:20:12 +03:00
nimlgen
5a57b48134 cuda p2p enable when available (#4153) 2024-04-12 16:21:54 +03:00
chenyu
380f27d629 move sum acc_dtype into lazy so it applies to backward (#4149)
* move sum acc_dtype into lazy so it applies to backward

* unit test
2024-04-11 14:43:56 -04:00
George Hotz
bbda20c0db CompiledASTRunner -> CompiledRunner (#4148) 2024-04-11 08:49:52 -07:00
George Hotz
0f16709c00 hotfix: remove test speed vs torch 2024-04-11 08:37:57 -07:00
qazal
c0796374e4 refactor membufs (#4147) 2024-04-11 08:30:44 -07:00
George Hotz
b7e281cf10 JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz
e79a11b99c hotfix: revert llama change 2024-04-10 20:13:15 -07:00
George Hotz
2e6c39b0b2 Do less realizes (#4141)
* less realize

* corealize jit inputs

* prints

* print before we run
2024-04-10 19:50:50 -07:00
chenyu
06bcae13b4 PADTO SUM if parents of sum are all zero-preserving (#4140)
* PADTO SUM if parents of sum are all zero-preserving

* test case unsafe ops after sum is fine

* reuse UNSAFE_PAD_OPS

* update db version
2024-04-10 22:16:12 -04:00
George Hotz
081dd1573f hotfix: keep CUDA D2D copy behind the CUDA_P2P flag 2024-04-10 21:36:48 +00:00
George Hotz
af5984df43 cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
terafo
5e6d2155e4 Add driving monitoring model to benchmarks (#4134)
* add driving monitoring model to benchmarks

* handle crash
2024-04-10 14:27:03 -04:00
chenyu
bf3583f9b2 use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132) 2024-04-10 13:11:50 -04:00
George Hotz
a35375df85 run_schedule is so simple now (#4130) 2024-04-10 09:49:30 -07:00
George Hotz
86bd2eb500 hotfix: update copy_from_fd for new DiskBuffer 2024-04-10 15:41:06 +00:00
George Hotz
ee457a4b20 no more underlying diskbuffer, that's just the device (#4129) 2024-04-10 08:32:25 -07:00
geohotstan
fe88591890 update onnx to 1.16.0 (#4127)
* update

* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
George Hotz
08ddeb5685 create schedule has global vars (#4125)
* abstractions3 is currently wishful thinking

* create_schedule_with_vars
2024-04-09 21:42:16 -07:00
George Hotz
216eb235e5 hotfix: cast mnist to float 2024-04-09 19:30:03 -07:00
George Hotz
fea774f669 spend 5 lines to bring mnist into the repo (#4122) 2024-04-09 19:24:57 -07:00
qazal
42edae8935 pickle schedules (#4114)
* pickle schedules

* Update test_pickle.py

* Update test_pickle.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-09 13:47:25 -07:00
Felix Kuehling
38ae4194a6 Fixes for ops_kfd (#4105)
* kfd_ops: Fix GPU node discovery on NUMA systems

Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are
not accessible because of device cgroups.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

* kfd_ops: Format the GFX arch target name correctly

The target version in sysfs properties is a decimal representation with
two digits per component.

The format for LLVM GFX target names is a bit quirky for historical
reasons. It uses one digit for the minor version and stepping. When it
ran out of decimal digits for the stepping on gfx90X it started using
hexadecimal there. But the major version is still decimal and went
double digit in GFX10.

Make sure to parse and format it accordingly for all supported GPUs.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

---------

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
2024-04-09 13:21:21 -07:00
George Hotz
10dbf90b2c hotfix: test speed 2024-04-09 13:20:39 -07:00
George Hotz
ae849d12d7 numpy device + pickle it (#4120) 2024-04-09 13:19:30 -07:00
chenyu
1ef9c50fd7 Update ssa input order and annotate types in cstyle and assembly (#4117)
variable prefix is never optional (removed the default "t") and UOp can be optional (added the default None).
2024-04-09 13:10:29 -04:00
geohotstan
15f2f39658 conceptually simpler fancy index (#3335)
* init

* add failed case

* fix: temp comment out MULACC cast

* is this right?

* add test case

* oops, forgot to get rid of temp test

* WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY

* cleaner

* comment cleanup

* update docs

* resolve conflict

* oops

* SUPA FAST

* comment out a test

* del some print statements

* use new broadcast stuff

* more clean up

* move try except

* skip fancy indexing for python backend test_ops
2024-04-09 11:18:04 -04:00