Commit Graph

6200 Commits

Author SHA1 Message Date
nimlgen
8bbf6fb88c use mv_address in ops_gpu (#6856) 2024-10-02 22:31:51 +03:00
chenyu
c3c93f332a symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
chenyu
08850da026 minor rand_like change [run_process_replay] (#6848) 2024-10-02 07:27:51 -04:00
George Hotz
7214450c23 little symbolic changes [pr] (#6849)
* little symbolic changes [pr]

* symbolic needs resolve too

* no resolve

* less change
2024-10-02 17:12:30 +08:00
qazal
fc78716d31 Buffer arg from big graph [pr] (#6847)
* Buffer arg from big graph [pr]

* x.dtype
2024-10-02 15:28:47 +08:00
qazal
29363fb85e add dtype.ptr() [pr] (#6839) 2024-10-02 15:03:05 +08:00
George Hotz
be12409b51 changes for symbolic (#6844)
* changes for symbolic

* only for ints

* check int first
2024-10-02 12:57:16 +08:00
qazal
1735f8ef1c viz rewrite part 1 [pr] (#6842)
* core viz spec

* leaaan

* refine docs

* .

* add rewrite_count back in ui
2024-10-02 11:56:25 +08:00
mesozoic-egg
d2e02b47e1 Construct c_ulong in blitCommandEncoder copy method (#6793)
* Construct c_ulong in blitCommandEncoder copy method

* line too long

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-02 11:09:37 +08:00
George Hotz
567e10efcb lil symbolic changes [pr] (#6841) 2024-10-02 10:56:22 +08:00
George Hotz
100ce7a684 hotfix: min/max on CMPNE was wrong 2024-10-02 10:15:03 +08:00
chenyu
5f77217772 bert default CKPT to 0 (#6840)
not required
2024-10-01 21:55:56 -04:00
George Hotz
1ac83aaa4b lil sym changes (#6837)
* lil sym changes [pr]

* fix inf crap

* Update ops.py

* remove that, it's wrong
2024-10-02 09:54:17 +08:00
Tobias Fischer
33f7599158 Compute FID Score (#6802)
* compute fid score code

* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
ignaciosica
6a73ad89a2 get global, local and shared max from cudarenderer (#6836) 2024-10-01 16:32:57 +03:00
George Hotz
84726e8855 good changes from symbolic removal [run_process_replay] (#6835)
* good changes from symbolic removal [run_process_replay]

* fix __ne__
2024-10-01 18:49:09 +08:00
qazal
c5b252cdb3 add pr alias [pr] (#6834) 2024-10-01 18:48:44 +08:00
George Hotz
e907b25792 move some pm rules to uopgraph.py [run_process_replay] (#6831)
* move some pm rules to uopgraph.py [run_process_replay]

* move more

* move lt and clean

* end maybe

* put back
2024-10-01 18:28:41 +08:00
qazal
0cb82f308c viz don't include graph_rewrites that return a non-UOp (#6832)
* viz don't include graph_rewrites that return a non-UOp

* delete bad things
2024-10-01 18:13:53 +08:00
George Hotz
2a540d87e7 don't use is_int [run_process_replay] (#6833) 2024-10-01 18:13:34 +08:00
vladov
501cfde7e6 Fix GPT2 with OpenCL backend. (#6821)
* Fix GPT2 with OpenCL backend.

* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal
a16a8c5958 color process replay stats [run_process_replay] (#6830) 2024-10-01 15:29:11 +08:00
George Hotz
547733e57c stunning_mnist [run_process_replay] (#6828)
* stunning_mnist [run_process_replay]

* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal
391497a311 schedule independent of Device [run_process_replay] (#6829) 2024-10-01 14:46:26 +08:00
George Hotz
8a93c48901 pickle main pattern matcher [run_process_replay] (#6827)
* pickle main pattern matcher [run_process_replay]

* del line
2024-10-01 13:58:42 +08:00
George Hotz
d726eb6f48 uop resolve [run_process_replay] (#6826)
* uop bool and int and stuff [run_process_replay]

* add ne support

* can't even be None anymore

* BinaryOps.AND support

* less compare
2024-10-01 13:11:42 +08:00
qazal
a42b177533 express CONST view as SWIZZLE, uop VALID only once [run_process_replay] (#6823)
* construct VALID once and SWIZZLE

* make Variable work

* image dtype

* test: merge views happens already
2024-10-01 11:44:26 +08:00
George Hotz
50dd6bd951 move cmp tuple out [run_process_replay] (#6825)
* move cmp tuple out [run_process_replay]

* was unneeded
2024-10-01 10:38:28 +08:00
qazal
a1dee0e532 early uop UOps.BUFFER (only once) [run_process_replay] (#6820)
* buf_uops lookup [run_process_replay]

* next diff will be this

* fix ImageDType
2024-10-01 08:46:05 +08:00
nimlgen
e213bea426 nv shorter (#6819) 2024-09-30 19:39:32 +03:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
chenyu
f59517754e add RESET_STEP in bert to control reset (#6818)
same as resnet
2024-09-30 09:39:04 -04:00
qazal
0c24fec9f4 test current behavior of const schedule [run_process_replay] (#6817) 2024-09-30 21:02:01 +08:00
qazal
4a4aa69b84 add a better dedup test for DEFINE_VAR with CONST arg (#6813) 2024-09-30 15:43:55 +08:00
qazal
e7fcbe1a4d refactor test_linearizer correctness asserts (#6812) 2024-09-30 15:31:02 +08:00
George Hotz
9dd9f71011 no global kernel stuff [run_process_replay] (#6808)
* use traceback instead of global metadata crap [run_process_replay]

* save the kernel

* correct, imports clean, no device

* UNPARENTED

* speed

* proudly unparented

* Update ops.py

* update tests for unparented

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-30 13:52:33 +08:00
George Hotz
00b3171902 mod can be and (#6810) 2024-09-30 12:33:15 +08:00
qazal
c9d763d331 refactor to axis_arg [run_process_replay] (#6806)
* refactor to axis_arg [run_process_replay]

* remove more arg[1]s
2024-09-30 09:37:31 +08:00
qazal
7099af4450 VIZ show rendering errors (#6807)
* VIZ show rendering errors

* show the entire traceback
2024-09-30 09:35:36 +08:00
George Hotz
2ed94e447f gpt2: corealize opt and loss 2024-09-30 09:11:20 +08:00
qazal
2ec73d6f05 push swizzle through dim change (#6801)
* push swizzle through dim change

* can this be generic

* generic version

* cleanups
2024-09-30 09:04:59 +08:00
George Hotz
a76c6c740c hand pad gpt2 (#6805) 2024-09-30 09:03:07 +08:00
geohotstan
282abb4234 add get_available_backends (#6771)
* lol

* 1 less line lmfao

* something like this?

* comment

* pylint

* just iterator

* backends -> devices

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-30 08:58:04 +08:00
qazal
3c15e64273 VIZ prep for the new kernel render (#6800)
* refactor to list

* remove prints in test_viz

* more cleanup
2024-09-29 20:06:31 +08:00
qazal
01c9653614 add UOps.BUFFER, delete Buffer in UOps.DEFINE_GLOBAL (#6798)
* delete DEFINE_GLOBAL buffer arg

* add UOps.BUFFER
2024-09-29 18:56:07 +08:00
qazal
5e1221845f refactor schedule edges to tuple[LazyBuffer, ...] [run_process_replay] (#6797) 2024-09-29 11:34:39 +08:00
chenyu
68e59eb3f5 update mlperf-logging to 4.1.0-rc3 (#6796) 2024-09-28 21:45:37 -04:00
qazal
dab05ff070 match dataclass.replace in UOp.replace [run_process_replay] (#6792)
* UOp replace matching dataclass replace

* p2

* replace creates a copy
2024-09-28 16:28:49 +08:00
chenyu
494b20e886 bert BS back to 54 (#6791)
60 does not run end to end
2024-09-27 22:16:05 -04:00
chenyu
572d77d1d9 bert script delete eval data after eval (#6790)
fits BS=60 which is 2% faster than 54. also fixed wandb logging params
2024-09-27 20:54:00 -04:00