Commit Graph

6172 Commits

Author SHA1 Message Date
qazal
a1dee0e532 early uop UOps.BUFFER (only once) [run_process_replay] (#6820)
* buf_uops lookup [run_process_replay]

* next diff will be this

* fix ImageDType
2024-10-01 08:46:05 +08:00
nimlgen
e213bea426 nv shorter (#6819) 2024-09-30 19:39:32 +03:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
chenyu
f59517754e add RESET_STEP in bert to control reset (#6818)
same as resnet
2024-09-30 09:39:04 -04:00
qazal
0c24fec9f4 test current behavior of const schedule [run_process_replay] (#6817) 2024-09-30 21:02:01 +08:00
qazal
4a4aa69b84 add a better dedup test for DEFINE_VAR with CONST arg (#6813) 2024-09-30 15:43:55 +08:00
qazal
e7fcbe1a4d refactor test_linearizer correctness asserts (#6812) 2024-09-30 15:31:02 +08:00
George Hotz
9dd9f71011 no global kernel stuff [run_process_replay] (#6808)
* use traceback instead of global metadata crap [run_process_replay]

* save the kernel

* correct, imports clean, no device

* UNPARENTED

* speed

* proudly unparented

* Update ops.py

* update tests for unparented

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-30 13:52:33 +08:00
George Hotz
00b3171902 mod can be and (#6810) 2024-09-30 12:33:15 +08:00
qazal
c9d763d331 refactor to axis_arg [run_process_replay] (#6806)
* refactor to axis_arg [run_process_replay]

* remove more arg[1]s
2024-09-30 09:37:31 +08:00
qazal
7099af4450 VIZ show rendering errors (#6807)
* VIZ show rendering errors

* show the entire traceback
2024-09-30 09:35:36 +08:00
George Hotz
2ed94e447f gpt2: corealize opt and loss 2024-09-30 09:11:20 +08:00
qazal
2ec73d6f05 push swizzle through dim change (#6801)
* push swizzle through dim change

* can this be generic

* generic version

* cleanups
2024-09-30 09:04:59 +08:00
George Hotz
a76c6c740c hand pad gpt2 (#6805) 2024-09-30 09:03:07 +08:00
geohotstan
282abb4234 add get_available_backends (#6771)
* lol

* 1 less line lmfao

* something like this?

* comment

* pylint

* just iterator

* backends -> devices

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-30 08:58:04 +08:00
qazal
3c15e64273 VIZ prep for the new kernel render (#6800)
* refactor to list

* remove prints in test_viz

* more cleanup
2024-09-29 20:06:31 +08:00
qazal
01c9653614 add UOps.BUFFER, delete Buffer in UOps.DEFINE_GLOBAL (#6798)
* delete DEFINE_GLOBAL buffer arg

* add UOps.BUFFER
2024-09-29 18:56:07 +08:00
qazal
5e1221845f refactor schedule edges to tuple[LazyBuffer, ...] [run_process_replay] (#6797) 2024-09-29 11:34:39 +08:00
chenyu
68e59eb3f5 update mlperf-logging to 4.1.0-rc3 (#6796) 2024-09-28 21:45:37 -04:00
qazal
dab05ff070 match dataclass.replace in UOp.replace [run_process_replay] (#6792)
* UOp replace matching dataclass replace

* p2

* replace creates a copy
2024-09-28 16:28:49 +08:00
chenyu
494b20e886 bert BS back to 54 (#6791)
60 does not run end to end
2024-09-27 22:16:05 -04:00
chenyu
572d77d1d9 bert script delete eval data after eval (#6790)
fits BS=60 which is 2% faster than 54. also fixed wandb logging params
2024-09-27 20:54:00 -04:00
chenyu
f9c8e144ff chmod +x mlperf bert script for red (#6789)
also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red
2024-09-27 11:27:32 -04:00
Francis Lata
d3a387be63 [MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
chenyu
bc82f8c5be use where in dropout (#6758)
should save memory since we only store mask in bool instead of the upcasted used in mul
2024-09-27 11:11:43 -04:00
qazal
76b3c1e818 add all realized Buffers to schedule graph edges [run_process_replay] (#6786)
* add realized Buffers to bufs

* simpler checks
2024-09-27 19:25:51 +08:00
qazal
568c97f7a2 add UOp.define_global [run_process_replay] (#6787)
* add UOp.define_global [run_process_replay]

* no src
2024-09-27 19:24:03 +08:00
nimlgen
b95f47784a qcom sleep when sync (#6785)
* qcom sleep when sync

* linter

* short
2024-09-27 19:14:10 +08:00
qazal
fb3fe6f39b better VIZ (#6781)
* ui changes

* make kernels global

* dont save buffers when running VIZ=1

* remove flex in layout

* use os.execv

* del server thread

* server close

* cleanup

* logs cleanup

* rm getenv

* cleanups

* remove global
2024-09-27 18:38:31 +08:00
chenyu
2fc26890c9 default BS=9 in handcode_opt bert (#6783)
using 54 for 6 gpus now, and 2 is not a good default
2024-09-27 04:38:16 -04:00
George Hotz
9a3f6f392d llm.c tok/s 2024-09-27 00:46:18 -07:00
George Hotz
b0e70ab04f llm.c updates 2024-09-27 15:25:59 +08:00
George Hotz
eaa1e0eeeb rename constant_folder to sym [run_process_replay] (#6780) 2024-09-27 14:54:54 +08:00
qazal
900b21ef0c viz delete const after fold (#6778)
* viz delete const after fold

* add base to tests
2024-09-27 11:58:01 +08:00
qazal
94e43dc49a add Buffer.to_uop [run_process_replay] (#6777) 2024-09-27 11:41:23 +08:00
qazal
98a81b36e1 viz table view (#6743)
* fix matcher with ctx

* current_kernel fix

* add table

* make the right things clickable

* some more init work

* add kernel resizer

* Revert "add kernel resizer"

This reverts commit 035eef3703.

* allow scroll
2024-09-27 10:26:46 +08:00
chenyu
bea7ed5986 add RUNMLPERF=1 to bert dev_run.sh (#6775)
already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data
2024-09-26 11:00:49 -04:00
George Hotz
c178dc1071 faster uops ci [run_process_replay] (#6774) 2024-09-26 20:15:01 +08:00
George Hotz
249af24f18 metal bfloat as cast (#6773) 2024-09-26 19:31:40 +08:00
George Hotz
ed2f28388f render cast is rewrite rules [run_process_replay] (#6772)
* render cast is rewrite rules [run_process_replay]

* move load/store to rewrite rules

* render_alu smaller

* render_gep
2024-09-26 19:03:31 +08:00
nimlgen
3c56aeee70 add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
George Hotz
14ad47b515 rewrite to use uops if (#6764)
* rewrite to use uops if

* does this pass

* careful penalty

* fix tests

* remove unused stuff

* that's a cstyle rewrite

* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
George Hotz
7e7184bb13 cleaner ptx match rules [run_process_replay] (#6770)
* cleaner ptx match rules [run_process_replay]

* clean up load/store rules

* now that's clean

* oops, typo

* cast back to bool
2024-09-26 17:44:10 +08:00
chenyu
12de203a43 add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769)
* update bert BEAM params

copied from resnet to start with

* just IGNORE_JIT_FIRST_BEAM
2024-09-26 05:38:24 -04:00
wozeparrot
15cd42cfb9 feat: support TRACEMETA=2 in handcode_opt (#6767) 2024-09-26 16:58:29 +08:00
chenyu
5a5fbfa1eb smaller bert script change (#6768)
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
wozeparrot
abd484a9f7 fix: need numpy for docs and testing (#6766) 2024-09-26 16:44:59 +08:00
wozeparrot
2b899164c6 no numpy (#6751) 2024-09-26 16:40:18 +08:00
George Hotz
7fca0bc912 use pattern matcher for image [run_process_replay] (#6762)
* use pattern matcher for image [run_process_replay]

* try again

* this
2024-09-26 15:49:09 +08:00
qazal
197f8fd986 early uop globals with Buffer (#6753) 2024-09-26 15:34:21 +08:00