qazal
a1dee0e532
early uop UOps.BUFFER (only once) [run_process_replay] ( #6820 )
...
* buf_uops lookup [run_process_replay]
* next diff will be this
* fix ImageDType
2024-10-01 08:46:05 +08:00
nimlgen
e213bea426
nv shorter ( #6819 )
2024-09-30 19:39:32 +03:00
George Hotz
0f28e93224
add pickle support for pattern matchers [run_process_replay] ( #6816 )
...
* add pickle support for pattern matchers [run_process_replay]
* cleaner and all
* no closures
* fix tests
* revert that
* final
* cleaner
* python 3.8 fix
* add round trip back
* this
* waste lines on this. that's the final line count
* max print better
* more targetted fix
* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
chenyu
f59517754e
add RESET_STEP in bert to control reset ( #6818 )
...
same as resnet
2024-09-30 09:39:04 -04:00
qazal
0c24fec9f4
test current behavior of const schedule [run_process_replay] ( #6817 )
2024-09-30 21:02:01 +08:00
qazal
4a4aa69b84
add a better dedup test for DEFINE_VAR with CONST arg ( #6813 )
2024-09-30 15:43:55 +08:00
qazal
e7fcbe1a4d
refactor test_linearizer correctness asserts ( #6812 )
2024-09-30 15:31:02 +08:00
George Hotz
9dd9f71011
no global kernel stuff [run_process_replay] ( #6808 )
...
* use traceback instead of global metadata crap [run_process_replay]
* save the kernel
* correct, imports clean, no device
* UNPARENTED
* speed
* proudly unparented
* Update ops.py
* update tests for unparented
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-09-30 13:52:33 +08:00
George Hotz
00b3171902
mod can be and ( #6810 )
2024-09-30 12:33:15 +08:00
qazal
c9d763d331
refactor to axis_arg [run_process_replay] ( #6806 )
...
* refactor to axis_arg [run_process_replay]
* remove more arg[1]s
2024-09-30 09:37:31 +08:00
qazal
7099af4450
VIZ show rendering errors ( #6807 )
...
* VIZ show rendering errors
* show the entire traceback
2024-09-30 09:35:36 +08:00
George Hotz
2ed94e447f
gpt2: corealize opt and loss
2024-09-30 09:11:20 +08:00
qazal
2ec73d6f05
push swizzle through dim change ( #6801 )
...
* push swizzle through dim change
* can this be generic
* generic version
* cleanups
2024-09-30 09:04:59 +08:00
George Hotz
a76c6c740c
hand pad gpt2 ( #6805 )
2024-09-30 09:03:07 +08:00
geohotstan
282abb4234
add get_available_backends ( #6771 )
...
* lol
* 1 less line lmfao
* something like this?
* comment
* pylint
* just iterator
* backends -> devices
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-09-30 08:58:04 +08:00
qazal
3c15e64273
VIZ prep for the new kernel render ( #6800 )
...
* refactor to list
* remove prints in test_viz
* more cleanup
2024-09-29 20:06:31 +08:00
qazal
01c9653614
add UOps.BUFFER, delete Buffer in UOps.DEFINE_GLOBAL ( #6798 )
...
* delete DEFINE_GLOBAL buffer arg
* add UOps.BUFFER
2024-09-29 18:56:07 +08:00
qazal
5e1221845f
refactor schedule edges to tuple[LazyBuffer, ...] [run_process_replay] ( #6797 )
2024-09-29 11:34:39 +08:00
chenyu
68e59eb3f5
update mlperf-logging to 4.1.0-rc3 ( #6796 )
2024-09-28 21:45:37 -04:00
qazal
dab05ff070
match dataclass.replace in UOp.replace [run_process_replay] ( #6792 )
...
* UOp replace matching dataclass replace
* p2
* replace creates a copy
2024-09-28 16:28:49 +08:00
chenyu
494b20e886
bert BS back to 54 ( #6791 )
...
60 does not run end to end
2024-09-27 22:16:05 -04:00
chenyu
572d77d1d9
bert script delete eval data after eval ( #6790 )
...
fits BS=60 which is 2% faster than 54. also fixed wandb logging params
2024-09-27 20:54:00 -04:00
chenyu
f9c8e144ff
chmod +x mlperf bert script for red ( #6789 )
...
also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red
2024-09-27 11:27:32 -04:00
Francis Lata
d3a387be63
[MLPerf] Prepare openimages dataset script ( #6747 )
...
* prepare openimages for MLPerf
* cleanup
* fix issue when clearing jit_cache on retinanet eval
* revert pandas specific changes
2024-09-27 11:13:56 -04:00
chenyu
bc82f8c5be
use where in dropout ( #6758 )
...
should save memory since we only store mask in bool instead of the upcasted used in mul
2024-09-27 11:11:43 -04:00
qazal
76b3c1e818
add all realized Buffers to schedule graph edges [run_process_replay] ( #6786 )
...
* add realized Buffers to bufs
* simpler checks
2024-09-27 19:25:51 +08:00
qazal
568c97f7a2
add UOp.define_global [run_process_replay] ( #6787 )
...
* add UOp.define_global [run_process_replay]
* no src
2024-09-27 19:24:03 +08:00
nimlgen
b95f47784a
qcom sleep when sync ( #6785 )
...
* qcom sleep when sync
* linter
* short
2024-09-27 19:14:10 +08:00
qazal
fb3fe6f39b
better VIZ ( #6781 )
...
* ui changes
* make kernels global
* dont save buffers when running VIZ=1
* remove flex in layout
* use os.execv
* del server thread
* server close
* cleanup
* logs cleanup
* rm getenv
* cleanups
* remove global
2024-09-27 18:38:31 +08:00
chenyu
2fc26890c9
default BS=9 in handcode_opt bert ( #6783 )
...
using 54 for 6 gpus now, and 2 is not a good default
2024-09-27 04:38:16 -04:00
George Hotz
9a3f6f392d
llm.c tok/s
2024-09-27 00:46:18 -07:00
George Hotz
b0e70ab04f
llm.c updates
2024-09-27 15:25:59 +08:00
George Hotz
eaa1e0eeeb
rename constant_folder to sym [run_process_replay] ( #6780 )
2024-09-27 14:54:54 +08:00
qazal
900b21ef0c
viz delete const after fold ( #6778 )
...
* viz delete const after fold
* add base to tests
2024-09-27 11:58:01 +08:00
qazal
94e43dc49a
add Buffer.to_uop [run_process_replay] ( #6777 )
2024-09-27 11:41:23 +08:00
qazal
98a81b36e1
viz table view ( #6743 )
...
* fix matcher with ctx
* current_kernel fix
* add table
* make the right things clickable
* some more init work
* add kernel resizer
* Revert "add kernel resizer"
This reverts commit 035eef3703 .
* allow scroll
2024-09-27 10:26:46 +08:00
chenyu
bea7ed5986
add RUNMLPERF=1 to bert dev_run.sh ( #6775 )
...
already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data
2024-09-26 11:00:49 -04:00
George Hotz
c178dc1071
faster uops ci [run_process_replay] ( #6774 )
2024-09-26 20:15:01 +08:00
George Hotz
249af24f18
metal bfloat as cast ( #6773 )
2024-09-26 19:31:40 +08:00
George Hotz
ed2f28388f
render cast is rewrite rules [run_process_replay] ( #6772 )
...
* render cast is rewrite rules [run_process_replay]
* move load/store to rewrite rules
* render_alu smaller
* render_gep
2024-09-26 19:03:31 +08:00
nimlgen
3c56aeee70
add Tensor.from_blob ( #6765 )
...
* draft tensor from pointer init
* some docs and types
* comment
* cleaner
* test
* malloc
* qcom cl interop
* jit example
* cleaner
* dealoc
* wording
* docs
2024-09-26 18:33:19 +08:00
George Hotz
14ad47b515
rewrite to use uops if ( #6764 )
...
* rewrite to use uops if
* does this pass
* careful penalty
* fix tests
* remove unused stuff
* that's a cstyle rewrite
* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
George Hotz
7e7184bb13
cleaner ptx match rules [run_process_replay] ( #6770 )
...
* cleaner ptx match rules [run_process_replay]
* clean up load/store rules
* now that's clean
* oops, typo
* cast back to bool
2024-09-26 17:44:10 +08:00
chenyu
12de203a43
add IGNORE_JIT_FIRST_BEAM to bert scripts ( #6769 )
...
* update bert BEAM params
copied from resnet to start with
* just IGNORE_JIT_FIRST_BEAM
2024-09-26 05:38:24 -04:00
wozeparrot
15cd42cfb9
feat: support TRACEMETA=2 in handcode_opt ( #6767 )
2024-09-26 16:58:29 +08:00
chenyu
5a5fbfa1eb
smaller bert script change ( #6768 )
...
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
wozeparrot
abd484a9f7
fix: need numpy for docs and testing ( #6766 )
2024-09-26 16:44:59 +08:00
wozeparrot
2b899164c6
no numpy ( #6751 )
2024-09-26 16:40:18 +08:00
George Hotz
7fca0bc912
use pattern matcher for image [run_process_replay] ( #6762 )
...
* use pattern matcher for image [run_process_replay]
* try again
* this
2024-09-26 15:49:09 +08:00
qazal
197f8fd986
early uop globals with Buffer ( #6753 )
2024-09-26 15:34:21 +08:00