Francis Lata
9fbc3f1fc7
Merge branch 'master' into retinanet_mlperf
2024-09-27 08:16:24 -07:00
Francis Lata
d3a387be63
[MLPerf] Prepare openimages dataset script ( #6747 )
...
* prepare openimages for MLPerf
* cleanup
* fix issue when clearing jit_cache on retinanet eval
* revert pandas specific changes
2024-09-27 11:13:56 -04:00
chenyu
bc82f8c5be
use where in dropout ( #6758 )
...
should save memory since we only store mask in bool instead of the upcasted used in mul
2024-09-27 11:11:43 -04:00
Francis Lata
23563c84d6
Merge branch 'master' into retinanet_mlperf
2024-09-27 07:12:52 -07:00
qazal
76b3c1e818
add all realized Buffers to schedule graph edges [run_process_replay] ( #6786 )
...
* add realized Buffers to bufs
* simpler checks
2024-09-27 19:25:51 +08:00
qazal
568c97f7a2
add UOp.define_global [run_process_replay] ( #6787 )
...
* add UOp.define_global [run_process_replay]
* no src
2024-09-27 19:24:03 +08:00
nimlgen
b95f47784a
qcom sleep when sync ( #6785 )
...
* qcom sleep when sync
* linter
* short
2024-09-27 19:14:10 +08:00
qazal
fb3fe6f39b
better VIZ ( #6781 )
...
* ui changes
* make kernels global
* dont save buffers when running VIZ=1
* remove flex in layout
* use os.execv
* del server thread
* server close
* cleanup
* logs cleanup
* rm getenv
* cleanups
* remove global
2024-09-27 18:38:31 +08:00
chenyu
2fc26890c9
default BS=9 in handcode_opt bert ( #6783 )
...
using 54 for 6 gpus now, and 2 is not a good default
2024-09-27 04:38:16 -04:00
George Hotz
9a3f6f392d
llm.c tok/s
2024-09-27 00:46:18 -07:00
George Hotz
b0e70ab04f
llm.c updates
2024-09-27 15:25:59 +08:00
George Hotz
eaa1e0eeeb
rename constant_folder to sym [run_process_replay] ( #6780 )
2024-09-27 14:54:54 +08:00
qazal
900b21ef0c
viz delete const after fold ( #6778 )
...
* viz delete const after fold
* add base to tests
2024-09-27 11:58:01 +08:00
qazal
94e43dc49a
add Buffer.to_uop [run_process_replay] ( #6777 )
2024-09-27 11:41:23 +08:00
qazal
98a81b36e1
viz table view ( #6743 )
...
* fix matcher with ctx
* current_kernel fix
* add table
* make the right things clickable
* some more init work
* add kernel resizer
* Revert "add kernel resizer"
This reverts commit 035eef3703 .
* allow scroll
2024-09-27 10:26:46 +08:00
Francis Lata
211b04ba2c
Merge branch 'master' into retinanet_mlperf
2024-09-26 15:03:00 -07:00
chenyu
bea7ed5986
add RUNMLPERF=1 to bert dev_run.sh ( #6775 )
...
already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data
2024-09-26 11:00:49 -04:00
George Hotz
c178dc1071
faster uops ci [run_process_replay] ( #6774 )
2024-09-26 20:15:01 +08:00
George Hotz
249af24f18
metal bfloat as cast ( #6773 )
2024-09-26 19:31:40 +08:00
George Hotz
ed2f28388f
render cast is rewrite rules [run_process_replay] ( #6772 )
...
* render cast is rewrite rules [run_process_replay]
* move load/store to rewrite rules
* render_alu smaller
* render_gep
2024-09-26 19:03:31 +08:00
nimlgen
3c56aeee70
add Tensor.from_blob ( #6765 )
...
* draft tensor from pointer init
* some docs and types
* comment
* cleaner
* test
* malloc
* qcom cl interop
* jit example
* cleaner
* dealoc
* wording
* docs
2024-09-26 18:33:19 +08:00
George Hotz
14ad47b515
rewrite to use uops if ( #6764 )
...
* rewrite to use uops if
* does this pass
* careful penalty
* fix tests
* remove unused stuff
* that's a cstyle rewrite
* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
George Hotz
7e7184bb13
cleaner ptx match rules [run_process_replay] ( #6770 )
...
* cleaner ptx match rules [run_process_replay]
* clean up load/store rules
* now that's clean
* oops, typo
* cast back to bool
2024-09-26 17:44:10 +08:00
chenyu
12de203a43
add IGNORE_JIT_FIRST_BEAM to bert scripts ( #6769 )
...
* update bert BEAM params
copied from resnet to start with
* just IGNORE_JIT_FIRST_BEAM
2024-09-26 05:38:24 -04:00
Francis Lata
ea05de325c
Merge branch 'master' into retinanet_mlperf
2024-09-26 02:20:28 -07:00
wozeparrot
15cd42cfb9
feat: support TRACEMETA=2 in handcode_opt ( #6767 )
2024-09-26 16:58:29 +08:00
chenyu
5a5fbfa1eb
smaller bert script change ( #6768 )
...
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
wozeparrot
abd484a9f7
fix: need numpy for docs and testing ( #6766 )
2024-09-26 16:44:59 +08:00
wozeparrot
2b899164c6
no numpy ( #6751 )
2024-09-26 16:40:18 +08:00
George Hotz
7fca0bc912
use pattern matcher for image [run_process_replay] ( #6762 )
...
* use pattern matcher for image [run_process_replay]
* try again
* this
2024-09-26 15:49:09 +08:00
qazal
197f8fd986
early uop globals with Buffer ( #6753 )
2024-09-26 15:34:21 +08:00
George Hotz
e999281502
match_to_scalar ( #6761 )
2024-09-26 14:50:47 +08:00
George Hotz
0c7d34ceb7
did vload do anything? [run_process_replay] ( #6760 )
2024-09-26 14:46:16 +08:00
qazal
ee4feedb77
delete test_variable_const [run_process_replay] ( #6757 )
...
* delete test_variable_const [run_process_replay]
* don't allow variable UPat
2024-09-26 12:27:11 +08:00
chenyu
0424c4967d
fix handcode_opt.py for bert ( #6756 )
2024-09-26 00:20:24 -04:00
chenyu
396c96357b
update mlperf bert scripts ( #6755 )
...
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
George Hotz
717b394391
remove defaultdict from PatternMatcher [run_process_replay] ( #6754 )
...
* remove defaultdict from PatternMatcher [run_process_replay]
* nicer way to write that
* same line count
* tpm too
2024-09-26 11:25:01 +08:00
George Hotz
7e73c7b3cc
hotfix: bump stable diffusion val distance
2024-09-26 11:15:29 +08:00
George Hotz
ff880f5be4
hotfix: force_transcendental to fix process replay
2024-09-26 11:13:16 +08:00
George Hotz
a6a70aa4bd
add optional NEG and SUB ( #6750 )
...
* add optional NEG and SUB
* describe that compute + optional mulacc
* ptx cleanup
* lil cleanups
2024-09-26 10:50:53 +08:00
George Hotz
197dbbda0f
add UnaryOps.NEG + BinaryOps.SUB so process replay can work
2024-09-26 10:36:33 +08:00
George Hotz
b199b699ed
use shl everywhere ( #6744 )
...
* use shl everywhere
* fix parens
* late patterns
* works as an extra pass
* ptx
2024-09-26 09:59:36 +08:00
qazal
88160e59b2
gate engine.graph imports [run_process_replay] ( #6748 )
2024-09-26 09:13:49 +08:00
qazal
12e4a4900a
hotfix: missing return in METAL dm benchmark ( #6749 )
2024-09-26 09:12:38 +08:00
Francis Lata
6ccb790371
Merge branch 'master' into retinanet_mlperf
2024-09-25 17:50:55 -07:00
Francis Lata
979070c327
Merge branch 'master' into retinanet_mlperf
2024-09-25 17:26:37 -07:00
qazal
8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] ( #6737 )
...
* gc tests for buffer schedule [run_process_replay]
* assert global counters, maybe del
* check init
* rm global counters
2024-09-26 08:26:31 +08:00
qazal
b629a7998d
early assert buffer count limit [run_process_replay] ( #6746 )
...
* better error message for buffer count limit [run_process_replay]
* 3.9 needs that
* assert ScheduleItem
* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
Francis Lata
b7a8de1a4e
Merge branch 'master' into retinanet_mlperf
2024-09-25 10:57:32 -07:00
wozeparrot
4ebc9589a6
feat: make buffer ( #6745 )
2024-09-25 18:31:03 +08:00