Commit Graph

6127 Commits

Author SHA1 Message Date
chenyu
5a5fbfa1eb smaller bert script change (#6768)
only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently
2024-09-26 04:54:28 -04:00
wozeparrot
abd484a9f7 fix: need numpy for docs and testing (#6766) 2024-09-26 16:44:59 +08:00
wozeparrot
2b899164c6 no numpy (#6751) 2024-09-26 16:40:18 +08:00
George Hotz
7fca0bc912 use pattern matcher for image [run_process_replay] (#6762)
* use pattern matcher for image [run_process_replay]

* try again

* this
2024-09-26 15:49:09 +08:00
qazal
197f8fd986 early uop globals with Buffer (#6753) 2024-09-26 15:34:21 +08:00
George Hotz
e999281502 match_to_scalar (#6761) 2024-09-26 14:50:47 +08:00
George Hotz
0c7d34ceb7 did vload do anything? [run_process_replay] (#6760) 2024-09-26 14:46:16 +08:00
qazal
ee4feedb77 delete test_variable_const [run_process_replay] (#6757)
* delete test_variable_const [run_process_replay]

* don't allow variable UPat
2024-09-26 12:27:11 +08:00
chenyu
0424c4967d fix handcode_opt.py for bert (#6756) 2024-09-26 00:20:24 -04:00
chenyu
396c96357b update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
George Hotz
717b394391 remove defaultdict from PatternMatcher [run_process_replay] (#6754)
* remove defaultdict from PatternMatcher [run_process_replay]

* nicer way to write that

* same line count

* tpm too
2024-09-26 11:25:01 +08:00
George Hotz
7e73c7b3cc hotfix: bump stable diffusion val distance 2024-09-26 11:15:29 +08:00
George Hotz
ff880f5be4 hotfix: force_transcendental to fix process replay 2024-09-26 11:13:16 +08:00
George Hotz
a6a70aa4bd add optional NEG and SUB (#6750)
* add optional NEG and SUB

* describe that compute + optional mulacc

* ptx cleanup

* lil cleanups
2024-09-26 10:50:53 +08:00
George Hotz
197dbbda0f add UnaryOps.NEG + BinaryOps.SUB so process replay can work 2024-09-26 10:36:33 +08:00
George Hotz
b199b699ed use shl everywhere (#6744)
* use shl everywhere

* fix parens

* late patterns

* works as an extra pass

* ptx
2024-09-26 09:59:36 +08:00
qazal
88160e59b2 gate engine.graph imports [run_process_replay] (#6748) 2024-09-26 09:13:49 +08:00
qazal
12e4a4900a hotfix: missing return in METAL dm benchmark (#6749) 2024-09-26 09:12:38 +08:00
qazal
8a15ccb414 start gc/mem usage tests for buffer schedule [run_process_replay] (#6737)
* gc tests for buffer schedule [run_process_replay]

* assert global counters, maybe del

* check init

* rm global counters
2024-09-26 08:26:31 +08:00
qazal
b629a7998d early assert buffer count limit [run_process_replay] (#6746)
* better error message for buffer count limit [run_process_replay]

* 3.9 needs that

* assert ScheduleItem

* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
wozeparrot
4ebc9589a6 feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
mesozoic-egg
992cde05d7 Metal with CDLL instead of py-objc (#6545)
* Add CDLL interface for metal

* remove two unused functions

* Cover most of the API methods

* switch to cdll

* directly call objc message in ops_metal

* keep only obj interface

* Use direct message sending for graph

* may have found a solution to the memoryview on ctypes pointer

* buf indexing bug fixed

* fix c_int

* fix c int to bytes

* fix gpu time bug

* line savings for cdll metal core

* wip

* c int bug

* fix buf casting

* dedup for c_void_p

* dedup for c_void_p

* linter fix

* remove unused stuff

* my py fix

* more mypy error fix

* line savings

* line savings

* rename send_message to msg; add __hash__ and __eq__ for dedup

* wip

* refactor

* refactor

* remove named import from ctypes

* forgot to change variable name

* file reorg, put support.py to ops_metal

* refactor

* hash error

* remove to_ns_array

* test oom exception, fix exception change

* typevar for msg

* add back dedup

* test for compile error

* move constant to graph

* move header constant around

* get label for icb buffer

* check icb label using "in"

* wip fixing mypy reported error

* fixed mypy error

* code formatting

* all_resources dedup match previous

* code formatting

* code formatting; buffer set to objc_id

* revert changes on buf for the manual release, seems like _free is not always called

* skip unless on metal, for test_metal

* fix premature mem release causing seg fault

* test_metal check for device before importing

* Buffer should only be released under _free explicitly

* mypy fixes

* change object ownership

* test compile success

* lint fixes

* remove load_library

* wrap sel_register in cache

* simplify to_struct

* swap lines

* fix type error in to_struct

* bump line to 9800

* remove pyobjc from setup.py

* command buffer should be objc_instance and get released

* stringWithUTF8String: returns objc_instance

* Use constant for MTLPipelineOptionNone

* better explanation for [MTLBuffer contents:] return

* Use dyld_find in case the path differs

* trailing whitespace

* handle exception for methods that take error:

* load /System/Library instead of /Library

* Init c_void_p with None instead of zero for error objects

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-25 17:43:01 +08:00
George Hotz
cd534dee11 cstyle changes that don't pass process replay (#6734)
* cstyle changes that don't pass process replay

* add constant folder back there

* cleanups

* const

* fix some tests

* bfloat16 too

* complete set of types

* that cast shouldn't be needed

* that was a questionable test
2024-09-25 17:33:34 +08:00
George Hotz
232edcfd4f cast bool for type verify [run_process_replay] (#6742) 2024-09-25 17:12:16 +08:00
George Hotz
cb22ef379a truncate consts early (#6741)
* truncate consts early

* ptx still fails

* Update dtype.py
2024-09-25 16:49:51 +08:00
nimlgen
e31552e2e0 qcom reinit queue on exec (#6728)
* qcom setup on exec as gpu=1

* linter

* gpulike

* offsets
2024-09-25 16:08:50 +08:00
George Hotz
882339f729 remove parens from neg (#6738) 2024-09-25 15:38:20 +08:00
qazal
5ad2f95d01 process replay diff stats (#6736)
* process replay diff stats

* fix tuples
2024-09-25 15:19:56 +08:00
nimlgen
56979aa3ed qcom ioctl log levels (#6735) 2024-09-25 14:59:27 +08:00
chenyu
66af8bb54c use UOp.replace and UOp.define_var in validhack (#6730)
easier to see the diff in replacement
[run_process_replay]
2024-09-25 02:51:34 -04:00
chenyu
ff25bfb1b0 conv backward tests in test_simplify_valid_idx (#6727)
the backward idx is pretty ugly now
2024-09-25 02:51:07 -04:00
qazal
6c69fec1ef viz more info for rewrite location (#6729) 2024-09-25 14:49:40 +08:00
George Hotz
39f78619ff cstyle replay [run_process_replay] (#6731)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff

* cstyle changes that [run_process_replay]
2024-09-25 14:26:05 +08:00
nimlgen
e1caa24a92 qcom fix binded queue might be overwritten (#6712) 2024-09-25 12:45:23 +08:00
George Hotz
dd575da7ee real minimum cstyle change (#6709)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff
2024-09-25 12:40:46 +08:00
chenyu
e6a1b5aa8f more test_simplify_valid_idx cleanup (#6726)
moved UOps.VECTORIZE of idx into the helper
2024-09-24 23:47:42 -04:00
chenyu
14524eeddc test_image_valid.py -> test_simplify_valid_idx.py (#6724)
restructure the tests, will use the same file for non-image tests
2024-09-24 23:32:27 -04:00
qazal
e0d8685c99 test_masked_upcast_wino check device buf_max (#6723) 2024-09-25 11:26:53 +08:00
George Hotz
f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108 add new model CI 2024-09-25 10:23:06 +08:00
ttomsa
76bd4c7d5f advanced setitem (#6262)
* advanced setitem draft

* add setitem tests

* fix for tests

* small change

* handle repeated indices with test

* fix v broadcasting to mask

* clean up a bit

* open more tests

* clean up, fixes issue with scalar tensor index

* fix

* fix index_put_ and linter

* add type annotation

* done

* remove non contiguous hack

* woops linter

* name fix

* add back type notation

* more type notation

* final

* linter

* check lazydata not shared

* no numpy

* no numpy

* rename

* index benchmark

* linter

* no cloning time

* rm benchmark

* new function

* rm contiguous and cast early

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-24 22:14:59 -04:00
qazal
3bf25aae78 start work on global buffer count limit [run_process_replay] (#6722)
* add a bufs_max option

* simple spec
2024-09-25 09:51:56 +08:00
George Hotz
b0ffe2452b bump line count to 9800 2024-09-25 09:15:30 +08:00
chenyu
5c240c34aa split validhack into simplify idx and drop valids (#6719)
* split validhack into simplify idx and drop valids

will be using the simplify idx for non-image buffer
[run_process_replay]

* shorter
2024-09-24 09:40:27 -04:00
qazal
cefc3e9382 make all schedules immutable [run_process_replay] (#6718)
* compute inputs and outputs in LBScheduleItem [run_process_replay]

* simpler metadata, delete __hash__

* no dynamic field

* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal
29330014ab give FUZZ_SCHEDULE views a base (#6717)
* memoryview to bytes

* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen
f0019ad29c bump ci test timeout for test_speed_exec_time (#6715)
* bump ci test timeout for test_speed_exec_time

* more
2024-09-24 18:44:09 +08:00
qazal
1c03fb69c9 viz dedup assert groupby ctx [run_process_replay] (#6714) 2024-09-24 18:17:21 +08:00
chenyu
8d75326cb5 do not fold var with min==max (#6713)
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00