chenyu
0ed755bcf5
resnet use EVAL_BS=192 ( #4482 )
...
* resnet use EVAL_BS=192
also lower green run BEAM_MIN_PROGRESS from 10 to 5
* BEAM_MIN_PROGRESS 5 is too close to setup limit
2024-05-08 22:29:27 -04:00
chenyu
1f6bf9d2f7
real diskcache_clear in model_train resnet ( #4445 )
...
clear cache if INITMLPERF is set, or running run_and_time. dev_beam and dev_run do not clear cache
2024-05-08 19:06:09 -04:00
chenyu
1b4645bea6
hotfix resnet move init_start to start of the script ( #4481 )
2024-05-08 19:03:52 -04:00
wozeparrot
a347ae94d6
feat: remove wandb ( #4480 )
2024-05-08 15:31:16 -07:00
qazal
00c309dfe2
trigger tc in remu ( #4479 )
2024-05-08 23:23:46 +03:00
nimlgen
e14d5b6fd7
nv fix oob qmd ptr ( #4478 )
...
* nv fix oob qmd ptr
* test kernargs no oob
2024-05-08 23:11:04 +03:00
chenyu
db7e15c46f
hotfix resnet only log epoch start with RUNMLPERF ( #4477 )
2024-05-08 15:14:41 -04:00
chenyu
062c6dd65d
mlperf logging, truncate dir in logs and log seed ( #4475 )
2024-05-08 12:54:02 -04:00
chenyu
b62a65b617
redo faster sparse_categorical_crossentropy ( #4461 )
...
update LR and DECAY in resnet default that help convergence too
2024-05-08 11:21:43 -04:00
Elias Wahl
e87460c7e2
bump version ( #4474 )
2024-05-08 07:48:42 -07:00
Szymon Ożóg
4eb6aef73c
Speed up graph rewrite ( #4473 )
...
* Speed up graph rewrite
* Bring back old name
2024-05-08 07:15:15 -07:00
Nicklas Boman
cc33947fa5
Update links in new docs ( #4363 )
...
tensor and nn links to tensor.md and nn.md
2024-05-08 06:13:00 -07:00
chenyu
36a1f38049
lazy folding: mul -1 is neg, and neg neg is noop ( #4472 )
2024-05-08 01:52:22 -04:00
chenyu
c508eb7425
revert the removal of CAST_BEFORE_VIEW ( #4471 )
...
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
George Hotz
5dbab7fae6
bring thneed back ( #4467 )
...
* bring thneed back
* simple thneed
* bug fixes in new thneed
* needs_load
* context
* move that there
* fix thneed size
* fix CI
* one memory planner
* assert on buffer size
2024-05-07 20:55:03 -07:00
chenyu
7eb035e7c5
stronger test case for half mean overflow ( #4470 )
2024-05-07 22:40:09 -04:00
chenyu
ca7300c783
fix half mean and its backward ( #4469 )
...
* fix half mean and its backward
cast to sum_acc_type, sum, div, then cast back
* mean dtype tests
2024-05-07 21:46:41 -04:00
Francis Lam
7da1b41f38
fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts ( #4468 )
...
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00
chenyu
46a793111b
test for LazyBuffer._view when mask out and degrade into const ( #4465 )
...
changed the condition from all 0 in masked dims to any 0 in masked. it's no-op because shapetracker rewrites whole mask to 0 if any dim has 0 as part of canonicalization
2024-05-07 12:56:23 -04:00
nimlgen
a1d350a810
nv timeline semaphores ( #4464 )
...
* nv timeline semaphores
* nv hcq fixes
2024-05-07 17:31:19 +03:00
nimlgen
e3bb85fd0e
amd timeline semaphores ( #4416 )
...
* amd timeline semaphores
* v2
* fixes
* reset signals
* fix
* rollover test
* small fixes
* linter
* copyin
2024-05-07 11:17:32 +03:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
qazal
35dfbc6354
rand_for_dtype helper ( #4459 )
2024-05-07 00:03:42 +03:00
nimlgen
a3140c9767
nv boost subdevice ( #4456 )
2024-05-06 23:05:20 +03:00
Francis Lam
47750e65fd
kernel: un-reverse the order of the local indices ( #4454 )
...
no change to performance or behavior. new LOCALS are added to the
left side of the LOCALS block (to the left of the first_reduce).
2024-05-06 15:21:27 -04:00
chenyu
5e036cd0b3
test unary and more reduces in test_flopcounter ( #4455 )
...
cannot really catch a spec change error without testing the new spec explicitly, but we don't intended to change the lazy spec lightly
another possible way to catch reduce flopcounter shape would be type checking InterpretedFlopCounter and throw error if `in` results in `Never`
2024-05-06 15:15:16 -04:00
nimlgen
d0b8862dea
fix out of resource kernels on nv ( #4450 )
...
* fix out of resource kernels on nv
* better comment
* noqa
* noqa 2
* linter
2024-05-06 19:24:20 +03:00
George Hotz
f4e49a7c1a
resnet 50 opt: correct loop + LARS ( #4449 )
...
* correct loop + LARS
* ops
2024-05-06 08:01:26 -07:00
chenyu
292ce64ad7
move acc_dt out of lazy ( #4382 )
...
move the logic to tensor.py for forward, and function.py for two places in backward (expand and max)
2024-05-06 07:41:25 -07:00
nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
George Hotz
fc995d4446
add backward to handcode_resnet50_opt
2024-05-06 06:42:26 -07:00
qazal
6dbe5585b0
batchnorm + conv backward in test_schedule ( #4420 )
...
* test both optims
* batchnorm_backward
2024-05-06 16:40:17 +03:00
Timmy
3f3c973022
Multiple Reduce Kernels - kernel properly orders reduceops ( #4418 )
...
* enable kernel with multiple reduceops
* copy self.reduceops
* assert only one reduceop per kernel
* kernel.py dfs order
* linters
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-05-06 13:54:44 +03:00
wozeparrot
603d3a351b
feat: allow keeping multiple cookies ( #4440 )
2024-05-05 19:26:48 -07:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
709410071c
mlperf/resnet: updated BEAM params to increase performance ( #4443 )
2024-05-05 21:49:46 -04:00
Francis Lam
c8595a9655
update sops.gz, fix tests and add new linearizer test ( #4437 )
...
* update sops.gz, fix tests and add new linearizer test
* remove METAL CI skip for test_failure_22
* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
wozeparrot
9ad3d0520a
hotfix: npy is also ok ( #4439 )
2024-05-05 13:48:54 -07:00
chenyu
d0eb1540d5
helpers.diskcache_clear ( #4436 )
...
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz
595a6e3069
test_fold_conv_relu_backward test
2024-05-05 11:13:43 -07:00
George Hotz
cc16f644d0
hotfix: remove FAKE buffer from graph
2024-05-05 10:52:41 -07:00
qazal
760776c59d
merge EfficientNet to C with clang job ( #4426 )
...
* merge ImageNet to C with linters
* add to clang
* delete from linter
2024-05-05 20:33:12 +03:00
chenyu
3b30756cbb
update mlperf submission system ( #4435 )
...
more required fields.
2024-05-05 13:19:07 -04:00
George Hotz
f95658bc3e
hotfix: pickle jit works if you delete the function
2024-05-05 10:14:03 -07:00
George Hotz
12be536c06
Clang graph ( #4424 )
...
* clang graph runner
* render_dtype
* name it ClangGraph
* JIT=2
* JIT=2 goes there
* JIT as context var
2024-05-05 09:54:12 -07:00
David Hou
544431c388
refactor: pass reduceop into global_load ( #4417 )
...
* pass reduceop directly to global_load
* typing
* make mypy happy :/
* cede a line to mypy :(
* fold in acc_const
* add todo
2024-05-05 19:43:48 +03:00
geohotstan
874dfc556c
update setitem tests to test for currently supported cases ( #4334 )
...
* tests, tests, tests
* one more test
* tests tests tests tests
* t e s t
* a few more
2024-05-05 11:59:13 -04:00
chenyu
fc9e58e482
Revert "refactor sparse_categorical_crossentropy ( #4406 )" ( #4429 )
...
This reverts commit c7368515d2 .
2024-05-05 02:30:37 -04:00
David Hou
c0a048c044
batchnorm d(var)/d(mean) = 0 ( #4430 )
...
* d(var)/d(mean) = 0
* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz
e2eab9c2b3
hotfix: disk is okay in child process
2024-05-04 18:18:31 +00:00