nimlgen
e3bb85fd0e
amd timeline semaphores ( #4416 )
...
* amd timeline semaphores
* v2
* fixes
* reset signals
* fix
* rollover test
* small fixes
* linter
* copyin
2024-05-07 11:17:32 +03:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
qazal
35dfbc6354
rand_for_dtype helper ( #4459 )
2024-05-07 00:03:42 +03:00
nimlgen
a3140c9767
nv boost subdevice ( #4456 )
2024-05-06 23:05:20 +03:00
Francis Lam
47750e65fd
kernel: un-reverse the order of the local indices ( #4454 )
...
no change to performance or behavior. new LOCALS are added to the
left side of the LOCALS block (to the left of the first_reduce).
2024-05-06 15:21:27 -04:00
chenyu
5e036cd0b3
test unary and more reduces in test_flopcounter ( #4455 )
...
cannot really catch a spec change error without testing the new spec explicitly, but we don't intended to change the lazy spec lightly
another possible way to catch reduce flopcounter shape would be type checking InterpretedFlopCounter and throw error if `in` results in `Never`
2024-05-06 15:15:16 -04:00
nimlgen
d0b8862dea
fix out of resource kernels on nv ( #4450 )
...
* fix out of resource kernels on nv
* better comment
* noqa
* noqa 2
* linter
2024-05-06 19:24:20 +03:00
George Hotz
f4e49a7c1a
resnet 50 opt: correct loop + LARS ( #4449 )
...
* correct loop + LARS
* ops
2024-05-06 08:01:26 -07:00
chenyu
292ce64ad7
move acc_dt out of lazy ( #4382 )
...
move the logic to tensor.py for forward, and function.py for two places in backward (expand and max)
2024-05-06 07:41:25 -07:00
nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
George Hotz
fc995d4446
add backward to handcode_resnet50_opt
2024-05-06 06:42:26 -07:00
qazal
6dbe5585b0
batchnorm + conv backward in test_schedule ( #4420 )
...
* test both optims
* batchnorm_backward
2024-05-06 16:40:17 +03:00
Timmy
3f3c973022
Multiple Reduce Kernels - kernel properly orders reduceops ( #4418 )
...
* enable kernel with multiple reduceops
* copy self.reduceops
* assert only one reduceop per kernel
* kernel.py dfs order
* linters
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-05-06 13:54:44 +03:00
wozeparrot
603d3a351b
feat: allow keeping multiple cookies ( #4440 )
2024-05-05 19:26:48 -07:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
709410071c
mlperf/resnet: updated BEAM params to increase performance ( #4443 )
2024-05-05 21:49:46 -04:00
Francis Lam
c8595a9655
update sops.gz, fix tests and add new linearizer test ( #4437 )
...
* update sops.gz, fix tests and add new linearizer test
* remove METAL CI skip for test_failure_22
* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
wozeparrot
9ad3d0520a
hotfix: npy is also ok ( #4439 )
2024-05-05 13:48:54 -07:00
chenyu
d0eb1540d5
helpers.diskcache_clear ( #4436 )
...
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz
595a6e3069
test_fold_conv_relu_backward test
2024-05-05 11:13:43 -07:00
George Hotz
cc16f644d0
hotfix: remove FAKE buffer from graph
2024-05-05 10:52:41 -07:00
qazal
760776c59d
merge EfficientNet to C with clang job ( #4426 )
...
* merge ImageNet to C with linters
* add to clang
* delete from linter
2024-05-05 20:33:12 +03:00
chenyu
3b30756cbb
update mlperf submission system ( #4435 )
...
more required fields.
2024-05-05 13:19:07 -04:00
George Hotz
f95658bc3e
hotfix: pickle jit works if you delete the function
2024-05-05 10:14:03 -07:00
George Hotz
12be536c06
Clang graph ( #4424 )
...
* clang graph runner
* render_dtype
* name it ClangGraph
* JIT=2
* JIT=2 goes there
* JIT as context var
2024-05-05 09:54:12 -07:00
David Hou
544431c388
refactor: pass reduceop into global_load ( #4417 )
...
* pass reduceop directly to global_load
* typing
* make mypy happy :/
* cede a line to mypy :(
* fold in acc_const
* add todo
2024-05-05 19:43:48 +03:00
geohotstan
874dfc556c
update setitem tests to test for currently supported cases ( #4334 )
...
* tests, tests, tests
* one more test
* tests tests tests tests
* t e s t
* a few more
2024-05-05 11:59:13 -04:00
chenyu
fc9e58e482
Revert "refactor sparse_categorical_crossentropy ( #4406 )" ( #4429 )
...
This reverts commit c7368515d2 .
2024-05-05 02:30:37 -04:00
David Hou
c0a048c044
batchnorm d(var)/d(mean) = 0 ( #4430 )
...
* d(var)/d(mean) = 0
* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz
e2eab9c2b3
hotfix: disk is okay in child process
2024-05-04 18:18:31 +00:00
George Hotz
cf33afa778
don't open devices from children ( #4425 )
...
* don't open devices from children
* correct way to do this
* fix Device.DEFAULT and add back JITBEAM
2024-05-04 10:35:40 -07:00
qazal
fa17dcaf07
Fix llm.c/export.py ( #4423 )
...
* fix headers
* add CI
* add stdio
* merge clang tests
* revert llm.c
* revert ci
* Revert "revert llm.c"
This reverts commit 5fd17e3c8b .
2024-05-04 19:37:10 +03:00
George Hotz
cb7289f9c9
remove clang program header ( #4422 )
...
* remove clang program header
* proper max
* bools are numbers
* fix compile enet
2024-05-04 08:38:01 -07:00
qazal
267bbb57f9
Revert "Add insert_before to Linearizer Functions ( #4320 )" ( #4421 )
...
This reverts commit 664b563c91 .
2024-05-04 17:50:21 +03:00
qazal
5f3bae378f
search children in fusion ( #4322 )
...
* scheduler diff
* tests diff
* new changes
* realizes
* chores
* assign
* kind of r3
* forced_realize wont do it
* with forced_realize
* start with children
* test search
* r3 with parents
* diff cleanup
* add children
* crossing assign
* late fuse descendants
* update kernel counts
* assign diff doesnt belong here
2024-05-04 17:22:15 +03:00
qazal
249cadd106
fusing crossing diamond assign ( #4403 )
...
* refactor scheduler parents search
* assign target
* unit test
* can't chase this
2024-05-04 15:19:48 +03:00
George Hotz
9fc4465557
subbuffer support ( #4397 )
...
* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
2024-05-03 18:05:57 -07:00
chenyu
c7368515d2
refactor sparse_categorical_crossentropy ( #4406 )
...
factor out the -1 * and / loss_mask.sum() for both smoothing and non-smoothing terms
2024-05-03 14:28:36 -04:00
qazal
3401734e54
infra for scheduler process replay ( #4405 )
...
* use getenv
* capture ast
* fix graph
* replay schedules
* exec
2024-05-03 20:29:13 +03:00
chenyu
473ecb978a
remove SPLIT_REDUCEOP=1 from resnet scripts ( #4404 )
...
SPLIT_REDUCEOP=1 is default
2024-05-03 12:36:23 -04:00
David Hou
b767d59684
resnet trainer: keep old cookie around until next step has been queued ( #4401 )
...
* keep old cookie around until next step has been queued (-10ms 6gpu)
* also for eval
* drop cookie before data_get?
* Revert "drop cookie before data_get?"
This reverts commit b01e6aa2b2 .
* Revert "Revert "drop cookie before data_get?""
This reverts commit 23464e73d4 .
2024-05-03 12:15:21 -04:00
qazal
cf3ccb809f
refactor scheduler parents search ( #4402 )
2024-05-03 17:16:34 +03:00
George-the-1st
0627e26140
Added missing unittest execution code ( #4400 )
...
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
chenyu
d4062cb6fc
NV tensor_cores in kernel.py ( #4399 )
2024-05-02 22:33:08 -04:00
qazal
0deaaf2bc8
partial fusion spec ( #4398 )
2024-05-03 04:14:23 +03:00
chenyu
2c3b7f8e70
pad resnet training data with training data mean ( #4369 )
...
update model_train resnet to pad training
2024-05-02 20:26:15 -04:00
Francis Lam
3cf8291f2f
mlperf/resnet: update beam params to increase time and quality ( #4396 )
...
* mlperf/resnet: update beam params to increase time and quality
* revert upcast 8 in search space and add rocm setup function
* refactor to independent setup.sh script
2024-05-02 20:14:46 -04:00
nimlgen
ca6c8ae739
factor out resource access logic in multigraph base class ( #4385 )
...
* factor out resource access logic in multigraph base class
* hsa fixes
* clean
* linter
* linter 2
* not need this
2024-05-03 00:38:22 +03:00
chenyu
ab01a9433d
resnet eval 4n+3 if epoch < 33 ( #4391 )
...
the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes
2024-05-02 16:52:07 -04:00
Francis Lam
7c8401fc65
search: skip timing the unoptimized kernel ( #4395 )
...
* search: skip timing the unoptimized kernel
also ensure the return the unoptimized kernel if no opts are valid
and refactor debugging to a single BEAM_DEBUG variable
* stop early on fast kernels that can't improve enough
2024-05-02 16:48:49 -04:00