nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
qazal
6dbe5585b0
batchnorm + conv backward in test_schedule ( #4420 )
...
* test both optims
* batchnorm_backward
2024-05-06 16:40:17 +03:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
c8595a9655
update sops.gz, fix tests and add new linearizer test ( #4437 )
...
* update sops.gz, fix tests and add new linearizer test
* remove METAL CI skip for test_failure_22
* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
chenyu
d0eb1540d5
helpers.diskcache_clear ( #4436 )
...
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz
595a6e3069
test_fold_conv_relu_backward test
2024-05-05 11:13:43 -07:00
George Hotz
f95658bc3e
hotfix: pickle jit works if you delete the function
2024-05-05 10:14:03 -07:00
geohotstan
874dfc556c
update setitem tests to test for currently supported cases ( #4334 )
...
* tests, tests, tests
* one more test
* tests tests tests tests
* t e s t
* a few more
2024-05-05 11:59:13 -04:00
David Hou
c0a048c044
batchnorm d(var)/d(mean) = 0 ( #4430 )
...
* d(var)/d(mean) = 0
* drop the number in test_schedule!
2024-05-05 00:25:45 -04:00
George Hotz
cb7289f9c9
remove clang program header ( #4422 )
...
* remove clang program header
* proper max
* bools are numbers
* fix compile enet
2024-05-04 08:38:01 -07:00
qazal
267bbb57f9
Revert "Add insert_before to Linearizer Functions ( #4320 )" ( #4421 )
...
This reverts commit 664b563c91 .
2024-05-04 17:50:21 +03:00
qazal
5f3bae378f
search children in fusion ( #4322 )
...
* scheduler diff
* tests diff
* new changes
* realizes
* chores
* assign
* kind of r3
* forced_realize wont do it
* with forced_realize
* start with children
* test search
* r3 with parents
* diff cleanup
* add children
* crossing assign
* late fuse descendants
* update kernel counts
* assign diff doesnt belong here
2024-05-04 17:22:15 +03:00
qazal
249cadd106
fusing crossing diamond assign ( #4403 )
...
* refactor scheduler parents search
* assign target
* unit test
* can't chase this
2024-05-04 15:19:48 +03:00
George Hotz
9fc4465557
subbuffer support ( #4397 )
...
* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
2024-05-03 18:05:57 -07:00
qazal
3401734e54
infra for scheduler process replay ( #4405 )
...
* use getenv
* capture ast
* fix graph
* replay schedules
* exec
2024-05-03 20:29:13 +03:00
George-the-1st
0627e26140
Added missing unittest execution code ( #4400 )
...
same code as on every other test file, just missing from this one for some reason.
2024-05-02 22:34:30 -04:00
qazal
0deaaf2bc8
partial fusion spec ( #4398 )
2024-05-03 04:14:23 +03:00
Francis Lam
5c5b40880f
search: fix edge cases on screening potential ops ( #4394 )
...
* search: fix edge cases on screening potential ops
won't change correctness, but will save a little python time by
properly deduplicating potential actions
* check for de-duplication instead of exact valid actions
* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz
2786dff26d
new disk tensor tests ( #4393 )
2024-05-02 08:54:44 -07:00
George Hotz
c8a2047377
testing for all reduce ( #4387 )
2024-05-02 06:34:10 -07:00
qazal
0b47818e0f
simpler reduceop children chasing ( #4350 )
...
* simplest case
* midreduce case
* all tests
* pending things
* unify tests
2024-05-02 15:15:30 +03:00
George Hotz
f635c4d273
fix define global ( #4383 )
...
* fix define global
* remove name from DEFINE_GLOBAL
* fix fuzzing
* fix ptx
* fix python
2024-05-01 22:32:56 -04:00
chenyu
826cccd54d
fix mean underflow for half tensor ( #4377 )
...
* fix mean underflow for half tensor
divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var
* skip for python backend
2024-05-01 13:38:57 -04:00
chenyu
077ea6926c
remove downcast_half in sum ( #4376 )
...
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
George Hotz
bd49d2854a
hotfix: skip fetch tests always
2024-05-01 08:43:26 -07:00
qazal
ea06f657df
fusion tests from test_opt ( #4357 )
...
* opt tests
* more sgd
* batchnorm
* models stay in external
2024-05-01 16:44:12 +03:00
George Hotz
27ee49bf30
tensor variable ( #4362 )
...
* tensor variable support
* consttype without variable?
* __setitem__
* symbolic mean works
* arange test
* more tests
* a few more tests
2024-04-30 14:08:57 -07:00
Francis Lam
0d33c54d99
kernel: change PADTO check to allow up to 4x padding ( #4354 )
...
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
c12bcabb07
search: fix actions space checks to ignore TC axis and amt ( #4360 )
...
* search: fix actions space checks to ignore TC axis and amt
* add test for number of actions in get_linearizer_actions
2024-04-30 14:02:22 -04:00
George Hotz
d325be2540
update docs ( #4356 )
...
* update docs
* nn.md
* mnist cleanups
* rhip test is very slow
2024-04-30 16:51:42 +09:00
Francis Lam
a9a1fa6bbf
wmma: add reduce axis choice to TC action space ( #4328 )
...
* wmma: add reduce axis choice to TC action space
* add test for TC multi-reduce axis choice
2024-04-29 19:15:39 -04:00
chenyu
93abcd3113
fix function.py sum backward without downcast_half ( #4353 )
...
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
Francis Lam
18c61ce077
test/fuzz_linearizer: add --atol/rtol and change half distribution ( #4352 )
2024-04-29 15:53:59 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-29 14:35:27 -04:00
qazal
cc1797673e
all fusion opportunities ( #4348 )
2024-04-29 19:32:23 +03:00
chenyu
f363f39e83
fix dtype of const folded sum ( #4349 )
...
const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype
2024-04-29 11:40:45 -04:00
qazal
774a9b0bca
override assign_target in fuzz_schedule ( #4342 )
...
* store assign_targets
* cleanup
* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1
[MLPerf] UNet3D dataloader ( #4343 )
...
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
2024-04-28 22:34:18 -04:00
chenyu
c1d8d425eb
fix mean of half tensor if sum is greater than hlaf.max ( #4327 )
...
sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean
2024-04-28 18:04:54 -04:00
qazal
23445db2b9
no skipped tests in RHIP ( #4337 )
...
* delete skip
* delete split skip
* remu dev
* compiler fails here
* Revert "remu dev"
This reverts commit 28b933d4eb .
2024-04-28 12:23:05 -04:00
Obada Khalili
e4befa41d7
Fix in _reshape_mask ( #4332 )
...
* handle reshape with remainder in _reshape_mask
* remove trailing whitespce
* use helper_test_op to generate tensors from shapes
* test in shapetracket too
* remove whitespace
* revert property name in other class tests
2024-04-28 11:57:39 -04:00
Timmy
664b563c91
Add insert_before to Linearizer Functions ( #4320 )
...
* adding insert_before to linearizer functions
* uop insert_before test case
* formatting
* more formatting
* more formatting
* syntax
* removing self.cast
* addressing err
* removing noqa s
2024-04-28 11:38:36 -04:00
qazal
3372bea322
reduce children fusion tests ( #4321 )
...
* base tests
* real-world tests
2024-04-28 11:14:02 -04:00
chenyu
24a6342950
add mem/s to external_benchmark_resnet ( #4309 )
2024-04-26 20:07:17 -04:00
Szymon Ożóg
de832d26c6
disable bfloat16 from ptx tests ( #4305 )
2024-04-26 01:20:10 -04:00
Szymon Ożóg
f1ebcffb87
Ptx beam fix ( #4296 )
...
* Fix beam search for PTX
* fix ptr arm test
2024-04-25 15:39:39 -04:00
qazal
9a47ed0705
test crossing diamond assigns ( #4298 )
2024-04-25 21:52:05 +03:00
chenyu
5ae252ae83
use at least float32 for optim.lr ( #4297 )
...
* use at least float32 for optim.lr
when doing mixed precision training (float32 weight, default_float=half), still use float32 to store lr.
it would have been upcasted later in actual weight update, but would have lost precision.
this improved resnet convergence significantly
* undo type annotation
2024-04-25 14:42:28 -04:00
David Hou
6f792b727b
More improvements for resnet layer bench ( #4272 )
...
* fix first layer size, new schedule stuff
* estimates
* get different conv layers
* \r for estimated times
* E501
* space after comma
2024-04-25 12:40:49 -04:00