chenyu
d4062cb6fc
NV tensor_cores in kernel.py ( #4399 )
2024-05-02 22:33:08 -04:00
qazal
0deaaf2bc8
partial fusion spec ( #4398 )
2024-05-03 04:14:23 +03:00
chenyu
2c3b7f8e70
pad resnet training data with training data mean ( #4369 )
...
update model_train resnet to pad training
2024-05-02 20:26:15 -04:00
Francis Lam
3cf8291f2f
mlperf/resnet: update beam params to increase time and quality ( #4396 )
...
* mlperf/resnet: update beam params to increase time and quality
* revert upcast 8 in search space and add rocm setup function
* refactor to independent setup.sh script
2024-05-02 20:14:46 -04:00
nimlgen
ca6c8ae739
factor out resource access logic in multigraph base class ( #4385 )
...
* factor out resource access logic in multigraph base class
* hsa fixes
* clean
* linter
* linter 2
* not need this
2024-05-03 00:38:22 +03:00
chenyu
ab01a9433d
resnet eval 4n+3 if epoch < 33 ( #4391 )
...
the rule is as thoroughly as 4n+k and we can stop the clock as soon as eval hits target. this can save 24 evals or 12 minutes
2024-05-02 16:52:07 -04:00
Francis Lam
7c8401fc65
search: skip timing the unoptimized kernel ( #4395 )
...
* search: skip timing the unoptimized kernel
also ensure the return the unoptimized kernel if no opts are valid
and refactor debugging to a single BEAM_DEBUG variable
* stop early on fast kernels that can't improve enough
2024-05-02 16:48:49 -04:00
Francis Lam
5c5b40880f
search: fix edge cases on screening potential ops ( #4394 )
...
* search: fix edge cases on screening potential ops
won't change correctness, but will save a little python time by
properly deduplicating potential actions
* check for de-duplication instead of exact valid actions
* refactor long line
2024-05-02 14:53:05 -04:00
George Hotz
89030b238a
add consecutive property to shapetracker
2024-05-02 10:41:28 -07:00
George Hotz
2786dff26d
new disk tensor tests ( #4393 )
2024-05-02 08:54:44 -07:00
chenyu
7492e5d3e7
resnet correct log name for red ( #4390 )
2024-05-02 10:58:55 -04:00
chenyu
bf31837e6d
resnet correct steps_in_val_epoch in logging ( #4389 )
...
also added random seed from system in scripts
2024-05-02 10:51:36 -04:00
George Hotz
c8a2047377
testing for all reduce ( #4387 )
2024-05-02 06:34:10 -07:00
ym555
3113785604
Llama 3 Models ( #4339 )
...
* Full Impl
* fix test
* Fix inference loop
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-05-02 06:06:07 -07:00
qazal
0b47818e0f
simpler reduceop children chasing ( #4350 )
...
* simplest case
* midreduce case
* all tests
* pending things
* unify tests
2024-05-02 15:15:30 +03:00
chenyu
22376e53b7
resnet mlperf logging ( #4361 )
...
* resnet mlperf logging
* cropping too much?
2024-05-02 00:00:04 -04:00
George Hotz
f635c4d273
fix define global ( #4383 )
...
* fix define global
* remove name from DEFINE_GLOBAL
* fix fuzzing
* fix ptx
* fix python
2024-05-01 22:32:56 -04:00
chenyu
ad116dc5c6
fill in mlperf system description ( #4381 )
...
it did not ask too many details. will put software versions later with tinygrad commit.
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_red.json training 4.0.0
INFO - System description checker passed for tinybox red
```
```
python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v4.0/tinycorp/systems/tinybox_green.json training 4.
0.0
INFO - System description checker passed for tinybox green
```
2024-05-01 16:47:45 -04:00
chenyu
9358b62073
rename resnet script to dev_beam.sh and dev_run.sh ( #4379 )
...
final run_and_time needs to be one script for both. rename the old scripts
2024-05-01 14:41:35 -04:00
chenyu
6628e13a5f
pad resnet eval data in model_train ( #4374 )
...
asserted if eval sample count is different from total eval file count.
2024-05-01 14:33:42 -04:00
George Hotz
105fbd7925
add 3080 support to NV
2024-05-01 11:17:01 -07:00
chenyu
826cccd54d
fix mean underflow for half tensor ( #4377 )
...
* fix mean underflow for half tensor
divide only the reduce factor. added unit test and non-nan assertion in resnet training. also added a failed test cast for symbolic shape var
* skip for python backend
2024-05-01 13:38:57 -04:00
chenyu
dce7ac0160
NOCLANG=1 for tinybox green ci. ( #4378 )
...
CLANG was disabled for tinybox red for speed
2024-05-01 13:31:01 -04:00
George Hotz
272bea5100
GraphRunner ( #4375 )
...
* GraphRunner
* new metal graph
* update hsa for graph runner
* put var_vals back
* move that clear after the capture
2024-05-01 10:27:13 -07:00
chenyu
077ea6926c
remove downcast_half in sum ( #4376 )
...
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
George Hotz
bd49d2854a
hotfix: skip fetch tests always
2024-05-01 08:43:26 -07:00
George Hotz
b683d0f496
hotfix: 100% accuracy is wrong
2024-05-01 08:07:18 -07:00
George Hotz
8bcf533a84
gitignore open-images-v6TEST
2024-05-01 13:55:38 +00:00
qazal
ea06f657df
fusion tests from test_opt ( #4357 )
...
* opt tests
* more sgd
* batchnorm
* models stay in external
2024-05-01 16:44:12 +03:00
George Hotz
995d264666
hotfix: add CNAME to put docs at docs.tinygrad.org
2024-04-30 23:17:35 -07:00
chenyu
683b7c605a
pad first batch of imagenet dataloader and update eval ( #4368 )
...
* pad first batch of imagenet dataloader and update eval
* pad zero instead of empty for training
2024-05-01 00:21:52 -04:00
wozeparrot
4a26718ca9
feat: tinyboxgreen ( #4365 )
2024-04-30 19:05:37 -04:00
Francis Lam
16838eae08
mlperf/resnet: update tinybox_red parameters to new best values ( #4364 )
...
about 27 minutes to setup and 345ms/110TF steps
2024-04-30 18:08:12 -04:00
George Hotz
27ee49bf30
tensor variable ( #4362 )
...
* tensor variable support
* consttype without variable?
* __setitem__
* symbolic mean works
* arange test
* more tests
* a few more tests
2024-04-30 14:08:57 -07:00
nimlgen
d2f89615b2
remove aql remnants in amd ( #4346 )
2024-04-30 23:36:02 +03:00
Francis Lam
0d33c54d99
kernel: change PADTO check to allow up to 4x padding ( #4354 )
...
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
c12bcabb07
search: fix actions space checks to ignore TC axis and amt ( #4360 )
...
* search: fix actions space checks to ignore TC axis and amt
* add test for number of actions in get_linearizer_actions
2024-04-30 14:02:22 -04:00
chenyu
fdc8fabae5
disable flaky mac gpt2 beam benchmark and add back cifar mac with JIT=2 ( #4358 )
...
* debug flaky mac gpt2 beam run
* disable for now
2024-04-30 10:41:37 -04:00
George Hotz
d325be2540
update docs ( #4356 )
...
* update docs
* nn.md
* mnist cleanups
* rhip test is very slow
2024-04-30 16:51:42 +09:00
Sohaib
a2d81514fd
just get dtype from kwargs ( #4355 )
2024-04-30 16:26:14 +09:00
Francis Lam
a9a1fa6bbf
wmma: add reduce axis choice to TC action space ( #4328 )
...
* wmma: add reduce axis choice to TC action space
* add test for TC multi-reduce axis choice
2024-04-29 19:15:39 -04:00
chenyu
93abcd3113
fix function.py sum backward without downcast_half ( #4353 )
...
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
Francis Lam
18c61ce077
test/fuzz_linearizer: add --atol/rtol and change half distribution ( #4352 )
2024-04-29 15:53:59 -04:00
Elias Wahl
71ff68b445
dropout after eval step ( #4351 )
2024-04-29 15:47:21 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-29 14:35:27 -04:00
Sohaib
61c97d5305
refactor ops_gpu ctypes ( #4331 )
...
* refactor ops_gpu ctypes
- remove redundant byref as ctypes automatically handles passing `type` as
`POINTER(type)`
- use walrus operator instead of init_c_var when possible
* clSetKernelArg argtype is POINTER(None)
2024-04-30 01:33:34 +08:00
qazal
cc1797673e
all fusion opportunities ( #4348 )
2024-04-29 19:32:23 +03:00
chenyu
f363f39e83
fix dtype of const folded sum ( #4349 )
...
const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype
2024-04-29 11:40:45 -04:00
geohotstan
bf412aeb80
use tolist instead of numpy for extracting parameters in onnx ( #4333 )
...
* still some numpy left
* all pass
* oops indent
* fix up safe_python
* to_python_const
2024-04-29 10:48:20 -04:00