George Hotz
272bea5100
GraphRunner ( #4375 )
...
* GraphRunner
* new metal graph
* update hsa for graph runner
* put var_vals back
* move that clear after the capture
2024-05-01 10:27:13 -07:00
chenyu
077ea6926c
remove downcast_half in sum ( #4376 )
...
breaks boolean mean and other stuff
2024-05-01 11:46:44 -04:00
George Hotz
bd49d2854a
hotfix: skip fetch tests always
2024-05-01 08:43:26 -07:00
George Hotz
b683d0f496
hotfix: 100% accuracy is wrong
2024-05-01 08:07:18 -07:00
George Hotz
8bcf533a84
gitignore open-images-v6TEST
2024-05-01 13:55:38 +00:00
qazal
ea06f657df
fusion tests from test_opt ( #4357 )
...
* opt tests
* more sgd
* batchnorm
* models stay in external
2024-05-01 16:44:12 +03:00
George Hotz
995d264666
hotfix: add CNAME to put docs at docs.tinygrad.org
2024-04-30 23:17:35 -07:00
chenyu
683b7c605a
pad first batch of imagenet dataloader and update eval ( #4368 )
...
* pad first batch of imagenet dataloader and update eval
* pad zero instead of empty for training
2024-05-01 00:21:52 -04:00
wozeparrot
4a26718ca9
feat: tinyboxgreen ( #4365 )
2024-04-30 19:05:37 -04:00
Francis Lam
16838eae08
mlperf/resnet: update tinybox_red parameters to new best values ( #4364 )
...
about 27 minutes to setup and 345ms/110TF steps
2024-04-30 18:08:12 -04:00
George Hotz
27ee49bf30
tensor variable ( #4362 )
...
* tensor variable support
* consttype without variable?
* __setitem__
* symbolic mean works
* arange test
* more tests
* a few more tests
2024-04-30 14:08:57 -07:00
nimlgen
d2f89615b2
remove aql remnants in amd ( #4346 )
2024-04-30 23:36:02 +03:00
Francis Lam
0d33c54d99
kernel: change PADTO check to allow up to 4x padding ( #4354 )
...
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Elias Wahl
babe87a8ae
BERT: Checkpoint loading tests ( #4359 )
...
* Move checkpoint init to helpers. Add test
* linters
* Move the steps outside of the main train loop
* Move data_get
* data_get belongs to helpers
2024-04-30 14:43:41 -04:00
Francis Lam
c12bcabb07
search: fix actions space checks to ignore TC axis and amt ( #4360 )
...
* search: fix actions space checks to ignore TC axis and amt
* add test for number of actions in get_linearizer_actions
2024-04-30 14:02:22 -04:00
chenyu
fdc8fabae5
disable flaky mac gpt2 beam benchmark and add back cifar mac with JIT=2 ( #4358 )
...
* debug flaky mac gpt2 beam run
* disable for now
2024-04-30 10:41:37 -04:00
George Hotz
d325be2540
update docs ( #4356 )
...
* update docs
* nn.md
* mnist cleanups
* rhip test is very slow
2024-04-30 16:51:42 +09:00
Sohaib
a2d81514fd
just get dtype from kwargs ( #4355 )
2024-04-30 16:26:14 +09:00
Francis Lam
a9a1fa6bbf
wmma: add reduce axis choice to TC action space ( #4328 )
...
* wmma: add reduce axis choice to TC action space
* add test for TC multi-reduce axis choice
2024-04-29 19:15:39 -04:00
chenyu
93abcd3113
fix function.py sum backward without downcast_half ( #4353 )
...
without downcast_half, sum output dtype can be different from input dtype. cast back to input dtype in function.py
2024-04-29 17:53:02 -04:00
Francis Lam
18c61ce077
test/fuzz_linearizer: add --atol/rtol and change half distribution ( #4352 )
2024-04-29 15:53:59 -04:00
Elias Wahl
71ff68b445
dropout after eval step ( #4351 )
2024-04-29 15:47:21 -04:00
Elias Wahl
27613dd881
MLPerf BERT: Main training loop ( #4288 )
...
* BERT language modeling head + trunc normal initializers
* add train loop + helpers
* shuffle in dataloaders + slight changes in main loop
* beam change
* Minor changes
* random.shuffle
* HParam update
* Use deque for dataloader
* wandb bert project name
* half fixes
* BENCHMARK + remove epoch
* cast + print()
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-29 14:35:27 -04:00
Sohaib
61c97d5305
refactor ops_gpu ctypes ( #4331 )
...
* refactor ops_gpu ctypes
- remove redundant byref as ctypes automatically handles passing `type` as
`POINTER(type)`
- use walrus operator instead of init_c_var when possible
* clSetKernelArg argtype is POINTER(None)
2024-04-30 01:33:34 +08:00
qazal
cc1797673e
all fusion opportunities ( #4348 )
2024-04-29 19:32:23 +03:00
chenyu
f363f39e83
fix dtype of const folded sum ( #4349 )
...
const folding sum should return in the same dtype the same as regular sum, which can be different from input dtype
2024-04-29 11:40:45 -04:00
geohotstan
bf412aeb80
use tolist instead of numpy for extracting parameters in onnx ( #4333 )
...
* still some numpy left
* all pass
* oops indent
* fix up safe_python
* to_python_const
2024-04-29 10:48:20 -04:00
qazal
774a9b0bca
override assign_target in fuzz_schedule ( #4342 )
...
* store assign_targets
* cleanup
* override target
2024-04-29 11:04:04 +03:00
Francis Lata
bb849a57d1
[MLPerf] UNet3D dataloader ( #4343 )
...
* add support for train/val datasets for kits19
* split dataset into train and val sets
* add tests for kits19 dataloader
* add MLPerf dataset tests to CI
* update unet3d model_eval script
* fix linting
* add nibabel
* fix how mock dataset gets created
* update ref implementation with permalink and no edits
* clean up test and update rand_flip implementation
* cleanups
2024-04-28 22:34:18 -04:00
chenyu
82d0ed3cf3
cap default dataset wikipedia max_workers to 32 ( #4345 )
...
64 on tinybox OOM
2024-04-28 21:55:21 -04:00
chenyu
c1d8d425eb
fix mean of half tensor if sum is greater than hlaf.max ( #4327 )
...
sum of half does acc in float32 already, add an arg to not downcast to half and use that in mean
2024-04-28 18:04:54 -04:00
qazal
e027879475
hotfix: remove double assignment ( #4340 )
2024-04-28 13:41:31 -04:00
qazal
23445db2b9
no skipped tests in RHIP ( #4337 )
...
* delete skip
* delete split skip
* remu dev
* compiler fails here
* Revert "remu dev"
This reverts commit 28b933d4eb .
2024-04-28 12:23:05 -04:00
Obada Khalili
e4befa41d7
Fix in _reshape_mask ( #4332 )
...
* handle reshape with remainder in _reshape_mask
* remove trailing whitespce
* use helper_test_op to generate tensors from shapes
* test in shapetracket too
* remove whitespace
* revert property name in other class tests
2024-04-28 11:57:39 -04:00
Timmy
664b563c91
Add insert_before to Linearizer Functions ( #4320 )
...
* adding insert_before to linearizer functions
* uop insert_before test case
* formatting
* more formatting
* more formatting
* syntax
* removing self.cast
* addressing err
* removing noqa s
2024-04-28 11:38:36 -04:00
qazal
3372bea322
reduce children fusion tests ( #4321 )
...
* base tests
* real-world tests
2024-04-28 11:14:02 -04:00
Arnav Mehta
f3de17912f
added the download if not present missing function ( #4318 )
2024-04-28 16:31:08 +08:00
geohotstan
bc36940c28
fix ( #4319 )
2024-04-28 16:29:04 +08:00
nimlgen
8d1649d8c2
raise error when too many resources requested in nv ( #4324 )
2024-04-27 23:48:51 +03:00
qazal
c6c12ba94a
save schedule graph pre validation ( #4317 )
2024-04-27 12:06:15 +03:00
Victor Ziliang Peng
40264c7d1e
Update index.md ( #4315 )
2024-04-27 15:12:44 +08:00
chenyu
24a6342950
add mem/s to external_benchmark_resnet ( #4309 )
2024-04-26 20:07:17 -04:00
Francis Lam
1f2642c73b
kernel: fix calculation of smem size to ignore UNROLL ( #4308 )
...
* kernel: fix calculation of smem size to ignore UNROLL
* simplify prod array
2024-04-26 14:34:56 -04:00
Szymon Ożóg
de832d26c6
disable bfloat16 from ptx tests ( #4305 )
2024-04-26 01:20:10 -04:00
chenyu
ec65aea32f
resnet stop the script once hit target ( #4303 )
...
* resnet stop the script once hit target
* comment
2024-04-25 23:54:56 -04:00
chenyu
1891ebb655
make ring allreduce chunks a multiple of 2^n if possible ( #4302 )
...
in resnet, instead of chunking as [43691, 43691, 43691, 43691, 43690, 43690], chunk as [43712, 43712, 43680, 43680, 43680, 43680] and those can have 32 local.
more than 2X faster for the applicable kernels and overall 1% for resnet
2024-04-25 23:45:28 -04:00
George Hotz
1e37c4a7a1
minor llm.c improvements
2024-04-26 11:15:31 +08:00
chenyu
3ec4b745d6
JIT=2 for mac cifar benchmark ( #4300 )
...
also double BS for resnet training benchmark to match submission target
2024-04-25 18:33:40 -04:00
David Hou
c2dbe2a78b
new split reduce heuristic try 2 ( #4294 )
...
* new split reduce heuristic
* update comment
* rename
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-04-25 18:14:15 -04:00
Szymon Ożóg
f1ebcffb87
Ptx beam fix ( #4296 )
...
* Fix beam search for PTX
* fix ptr arm test
2024-04-25 15:39:39 -04:00