Francis Lata
95cdbbf237
add jit to the training loop
2025-01-22 12:31:29 -08:00
Francis Lata
efe64ebeaf
enable lr scheduler and fix benchmark timing
2025-01-22 09:56:38 -08:00
Francis Lata
66ff6cb37a
create the necessary samples per test case
2025-01-21 14:14:58 -08:00
Francis Lata
9b95d6d62c
setup openimages samples differently
2025-01-21 14:05:19 -08:00
Francis Lata
d1bc4aef94
do not realize when sharding model weights
2025-01-21 13:45:35 -08:00
Francis Lata
7f331d8836
fix dataloader script
2025-01-21 13:43:59 -08:00
Francis Lata
1bf5ee286b
Revert "debug dataset test failuire"
...
This reverts commit 1b2f9d7f50 .
2025-01-21 13:30:12 -08:00
Francis Lata
1b2f9d7f50
debug dataset test failuire
2025-01-21 13:23:50 -08:00
Francis Lata
7815d3ddff
Merge branch 'master' into retinanet_mlperf
2025-01-21 13:06:04 -08:00
nimlgen
c5e46c5eee
am: recover from any boot interrupt ( #8703 )
...
* am: recover from any load interrupt
* add fuzzer
* nu
2025-01-21 22:22:23 +03:00
chenyu
1e283c33d3
remove realize in bert model init [pr] ( #8707 )
2025-01-21 14:11:03 -05:00
George Hotz
018edd934b
don't use view in copy [pr] ( #8704 )
...
* don't use view in copy [pr]
* oh, remove double contig
* fix reps
2025-01-21 09:57:47 -08:00
qazal
d6bf1feaab
remove the "no copy" line from copy_to_device ( #8702 )
...
* delete the no copy one
* add tests
2025-01-21 17:09:33 +02:00
nimlgen
3628f89929
fix deallocate for subbuffers ( #8701 )
...
* fix deallocate for subbuffers
* forgot this
* rm name
* hmm
2025-01-21 16:34:19 +03:00
nimlgen
6733a3a96b
am: fix typo ( #8700 )
2025-01-21 14:35:15 +03:00
qazal
f0d424ecdf
Tensor UOps can become a buffer or const after scheduling ( #8698 )
...
* spec
* work
* update test_viewed_consts_do_not_realize
* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3
allow symbolic shape in tensor const parents [pr] ( #8699 )
2025-01-21 12:01:25 +02:00
nimlgen
2b239db5d2
temp() with usernames ( #8697 )
2025-01-21 12:26:43 +03:00
Francis Lata
bf36006ff0
set seed
2025-01-20 22:54:54 -08:00
Francis Lata
5d9a604963
add support for BENCHMARK
2025-01-20 22:47:23 -08:00
Francis Lata
be2e97260d
fix dtype for anchor inside dataloader and fix horizontal flip transformation
2025-01-20 22:45:25 -08:00
qazal
66ac0087e8
more high level contiguous tests + scheduler deletions [pr] ( #8695 )
...
* delete those
* move the upat too
* rename ops_folding to just sym
* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56
simplify tensors before scheduling [pr] ( #8580 )
...
* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-01-20 23:42:42 +02:00
Francis Lata
cd511384e2
move anchors as part of dataloader
2025-01-20 13:13:16 -08:00
qazal
02ad450e22
add failing assert for gradient realization [pr] ( #8692 )
2025-01-20 22:50:09 +02:00
qazal
b14c9848cc
small changes to make the tensor_map_simple diff cleaner [pr] ( #8691 )
2025-01-20 22:25:59 +02:00
Sieds Lykles
1a15c0e89d
Move define_acc down an unrolled add chain ( #8404 )
...
* Move define_acc down an unrolled add chain
* Prevent possible infinite recursion
* Add test
* Fix typo in test
* Move mulacc_unrolled to devoctorize + load_store_indexing pass
* Add test for mulacc_unrolled by itself
* undo formatter
* import from ops, not rewriter
* Add a const version
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 14:56:27 -05:00
geohotstan
dd82b4c913
make onnx runner a class ( #8647 )
...
* this
* clean up
* more clean ups and improve debug msg
* more correct training toggler
* remove manual training toggling
* change some variable names
* actually just add the training toggle for LIMIT envvar too
* more refinement
* __call__ and OnnxRunner
* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later
* ahhhh found another mistake
* remove limit from __call__
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 10:11:05 -08:00
George Hotz
46a8c5e1e5
delete forced_realize ( #8615 )
...
* delete forced_realize
* put that back
* expectedFailures
* cleaner create_subbuffer
* more comments
---------
Co-authored-by: qazal <qazal.software@gmail.com >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-01-20 09:40:36 -08:00
chenyu
679b1ad058
move softmax upcast to after subtracting max ( #8684 )
...
* move softmax upcast to after subtracting max
max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax
* skipUnless half
2025-01-20 12:16:32 -05:00
Francis Lata
575c748d94
fix wandb resuming feature
2025-01-20 07:22:16 -08:00
Francis Lata
a90a6e624d
add wandb
2025-01-20 07:07:51 -08:00
nimlgen
08ca871d77
am: remove pm block ( #8688 )
...
* am: remove pm block
* hm
* oops
2025-01-20 18:05:22 +03:00
Francis Lata
9402872d90
Merge branch 'master' into retinanet_mlperf
2025-01-20 06:51:12 -08:00
nimlgen
9d3c40601f
am: fast memory manager ( #8654 )
...
* start
* progress
* fixes
* smth
* mini fixes
* fix2
* ugh, need this for now
* faster
* cleanups
* tiny linters
* make mypy happier
* test & free pts
* ops
* linter
* cleanup vm
* fix
* remove map_from
* tiny fixes
* add test to ci
2025-01-20 16:58:22 +03:00
qazal
9e55495b4d
fold double contiguous [pr] ( #8687 )
2025-01-20 14:38:33 +02:00
qazal
ed63ff2372
Remove contiguous on buffer ( #8676 )
...
* remove contiguous on buffer
* spec
* make things that can't be images not images
2025-01-20 13:48:33 +02:00
qazal
3499a2c72d
start moving image things to rewrite rules ( #8678 )
...
* start moving image things to rewrite rules [pr]
* that too
* as expected
* fix
* Revert "fix"
This reverts commit fd03c9464b .
2025-01-20 13:34:29 +02:00
qazal
b1847d561f
smaller do_realize and some cleanups [pr] ( #8685 )
...
* do_realize cleanups [pr]
* cleanup assign
* unwrap ShapeTracker as we expect it to exist
2025-01-20 12:47:01 +02:00
qazal
689bf68cfc
remove GroupOp.Meta [pr] ( #8686 )
2025-01-20 12:24:19 +02:00
George Hotz
4198bce150
_apply_map_to_tensors [pr] ( #8683 )
2025-01-19 17:56:04 -08:00
George Hotz
98d01a059d
rename uopgraph to rewriter [pr] ( #8682 )
2025-01-19 17:03:12 -08:00
Ignacio Sica
f532c78889
minor space hotfix ( #8679 )
2025-01-19 17:00:24 -08:00
Francis Lata
bef389dec7
realize boxcoder's encoding
2025-01-19 15:59:28 -08:00
chenyu
2d0842386d
fix parse_valid for float uop ( #8681 )
...
x < c -> X <= c-1 only works for int
2025-01-19 18:15:49 -05:00
George Hotz
168c16646a
change create_schedule_with_vars api to big_sink [pr] ( #8677 )
2025-01-19 13:30:26 -08:00
chenyu
beba490ba8
update mask in scaled_dot_product_attention ( #8674 )
...
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6
raise if attn_mask is set when is_causal=True in sdpa [pr] ( #8675 )
...
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe
replace DEVICE of CONST after copy folding ( #8673 )
2025-01-19 11:33:39 -05:00
qazal
d957a4f108
add tests for div buffer collapsing in the scheduler [pr] ( #8671 )
...
* add tests for mul/div buffer collapsing in the scheduler [pr]
* lint
* merge with test_linearizer's version of this
* 4*3
2025-01-18 14:15:29 -05:00