Commit Graph

7717 Commits

Author SHA1 Message Date
Francis Lata
95cdbbf237 add jit to the training loop 2025-01-22 12:31:29 -08:00
Francis Lata
efe64ebeaf enable lr scheduler and fix benchmark timing 2025-01-22 09:56:38 -08:00
Francis Lata
66ff6cb37a create the necessary samples per test case 2025-01-21 14:14:58 -08:00
Francis Lata
9b95d6d62c setup openimages samples differently 2025-01-21 14:05:19 -08:00
Francis Lata
d1bc4aef94 do not realize when sharding model weights 2025-01-21 13:45:35 -08:00
Francis Lata
7f331d8836 fix dataloader script 2025-01-21 13:43:59 -08:00
Francis Lata
1bf5ee286b Revert "debug dataset test failuire"
This reverts commit 1b2f9d7f50.
2025-01-21 13:30:12 -08:00
Francis Lata
1b2f9d7f50 debug dataset test failuire 2025-01-21 13:23:50 -08:00
Francis Lata
7815d3ddff Merge branch 'master' into retinanet_mlperf 2025-01-21 13:06:04 -08:00
nimlgen
c5e46c5eee am: recover from any boot interrupt (#8703)
* am: recover from any load interrupt

* add fuzzer

* nu
2025-01-21 22:22:23 +03:00
chenyu
1e283c33d3 remove realize in bert model init [pr] (#8707) 2025-01-21 14:11:03 -05:00
George Hotz
018edd934b don't use view in copy [pr] (#8704)
* don't use view in copy [pr]

* oh, remove double contig

* fix reps
2025-01-21 09:57:47 -08:00
qazal
d6bf1feaab remove the "no copy" line from copy_to_device (#8702)
* delete the no copy one

* add tests
2025-01-21 17:09:33 +02:00
nimlgen
3628f89929 fix deallocate for subbuffers (#8701)
* fix deallocate for subbuffers

* forgot this

* rm name

* hmm
2025-01-21 16:34:19 +03:00
nimlgen
6733a3a96b am: fix typo (#8700) 2025-01-21 14:35:15 +03:00
qazal
f0d424ecdf Tensor UOps can become a buffer or const after scheduling (#8698)
* spec

* work

* update test_viewed_consts_do_not_realize

* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3 allow symbolic shape in tensor const parents [pr] (#8699) 2025-01-21 12:01:25 +02:00
nimlgen
2b239db5d2 temp() with usernames (#8697) 2025-01-21 12:26:43 +03:00
Francis Lata
bf36006ff0 set seed 2025-01-20 22:54:54 -08:00
Francis Lata
5d9a604963 add support for BENCHMARK 2025-01-20 22:47:23 -08:00
Francis Lata
be2e97260d fix dtype for anchor inside dataloader and fix horizontal flip transformation 2025-01-20 22:45:25 -08:00
qazal
66ac0087e8 more high level contiguous tests + scheduler deletions [pr] (#8695)
* delete those

* move the upat too

* rename ops_folding to just sym

* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56 simplify tensors before scheduling [pr] (#8580)
* delete forced_realize

* put that back

* work

* remove forced_realize

* expectedFailures

* contiguous(buffer)

* multi

* expectedFailures

* cleaner create_subbuffer

* more comments

* remove that

* note

* realizes

* work

* one upat and image is back

* remove

* cleaner

* fix test_complex_backward for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-01-20 23:42:42 +02:00
Francis Lata
cd511384e2 move anchors as part of dataloader 2025-01-20 13:13:16 -08:00
qazal
02ad450e22 add failing assert for gradient realization [pr] (#8692) 2025-01-20 22:50:09 +02:00
qazal
b14c9848cc small changes to make the tensor_map_simple diff cleaner [pr] (#8691) 2025-01-20 22:25:59 +02:00
Sieds Lykles
1a15c0e89d Move define_acc down an unrolled add chain (#8404)
* Move define_acc down an unrolled add chain

* Prevent possible infinite recursion

* Add test

* Fix typo in test

* Move mulacc_unrolled to devoctorize + load_store_indexing pass

* Add test for mulacc_unrolled by itself

* undo formatter

* import from ops, not rewriter

* Add a const version

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 14:56:27 -05:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
George Hotz
46a8c5e1e5 delete forced_realize (#8615)
* delete forced_realize

* put that back

* expectedFailures

* cleaner create_subbuffer

* more comments

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-20 09:40:36 -08:00
chenyu
679b1ad058 move softmax upcast to after subtracting max (#8684)
* move softmax upcast to after subtracting max

max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax

* skipUnless half
2025-01-20 12:16:32 -05:00
Francis Lata
575c748d94 fix wandb resuming feature 2025-01-20 07:22:16 -08:00
Francis Lata
a90a6e624d add wandb 2025-01-20 07:07:51 -08:00
nimlgen
08ca871d77 am: remove pm block (#8688)
* am: remove pm block

* hm

* oops
2025-01-20 18:05:22 +03:00
Francis Lata
9402872d90 Merge branch 'master' into retinanet_mlperf 2025-01-20 06:51:12 -08:00
nimlgen
9d3c40601f am: fast memory manager (#8654)
* start

* progress

* fixes

* smth

* mini fixes

* fix2

* ugh, need this for now

* faster

* cleanups

* tiny linters

* make mypy happier

* test & free pts

* ops

* linter

* cleanup vm

* fix

* remove map_from

* tiny fixes

* add test to ci
2025-01-20 16:58:22 +03:00
qazal
9e55495b4d fold double contiguous [pr] (#8687) 2025-01-20 14:38:33 +02:00
qazal
ed63ff2372 Remove contiguous on buffer (#8676)
* remove contiguous on buffer

* spec

* make things that can't be images not images
2025-01-20 13:48:33 +02:00
qazal
3499a2c72d start moving image things to rewrite rules (#8678)
* start moving image things to rewrite rules [pr]

* that too

* as expected

* fix

* Revert "fix"

This reverts commit fd03c9464b.
2025-01-20 13:34:29 +02:00
qazal
b1847d561f smaller do_realize and some cleanups [pr] (#8685)
* do_realize cleanups [pr]

* cleanup assign

* unwrap ShapeTracker as we expect it to exist
2025-01-20 12:47:01 +02:00
qazal
689bf68cfc remove GroupOp.Meta [pr] (#8686) 2025-01-20 12:24:19 +02:00
George Hotz
4198bce150 _apply_map_to_tensors [pr] (#8683) 2025-01-19 17:56:04 -08:00
George Hotz
98d01a059d rename uopgraph to rewriter [pr] (#8682) 2025-01-19 17:03:12 -08:00
Ignacio Sica
f532c78889 minor space hotfix (#8679) 2025-01-19 17:00:24 -08:00
Francis Lata
bef389dec7 realize boxcoder's encoding 2025-01-19 15:59:28 -08:00
chenyu
2d0842386d fix parse_valid for float uop (#8681)
x < c -> X <= c-1 only works for int
2025-01-19 18:15:49 -05:00
George Hotz
168c16646a change create_schedule_with_vars api to big_sink [pr] (#8677) 2025-01-19 13:30:26 -08:00
chenyu
beba490ba8 update mask in scaled_dot_product_attention (#8674)
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6 raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675)
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe replace DEVICE of CONST after copy folding (#8673) 2025-01-19 11:33:39 -05:00
qazal
d957a4f108 add tests for div buffer collapsing in the scheduler [pr] (#8671)
* add tests for mul/div buffer collapsing in the scheduler [pr]

* lint

* merge with test_linearizer's version of this

* 4*3
2025-01-18 14:15:29 -05:00