Commit Graph

4618 Commits

Author SHA1 Message Date
George Hotz
b53fe7c2fc remove unused ctx [pr] (#8751)
* remove unused ctx [pr]

* fix test
2025-01-26 17:59:15 +09:00
George Hotz
b4bf6a7dea switch backward to use gradient [pr] (#8235)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
0ffd572e1e fix multi with no real srcs (#8749) 2025-01-26 08:41:00 +09:00
qazal
0e42befc6e viz cleanups 2 [pr] (#8748)
* viz cleanups 2 [pr]

* test_viz updates
2025-01-25 19:41:57 +02:00
qazal
a037201168 test_viz cleanups + move to /unit directory (#8746)
* test_viz cleanups + move to /unit directory

* lint
2025-01-25 14:33:31 +02:00
chenyu
e2b380b743 make UOp.multi real a tuple instead of list [pr] (#8744)
tuple is immutable. also updated test_rand_like_from_alu test
2025-01-24 20:47:27 -05:00
chenyu
e0e176efbc failed test case for multi rand_like [pr] (#8740)
new multi broke multi device dropout
2025-01-24 13:56:51 -05:00
nimlgen
dc10187fc0 am: add am_smi (#8739)
* am: start monitor

* cleanups

* fixes

* hmm

* progress

* cleanup
2025-01-24 20:16:19 +03:00
George Hotz
e82ba1454b MultiLazyBuffer is UOp [pr] (#8662)
* MultiLazyBuffer is UOp [pr]

* this is new mlb

* this is the idea

* progress

* multitensor works

* more movement ops

* this

* MultiLazyBuffer is UOp

* cleanups

* multi axis

* fix more tests

* work

* not that

* add multi grad and move shard to ops

* mops not views

* no double contig

* sweet, all mt tests passing

* port old logic

* remove lbs

* fix realized

* whitespace

* assign tweak

* test_assign_kv_cache_multi passes

* fix is_realized

* fix JIT for multi

* just a few more lines i'll pay them back soon i swear please bro just a few more

* no split reduceop for multi
2025-01-24 13:28:55 +09:00
qazal
8e5bd0cd7a fix buffer init and skip test_swizzle_failure_permute [pr] (#8732)
* fix buffer init and skip test_swizzle_failure_permute [pr]

* replace preload with just load

* add
2025-01-23 17:21:38 +02:00
nimlgen
e4512baea4 am: cleanup mm (#8730)
* am: cleanup mm

* cle

* ops

* entries
2025-01-23 15:49:37 +03:00
qazal
07ec99001a keep VIEW in big_sink + copy of buffer view spec [pr] (#8727)
* keep views in sink [pr]

* tests

* things from the gpt2 bug
2025-01-23 11:29:30 +02:00
qazal
6cb74bb630 fix using clone with shrink [pr] (#8724)
* fix using clone with shrink [pr]

* remove extra arg, add test_clone_with_shrink_realized
2025-01-23 08:28:07 +02:00
qazal
907dfa0e82 image buffer realization spec [pr] (#8420)
* image buffer realization spec [pr]

* redo the spec

* work
2025-01-22 20:25:22 +02:00
nimlgen
93fb50ce77 allreduce: add flags (#8713) 2025-01-22 17:44:31 +03:00
qazal
2dae467b75 scheduler + process_replay import cleanup (#8711) 2025-01-22 12:44:07 +02:00
qazal
e3d1464ba4 move assign preload out of schedule item [pr] (#8710)
* move assign preload out of schedule item [pr]

* fix that
2025-01-22 12:43:57 +02:00
nimlgen
c5e46c5eee am: recover from any boot interrupt (#8703)
* am: recover from any load interrupt

* add fuzzer

* nu
2025-01-21 22:22:23 +03:00
George Hotz
018edd934b don't use view in copy [pr] (#8704)
* don't use view in copy [pr]

* oh, remove double contig

* fix reps
2025-01-21 09:57:47 -08:00
qazal
d6bf1feaab remove the "no copy" line from copy_to_device (#8702)
* delete the no copy one

* add tests
2025-01-21 17:09:33 +02:00
nimlgen
3628f89929 fix deallocate for subbuffers (#8701)
* fix deallocate for subbuffers

* forgot this

* rm name

* hmm
2025-01-21 16:34:19 +03:00
qazal
f0d424ecdf Tensor UOps can become a buffer or const after scheduling (#8698)
* spec

* work

* update test_viewed_consts_do_not_realize

* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3 allow symbolic shape in tensor const parents [pr] (#8699) 2025-01-21 12:01:25 +02:00
qazal
66ac0087e8 more high level contiguous tests + scheduler deletions [pr] (#8695)
* delete those

* move the upat too

* rename ops_folding to just sym

* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56 simplify tensors before scheduling [pr] (#8580)
* delete forced_realize

* put that back

* work

* remove forced_realize

* expectedFailures

* contiguous(buffer)

* multi

* expectedFailures

* cleaner create_subbuffer

* more comments

* remove that

* note

* realizes

* work

* one upat and image is back

* remove

* cleaner

* fix test_complex_backward for now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-01-20 23:42:42 +02:00
qazal
02ad450e22 add failing assert for gradient realization [pr] (#8692) 2025-01-20 22:50:09 +02:00
Sieds Lykles
1a15c0e89d Move define_acc down an unrolled add chain (#8404)
* Move define_acc down an unrolled add chain

* Prevent possible infinite recursion

* Add test

* Fix typo in test

* Move mulacc_unrolled to devoctorize + load_store_indexing pass

* Add test for mulacc_unrolled by itself

* undo formatter

* import from ops, not rewriter

* Add a const version

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 14:56:27 -05:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
George Hotz
46a8c5e1e5 delete forced_realize (#8615)
* delete forced_realize

* put that back

* expectedFailures

* cleaner create_subbuffer

* more comments

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-01-20 09:40:36 -08:00
chenyu
679b1ad058 move softmax upcast to after subtracting max (#8684)
* move softmax upcast to after subtracting max

max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax

* skipUnless half
2025-01-20 12:16:32 -05:00
nimlgen
08ca871d77 am: remove pm block (#8688)
* am: remove pm block

* hm

* oops
2025-01-20 18:05:22 +03:00
nimlgen
9d3c40601f am: fast memory manager (#8654)
* start

* progress

* fixes

* smth

* mini fixes

* fix2

* ugh, need this for now

* faster

* cleanups

* tiny linters

* make mypy happier

* test & free pts

* ops

* linter

* cleanup vm

* fix

* remove map_from

* tiny fixes

* add test to ci
2025-01-20 16:58:22 +03:00
qazal
9e55495b4d fold double contiguous [pr] (#8687) 2025-01-20 14:38:33 +02:00
qazal
ed63ff2372 Remove contiguous on buffer (#8676)
* remove contiguous on buffer

* spec

* make things that can't be images not images
2025-01-20 13:48:33 +02:00
qazal
3499a2c72d start moving image things to rewrite rules (#8678)
* start moving image things to rewrite rules [pr]

* that too

* as expected

* fix

* Revert "fix"

This reverts commit fd03c9464b.
2025-01-20 13:34:29 +02:00
George Hotz
98d01a059d rename uopgraph to rewriter [pr] (#8682) 2025-01-19 17:03:12 -08:00
chenyu
2d0842386d fix parse_valid for float uop (#8681)
x < c -> X <= c-1 only works for int
2025-01-19 18:15:49 -05:00
George Hotz
168c16646a change create_schedule_with_vars api to big_sink [pr] (#8677) 2025-01-19 13:30:26 -08:00
chenyu
beba490ba8 update mask in scaled_dot_product_attention (#8674)
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6 raise if attn_mask is set when is_causal=True in sdpa [pr] (#8675)
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe replace DEVICE of CONST after copy folding (#8673) 2025-01-19 11:33:39 -05:00
qazal
d957a4f108 add tests for div buffer collapsing in the scheduler [pr] (#8671)
* add tests for mul/div buffer collapsing in the scheduler [pr]

* lint

* merge with test_linearizer's version of this

* 4*3
2025-01-18 14:15:29 -05:00
ignaciosica
d2234e308a tf32 tc for nv and ptx (#8635)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-17 17:43:57 -08:00
nimlgen
5afb0a4a81 metal: fix transfer profiling (#8659) 2025-01-17 23:47:01 +03:00
George Hotz
8609b880bd hotfix: test_backward_sum 2025-01-17 10:25:02 -08:00
chenyu
f8cc971c3b raise RuntimeError for uneven shards in Tensor.shard [pr] (#8656) 2025-01-17 12:48:39 -05:00
mesozoic-egg
3506a7585f upcast overflowed idx to int64 [pr] (#8268)
* use full_shape to determine if index can potentially overflow

* update comment

* use shapetracker to check max index value

* wip

* lint

* handle mask

* upcast to int64 by st is noop on WGSL

* fix comments

* Handle negative overflow, intermediaries overflow, int64 support

handle negative overflow

handle symbolic

wip

handle intermediate values

wip

check if typemap support int64

lint

comment

* add invalid_dtype

lint

* Fix bug on checking mask overflow

wip

wip

* Add more tests, need to resolve partial upcast

test Valid_view_dup

test valid op overflow

refine test cases

clean up

cleanup

wip

refine tests

lint

* Upcast is handled by lower_load_store

upcast as graph_rewrite to backtrack

update test

wip

cleanup

wip

cleanup

do upcast in lower_load_store

lint

* cleanup

* do upcast within lower_load_store and mutate ctx

* do upcast in get_idx and view

revert

lint

* cleanup

* Upcast in vec, const

upcast to const

test case 3

upcast on vector

lint

* simplify idx with symbolic in case of fake overflow

test case4

test case 4

update test

* test case4 is only for metal

* try: upcast inside graph_rewrite instead of shapetracker

wip

* checking overflow can just be done directly on all views, with idxs

* cleanup

* REMOVE hard coded uop test for idx upcast

* refactor

cleanup

refactor

* do actual casting when necessary, instead of rewriting all idx

hard code uop test

new upcast

* check dtype for int64 in webgpu

* cleanup

cleanup

* cleanup

* update tests

cleanup

comment

cleanup

cleanup

* comment

* comment

* update comment

update comment

* refactor

* typo

* keep the scope to only upcasting

* white space

* Revert "white space"

This reverts commit 314d7eb184.

* Revert "keep the scope to only upcasting"

This reverts commit 1ef701dd85.

* sym folding is not necessary

lint1

* fold symbolic

lint

* use symbolic simple when folding shapetracker idx

* full sym folding is required after all...

* Ops.CAST should retain the src min max

* put rewrite to lowerer

wip

* start testing on higher level

wip

test higher level in test_tensor

* find Ops.STORE in list instead of recursively

* check dtype support when upcasting

* remove invalid_dtype

* lint

* fix int64 support checks in upcast

lint

* skipif skipunless

* revert fold to find test case

* Revert "revert fold to find test case"

This reverts commit 225bb6e801.

* test sym folding

* handle ptx

* wip

* wip

* delete hard coded uop test

* lint fixes

* wip

* fix checking for None

* lint

* handle ptx

* comment

* dtype for overflow()

* update skipIf skipUnless

* assert in wgsl renderer for int64

wip

* do folded_upcast in to_indexed_op, real_size uses views_to_indexed_ops

* assert in lowerer for dtype support

lint

* Revert "assert in lowerer for dtype support"

This reverts commit 8e9b1b79bf.

* assert dtype in kernel.py

* Revert "assert dtype in kernel.py"

This reverts commit e29b9a9893.

* wip

* assert in render

* remove old assert

* check dtype from rendere, assert in upcast

wip

* smaller arange for sym fold case

* linearize directly

* use expand directly

* lint

* lint

* rename

* no need to check dtype in device.py

* trigger pr

* remove dtype assert in upcast, make wgpu fail in render

* use DType for type hint instead of dtypes

* assert on KeyError in tests for webgpu backend int64

* use a tuple for src

* test real kernel run

wip

* lint error

* restore

* fix real_size

* update test example

* resolve merge stuff

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
2025-01-17 11:52:31 -05:00
qazal
23f0ff0ed8 add bitcast to multi [pr] (#8652) 2025-01-17 03:17:19 -05:00
qazal
2b7db9b45d delete unused cast/bitcast lines from ops.py [pr] (#8651)
* move cast and bitcast out

* more deletion of bitcast arg

* fix test_bitcast_fuses

* update tests

* work
2025-01-17 03:04:18 -05:00
eliotgolding
0289fbb1c2 limit real_size to the size of first View of ShapeTracker (#8628)
* fix real_size

* add fuzzer; typing

* spacing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-16 16:27:39 -05:00