Commit Graph

10417 Commits

Author SHA1 Message Date
ttomsa
170ece6605 fix advanced setitem overlap with 0 (#7793)
* fix advanced setitem overlap with 0

* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e test_kernel_cache_in_action: fix test (#7792) 2024-11-19 13:34:56 -05:00
George Hotz
913a27ee27 from_buffer on metal was never called [pr] (#7791) 2024-11-20 00:35:17 +08:00
Eitan Turok
56017c52a0 Raise error when model architecture does not match state dict (#7772)
* init

* style

* style

* style

* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5 rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
d5f76462c8 fix CI beautiful_mnist dir (#7790)
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
geohotstan
aeaf574a05 add failure test for setitem bug (#7786)
* add failure test

* rename

* improve tests

* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b hotfix: ctx doesn't impact process replay [pr] (#7785) 2024-11-19 20:17:01 +08:00
qazal
8360bbd88d faster assign view check [pr] (#7781) 2024-11-19 19:42:51 +08:00
George Hotz
3daa376107 remove numpy from assign [pr] (#7784)
* remove numpy from assign [pr]

* cast not required
2024-11-19 19:34:53 +08:00
George Hotz
fbb4099b3c add test for compile3 [pr] (#7783)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-19 19:26:51 +08:00
qazal
4f6071d919 capture the schedule context in process replay [pr] (#7782) 2024-11-19 19:12:00 +08:00
qazal
f493d480e3 metadata appending to graph_rewrite (#7780) 2024-11-19 18:05:42 +08:00
chenyu
73ea913050 really not using numpy in gpt2 example (#7779) 2024-11-18 23:21:16 -05:00
chenyu
e6debda5c4 remove numpy from gpt2 and llama examples (#7778) 2024-11-18 22:48:17 -05:00
George Hotz
005636304b have VIZ=1 use HTTP/1.1 for keep-alive [pr] (#7776) 2024-11-19 09:38:12 +08:00
George Hotz
65f188aafb bump version to 0.10.0 v0.10.0 2024-11-19 08:27:28 +08:00
chenyu
26200574dc load_state_dict test cases when model and data shard differently (#7774)
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.

it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
Francis Lata
a1c1b9547f Context manager support for tqdm (#7770)
* add context manager support

* add test case for context manager usage
2024-11-18 14:12:03 -05:00
geohotstan
8100109c9d Add replicate mode to Tensor.pad (#7608)
* base implementation

* add tests

* actually remove the assertionerror test

* actually only have reflect for this pr

* change the 4 if-else one liner

* maybe use a lambda

* fix

* maybe a lil cleaner

* fix tests

* complete

* small change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-18 10:55:38 -05:00
qazal
62db6398a5 delete buffer tracking from ScheduleContext [pr] (#7766) 2024-11-18 22:47:32 +08:00
Shuni
ed76d3ceac Fix AMD queue CWSR memory size (#7765)
* Fix AMD queue CWSR memory size

* fix linter error

* add debug_memory_size field

* align CWSR save area allocation to page size
2024-11-18 17:22:03 +03:00
ignaciosica
f02462c5cb swizzle tc [pr] (#7633)
* swizzle tc draft

* further cleanup

* hotfix: remove typing from fix_st and cleanup

* hotfix: revert cache property (moved into separate pr)

* hotfix

* hotfix: rename

* take patterns from schedule

* hotfix: rename vars

* hotfix

* no more view of store

* hotfix: linter

* as view is only used for tc fix up and tc is only enabled for LOAD, remove valid an preload from pm rule

- also remove inner simplify in fix_st

* add typing to fix_st

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-18 21:08:21 +08:00
qazal
6ea4a173e7 make is_realized a property [pr] (#7763)
* make is_realized a property [pr]

* fix assign

* multi
2024-11-18 19:15:37 +08:00
chenyu
5de0ea40f3 reorder Tensor.__init__ to match type (#7758)
and reordered check lazy devices part
2024-11-17 21:32:48 -05:00
chenyu
66d7d5af50 fix Tensor(MultiLazyBuffer) with different dtype should fail (#7757)
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
chenyu
b1d734a02c remove the -1 then -(-1) in Tensor.argmax (#7753) 2024-11-17 16:54:09 -05:00
chenyu
e3081355fe minor Tensor.einsum cleanup (#7752)
removed some dead conditions and add types. still reads more complicated than needed
2024-11-17 16:11:30 -05:00
chenyu
8b08a72657 consmetic change to Tensor._pool (#7751)
aligned the shink lines
2024-11-17 15:38:11 -05:00
chenyu
df817297b6 fix passing acc_dtype="" to Tensor.prod should fail (#7750)
similar to sum
2024-11-17 11:38:13 -05:00
chenyu
55707fd00d fix passing sum_acc_dtype="" to Tensor.sum should fail (#7748) 2024-11-17 10:58:41 -05:00
chenyu
f18296e23c simpler Tensor._reduce (#7747) 2024-11-17 09:20:00 -05:00
qazal
0cc8de2f15 reverse map buf_uops [pr] (#7743) 2024-11-17 21:29:56 +08:00
chenyu
0292ae7508 Tensor.meshgrid cleanup (#7741) 2024-11-17 08:26:53 -05:00
qazal
40642cb9ea to_uop split paths part 2 [pr] (#7746) 2024-11-17 21:07:28 +08:00
qazal
99024b922b to_uop one path for all ops part 1 (#7745)
* flat meta ops

* one path for everything

* add tests

* view is always base

* just run
2024-11-17 20:12:44 +08:00
qazal
eeb222f98b add UOp.new_buffer [pr] (#7742) 2024-11-17 16:44:52 +08:00
chenyu
a15a900415 fix Tensor.meshgrid for 1D input and check indexing (#7740) 2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc add Tensor.meshgrid (#7714)
* initial implementation and test

* some other places that can use meshgrid

* revert the onnx_ops change

* add to docs

* revert interpolate too

* update

* improve edge case test

* might as well test grad

* add to test can improve docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-16 23:06:47 -05:00
mesozoic-egg
1a5e896bd4 [pr] Have PTX share code with LLVM (#7635)
* integrate into ops_cuda

* remove debugging stuff

* lint fix

* mypy fixes

* swap ptx.py

* edit

* simplify wmma

* wip

* space

* refactor

* sync the ops removal changes

* refactor

* rename variables

---------

Co-authored-by: judy <mesozoic.egg@proton.mail>
2024-11-17 10:53:56 +08:00
chenyu
f2f7384b67 _resolve_dim cleanup (#7736)
no duplicated self.ndim+outer
2024-11-16 11:05:39 -05:00
chenyu
e777211a00 Tensor.repeat cleanup (#7735)
flatten instead of double for loop comprehension
2024-11-16 10:43:45 -05:00
chenyu
f1efd84c92 fix repeat_interleave with negative dim (#7734) 2024-11-16 10:15:29 -05:00
chenyu
e3105675fb cond.where(True, False) is cond (#7733) 2024-11-16 09:44:17 -05:00
qazal
40ae0e9115 smaller big graph (#7695)
* start

* work

* rewrite to PRELOAD

* st is always from base

* fix aesthetics

* work

* more work

* refactor to is_forced_realize

* uh

* green?

* metaop can be image

* dont count realized

* this is the new src

* test_tiny_add passes

* work
2024-11-16 22:04:57 +08:00
qazal
f3f95ab9d9 flatten fusion upats [pr] (#7732) 2024-11-16 21:26:19 +08:00
qazal
ec8c5598f6 refactor to generic UPat for sourcing unrealized bufs [pr] (#7731)
* base check

* use is_scheduled

* fixup lazy

* update metadata

* match is too slow
2024-11-16 21:01:22 +08:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
chenyu
22da31b223 clean up Tensor.dot (#7728)
more docs (similar to numpy) and removed many confusing  `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac fix max_pool2d for int tensor with padding (#7726)
padding inf messed output dtype
2024-11-15 16:22:11 -05:00