ttomsa
170ece6605
fix advanced setitem overlap with 0 ( #7793 )
...
* fix advanced setitem overlap with 0
* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e
test_kernel_cache_in_action: fix test ( #7792 )
2024-11-19 13:34:56 -05:00
George Hotz
913a27ee27
from_buffer on metal was never called [pr] ( #7791 )
2024-11-20 00:35:17 +08:00
Eitan Turok
56017c52a0
Raise error when model architecture does not match state dict ( #7772 )
...
* init
* style
* style
* style
* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
d5f76462c8
fix CI beautiful_mnist dir ( #7790 )
...
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
geohotstan
aeaf574a05
add failure test for setitem bug ( #7786 )
...
* add failure test
* rename
* improve tests
* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b
hotfix: ctx doesn't impact process replay [pr] ( #7785 )
2024-11-19 20:17:01 +08:00
qazal
8360bbd88d
faster assign view check [pr] ( #7781 )
2024-11-19 19:42:51 +08:00
George Hotz
3daa376107
remove numpy from assign [pr] ( #7784 )
...
* remove numpy from assign [pr]
* cast not required
2024-11-19 19:34:53 +08:00
George Hotz
fbb4099b3c
add test for compile3 [pr] ( #7783 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-19 19:26:51 +08:00
qazal
4f6071d919
capture the schedule context in process replay [pr] ( #7782 )
2024-11-19 19:12:00 +08:00
qazal
f493d480e3
metadata appending to graph_rewrite ( #7780 )
2024-11-19 18:05:42 +08:00
chenyu
73ea913050
really not using numpy in gpt2 example ( #7779 )
2024-11-18 23:21:16 -05:00
chenyu
e6debda5c4
remove numpy from gpt2 and llama examples ( #7778 )
2024-11-18 22:48:17 -05:00
George Hotz
005636304b
have VIZ=1 use HTTP/1.1 for keep-alive [pr] ( #7776 )
2024-11-19 09:38:12 +08:00
George Hotz
65f188aafb
bump version to 0.10.0
v0.10.0
2024-11-19 08:27:28 +08:00
chenyu
26200574dc
load_state_dict test cases when model and data shard differently ( #7774 )
...
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.
it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
Francis Lata
a1c1b9547f
Context manager support for tqdm ( #7770 )
...
* add context manager support
* add test case for context manager usage
2024-11-18 14:12:03 -05:00
geohotstan
8100109c9d
Add replicate mode to Tensor.pad ( #7608 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* actually only have reflect for this pr
* change the 4 if-else one liner
* maybe use a lambda
* fix
* maybe a lil cleaner
* fix tests
* complete
* small change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-18 10:55:38 -05:00
qazal
62db6398a5
delete buffer tracking from ScheduleContext [pr] ( #7766 )
2024-11-18 22:47:32 +08:00
Shuni
ed76d3ceac
Fix AMD queue CWSR memory size ( #7765 )
...
* Fix AMD queue CWSR memory size
* fix linter error
* add debug_memory_size field
* align CWSR save area allocation to page size
2024-11-18 17:22:03 +03:00
ignaciosica
f02462c5cb
swizzle tc [pr] ( #7633 )
...
* swizzle tc draft
* further cleanup
* hotfix: remove typing from fix_st and cleanup
* hotfix: revert cache property (moved into separate pr)
* hotfix
* hotfix: rename
* take patterns from schedule
* hotfix: rename vars
* hotfix
* no more view of store
* hotfix: linter
* as view is only used for tc fix up and tc is only enabled for LOAD, remove valid an preload from pm rule
- also remove inner simplify in fix_st
* add typing to fix_st
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-18 21:08:21 +08:00
qazal
6ea4a173e7
make is_realized a property [pr] ( #7763 )
...
* make is_realized a property [pr]
* fix assign
* multi
2024-11-18 19:15:37 +08:00
chenyu
5de0ea40f3
reorder Tensor.__init__ to match type ( #7758 )
...
and reordered check lazy devices part
2024-11-17 21:32:48 -05:00
chenyu
66d7d5af50
fix Tensor(MultiLazyBuffer) with different dtype should fail ( #7757 )
...
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
chenyu
b1d734a02c
remove the -1 then -(-1) in Tensor.argmax ( #7753 )
2024-11-17 16:54:09 -05:00
chenyu
e3081355fe
minor Tensor.einsum cleanup ( #7752 )
...
removed some dead conditions and add types. still reads more complicated than needed
2024-11-17 16:11:30 -05:00
chenyu
8b08a72657
consmetic change to Tensor._pool ( #7751 )
...
aligned the shink lines
2024-11-17 15:38:11 -05:00
chenyu
df817297b6
fix passing acc_dtype="" to Tensor.prod should fail ( #7750 )
...
similar to sum
2024-11-17 11:38:13 -05:00
chenyu
55707fd00d
fix passing sum_acc_dtype="" to Tensor.sum should fail ( #7748 )
2024-11-17 10:58:41 -05:00
chenyu
f18296e23c
simpler Tensor._reduce ( #7747 )
2024-11-17 09:20:00 -05:00
qazal
0cc8de2f15
reverse map buf_uops [pr] ( #7743 )
2024-11-17 21:29:56 +08:00
chenyu
0292ae7508
Tensor.meshgrid cleanup ( #7741 )
2024-11-17 08:26:53 -05:00
qazal
40642cb9ea
to_uop split paths part 2 [pr] ( #7746 )
2024-11-17 21:07:28 +08:00
qazal
99024b922b
to_uop one path for all ops part 1 ( #7745 )
...
* flat meta ops
* one path for everything
* add tests
* view is always base
* just run
2024-11-17 20:12:44 +08:00
qazal
eeb222f98b
add UOp.new_buffer [pr] ( #7742 )
2024-11-17 16:44:52 +08:00
chenyu
a15a900415
fix Tensor.meshgrid for 1D input and check indexing ( #7740 )
2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc
add Tensor.meshgrid ( #7714 )
...
* initial implementation and test
* some other places that can use meshgrid
* revert the onnx_ops change
* add to docs
* revert interpolate too
* update
* improve edge case test
* might as well test grad
* add to test can improve docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-16 23:06:47 -05:00
mesozoic-egg
1a5e896bd4
[pr] Have PTX share code with LLVM ( #7635 )
...
* integrate into ops_cuda
* remove debugging stuff
* lint fix
* mypy fixes
* swap ptx.py
* edit
* simplify wmma
* wip
* space
* refactor
* sync the ops removal changes
* refactor
* rename variables
---------
Co-authored-by: judy <mesozoic.egg@proton.mail >
2024-11-17 10:53:56 +08:00
chenyu
f2f7384b67
_resolve_dim cleanup ( #7736 )
...
no duplicated self.ndim+outer
2024-11-16 11:05:39 -05:00
chenyu
e777211a00
Tensor.repeat cleanup ( #7735 )
...
flatten instead of double for loop comprehension
2024-11-16 10:43:45 -05:00
chenyu
f1efd84c92
fix repeat_interleave with negative dim ( #7734 )
2024-11-16 10:15:29 -05:00
chenyu
e3105675fb
cond.where(True, False) is cond ( #7733 )
2024-11-16 09:44:17 -05:00
qazal
40ae0e9115
smaller big graph ( #7695 )
...
* start
* work
* rewrite to PRELOAD
* st is always from base
* fix aesthetics
* work
* more work
* refactor to is_forced_realize
* uh
* green?
* metaop can be image
* dont count realized
* this is the new src
* test_tiny_add passes
* work
2024-11-16 22:04:57 +08:00
qazal
f3f95ab9d9
flatten fusion upats [pr] ( #7732 )
2024-11-16 21:26:19 +08:00
qazal
ec8c5598f6
refactor to generic UPat for sourcing unrealized bufs [pr] ( #7731 )
...
* base check
* use is_scheduled
* fixup lazy
* update metadata
* match is too slow
2024-11-16 21:01:22 +08:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
22da31b223
clean up Tensor.dot ( #7728 )
...
more docs (similar to numpy) and removed many confusing `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac
fix max_pool2d for int tensor with padding ( #7726 )
...
padding inf messed output dtype
2024-11-15 16:22:11 -05:00