chenyu
66d7d5af50
fix Tensor(MultiLazyBuffer) with different dtype should fail ( #7757 )
...
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
d1dfd598a2
assert specifying device to rand_like a multi tensor ( #7678 )
...
* assert specifying device to rand_like a multi tensor
raise RuntimeError instead of dropping it silently
* fix that
2024-11-13 10:24:40 -05:00
chenyu
51432bfbff
add rand_like test case with device specified ( #7663 )
...
in single device or copied multi case, device is applied. but for sharded case the device is silently ignored now. maybe similar to rand we just don't allow tuple device in rand_like
2024-11-13 09:32:55 -05:00
qazal
e84d089ef1
delete ReduceOps, only use REDUCE_AXIS ( #7667 )
2024-11-13 19:04:27 +08:00
uuuvn
c846dd70b2
Increase test tolerance for probabilistic test ( #7580 )
2024-11-07 09:35:11 -05:00
George Hotz
205befa788
move is_dtype_supported to device [pr] ( #7575 )
2024-11-07 20:38:03 +08:00
George Hotz
99bd4372a5
Ops.ALU is no more, the arg is just an op ( #7525 )
...
* op arg alu [pr]
* more
* more passing
* fix more tests
* more tests passing
* fix single failing test
* so much cleaner
* noop to not have process replay trigger
* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
0c19b6298b
rename ops to have unique names ( #7522 )
2024-11-04 17:09:45 +08:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
chenyu
18e159c9ac
comment about multi real and more tests [pr] ( #7467 )
2024-11-01 11:49:11 -04:00
Tobias Fischer
1a9e145388
Tensor Clone Function ( #7154 )
...
* implemented clone function
* cleanup linting, single func
* added tests, cleaned up grad cloning
* fixed whitespace
2024-11-01 12:24:43 +08:00
George Hotz
4812801aa6
try for canonical order ( #7286 )
...
* try for canonical order
* cmp better
* disable bad tests
* flip const order
* fix test
* fix tests
* different fix for NOOP
* metaclass here
* fix tests
* narrower scope
2024-10-25 16:04:54 +08:00
George Hotz
d726eb6f48
uop resolve [run_process_replay] ( #6826 )
...
* uop bool and int and stuff [run_process_replay]
* add ne support
* can't even be None anymore
* BinaryOps.AND support
* less compare
2024-10-01 13:11:42 +08:00
wozeparrot
c100f3d406
default threefry ( #6116 )
2024-09-25 17:45:13 +08:00
George Hotz
cb22ef379a
truncate consts early ( #6741 )
...
* truncate consts early
* ptx still fails
* Update dtype.py
2024-09-25 16:49:51 +08:00
wozeparrot
2be0b26a1f
rand only supports single device ( #6682 )
2024-09-24 16:07:44 +08:00
qazal
982086f54c
UOps.VALID try 2 ( #6623 )
...
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95 .
* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
George Hotz
dbd4536167
Revert "add UOps.VALID ( #6387 )" ( #6441 )
...
This reverts commit 8186e4e7d6 .
2024-09-09 21:33:00 +08:00
George Hotz
8186e4e7d6
add UOps.VALID ( #6387 )
...
* uops valid
* broke full_shape
* fixup that st (hardcoded asts still red)
* fixup DEFINE_VAR
debug
more debug
* start moving stuff to ast_const
* move test_linearizer
* move test_linearizer_failures to ast_const
* fixup test_schedule
* small diff change
* regenerate dataset
* fixup test_multitensor
* regen dataset try 2
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-09-09 16:58:43 +08:00
chenyu
943ab97d24
fix Tensor.prod for multitensor ( #6264 )
2024-08-24 08:52:24 -04:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12 .
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
Tobias Fischer
6e3eb50fd1
added fix and reg tests ( #6060 )
2024-08-12 21:00:48 -04:00
David Hou
eb91423cb4
MLB support reshape for uneven shards ( #5804 )
...
* cleaner uneven reshape
* update test
2024-08-01 02:36:03 -07:00
David Hou
492a696d14
allow specify splits in shard, handle multiple different splits in MLB.e ( #5599 )
...
* allow specify splits in shard, handle multiple different splits in MLB.e
* line width
* linter
* don't use Device in docstring
* specify size of shards instead of boundaries
* adjust docstring for specify size of shards instead of boundaries
* don't allow splits on symbolic axis?
* just allow sint in splits_to_bounds
* add message for assert
* bounds instead of splits to save lines
* fix types
* reduce diff
* fix
* tuple
* golf :(
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-07-30 19:33:04 -07:00
George Hotz
e638b0084f
smaller multitensor resnet test ( #5450 )
...
* minor improvments to matcher speed [run_process_replay]
* oh, put that back
* make fake images smaller for resnet test
2024-07-13 07:31:28 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
George Hotz
3e40211e45
add UOP_IS_SYMBOLIC [run_process_replay] [no_assert] ( #5386 )
...
* cleanup a few things in uops [run_process_replay] [no_assert]
* add optional UOP_IS_SYMBOLIC
2024-07-11 10:48:45 -07:00
nimlgen
1678199b15
add update_copy to hcq spec ( #5348 )
...
* add update_copy to hcq spec
* fix amd
2024-07-09 20:44:44 +03:00
qazal
c1e166c08a
fix dtype mismatch for bool ops in multi ( #5299 )
2024-07-06 11:36:40 +03:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
David Hou
666a9c1448
don't view origin buffer when sharding ( #5122 )
...
* make buffer view optional with a flag
* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
chenyu
7948b05738
fix uneven shard with shrink and pad args on sharded axis ( #5131 )
...
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
chenyu
4a7d403777
cleanup test_multitensor ( #5118 )
...
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu
c0ba5e0dfb
multi copy_to_device return the copy on same device if possible ( #5117 )
...
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
chenyu
b886d250fb
improve test_dropout_on_shard ( #4912 )
...
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
George Hotz
35e53c0809
add sharded arange test ( #4908 )
2024-06-11 10:58:33 +02:00
chenyu
e33efd6a3d
test cases for multitensor adds const ( #4892 )
...
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
nimlgen
e78a9bf3f2
support view in nv/amd ( #4812 )
...
* support view in nv/amd
* fix amd
* fix
* run test on nv/amd
2024-06-03 22:11:52 +03:00
qazal
637f482588
configure derandomizing CI tests ( #4793 )
2024-05-31 17:06:58 +03:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00
nimlgen
eb9689336e
nv mockgpu ( #4600 )
...
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
2024-05-15 23:46:08 +03:00
George Hotz
5ba611787d
move image into tensor.py. delete features ( #4603 )
...
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
c9e84ed0da
refactor to Program class ( #4476 )
...
* refactor to Program class
* switch to Program
* fix tests
* smaller diff
* self.p
* more tests
* fix metal test
* tests
* fix openpilot
* move that to linearizer
* p.launchdims
2024-05-09 17:29:07 -07:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00