terafo
5e6d2155e4
Add driving monitoring model to benchmarks ( #4134 )
...
* add driving monitoring model to benchmarks
* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890
update onnx to 1.16.0 ( #4127 )
...
* update
* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac
skip a few clang test that took > 30 seconds in CI ( #4126 )
...
* skip slow CLANG test test_train_cifar
* skip those too
* and that
* only CI
* one more
2024-04-10 02:00:34 -04:00
qazal
42edae8935
pickle schedules ( #4114 )
...
* pickle schedules
* Update test_pickle.py
* Update test_pickle.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-04-09 13:47:25 -07:00
George Hotz
ae849d12d7
numpy device + pickle it ( #4120 )
2024-04-09 13:19:30 -07:00
David González Martínez
980124a605
add lerp operation to tensor ( #4102 )
...
* feat: add lerp operation to tensor
* fix
* style: fit in one line:
* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
chenyu
dbd39ab78a
setitem support setting python const ( #4111 )
2024-04-08 11:37:50 -04:00
chenyu
92c0675ccf
setitem initial support ( #4093 )
...
* wip setitem
it's an eager assign to output shapetracker view
* cleanups and tests
* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan
183708b3fd
broadcast expand to match torch ( #4085 )
...
* initial version
* heh gimme grrrreen
* version 2
* clean ups
* some test confusion
* fix onnx
* rename to _broadcast_tensors
* improved errors and test
* fixed?
* some test fixup
* version 3 lol
* comments
* cleaner
* add failure test for expand to 0 test
* 1 more assertRaises test
* make err msg better
* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn
2b81d9b334
Fix broken test ( #4104 )
2024-04-07 12:02:12 -04:00
uuuvn
bb7567b365
Fix metal ( #4101 )
2024-04-07 05:21:19 -07:00
chenyu
bdbcac67f1
assign jit test case with other tensor as input ( #4098 )
...
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz
164329a8ea
address kfd feedback ( #4087 )
...
* address kfd feedback
* signals cleanup
* signals cleanup
* handle 2 doorbell pages correctly
* signal reset cleanup
* signals cleanup
* more GTT
* cleanups
* minor cleanups
2024-04-05 15:24:41 -07:00
Akshit Talwar
750ecf8fef
replace slice by pad/shrink in _pool ( #4082 )
2024-04-05 11:47:22 -04:00
George Hotz
a337922c44
more work on kfd ( #4079 )
...
* more work on kfd
* fix multitensor test on kfd
* stuff
2024-04-05 08:36:36 -07:00
chenyu
e7ff5102cf
failed test in test_pattern_matcher ( #4080 )
...
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
George Hotz
3de855ea50
don't use SVM memory in KFD ( #4072 )
...
* don't use SVM memory in KFD
* copy from fd
* cleanups
* transfer
* hacks
* ops_hsa
* tighter API
2024-04-04 17:33:21 -07:00
chenyu
c1cffed1df
add LazyOp.dtype ( #4073 )
...
an inferred cached_property.
removed all cases that use get_lazyop_info just to get the dtype of an op.
prereq to remove InterpretedFlopCounter
2024-04-04 17:38:19 -04:00
Szymon Ożóg
82b7b9655f
test for dtype set ( #4069 )
2024-04-04 11:24:33 -04:00
geohotstan
1a1dd1c1a7
add and enable tests for indexing const folding ( #4068 )
...
* enable test in test_indexing
* added tests
* rename stuff
* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00
Szymon Ożóg
ba118abfec
improved caching for pointer arithmetics in ptx ( #3922 )
...
* improved caching for pointer arithmetics
* Add test for pointer arithmetics caching
* Refactor test
2024-04-04 07:33:48 -07:00
George Hotz
7181ffd630
HWCopyQueue in KFD ( #4042 )
...
* HWCopyQueue in KFD
* hw compute queue
* test
* move test
* more tests
* fix wait
* fix multimap
* mes crash
* tests pass but slow
* stuff is working
* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf
remove old envvar "OPT" ( #4060 )
2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
fe03725b21
const fold cast unrealized_unpadded_const ( #4047 )
...
* const fold unrealized_unpadded_const
changed the underlying arg directly
* CAST_BEFORE_VIEW folds some
* fix const index in getitem
2024-04-03 12:31:24 -04:00
Szymon Ożóg
e5a9bff899
Add pattern matcher tests, move uop transforms from assembly to pattern ( #4056 )
...
matcher
2024-04-03 09:06:43 -07:00
chenyu
f61ed869f5
Use exec_alu for lazy const folding ( #4039 )
2024-04-02 20:52:05 -04:00
chenyu
85edc493b0
uops const fold rules to prevent tautological compare warnings ( #4041 )
...
* uops const fold rules to prevent tautological compare warnings
`bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false
* not true for nan
* and nan does not work with llvm
* full truth table test
* revert a==a
* comments and indents
2024-04-02 16:45:58 -04:00
Patrick Tsai
0147174ad6
Embedding in one kernel ( #4036 )
...
* Embedding is in one kernel
* embedding is one kernel
* rm extra line
* newline
* bert test counts state vars?
* add a test?
* move items around
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com >
2024-04-02 11:38:21 -04:00
Dan Hoffman
5311b45053
re-enable has_local check for linearizer test ( #4034 )
...
Co-authored-by: Dan Hoffman <daniel.hoffman@intel.com >
2024-04-02 00:02:03 -04:00
George Hotz
7425a0c646
CommandQueue is the future ( #3950 )
...
* start of command queue
* cq work
* runs
* cleanup
* outs set
* read is gone
* future buffer work
* command queue is better
* command queue works
* loadops
* delete unneeded
* command queue works
* upd
* fix tests
* use CommandQueue in compile
* delay sync
2024-04-01 17:35:48 -07:00
chenyu
82440d3416
don't call contiguous for unpadded const into multi tensor ( #4032 )
...
* don't call contiguous for unpadded const into multi tensor
fixed multi const folding for sharded const.
still wip, need to be careful that this does not break multi device cache somewhere
* ehh need a memory test for that
* simple sharded memory test
2024-04-01 19:22:14 -04:00
chenyu
77a68fc52f
test examples for multi tensor const folding ( #4031 )
...
works with literal const operand now because it's copied to each shard and handled by lazy.
does not work for sharded const
2024-04-01 16:53:43 -04:00
chenyu
379d52548d
const fold left const operand for ADD and MUL ( #4029 )
...
* const fold left const operand for ADD and MUL
* neg have dtype issue
2024-04-01 15:09:04 -04:00
chenyu
0e02d074bd
fix Tensor.pow folding for exponent 0 and 1 ( #4025 )
2024-03-31 19:57:23 -04:00
mmmkkaaayy
a4ae9352bd
delete irrelevant JIT regression test ( #4024 )
2024-03-31 19:35:35 -04:00
chenyu
d3f27761b0
move const folding of ADD/SUB/MUL from tensor to lazy ( #4020 )
...
* move const folding of ADD/SUB/MUL from tensor to lazy
will do div and pow separately.
* fix onnx adding with None
2024-03-31 16:35:36 -04:00
chenyu
7f859593b8
fix _to_const_val and const folding around it ( #4017 )
...
* fix _to_const_val and const folding around it
is_unrealized_contiguous_const is too strict and almost never hit if const is expanded.
suffice to check if there's no pad
* that test is folded
* test_const_folding
2024-03-31 13:09:23 -04:00
chenyu
c71627fee6
move GlobalCounter to helpers ( #4002 )
...
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
9eef44521b
ScheduleItem uses Buffer ( #3995 )
...
* schedule Buffer
* update
* update tests
* master
* works
* remove LoadOps.WAIT
* fix compile2
* bad test
* rename and note
2024-03-29 20:50:27 -07:00
George Hotz
8f1e34a2a0
early src delete ( #3996 )
...
* early src delete
* fix bad test
* fix test_linearizer
2024-03-29 19:46:07 -07:00
George Hotz
f916aadaea
external that test
2024-03-29 19:35:50 -07:00
George Hotz
c42ed8e99c
don't reschedule
2024-03-29 19:17:37 -07:00
chenyu
b43e470f80
always use f32 for rand source of randn ( #3998 )
...
* always use f32 for source of randn
fixed bfloat16 randn to not have inf.
don't really care about float64. threefry is float32 based too
* HSA is broken
2024-03-29 17:04:34 -04:00
chenyu
6b6461122e
test case Tensor.randn should be finite ( #3994 )
...
* test case Tensor.randn should be finite
there's a hack to fix float16, need a generic solution that works with bf16 and threefry
* skip not supported
* bfloat16 local is wrong
* skip RHIP
2024-03-29 14:51:02 -04:00
chenyu
d9ff636cf5
use is to compare with enum ( #3993 )
...
* use is to compare with enum
currently it's mixed between `==` and `is`, moved all to `is`
* more
2024-03-29 13:02:56 -04:00
chenyu
7bc560ec49
remove outdated bf16 comments in test_dtype ( #3987 )
2024-03-29 00:56:18 -04:00
uuuvn
8a40d7d423
Shape changing bitcast and assert bitcast in disk ( #3973 )
...
* Shape changing bitcast
* only support it on disk
* basic test
* more tests
* RuntimeError instead of assert
* create unique temp files
* move tests that use disk to test_disk_tensor
* linter
* remove assert on error messages
* that's RuntimeError now
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-03-28 21:49:10 -07:00
chenyu
793ab0512e
use ctypes to truncate float64 and float32 in uops ( #3986 )
...
this fixed the softmax.argmax bug for ops_python as the float is truncated to float32
2024-03-28 23:56:50 -04:00
chenyu
c4c243f79d
update test_uops _equal to use assert_allclose ( #3981 )
...
it handles nan
2024-03-28 22:14:45 -04:00