Commit Graph

4433 Commits

Author SHA1 Message Date
terafo
5e6d2155e4 Add driving monitoring model to benchmarks (#4134)
* add driving monitoring model to benchmarks

* handle crash
2024-04-10 14:27:03 -04:00
geohotstan
fe88591890 update onnx to 1.16.0 (#4127)
* update

* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
qazal
42edae8935 pickle schedules (#4114)
* pickle schedules

* Update test_pickle.py

* Update test_pickle.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-09 13:47:25 -07:00
George Hotz
ae849d12d7 numpy device + pickle it (#4120) 2024-04-09 13:19:30 -07:00
David González Martínez
980124a605 add lerp operation to tensor (#4102)
* feat: add lerp operation to tensor

* fix

* style: fit in one line:

* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
chenyu
dbd39ab78a setitem support setting python const (#4111) 2024-04-08 11:37:50 -04:00
chenyu
92c0675ccf setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan
183708b3fd broadcast expand to match torch (#4085)
* initial version

* heh gimme grrrreen

* version 2

* clean ups

* some test confusion

* fix onnx

* rename to _broadcast_tensors

* improved errors and test

* fixed?

* some test fixup

* version 3 lol

* comments

* cleaner

* add failure test for expand to 0 test

* 1 more assertRaises test

* make err msg better

* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn
2b81d9b334 Fix broken test (#4104) 2024-04-07 12:02:12 -04:00
uuuvn
bb7567b365 Fix metal (#4101) 2024-04-07 05:21:19 -07:00
chenyu
bdbcac67f1 assign jit test case with other tensor as input (#4098)
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz
164329a8ea address kfd feedback (#4087)
* address kfd feedback

* signals cleanup

* signals cleanup

* handle 2 doorbell pages correctly

* signal reset cleanup

* signals cleanup

* more GTT

* cleanups

* minor cleanups
2024-04-05 15:24:41 -07:00
Akshit Talwar
750ecf8fef replace slice by pad/shrink in _pool (#4082) 2024-04-05 11:47:22 -04:00
George Hotz
a337922c44 more work on kfd (#4079)
* more work on kfd

* fix multitensor test on kfd

* stuff
2024-04-05 08:36:36 -07:00
chenyu
e7ff5102cf failed test in test_pattern_matcher (#4080)
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
George Hotz
3de855ea50 don't use SVM memory in KFD (#4072)
* don't use SVM memory in KFD

* copy from fd

* cleanups

* transfer

* hacks

* ops_hsa

* tighter API
2024-04-04 17:33:21 -07:00
chenyu
c1cffed1df add LazyOp.dtype (#4073)
an inferred cached_property.
removed all cases that use get_lazyop_info just to get the dtype of an op.
prereq to remove InterpretedFlopCounter
2024-04-04 17:38:19 -04:00
Szymon Ożóg
82b7b9655f test for dtype set (#4069) 2024-04-04 11:24:33 -04:00
geohotstan
1a1dd1c1a7 add and enable tests for indexing const folding (#4068)
* enable test in test_indexing

* added tests

* rename stuff

* del a test case cuz it's loadops.copy
2024-04-04 10:46:28 -04:00
Szymon Ożóg
ba118abfec improved caching for pointer arithmetics in ptx (#3922)
* improved caching for pointer arithmetics

* Add test for pointer arithmetics caching

* Refactor test
2024-04-04 07:33:48 -07:00
George Hotz
7181ffd630 HWCopyQueue in KFD (#4042)
* HWCopyQueue in KFD

* hw compute queue

* test

* move test

* more tests

* fix wait

* fix multimap

* mes crash

* tests pass but slow

* stuff is working

* one more test
2024-04-03 20:14:24 -07:00
chenyu
e3c0ac9fbf remove old envvar "OPT" (#4060) 2024-04-03 14:55:21 -04:00
chenyu
406cb5fd90 const fold ReduceOps (#4059) 2024-04-03 14:39:28 -04:00
chenyu
fe03725b21 const fold cast unrealized_unpadded_const (#4047)
* const fold unrealized_unpadded_const

changed the underlying arg directly

* CAST_BEFORE_VIEW folds some

* fix const index in getitem
2024-04-03 12:31:24 -04:00
Szymon Ożóg
e5a9bff899 Add pattern matcher tests, move uop transforms from assembly to pattern (#4056)
matcher
2024-04-03 09:06:43 -07:00
chenyu
f61ed869f5 Use exec_alu for lazy const folding (#4039) 2024-04-02 20:52:05 -04:00
chenyu
85edc493b0 uops const fold rules to prevent tautological compare warnings (#4041)
* uops const fold rules to prevent tautological compare warnings

`bool < false` is false, `true < bool` is false, `a == a` is true, `a != a` is false

* not true for nan

* and nan does not work with llvm

* full truth table test

* revert a==a

* comments and indents
2024-04-02 16:45:58 -04:00
Patrick Tsai
0147174ad6 Embedding in one kernel (#4036)
* Embedding is in one kernel

* embedding is one kernel

* rm extra line

* newline

* bert test counts state vars?

* add a test?

* move items around

---------

Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
2024-04-02 11:38:21 -04:00
Dan Hoffman
5311b45053 re-enable has_local check for linearizer test (#4034)
Co-authored-by: Dan Hoffman <daniel.hoffman@intel.com>
2024-04-02 00:02:03 -04:00
George Hotz
7425a0c646 CommandQueue is the future (#3950)
* start of command queue

* cq work

* runs

* cleanup

* outs set

* read is gone

* future buffer work

* command queue is better

* command queue works

* loadops

* delete unneeded

* command queue works

* upd

* fix tests

* use CommandQueue in compile

* delay sync
2024-04-01 17:35:48 -07:00
chenyu
82440d3416 don't call contiguous for unpadded const into multi tensor (#4032)
* don't call contiguous for unpadded const into multi tensor

fixed multi const folding for sharded const.
still wip, need to be careful that this does not break multi device cache somewhere

* ehh need a memory test for that

* simple sharded memory test
2024-04-01 19:22:14 -04:00
chenyu
77a68fc52f test examples for multi tensor const folding (#4031)
works with literal const operand now because it's copied to each shard and handled by lazy.
does not work for sharded const
2024-04-01 16:53:43 -04:00
chenyu
379d52548d const fold left const operand for ADD and MUL (#4029)
* const fold left const operand for ADD and MUL

* neg have dtype issue
2024-04-01 15:09:04 -04:00
chenyu
0e02d074bd fix Tensor.pow folding for exponent 0 and 1 (#4025) 2024-03-31 19:57:23 -04:00
mmmkkaaayy
a4ae9352bd delete irrelevant JIT regression test (#4024) 2024-03-31 19:35:35 -04:00
chenyu
d3f27761b0 move const folding of ADD/SUB/MUL from tensor to lazy (#4020)
* move const folding of ADD/SUB/MUL from tensor to lazy

will do div and pow separately.

* fix onnx adding with None
2024-03-31 16:35:36 -04:00
chenyu
7f859593b8 fix _to_const_val and const folding around it (#4017)
* fix _to_const_val and const folding around it

is_unrealized_contiguous_const is too strict and almost never hit if const is expanded.
suffice to check if there's no pad

* that test is folded

* test_const_folding
2024-03-31 13:09:23 -04:00
chenyu
c71627fee6 move GlobalCounter to helpers (#4002)
break circular import between ops and buffer
2024-03-30 00:30:30 -04:00
George Hotz
9eef44521b ScheduleItem uses Buffer (#3995)
* schedule Buffer

* update

* update tests

* master

* works

* remove LoadOps.WAIT

* fix compile2

* bad test

* rename and note
2024-03-29 20:50:27 -07:00
George Hotz
8f1e34a2a0 early src delete (#3996)
* early src delete

* fix bad test

* fix test_linearizer
2024-03-29 19:46:07 -07:00
George Hotz
f916aadaea external that test 2024-03-29 19:35:50 -07:00
George Hotz
c42ed8e99c don't reschedule 2024-03-29 19:17:37 -07:00
chenyu
b43e470f80 always use f32 for rand source of randn (#3998)
* always use f32 for source of randn

fixed bfloat16 randn to not have inf.
don't really care about float64. threefry is float32 based too

* HSA is broken
2024-03-29 17:04:34 -04:00
chenyu
6b6461122e test case Tensor.randn should be finite (#3994)
* test case Tensor.randn should be finite

there's a hack to fix float16, need a generic solution that works with bf16 and threefry

* skip not supported

* bfloat16 local is wrong

* skip RHIP
2024-03-29 14:51:02 -04:00
chenyu
d9ff636cf5 use is to compare with enum (#3993)
* use is to compare with enum

currently it's mixed between `==` and `is`, moved all to `is`

* more
2024-03-29 13:02:56 -04:00
chenyu
7bc560ec49 remove outdated bf16 comments in test_dtype (#3987) 2024-03-29 00:56:18 -04:00
uuuvn
8a40d7d423 Shape changing bitcast and assert bitcast in disk (#3973)
* Shape changing bitcast

* only support it on disk

* basic test

* more tests

* RuntimeError instead of assert

* create unique temp files

* move tests that use disk to test_disk_tensor

* linter

* remove assert on error messages

* that's RuntimeError now

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-28 21:49:10 -07:00
chenyu
793ab0512e use ctypes to truncate float64 and float32 in uops (#3986)
this fixed the softmax.argmax bug for ops_python as the float is truncated to float32
2024-03-28 23:56:50 -04:00
chenyu
c4c243f79d update test_uops _equal to use assert_allclose (#3981)
it handles nan
2024-03-28 22:14:45 -04:00