Commit Graph

4427 Commits

Author SHA1 Message Date
chenyu
2b0ee74bb6 lshift and rshift (#4591) 2024-05-14 19:16:31 -04:00
nimlgen
45e7400e3c start amd cleanup (#4583) 2024-05-14 22:59:59 +03:00
chenyu
a65c8de735 move .half() llama freq_cis to the end of sin and cos (#4587)
otherwise arange has inf if either dim or context length exceeds half.max
2024-05-14 15:00:18 -04:00
qazal
9aa5e02229 update llmc export (#4584)
* update example

* move train to optim

* rename

* b2
2024-05-14 21:18:38 +03:00
qazal
355e1c135c pad fusion tests (#4570)
* what breaks

* Revert "what breaks"

This reverts commit e79f679283.

* simplest case

* one unsafe op

* expand+pad, shrink+pad

* safe case

* refactor
2024-05-14 20:34:46 +03:00
chenyu
7afca52796 replace pow in LAMB by tracking b1**t and b2**t per step (#4582)
* replace pow in LAMB by tracking b1**t and b2**t per step

* remove t, add [self.b1_t, self.b2_t] to return

* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen
9b02aef45a remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg
5eb81ff764 Fix speed compare script (#4581)
* Fix speed compare script

* Update speed_compare_cuda_ptx.py

* Update speed_compare_cuda_ptx.py

* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan
089eeec271 setitem in-place operator tests (#4577)
* tests and error

* rename to in-place

* add a note

* more comments

* more comments

* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu
0fa57b8ce9 raise error if setitem tensors have requires_grad (#4575)
* raise error if setitem tensors have requires_grad

working on supporting this, first properly raises error

* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454 feat: add acc_dtype to einsum (#4571) 2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689 Optimize PTX gated loads index calculation (#4304)
* WIP but working

* Cleanup

* Remove float4 pred and alt

* Cleanup

* this is somehow slowin it down

* Simplify

* add define var to ignore when optimizing gates

* Update assembly.py

* Test for optimizing gated loads

* Cleanup

* Fix NEG needed before if

* Remove unused parameters

* Update assembly.py

* Fix for cachable gone

---------

Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal
c67b70ca67 small scheduler refactor (#4569)
* outputs

* consistent

* more style

* doesnt need tuple
2024-05-13 10:47:39 +03:00
qazal
77aa8659f5 use assign_targets in LazyOp creation (#4568)
* start

* correct error

* this is possible

* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176 assert error detail in test_assign (#4567)
* use regex assert

* that shouldnt raise
2024-05-13 09:56:05 +03:00
chenyu
25ec40ca93 cleanup dtype of tensor creation from list (#4566) 2024-05-13 02:47:41 -04:00
qazal
4e1135a0bc assign buffer read/write tests (#4565)
* simple tests

* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125 all uops are now cachable (#4564)
* all uops are now cachable

* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf simple stuff from new_uops branch (#4563) 2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e Test for optim assertion (#4558)
* add test for assertion

* whitespace

* restore state

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141 quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
ziereis
bcee4743ce fix error message (#4556)
* fix error messgae

* typo

* add suggestion to fix error

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 12:35:51 -07:00
chenyu
01a0c1a948 slightly faster nf4 llama (#4542) 2024-05-12 14:24:42 -04:00
qazal
4c232dc0ae refactor LoadOps scheduling (#4553)
* refactor

* op -> lop
2024-05-12 12:59:24 +03:00
qazal
3da152f0fe scheduler docs 2 (#4551)
* docs

* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot
e07c7668b3 nf4 llama (#4540) 2024-05-11 22:22:34 -07:00
George Hotz
7a26bdac65 move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666 add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
chenyu
bed70b130c mlperf bert getenv-able EVAL_STEP_FREQ (#4534) 2024-05-11 14:36:56 -04:00
George Hotz
328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
chenyu
da10cf0be1 extra/threefry.py for mem usage (#4533)
for now it needs 8N mem to generate size N rand
2024-05-11 13:46:44 -04:00
chenyu
8a0fb3d765 delete old extra/autopad.py (#4532) 2024-05-11 13:06:10 -04:00
chenyu
04a4980a51 touchup bert script (#4531)
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
qazal
4871476a1e move copy kernel to out of schedule ordering (#4530)
* delete from sorting

* move the logic
2024-05-11 14:44:44 +03:00
qazal
2fb564c125 multi reduce linearizer tests start (#4529)
* test_end_local

* test_early_end_local

* todos

* mean+std

* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f test_linearizer_correctness (#4458)
* test helper

* uops asserts

* cleanup args

* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0 infra for testing linearizer correctness (#4528)
* refactor outbufs

* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2 all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
wozeparrot
d2c347fc74 faster gather for bert (#4526) 2024-05-10 22:28:48 -07:00
George Hotz
922e6e056a hotfix: fix docs 2024-05-10 21:51:35 -07:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0 fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17 add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523)
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit

2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?

* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030 update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz
a0448ff595 use copy kernel in schedule (#4520)
* use copy kernel in schedule

* imports
2024-05-10 15:30:33 -07:00
chenyu
b15e2309bd verbose error message in getitem (#4519)
* verbose error message in getitem

still hard to undetstand, at least it prints what it's trying to expand

* sure

* :
2024-05-10 17:25:41 -04:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
qazal
a2b707a3eb scheduler comments 1 (#4515) 2024-05-10 20:44:28 +03:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00