Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
0fa57b8ce9 raise error if setitem tensors have requires_grad (#4575)
* raise error if setitem tensors have requires_grad

working on supporting this, first properly raises error

* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454 feat: add acc_dtype to einsum (#4571) 2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689 Optimize PTX gated loads index calculation (#4304)
* WIP but working

* Cleanup

* Remove float4 pred and alt

* Cleanup

* this is somehow slowin it down

* Simplify

* add define var to ignore when optimizing gates

* Update assembly.py

* Test for optimizing gated loads

* Cleanup

* Fix NEG needed before if

* Remove unused parameters

* Update assembly.py

* Fix for cachable gone

---------

Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal
c67b70ca67 small scheduler refactor (#4569)
* outputs

* consistent

* more style

* doesnt need tuple
2024-05-13 10:47:39 +03:00
qazal
77aa8659f5 use assign_targets in LazyOp creation (#4568)
* start

* correct error

* this is possible

* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176 assert error detail in test_assign (#4567)
* use regex assert

* that shouldnt raise
2024-05-13 09:56:05 +03:00
chenyu
25ec40ca93 cleanup dtype of tensor creation from list (#4566) 2024-05-13 02:47:41 -04:00
qazal
4e1135a0bc assign buffer read/write tests (#4565)
* simple tests

* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125 all uops are now cachable (#4564)
* all uops are now cachable

* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf simple stuff from new_uops branch (#4563) 2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e Test for optim assertion (#4558)
* add test for assertion

* whitespace

* restore state

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141 quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
ziereis
bcee4743ce fix error message (#4556)
* fix error messgae

* typo

* add suggestion to fix error

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 12:35:51 -07:00
chenyu
01a0c1a948 slightly faster nf4 llama (#4542) 2024-05-12 14:24:42 -04:00
qazal
4c232dc0ae refactor LoadOps scheduling (#4553)
* refactor

* op -> lop
2024-05-12 12:59:24 +03:00
qazal
3da152f0fe scheduler docs 2 (#4551)
* docs

* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot
e07c7668b3 nf4 llama (#4540) 2024-05-11 22:22:34 -07:00
George Hotz
7a26bdac65 move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666 add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
chenyu
bed70b130c mlperf bert getenv-able EVAL_STEP_FREQ (#4534) 2024-05-11 14:36:56 -04:00
George Hotz
328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
chenyu
da10cf0be1 extra/threefry.py for mem usage (#4533)
for now it needs 8N mem to generate size N rand
2024-05-11 13:46:44 -04:00
chenyu
8a0fb3d765 delete old extra/autopad.py (#4532) 2024-05-11 13:06:10 -04:00
chenyu
04a4980a51 touchup bert script (#4531)
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
qazal
4871476a1e move copy kernel to out of schedule ordering (#4530)
* delete from sorting

* move the logic
2024-05-11 14:44:44 +03:00
qazal
2fb564c125 multi reduce linearizer tests start (#4529)
* test_end_local

* test_early_end_local

* todos

* mean+std

* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f test_linearizer_correctness (#4458)
* test helper

* uops asserts

* cleanup args

* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0 infra for testing linearizer correctness (#4528)
* refactor outbufs

* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2 all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
wozeparrot
d2c347fc74 faster gather for bert (#4526) 2024-05-10 22:28:48 -07:00
George Hotz
922e6e056a hotfix: fix docs 2024-05-10 21:51:35 -07:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0 fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17 add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523)
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit

2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?

* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030 update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz
a0448ff595 use copy kernel in schedule (#4520)
* use copy kernel in schedule

* imports
2024-05-10 15:30:33 -07:00
chenyu
b15e2309bd verbose error message in getitem (#4519)
* verbose error message in getitem

still hard to undetstand, at least it prints what it's trying to expand

* sure

* :
2024-05-10 17:25:41 -04:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
qazal
a2b707a3eb scheduler comments 1 (#4515) 2024-05-10 20:44:28 +03:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
7c630a9a53 hotfix: fix llama spacing + fix hcq 2024-05-10 15:10:13 +00:00
George Hotz
58e7256ce9 restore hcq graph (#4513)
* Reapply "hcq graph (#4380)" (#4512)

This reverts commit 06c1e7498e.

* bring back hcq graph
2024-05-10 07:45:05 -07:00
George Hotz
06c1e7498e Revert "hcq graph (#4380)" (#4512)
This reverts commit 84a2e2b8c1.
2024-05-10 07:18:09 -07:00
nimlgen
84a2e2b8c1 hcq graph (#4380)
* start hcq graph

* hack-fix sync on amd

* nv

* fix nv

* multigrah

* fixes

* temp fix for graph

* this is not needed

* fix

* cleaner

* linetr

* fix none

* faster cuda copy

* faster amd copy

* temp nv fixes

* alloc on gpu

* exp: faster amd

* Revert "exp: faster amd"

This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c.

* revert, unrelated

* not in this pr

* linter
2024-05-10 07:15:12 -07:00
qazal
2b7ab60584 dfs fusion (#4491)
* use continue

* simplify

* flip

* track r

* derive forced_realize

* scheduler needs comments
2024-05-10 17:00:48 +03:00
qazal
bd8bb82555 move fusion out of child iteration (#4509) 2024-05-10 12:03:32 +03:00
qazal
ff216a383a refactor fused children (#4508)
* realized_children -> group

* use a set
2024-05-10 11:49:23 +03:00
chenyu
b399d98e41 fix resnet eval (#4507) 2024-05-10 00:49:00 -04:00
wozeparrot
a602dc67d3 feat: more mlperf fixes (#4505) 2024-05-09 20:50:20 -07:00
chenyu
0e8aa0e288 use fake data in beam searching resnet (#4504) 2024-05-09 23:43:50 -04:00