Commit Graph

4410 Commits

Author SHA1 Message Date
qazal
4e1135a0bc assign buffer read/write tests (#4565)
* simple tests

* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125 all uops are now cachable (#4564)
* all uops are now cachable

* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf simple stuff from new_uops branch (#4563) 2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e Test for optim assertion (#4558)
* add test for assertion

* whitespace

* restore state

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141 quantized llama multilazybuffer fix (#4557) 2024-05-12 14:19:21 -07:00
ziereis
bcee4743ce fix error message (#4556)
* fix error messgae

* typo

* add suggestion to fix error

---------

Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de>
2024-05-12 12:35:51 -07:00
chenyu
01a0c1a948 slightly faster nf4 llama (#4542) 2024-05-12 14:24:42 -04:00
qazal
4c232dc0ae refactor LoadOps scheduling (#4553)
* refactor

* op -> lop
2024-05-12 12:59:24 +03:00
qazal
3da152f0fe scheduler docs 2 (#4551)
* docs

* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot
e07c7668b3 nf4 llama (#4540) 2024-05-11 22:22:34 -07:00
George Hotz
7a26bdac65 move scheduleitem to schedule.py (#4541)
* move scheduleitem to schedule.py

* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666 add cpu objdump to LLVM/CLANG (#4537) 2024-05-11 14:28:44 -07:00
chenyu
bed70b130c mlperf bert getenv-able EVAL_STEP_FREQ (#4534) 2024-05-11 14:36:56 -04:00
George Hotz
328b083e66 lil profiling script 2024-05-11 11:02:44 -07:00
chenyu
da10cf0be1 extra/threefry.py for mem usage (#4533)
for now it needs 8N mem to generate size N rand
2024-05-11 13:46:44 -04:00
chenyu
8a0fb3d765 delete old extra/autopad.py (#4532) 2024-05-11 13:06:10 -04:00
chenyu
04a4980a51 touchup bert script (#4531)
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
qazal
4871476a1e move copy kernel to out of schedule ordering (#4530)
* delete from sorting

* move the logic
2024-05-11 14:44:44 +03:00
qazal
2fb564c125 multi reduce linearizer tests start (#4529)
* test_end_local

* test_early_end_local

* todos

* mean+std

* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f test_linearizer_correctness (#4458)
* test helper

* uops asserts

* cleanup args

* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0 infra for testing linearizer correctness (#4528)
* refactor outbufs

* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2 all realize 2 (#4527)
* all realize 2

* tests fixup

* fix more tests

* fix openpilot

* fix tests

* unneeded
2024-05-10 22:43:09 -07:00
wozeparrot
d2c347fc74 faster gather for bert (#4526) 2024-05-10 22:28:48 -07:00
George Hotz
922e6e056a hotfix: fix docs 2024-05-10 21:51:35 -07:00
George Hotz
347a3acb37 add renderer class (#4524)
* add renderer class

* tests pass

* fix pylint

* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0 fix TRAIN_BEAM and Tensor.training for mlperf bert (#4525)
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17 add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit (#4523)
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit

2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?

* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030 update tests get_runner (#4522) 2024-05-10 20:09:22 -07:00
George Hotz
a0448ff595 use copy kernel in schedule (#4520)
* use copy kernel in schedule

* imports
2024-05-10 15:30:33 -07:00
chenyu
b15e2309bd verbose error message in getitem (#4519)
* verbose error message in getitem

still hard to undetstand, at least it prints what it's trying to expand

* sure

* :
2024-05-10 17:25:41 -04:00
George Hotz
d438d5698d bring buffer back to device (#4517) 2024-05-10 11:22:31 -07:00
qazal
a2b707a3eb scheduler comments 1 (#4515) 2024-05-10 20:44:28 +03:00
George Hotz
4eef1ee9bf move renderer into options (#4514)
* move renderer into options

* fix tests

* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
7c630a9a53 hotfix: fix llama spacing + fix hcq 2024-05-10 15:10:13 +00:00
George Hotz
58e7256ce9 restore hcq graph (#4513)
* Reapply "hcq graph (#4380)" (#4512)

This reverts commit 06c1e7498e.

* bring back hcq graph
2024-05-10 07:45:05 -07:00
George Hotz
06c1e7498e Revert "hcq graph (#4380)" (#4512)
This reverts commit 84a2e2b8c1.
2024-05-10 07:18:09 -07:00
nimlgen
84a2e2b8c1 hcq graph (#4380)
* start hcq graph

* hack-fix sync on amd

* nv

* fix nv

* multigrah

* fixes

* temp fix for graph

* this is not needed

* fix

* cleaner

* linetr

* fix none

* faster cuda copy

* faster amd copy

* temp nv fixes

* alloc on gpu

* exp: faster amd

* Revert "exp: faster amd"

This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c.

* revert, unrelated

* not in this pr

* linter
2024-05-10 07:15:12 -07:00
qazal
2b7ab60584 dfs fusion (#4491)
* use continue

* simplify

* flip

* track r

* derive forced_realize

* scheduler needs comments
2024-05-10 17:00:48 +03:00
qazal
bd8bb82555 move fusion out of child iteration (#4509) 2024-05-10 12:03:32 +03:00
qazal
ff216a383a refactor fused children (#4508)
* realized_children -> group

* use a set
2024-05-10 11:49:23 +03:00
chenyu
b399d98e41 fix resnet eval (#4507) 2024-05-10 00:49:00 -04:00
wozeparrot
a602dc67d3 feat: more mlperf fixes (#4505) 2024-05-09 20:50:20 -07:00
chenyu
0e8aa0e288 use fake data in beam searching resnet (#4504) 2024-05-09 23:43:50 -04:00
George Hotz
5bfc33948a hotfix: only run optimize_local_size once 2024-05-09 20:01:53 -07:00
wozeparrot
29daea4e60 fix: core count and os (#4503) 2024-05-09 19:55:07 -07:00
George Hotz
89e119bc58 move Allocator to buffer.py (#4502)
* move Allocator to buffer.py

* move those to realize

* memory file

* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
1e843d495e cleaning up search with Program (#4500)
* cleaning up search

* fix tests

* test fix

* minor compiler cleanup
2024-05-09 19:01:53 -07:00
chenyu
d3dc332c2e Tensor.logsumexp (#4442)
the subtract max part should share with safe softmax

cleaner
2024-05-09 20:49:06 -04:00
chenyu
78b298aa2a move 0d tensor reduce axis check to _reduce (#4499) 2024-05-09 20:29:55 -04:00
George Hotz
c9e84ed0da refactor to Program class (#4476)
* refactor to Program class

* switch to Program

* fix tests

* smaller diff

* self.p

* more tests

* fix metal test

* tests

* fix openpilot

* move that to linearizer

* p.launchdims
2024-05-09 17:29:07 -07:00