qazal
3da152f0fe
scheduler docs 2 ( #4551 )
...
* docs
* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot
e07c7668b3
nf4 llama ( #4540 )
2024-05-11 22:22:34 -07:00
George Hotz
7a26bdac65
move scheduleitem to schedule.py ( #4541 )
...
* move scheduleitem to schedule.py
* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666
add cpu objdump to LLVM/CLANG ( #4537 )
2024-05-11 14:28:44 -07:00
chenyu
bed70b130c
mlperf bert getenv-able EVAL_STEP_FREQ ( #4534 )
2024-05-11 14:36:56 -04:00
George Hotz
328b083e66
lil profiling script
2024-05-11 11:02:44 -07:00
chenyu
da10cf0be1
extra/threefry.py for mem usage ( #4533 )
...
for now it needs 8N mem to generate size N rand
2024-05-11 13:46:44 -04:00
chenyu
8a0fb3d765
delete old extra/autopad.py ( #4532 )
2024-05-11 13:06:10 -04:00
chenyu
04a4980a51
touchup bert script ( #4531 )
...
small adjustments, remove duplicated training setting and stop the script once target is hit
2024-05-11 13:02:02 -04:00
qazal
4871476a1e
move copy kernel to out of schedule ordering ( #4530 )
...
* delete from sorting
* move the logic
2024-05-11 14:44:44 +03:00
qazal
2fb564c125
multi reduce linearizer tests start ( #4529 )
...
* test_end_local
* test_early_end_local
* todos
* mean+std
* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f
test_linearizer_correctness ( #4458 )
...
* test helper
* uops asserts
* cleanup args
* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0
infra for testing linearizer correctness ( #4528 )
...
* refactor outbufs
* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
wozeparrot
d2c347fc74
faster gather for bert ( #4526 )
2024-05-10 22:28:48 -07:00
George Hotz
922e6e056a
hotfix: fix docs
2024-05-10 21:51:35 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert ( #4525 )
...
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17
add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit ( #4523 )
...
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit
2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?
* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030
update tests get_runner ( #4522 )
2024-05-10 20:09:22 -07:00
George Hotz
a0448ff595
use copy kernel in schedule ( #4520 )
...
* use copy kernel in schedule
* imports
2024-05-10 15:30:33 -07:00
chenyu
b15e2309bd
verbose error message in getitem ( #4519 )
...
* verbose error message in getitem
still hard to undetstand, at least it prints what it's trying to expand
* sure
* :
2024-05-10 17:25:41 -04:00
George Hotz
d438d5698d
bring buffer back to device ( #4517 )
2024-05-10 11:22:31 -07:00
qazal
a2b707a3eb
scheduler comments 1 ( #4515 )
2024-05-10 20:44:28 +03:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
7c630a9a53
hotfix: fix llama spacing + fix hcq
2024-05-10 15:10:13 +00:00
George Hotz
58e7256ce9
restore hcq graph ( #4513 )
...
* Reapply "hcq graph (#4380 )" (#4512 )
This reverts commit 06c1e7498e .
* bring back hcq graph
2024-05-10 07:45:05 -07:00
George Hotz
06c1e7498e
Revert "hcq graph ( #4380 )" ( #4512 )
...
This reverts commit 84a2e2b8c1 .
2024-05-10 07:18:09 -07:00
nimlgen
84a2e2b8c1
hcq graph ( #4380 )
...
* start hcq graph
* hack-fix sync on amd
* nv
* fix nv
* multigrah
* fixes
* temp fix for graph
* this is not needed
* fix
* cleaner
* linetr
* fix none
* faster cuda copy
* faster amd copy
* temp nv fixes
* alloc on gpu
* exp: faster amd
* Revert "exp: faster amd"
This reverts commit 2e4cfd1f7d8a33634c50fb5655cff1b40269d28c.
* revert, unrelated
* not in this pr
* linter
2024-05-10 07:15:12 -07:00
qazal
2b7ab60584
dfs fusion ( #4491 )
...
* use continue
* simplify
* flip
* track r
* derive forced_realize
* scheduler needs comments
2024-05-10 17:00:48 +03:00
qazal
bd8bb82555
move fusion out of child iteration ( #4509 )
2024-05-10 12:03:32 +03:00
qazal
ff216a383a
refactor fused children ( #4508 )
...
* realized_children -> group
* use a set
2024-05-10 11:49:23 +03:00
chenyu
b399d98e41
fix resnet eval ( #4507 )
2024-05-10 00:49:00 -04:00
wozeparrot
a602dc67d3
feat: more mlperf fixes ( #4505 )
2024-05-09 20:50:20 -07:00
chenyu
0e8aa0e288
use fake data in beam searching resnet ( #4504 )
2024-05-09 23:43:50 -04:00
George Hotz
5bfc33948a
hotfix: only run optimize_local_size once
2024-05-09 20:01:53 -07:00
wozeparrot
29daea4e60
fix: core count and os ( #4503 )
2024-05-09 19:55:07 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
1e843d495e
cleaning up search with Program ( #4500 )
...
* cleaning up search
* fix tests
* test fix
* minor compiler cleanup
2024-05-09 19:01:53 -07:00
chenyu
d3dc332c2e
Tensor.logsumexp ( #4442 )
...
the subtract max part should share with safe softmax
cleaner
2024-05-09 20:49:06 -04:00
chenyu
78b298aa2a
move 0d tensor reduce axis check to _reduce ( #4499 )
2024-05-09 20:29:55 -04:00
George Hotz
c9e84ed0da
refactor to Program class ( #4476 )
...
* refactor to Program class
* switch to Program
* fix tests
* smaller diff
* self.p
* more tests
* fix metal test
* tests
* fix openpilot
* move that to linearizer
* p.launchdims
2024-05-09 17:29:07 -07:00
chenyu
5de4a46f10
re-enable gpt2 half/beam mac benchmark ( #4496 )
...
* re-enable gpt2 half/beam mac benchmark
from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce.
run this in M1 Max for 20 loops and it's fine
* that should be jitted
2024-05-09 19:15:32 -04:00
nimlgen
a2e2ba380c
nv tune shmem size ( #4495 )
...
* nv tune shmem size
* compare them
* linter
* linter2
2024-05-10 00:35:01 +03:00
chenyu
ef93e41a15
resnet mlperf systems add tinygrad commit and python / runtime versions ( #4494 )
2024-05-09 16:04:15 -04:00
chenyu
b5afdfbc5b
first draft resnet mlperf readme ( #4493 )
...
* start readme
* something
2024-05-09 15:51:44 -04:00
chenyu
047c7f3e5b
polish resnet mlperf logging ( #4490 )
...
don't include save final check point time in run time, and some cosmetic order changes
2024-05-09 13:04:24 -04:00
chenyu
d78e159aa3
resnet logging move RUN_START to start of the script ( #4488 )
2024-05-09 12:32:32 -04:00
chenyu
1bcb58479d
resnet setup power cap red box gpu to 350W ( #4484 )
...
1%-2% faster
2024-05-08 23:32:41 -04:00
chenyu
0ed755bcf5
resnet use EVAL_BS=192 ( #4482 )
...
* resnet use EVAL_BS=192
also lower green run BEAM_MIN_PROGRESS from 10 to 5
* BEAM_MIN_PROGRESS 5 is too close to setup limit
2024-05-08 22:29:27 -04:00