Commit Graph

4497 Commits

Author SHA1 Message Date
qazal
498cf3e7e0 fuzzer path search for DEFINE_ACC (#4656)
* insert acc

* add test_ops

* find toposorts

* todo - not yet ready

* remove the import

* atol and childless children
2024-05-23 00:50:01 +03:00
qazal
f11a81f707 isolated test for BEAM=2 llama wrong uops toposort (#4687)
* add ast

* skip test in CI
2024-05-23 00:47:37 +03:00
wozeparrot
6020595eb0 more tensor.py docs (#4686)
wow much docs
2024-05-22 21:28:26 +00:00
Francis Lam
721f9f6acf test/external/verify_kernel: fix LOGKERNS variable name in comments (#4685)
should've been changed with the LOGKERN to LOGKERNS change
2024-05-22 17:08:40 -04:00
chenyu
f8f97562e0 remove File Specific Variables from env_vars.md (#4684) 2024-05-22 17:00:14 -04:00
chenyu
225dcab3be prepend _ to broadcast_shape and deepwalk (#4683)
* prepend `_` to broadcast_shape and deepwalk

internal only

* that too
2024-05-22 16:39:05 -04:00
qazal
c5f5755328 correctness test for multireduce nested locals (#4682)
* nested locals test

* move st
2024-05-22 19:35:35 +03:00
chenyu
bc9be39dec set timeout in search _try_compile_linearized_w_idx (#4677) 2024-05-22 12:30:31 -04:00
qazal
d12d412e8b revert uops dtype in pattern matcher (#4681)
This reverts commit 5f84cbb5df.
2024-05-22 14:45:51 +03:00
Elias Wahl
acc0039cfc Resume fix + scheduler for non weight decay params (#4679)
* move ckpt dir

* fix resume. Add scheduler group
2024-05-21 19:38:13 -04:00
chenyu
0f21aa0416 example kernel that triggers Memory access fault for resnet on red (#4678) 2024-05-21 18:59:36 -04:00
qazal
5f84cbb5df keep UOps.CAST in PHI-GEP fold for unmatching dtypes (#4674)
* these should be val.dtype

* cast float4 and float2 to root

* document tests

* 2 args

* fix assert

* match dtype

* no extra lines

* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb catch compile errors in uops tests (#4672)
* use helper and compile

* llama beam=2

* ast length

* skip float4, fix hsa

* use empty tensors
2024-05-21 12:20:35 +03:00
wozeparrot
00432496d7 feat: tinyboxgreen (#4366)
* feat: tinyboxgreen

* feat: tinyboxgreenv2

* fix symlink weights

* fix: remove llama 2 70b for now

* feat: naming

* fix: remove extra cifar steps

* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
Timmy
de733d73cf Multireduce Linearizer Tests (#4665)
* updated tests

* make sure the upcasting tests actually causes the problem

* diff cleanup

* use UOpGraph utils

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-05-21 02:43:25 +03:00
chenyu
5e3fbbb33e llama3 example add manual seed and log seed (#4667) 2024-05-20 19:09:57 -04:00
chenyu
8c99cc17f5 remove link to old adding_new_accelerators.md (#4666)
fix #4657
2024-05-20 19:05:23 -04:00
chenyu
c4089d169f update BEAM_LOCAL_MAX to 1024 (#4664)
we used 1024 for mlperf submission and result steps time is 20% faster. the default should not be worse
2024-05-20 18:06:32 -04:00
chenyu
704cb1d8a0 fix conversation.py quantize (#4663)
it used to be true for int8, not it's a string for int8 or nf4
2024-05-20 17:36:37 -04:00
chenyu
ae861325ce update llama sample for mac 32 input buffer limit (#4662)
set default sampling params to function call to 0, and top k in llama3 to 25.
2024-05-20 17:23:39 -04:00
Elias Wahl
993091adfa loss scaler + nan fixes (#4661) 2024-05-20 17:08:35 -04:00
qazal
b33c827aed UOps.RANGE toposort spec (#4660)
* use iterator

* nested loops and outer loads

* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83 consolidate uops tests (#4659)
* merge uoptimize

* move tests

* fix skip message
2024-05-20 21:42:31 +03:00
Szymon Ożóg
1e7b7b2c3c Fix flop coutning for mulacc (#4640)
* Fix flop coutning for mulacc

* add test_simple_mulacc

* Update test_uops_stats.py

* Update test_uops_stats.py

* revert test_mulacc

* Test for MULACC vs MUL+ADD
2024-05-20 12:06:00 -04:00
wozeparrot
b144d4b460 new llama3 example (#4576) 2024-05-19 22:42:23 -07:00
nimlgen
c9f7f2da70 nv hcq bind api (#4629)
* hcq bind api for nv

* linter

* linter

* add test

* small comment
2024-05-19 23:17:10 +03:00
qazal
d308f4fa9a correctly insert UOps.END* in fuzz result (#4653) 2024-05-19 21:10:28 +03:00
chenyu
456aa0b656 update test_search kernel count (#4652)
integration test that beaming 1 kernel increments kernel count by 1, and moved exiting test_kernel_count to TestTimeLinearizer
2024-05-19 13:54:52 -04:00
qazal
954718e6bf reorder DEFINE_GLOBAL in fuzz_uops (#4651)
* globals base

* test: opt out of DEFINE_GLOBAL

* do it like ExecItem
2024-05-19 20:51:31 +03:00
Léo
967e35f8b8 fix(beam): GlobalCounters kernel count increasing when clearing l2 (#4598)
* fix(beam): GlobalCounters kernel count increasing when clearing l2

* fix: removed the NOSTATS var by adding do_update_stats to Tensor.realize()

* test(search): regression test for _time_program, should not increment kernel_count

* fix(test_search): unused var and now properly checking when l2 is cleared

* fix(test_search): added assert message

* fix(test_search): now testing public beam api for kcount

* ruff fixes

---------

Co-authored-by: Léo Paillé <leo.paille@enseirb-matmeca.fr>
2024-05-19 10:03:47 -07:00
George Hotz
4753283221 LOOP -> RANGE (#4650) 2024-05-19 06:40:20 -07:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
chenyu
8a0d1ca7bb CI test timeout 20 min -> 10 min (#4645)
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
qazal
b0cb02f719 uops fuzzing infra (#4641)
* base with bfs

* find paths

* get last

* try blocks

* Revert "try blocks"

This reverts commit 25f8e3fe85.

* this should be simpler

* full exec

* support debug

* fix lint

* add todo

* copy in_degree
2024-05-18 20:19:57 +03:00
qazal
bf8f855838 assert kernel counts in unsupported fusions (#4643)
* replace with comments

* not relevant

* update comment

* custom exception maybe

* fix LoadOps.VIEW
2024-05-18 20:14:37 +03:00
qazal
a5204fe89d refactor UOps.CONST (#4639)
* delete more

* nit: dont need assign

* can this be simpler

* use scalars

* always cast

* clang needs cast

* format
2024-05-18 10:07:36 +03:00
qazal
d0a2d40df3 root cause fix for UOps.CONST bad args (#4638)
* delete that

* real fix
2024-05-18 09:15:25 +03:00
George Hotz
9b464e34ea increase speed of uops (#4637)
* increase speed of uops

* not equal

* minor speedup
2024-05-17 21:04:39 -07:00
George Hotz
b74cc1d01a uops cleanup (#4634)
* def add cleanup

* minor speedup

* add back ptx speed

* a little faster

* merge that

* only linearize once for ptx

* two graph rewrites for ptx, bug?
2024-05-17 20:02:38 -07:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
nimlgen
daf57af3eb move tc to renderers (#4631)
* move tc to renderers

* missed import

* fix typo

* fix

* fix imports

* remove from tests

* fix 4607

* nv emulate timestamp

* time is int

* correct time
2024-05-18 00:36:29 +03:00
chenyu
d70988dddf add blob and raw=true for image in docs showcase (#4632)
this should  render the image correctly
2024-05-17 16:57:15 -04:00
nimlgen
10cf8e459b hcq update queue in place (#4626)
* do not self wait in hcq

* faster enqueue

* comments

* tests

* linter

* fix typo
2024-05-17 22:18:20 +03:00
chenyu
ca1df20fa9 benchmark name fix - resnet eval is on eval data (#4628) 2024-05-17 12:56:12 -04:00
chenyu
c86adabe15 time with real global buffers in search (#4621)
* filter fake buffers in search

* test that

* update test
2024-05-17 12:36:23 -04:00
chenyu
e5d4e6a8aa BEAM=2 in green CI for 100 TFLOPS (#4624) 2024-05-16 23:28:28 -04:00
chenyu
b3dd885ffb cleanup double import from tinygrad.device in tensor.py (#4620) 2024-05-16 14:21:22 -04:00
uuuvn
639ea5b0f2 Metal linearizer failure 22 is flaky not just on CI (#4617)
* METAL doesn't fail anymore, not just on CI

* oops
2024-05-16 11:31:23 -04:00
qazal
f3f2b96583 pick schedule tests from external_test_opt (#4615)
* conv tests

* misc

* that shouldnt const fold
2024-05-16 15:43:41 +03:00
qazal
13200c6894 check simple_pads in all views (#4614) 2024-05-16 14:34:39 +03:00