qazal
0e69b22629
multireduce OptOps tests (start) ( #4733 )
...
* start
* full tests
* add skips
* unrelated
* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1
delete duplicate tests in test_linearizer ( #4723 )
...
* delete duplicate test
test_simplify_uop isnt needed
max works
* ci
* remove skip
* add skip back
2024-05-26 08:11:42 +03:00
chenyu
31358cbea5
change Tensor.stack to method ( #4719 )
2024-05-24 17:04:19 -04:00
Szymon Ożóg
212025b53c
Int mulacc for ptx ( #4680 )
...
* IntMulacc
* don't mov const
* Dont do int mulacc on ocelot
* Workaround for ocelot
* Remove ocelot workaround
* Fix tests that merged into mulacc
* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
chenyu
4398cc3654
update test_linearizer.py ( #4707 )
...
tests passed locally on tinybox green. Also unified test skipping with local/shared/float4/tc
2024-05-23 22:41:22 -04:00
Francis Lam
49225522aa
wmma: chain unrolled WMMAs and phi only at the end ( #4703 )
...
* wmma: chain unrolled WMMAs and phi only at the end
* fix linter and tests
* reduce lines
2024-05-23 17:50:18 -04:00
qazal
532c9e08e3
proposal: PHI nodes in TC shouldn't have children inside the loop ( #4694 )
...
* expectations from UOpGraph
* one with children
* minimal repro
* replace
2024-05-23 15:11:26 -04:00
Szymon Ożóg
9a9963ba7b
Remove uops deepcopy from PTX ( #4671 )
...
* Remove uops deepcopy from PTX
* Update test
* Fix test
* fix for non-ptx
* Clean
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-22 23:14:17 -04:00
qazal
f11a81f707
isolated test for BEAM=2 llama wrong uops toposort ( #4687 )
...
* add ast
* skip test in CI
2024-05-23 00:47:37 +03:00
qazal
c5f5755328
correctness test for multireduce nested locals ( #4682 )
...
* nested locals test
* move st
2024-05-22 19:35:35 +03:00
qazal
d12d412e8b
revert uops dtype in pattern matcher ( #4681 )
...
This reverts commit 5f84cbb5df .
2024-05-22 14:45:51 +03:00
qazal
5f84cbb5df
keep UOps.CAST in PHI-GEP fold for unmatching dtypes ( #4674 )
...
* these should be val.dtype
* cast float4 and float2 to root
* document tests
* 2 args
* fix assert
* match dtype
* no extra lines
* better fix
2024-05-21 14:59:49 -04:00
qazal
458a3961eb
catch compile errors in uops tests ( #4672 )
...
* use helper and compile
* llama beam=2
* ast length
* skip float4, fix hsa
* use empty tensors
2024-05-21 12:20:35 +03:00
Timmy
de733d73cf
Multireduce Linearizer Tests ( #4665 )
...
* updated tests
* make sure the upcasting tests actually causes the problem
* diff cleanup
* use UOpGraph utils
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-05-21 02:43:25 +03:00
qazal
b33c827aed
UOps.RANGE toposort spec ( #4660 )
...
* use iterator
* nested loops and outer loads
* uop after phi
2024-05-20 23:38:20 +03:00
qazal
0d9e623d83
consolidate uops tests ( #4659 )
...
* merge uoptimize
* move tests
* fix skip message
2024-05-20 21:42:31 +03:00
George Hotz
4753283221
LOOP -> RANGE ( #4650 )
2024-05-19 06:40:20 -07:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00
nimlgen
daf57af3eb
move tc to renderers ( #4631 )
...
* move tc to renderers
* missed import
* fix typo
* fix
* fix imports
* remove from tests
* fix 4607
* nv emulate timestamp
* time is int
* correct time
2024-05-18 00:36:29 +03:00
nimlgen
eb9689336e
nv mockgpu ( #4600 )
...
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
2024-05-15 23:46:08 +03:00
Ahmed Harmouche
662bca8134
Split UnaryOps.CAST into CAST and BITCAST ( #4487 )
...
* Separate cast and bitcast
* Fix lint
* No more arg[0]
* Revert "No more arg[0]"
This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.
* CAST/BITCAST arg is the dtype only, no more tuple
* No image bitcast, regenerate dataset
* Small fixes
2024-05-15 11:43:31 -04:00
George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
nimlgen
9b02aef45a
remove rhip ( #4579 )
...
* remove rhip
* remove hip runner
2024-05-14 17:58:19 +03:00
nimlgen
2131556c2c
amd mockgpu ( #4535 )
...
* start mock amd gpu
* virt files
* cleaner
* init ci
* small fixes
* linter
* better?
* ugh
* linter
* fix
* diable some
* run shorter
* fixes
* add hcq test
* fix
* fix cmd revert
2024-05-14 14:28:04 +03:00
Filip Brzek
f7d08bd454
feat: add acc_dtype to einsum ( #4571 )
2024-05-13 14:02:07 -04:00
George Hotz
b660f60125
all uops are now cachable ( #4564 )
...
* all uops are now cachable
* cachable is gone
2024-05-12 22:34:35 -07:00
qazal
2fb564c125
multi reduce linearizer tests start ( #4529 )
...
* test_end_local
* test_early_end_local
* todos
* mean+std
* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f
test_linearizer_correctness ( #4458 )
...
* test helper
* uops asserts
* cleanup args
* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0
infra for testing linearizer correctness ( #4528 )
...
* refactor outbufs
* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
George Hotz
1e843d495e
cleaning up search with Program ( #4500 )
...
* cleaning up search
* fix tests
* test fix
* minor compiler cleanup
2024-05-09 19:01:53 -07:00
Francis Lam
47750e65fd
kernel: un-reverse the order of the local indices ( #4454 )
...
no change to performance or behavior. new LOCALS are added to the
left side of the LOCALS block (to the left of the first_reduce).
2024-05-06 15:21:27 -04:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
5c5b40880f
search: fix edge cases on screening potential ops ( #4394 )
...
* search: fix edge cases on screening potential ops
won't change correctness, but will save a little python time by
properly deduplicating potential actions
* check for de-duplication instead of exact valid actions
* refactor long line
2024-05-02 14:53:05 -04:00
Francis Lam
0d33c54d99
kernel: change PADTO check to allow up to 4x padding ( #4354 )
...
* kernel: change PADTO check to allow up to 4x padding
also optionally remove PADTO from the search action space with
BEAM_PADTO=0.
* fix test_linearizer test_tensor_cores_padded tests
* update resnet runs to use SPLIT_REDUCEOP=1
* fix up search TC axis and amt checking
* fix up the dimensions of the TC tests
2024-04-30 15:29:34 -04:00
Francis Lam
c12bcabb07
search: fix actions space checks to ignore TC axis and amt ( #4360 )
...
* search: fix actions space checks to ignore TC axis and amt
* add test for number of actions in get_linearizer_actions
2024-04-30 14:02:22 -04:00
George Hotz
d325be2540
update docs ( #4356 )
...
* update docs
* nn.md
* mnist cleanups
* rhip test is very slow
2024-04-30 16:51:42 +09:00
Francis Lam
a9a1fa6bbf
wmma: add reduce axis choice to TC action space ( #4328 )
...
* wmma: add reduce axis choice to TC action space
* add test for TC multi-reduce axis choice
2024-04-29 19:15:39 -04:00
George Hotz
38f97aa0fe
rename rawbufs to bufs in ExecItem ( #4274 )
2024-04-24 11:27:27 +08:00
Francis Lam
3f6c7ca8bf
test: fix test_tensor_core_padded on CUDA and add to benchmarks ( #4258 )
...
* test: fix test_tensor_core_padded on CUDA and add to benchmarks
* fix linter
* run both tests in one call
2024-04-22 23:22:11 -04:00
Francis Lam
bbb0ad4800
wmma: widen TC usage in search by using PADTO on TC axes when possible ( #4216 )
...
* wmma: widen TC usage in search by using PADTO on TC axes when possible
* test: start tests for the new padding TC behavior
* search: upgrade padded TC search to TC_OPT >= 2
* test: add behavior and correctness test for padded TC
added optional argument to apply_tensor_core to set TC_OPT level
* linearizer: add tests for the PADTO behvaior and docs
2024-04-22 16:50:31 -04:00
chenyu
31c9d9a228
fix test_linearizer tc opt tests for bf16 ( #4237 )
...
bf16 tc has larger rtol
2024-04-20 11:51:50 -04:00
George Hotz
ebc94c9d6c
rewrite the jit in the context of new schedule ( #4162 )
...
* rewrite the jit in the context of new schedule
* mypy better
* fix placeholder
* tests
* all functionality should work
* fix tests
* no CacheCollector
2024-04-12 21:54:36 -07:00
chenyu
06bcae13b4
PADTO SUM if parents of sum are all zero-preserving ( #4140 )
...
* PADTO SUM if parents of sum are all zero-preserving
* test case unsafe ops after sum is fine
* reuse UNSAFE_PAD_OPS
* update db version
2024-04-10 22:16:12 -04:00
chenyu
406cb5fd90
const fold ReduceOps ( #4059 )
2024-04-03 14:39:28 -04:00
chenyu
f61ed869f5
Use exec_alu for lazy const folding ( #4039 )
2024-04-02 20:52:05 -04:00
George Hotz
9eef44521b
ScheduleItem uses Buffer ( #3995 )
...
* schedule Buffer
* update
* update tests
* master
* works
* remove LoadOps.WAIT
* fix compile2
* bad test
* rename and note
2024-03-29 20:50:27 -07:00
George Hotz
8f1e34a2a0
early src delete ( #3996 )
...
* early src delete
* fix bad test
* fix test_linearizer
2024-03-29 19:46:07 -07:00
chenyu
d9ff636cf5
use is to compare with enum ( #3993 )
...
* use is to compare with enum
currently it's mixed between `==` and `is`, moved all to `is`
* more
2024-03-29 13:02:56 -04:00