Commit Graph

4461 Commits

Author SHA1 Message Date
qazal
d0a2d40df3 root cause fix for UOps.CONST bad args (#4638)
* delete that

* real fix
2024-05-18 09:15:25 +03:00
George Hotz
9b464e34ea increase speed of uops (#4637)
* increase speed of uops

* not equal

* minor speedup
2024-05-17 21:04:39 -07:00
George Hotz
b74cc1d01a uops cleanup (#4634)
* def add cleanup

* minor speedup

* add back ptx speed

* a little faster

* merge that

* only linearize once for ptx

* two graph rewrites for ptx, bug?
2024-05-17 20:02:38 -07:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
nimlgen
daf57af3eb move tc to renderers (#4631)
* move tc to renderers

* missed import

* fix typo

* fix

* fix imports

* remove from tests

* fix 4607

* nv emulate timestamp

* time is int

* correct time
2024-05-18 00:36:29 +03:00
chenyu
d70988dddf add blob and raw=true for image in docs showcase (#4632)
this should  render the image correctly
2024-05-17 16:57:15 -04:00
nimlgen
10cf8e459b hcq update queue in place (#4626)
* do not self wait in hcq

* faster enqueue

* comments

* tests

* linter

* fix typo
2024-05-17 22:18:20 +03:00
chenyu
ca1df20fa9 benchmark name fix - resnet eval is on eval data (#4628) 2024-05-17 12:56:12 -04:00
chenyu
c86adabe15 time with real global buffers in search (#4621)
* filter fake buffers in search

* test that

* update test
2024-05-17 12:36:23 -04:00
chenyu
e5d4e6a8aa BEAM=2 in green CI for 100 TFLOPS (#4624) 2024-05-16 23:28:28 -04:00
chenyu
b3dd885ffb cleanup double import from tinygrad.device in tensor.py (#4620) 2024-05-16 14:21:22 -04:00
uuuvn
639ea5b0f2 Metal linearizer failure 22 is flaky not just on CI (#4617)
* METAL doesn't fail anymore, not just on CI

* oops
2024-05-16 11:31:23 -04:00
qazal
f3f2b96583 pick schedule tests from external_test_opt (#4615)
* conv tests

* misc

* that shouldnt const fold
2024-05-16 15:43:41 +03:00
qazal
13200c6894 check simple_pads in all views (#4614) 2024-05-16 14:34:39 +03:00
qazal
0b464df605 base change scheduling spec (#4613)
* spec and kernel cnt

* dont use half

* skip half
2024-05-16 13:30:49 +03:00
nimlgen
65f7e3b3ab nv setup constbuf4 (#4511)
* nv correct constbuf 4

* compare results to cuda

* test fixed

* failed kernel

* repro

* revert this change
2024-05-16 10:42:35 +03:00
chenyu
04f2327ca3 fix abs of diff of uint (#4411) 2024-05-15 18:39:11 -04:00
chenyu
2119e0456d redo simpler abs and sign (#4611)
moved Sign logic to function.py, and backward always returns 0 to match torch.
rewrite abs as `self * self.sign()`, so it's backward also matches torch.
2024-05-15 18:19:46 -04:00
nimlgen
eb9689336e nv mockgpu (#4600)
* mockgpu nv

* works

* comment that out

* fix merge

* setup gpuocelot

* install packages

* not run all of them

* passes

* fix ci

* almost

* should pass

* linter

* linter 2

* try this?

* ugn, not supported

* ci

* remove ticket from description

* better descs
2024-05-15 23:46:08 +03:00
chenyu
3c11ca452e skip CLANG test casts between double and half for now (#4609)
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
chenyu
8694eeb16d Revert "simpler abs and sign (#4606)" (#4608)
This reverts commit a5e157f663.
2024-05-15 15:46:33 -04:00
chenyu
a5e157f663 simpler abs and sign (#4606) 2024-05-15 14:33:09 -04:00
George Hotz
5ba611787d move image into tensor.py. delete features (#4603)
* move image into tensor.py

* change setup.py

* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal
36d2ac603e allbufs are base (#4605) 2024-05-15 20:46:37 +03:00
chenyu
067ff719c2 fix comment for Tensor.swish (#4604)
bad string replacement when we changed `function.` to `F.`
2024-05-15 13:32:47 -04:00
qazal
cd4d7e18c7 _recurse_lb small cleanup (#4601)
* minor cleanups

* comments

* extend env in replay
2024-05-15 19:10:42 +03:00
Ahmed Harmouche
662bca8134 Split UnaryOps.CAST into CAST and BITCAST (#4487)
* Separate cast and bitcast

* Fix lint

* No more arg[0]

* Revert "No more arg[0]"

This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.

* CAST/BITCAST arg is the dtype only, no more tuple

* No image bitcast, regenerate dataset

* Small fixes
2024-05-15 11:43:31 -04:00
George Hotz
53d082a2aa move memory into schedule (#4597) 2024-05-15 07:54:20 -07:00
qazal
a4a23c40a0 test masked assign views (#4599)
* possible masked

* not contiguous mask
2024-05-15 15:06:48 +03:00
George Hotz
ff64bcab69 move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
George Hotz
afa9753d39 ruff cleanup (#4594)
* check editor config

* no editorconfig, it doesn't work

* ruff cleanups
2024-05-14 21:16:14 -07:00
wozeparrot
7f009cf9fa split arange threefry (#4590) 2024-05-14 21:10:22 -07:00
George Hotz
9425973bc7 docs cleanup and move (#4593)
* cleanup and move

* docs-legacy is gone

* don't update setup.py
2024-05-14 20:44:59 -07:00
George Hotz
fd02ab1e8b move disassemblers and openpilot (#4592)
* move disassemblers and openpilot

* delete junk

* put that in pre-commit

* fixup readme
2024-05-14 19:30:02 -07:00
chenyu
2b0ee74bb6 lshift and rshift (#4591) 2024-05-14 19:16:31 -04:00
nimlgen
45e7400e3c start amd cleanup (#4583) 2024-05-14 22:59:59 +03:00
chenyu
a65c8de735 move .half() llama freq_cis to the end of sin and cos (#4587)
otherwise arange has inf if either dim or context length exceeds half.max
2024-05-14 15:00:18 -04:00
qazal
9aa5e02229 update llmc export (#4584)
* update example

* move train to optim

* rename

* b2
2024-05-14 21:18:38 +03:00
qazal
355e1c135c pad fusion tests (#4570)
* what breaks

* Revert "what breaks"

This reverts commit e79f679283.

* simplest case

* one unsafe op

* expand+pad, shrink+pad

* safe case

* refactor
2024-05-14 20:34:46 +03:00
chenyu
7afca52796 replace pow in LAMB by tracking b1**t and b2**t per step (#4582)
* replace pow in LAMB by tracking b1**t and b2**t per step

* remove t, add [self.b1_t, self.b2_t] to return

* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen
9b02aef45a remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg
5eb81ff764 Fix speed compare script (#4581)
* Fix speed compare script

* Update speed_compare_cuda_ptx.py

* Update speed_compare_cuda_ptx.py

* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan
089eeec271 setitem in-place operator tests (#4577)
* tests and error

* rename to in-place

* add a note

* more comments

* more comments

* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu
0fa57b8ce9 raise error if setitem tensors have requires_grad (#4575)
* raise error if setitem tensors have requires_grad

working on supporting this, first properly raises error

* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454 feat: add acc_dtype to einsum (#4571) 2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689 Optimize PTX gated loads index calculation (#4304)
* WIP but working

* Cleanup

* Remove float4 pred and alt

* Cleanup

* this is somehow slowin it down

* Simplify

* add define var to ignore when optimizing gates

* Update assembly.py

* Test for optimizing gated loads

* Cleanup

* Fix NEG needed before if

* Remove unused parameters

* Update assembly.py

* Fix for cachable gone

---------

Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-05-13 10:14:01 -07:00
qazal
c67b70ca67 small scheduler refactor (#4569)
* outputs

* consistent

* more style

* doesnt need tuple
2024-05-13 10:47:39 +03:00
qazal
77aa8659f5 use assign_targets in LazyOp creation (#4568)
* start

* correct error

* this is possible

* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176 assert error detail in test_assign (#4567)
* use regex assert

* that shouldnt raise
2024-05-13 09:56:05 +03:00