qazal
0b464df605
base change scheduling spec ( #4613 )
...
* spec and kernel cnt
* dont use half
* skip half
2024-05-16 13:30:49 +03:00
nimlgen
65f7e3b3ab
nv setup constbuf4 ( #4511 )
...
* nv correct constbuf 4
* compare results to cuda
* test fixed
* failed kernel
* repro
* revert this change
2024-05-16 10:42:35 +03:00
chenyu
04f2327ca3
fix abs of diff of uint ( #4411 )
2024-05-15 18:39:11 -04:00
chenyu
2119e0456d
redo simpler abs and sign ( #4611 )
...
moved Sign logic to function.py, and backward always returns 0 to match torch.
rewrite abs as `self * self.sign()`, so it's backward also matches torch.
2024-05-15 18:19:46 -04:00
nimlgen
eb9689336e
nv mockgpu ( #4600 )
...
* mockgpu nv
* works
* comment that out
* fix merge
* setup gpuocelot
* install packages
* not run all of them
* passes
* fix ci
* almost
* should pass
* linter
* linter 2
* try this?
* ugn, not supported
* ci
* remove ticket from description
* better descs
2024-05-15 23:46:08 +03:00
chenyu
3c11ca452e
skip CLANG test casts between double and half for now ( #4609 )
...
start breaking after github CI image update
2024-05-15 16:17:06 -04:00
chenyu
8694eeb16d
Revert "simpler abs and sign ( #4606 )" ( #4608 )
...
This reverts commit a5e157f663 .
2024-05-15 15:46:33 -04:00
chenyu
a5e157f663
simpler abs and sign ( #4606 )
2024-05-15 14:33:09 -04:00
George Hotz
5ba611787d
move image into tensor.py. delete features ( #4603 )
...
* move image into tensor.py
* change setup.py
* openpilot tests need pythonpath now
2024-05-15 10:50:25 -07:00
qazal
36d2ac603e
allbufs are base ( #4605 )
2024-05-15 20:46:37 +03:00
chenyu
067ff719c2
fix comment for Tensor.swish ( #4604 )
...
bad string replacement when we changed `function.` to `F.`
2024-05-15 13:32:47 -04:00
qazal
cd4d7e18c7
_recurse_lb small cleanup ( #4601 )
...
* minor cleanups
* comments
* extend env in replay
2024-05-15 19:10:42 +03:00
Ahmed Harmouche
662bca8134
Split UnaryOps.CAST into CAST and BITCAST ( #4487 )
...
* Separate cast and bitcast
* Fix lint
* No more arg[0]
* Revert "No more arg[0]"
This reverts commit dee6911335513f092fe2cbb9684e8a9d26aad964.
* CAST/BITCAST arg is the dtype only, no more tuple
* No image bitcast, regenerate dataset
* Small fixes
2024-05-15 11:43:31 -04:00
George Hotz
53d082a2aa
move memory into schedule ( #4597 )
2024-05-15 07:54:20 -07:00
qazal
a4a23c40a0
test masked assign views ( #4599 )
...
* possible masked
* not contiguous mask
2024-05-15 15:06:48 +03:00
George Hotz
ff64bcab69
move graph/search to engine ( #4596 )
2024-05-14 23:12:59 -07:00
George Hotz
afa9753d39
ruff cleanup ( #4594 )
...
* check editor config
* no editorconfig, it doesn't work
* ruff cleanups
2024-05-14 21:16:14 -07:00
wozeparrot
7f009cf9fa
split arange threefry ( #4590 )
2024-05-14 21:10:22 -07:00
George Hotz
9425973bc7
docs cleanup and move ( #4593 )
...
* cleanup and move
* docs-legacy is gone
* don't update setup.py
2024-05-14 20:44:59 -07:00
George Hotz
fd02ab1e8b
move disassemblers and openpilot ( #4592 )
...
* move disassemblers and openpilot
* delete junk
* put that in pre-commit
* fixup readme
2024-05-14 19:30:02 -07:00
chenyu
2b0ee74bb6
lshift and rshift ( #4591 )
2024-05-14 19:16:31 -04:00
nimlgen
45e7400e3c
start amd cleanup ( #4583 )
2024-05-14 22:59:59 +03:00
chenyu
a65c8de735
move .half() llama freq_cis to the end of sin and cos ( #4587 )
...
otherwise arange has inf if either dim or context length exceeds half.max
2024-05-14 15:00:18 -04:00
qazal
9aa5e02229
update llmc export ( #4584 )
...
* update example
* move train to optim
* rename
* b2
2024-05-14 21:18:38 +03:00
qazal
355e1c135c
pad fusion tests ( #4570 )
...
* what breaks
* Revert "what breaks"
This reverts commit e79f679283 .
* simplest case
* one unsafe op
* expand+pad, shrink+pad
* safe case
* refactor
2024-05-14 20:34:46 +03:00
chenyu
7afca52796
replace pow in LAMB by tracking b1**t and b2**t per step ( #4582 )
...
* replace pow in LAMB by tracking b1**t and b2**t per step
* remove t, add [self.b1_t, self.b2_t] to return
* adam has one less kernel
2024-05-14 13:08:22 -04:00
nimlgen
9b02aef45a
remove rhip ( #4579 )
...
* remove rhip
* remove hip runner
2024-05-14 17:58:19 +03:00
Szymon Ożóg
5eb81ff764
Fix speed compare script ( #4581 )
...
* Fix speed compare script
* Update speed_compare_cuda_ptx.py
* Update speed_compare_cuda_ptx.py
* Remove unused function
2024-05-14 17:47:03 +03:00
nimlgen
2131556c2c
amd mockgpu ( #4535 )
...
* start mock amd gpu
* virt files
* cleaner
* init ci
* small fixes
* linter
* better?
* ugh
* linter
* fix
* diable some
* run shorter
* fixes
* add hcq test
* fix
* fix cmd revert
2024-05-14 14:28:04 +03:00
geohotstan
089eeec271
setitem in-place operator tests ( #4577 )
...
* tests and error
* rename to in-place
* add a note
* more comments
* more comments
* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu
0fa57b8ce9
raise error if setitem tensors have requires_grad ( #4575 )
...
* raise error if setitem tensors have requires_grad
working on supporting this, first properly raises error
* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454
feat: add acc_dtype to einsum ( #4571 )
2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689
Optimize PTX gated loads index calculation ( #4304 )
...
* WIP but working
* Cleanup
* Remove float4 pred and alt
* Cleanup
* this is somehow slowin it down
* Simplify
* add define var to ignore when optimizing gates
* Update assembly.py
* Test for optimizing gated loads
* Cleanup
* Fix NEG needed before if
* Remove unused parameters
* Update assembly.py
* Fix for cachable gone
---------
Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-13 10:14:01 -07:00
qazal
c67b70ca67
small scheduler refactor ( #4569 )
...
* outputs
* consistent
* more style
* doesnt need tuple
2024-05-13 10:47:39 +03:00
qazal
77aa8659f5
use assign_targets in LazyOp creation ( #4568 )
...
* start
* correct error
* this is possible
* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176
assert error detail in test_assign ( #4567 )
...
* use regex assert
* that shouldnt raise
2024-05-13 09:56:05 +03:00
chenyu
25ec40ca93
cleanup dtype of tensor creation from list ( #4566 )
2024-05-13 02:47:41 -04:00
qazal
4e1135a0bc
assign buffer read/write tests ( #4565 )
...
* simple tests
* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125
all uops are now cachable ( #4564 )
...
* all uops are now cachable
* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf
simple stuff from new_uops branch ( #4563 )
2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e
Test for optim assertion ( #4558 )
...
* add test for assertion
* whitespace
* restore state
---------
Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de >
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141
quantized llama multilazybuffer fix ( #4557 )
2024-05-12 14:19:21 -07:00
ziereis
bcee4743ce
fix error message ( #4556 )
...
* fix error messgae
* typo
* add suggestion to fix error
---------
Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de >
2024-05-12 12:35:51 -07:00
chenyu
01a0c1a948
slightly faster nf4 llama ( #4542 )
2024-05-12 14:24:42 -04:00
qazal
4c232dc0ae
refactor LoadOps scheduling ( #4553 )
...
* refactor
* op -> lop
2024-05-12 12:59:24 +03:00
qazal
3da152f0fe
scheduler docs 2 ( #4551 )
...
* docs
* delete cleanups
2024-05-12 12:15:39 +03:00
wozeparrot
e07c7668b3
nf4 llama ( #4540 )
2024-05-11 22:22:34 -07:00
George Hotz
7a26bdac65
move scheduleitem to schedule.py ( #4541 )
...
* move scheduleitem to schedule.py
* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666
add cpu objdump to LLVM/CLANG ( #4537 )
2024-05-11 14:28:44 -07:00
chenyu
bed70b130c
mlperf bert getenv-able EVAL_STEP_FREQ ( #4534 )
2024-05-11 14:36:56 -04:00