Commit Graph

10417 Commits

Author SHA1 Message Date
Elias Wahl
bb248a0dd1 Optional half matmul (#4835)
* half linear

* move weight cast back

* oops

* matmul dtype var

* todo comment
2024-06-04 17:53:41 -04:00
Elias Wahl
04e237328b Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
nimlgen
1b8bed4a26 nv check cmdq overrun (#4824)
* nv check cmdq overrun

* fix assert
2024-06-04 23:22:58 +03:00
David Hou
cddce0e168 don't cast before view on shape changing bitcast (#4833)
* don't cast before view on shape changing bitcast

* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
0c3a996e64 Nest ifs for dtype and uop in pattern matcher (#4834) 2024-06-04 15:51:28 -04:00
Alec Chen
4909a0d16f Fix arg set in pattern matcher (#4830) 2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65 Add arg set regression test for pattern matcher (#4827)
* Add arg set regression test for pattern matcher

* real regression

---------

Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7 test_ops test cmp with special floats (#4826)
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
Szymon Ożóg
b6895dabaa Remove ssa label (#4823)
* remove ssa label

* linting
2024-06-04 16:51:05 +02:00
George Hotz
052c928d06 hotfix: touchups from presentation 2024-06-04 16:31:03 +02:00
chenyu
1e02b4cae1 default skip all exception in beam (#4822)
added a flag `BEAM_STRICT_MODE` to catch compile error or other exceptions on demand
2024-06-03 18:21:36 -04:00
chenyu
3afc914617 CMPEQ -> CMPNE and make it safe to pad (#4818)
* CMPNE

* new dataset
2024-06-03 18:02:15 -04:00
qazal
79c7d402ee improve augmented assign error message (#4813) 2024-06-03 16:57:22 -04:00
Szymon Ożóg
bb7b031c5c Bitshift (#4728)
* WIP

* Cleanup

* Cleanup

* Fix variable, refactor to use set

* right shift should be signed/unsigned

* Test for bitshifts

* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2 support view in nv/amd (#4812)
* support view in nv/amd

* fix amd

* fix

* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43 canonicalize 0 in shape in View.create (#4815)
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
Szymon Ożóg
d064bf6d8c b2 is useless (#4814) 2024-06-03 18:29:53 +02:00
nimlgen
65f0071c4b amd compute queue bind api (#4732)
* amd hcq bind api

* revert copy queue

* revert
2024-06-03 18:36:56 +03:00
chenyu
3cc6ae0d85 layernorm backward is indepedent of its mean (#4806) 2024-06-03 09:49:59 -04:00
George Hotz
2dae657415 improve readability (#4809) 2024-06-03 14:57:57 +02:00
George Hotz
eecfdd2f6e hotfix: fix dataset reading for new llm.c 2024-06-03 14:10:05 +02:00
qazal
6e0c16dfb0 cleanup render_reduceop (#4807)
* update acc key

* refactor return type

* remove return type

* run all reduces

* set acc key [run_process_replay]

* local_idxs are copied in render_reduceop [run_process_replay]
2024-06-03 14:39:02 +03:00
George Hotz
dd84f7d35e touchup: show process name in multiprocess assert 2024-06-03 13:09:40 +02:00
qazal
0db9674dea skip process replay on master (#4808) 2024-06-03 12:29:28 +03:00
qazal
f64fa51a64 process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
nimlgen
e8b5f2040d nv faster signal on dma queue (#4789) 2024-06-02 21:47:24 +03:00
Francis Lata
707099487a Multiprocessing UNet3D dataloader (#4801)
* testing dataloader

* matching dataloader implementation for unet3d

* remove comments

* clean up dataloader

* add cookie and cleanup

* use shm_path when creating SharedMemory

* add support for testing resnet and unet3d dataloaders

* update dataset test to return preprocesed data directory in prep for dataloader testing

* pass preprocessed dataset directory properly

* update loader function for dataloader

* add shuffling on indices

* update shm name

* more cleanup for unet3d dataloader

* remove changes to tests

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-02 11:30:47 -04:00
Timmy
ca32921f84 Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
qazal
231ed2c656 compute aliased buffer idxs pre reduce (#4788) 2024-06-01 16:46:52 -04:00
nimlgen
1b18ebb133 minor cleanups (#4802) 2024-06-01 20:11:43 +03:00
chenyu
1ffa5ec492 unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
nimlgen
7384ee08a0 amd cleanup sdma (#4796)
* amd cleanup sdma

* faster enqueue for sdma

* typo

* remove commnted lines

* fix overrun check

* flushhdp better command
2024-06-01 17:06:44 +03:00
qazal
240d6b5bc0 process replay benchmarks (#4668) 2024-06-01 14:36:21 +03:00
Alec Chen
b377db7f0d Refactor UOps pattern matcher to UPat instead of dicts (#4791) 2024-06-01 10:55:51 +02:00
qazal
de8c8abbd8 define indexes pre reduce (#4795) 2024-05-31 18:53:27 -04:00
nimlgen
bd2e7c8b31 amd registers from file (#4778)
* amd registers from file

* remove commentes

* linetr

* no off
2024-05-31 18:48:57 +03:00
chenyu
8942230b1f minor cleanups of test_tensor and extend some cases (#4794) 2024-05-31 10:43:22 -04:00
qazal
637f482588 configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
wozeparrot
ed0a740fe4 greater chat api endpoint compat (#4792) 2024-05-30 22:47:31 -07:00
chenyu
7cc883ecee CMPLT is safe to pad (#4790)
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb fix lazy r const folding with variable shape (#4783)
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
c4d1283049 simplify _cumsum with _first_zero=True (#4782)
handled the case with 0 in shape output of _cumsum, and _cumsum returns the correct shape with _first_zero=True
2024-05-30 13:19:33 -04:00
chenyu
4921de1945 fix cumsum of 0-d tensor (#4781)
* fix cumsum of 0-d tensor

* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f failed test case for ellipsis in einsum (#4779)
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7 Add UOps pattern matcher regression tests (#4725)
* add pattern matcher regression tests

* Remove test for dtype str after rebasing

* Make test uops match type spec

* leave const const, add const alu vin test

* correct uops

* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3 add fused tensor core opts tests (#4775)
* add fused tc opts tests

* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7 isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
nimlgen
57204c4014 amd cleanup pm4 queue (#4772) 2024-05-29 22:59:06 +03:00
lopusz
b2c408912c Add docs link to README (#4768) 2024-05-29 17:47:47 +00:00