chenyu
3cc6ae0d85
layernorm backward is indepedent of its mean ( #4806 )
2024-06-03 09:49:59 -04:00
George Hotz
2dae657415
improve readability ( #4809 )
2024-06-03 14:57:57 +02:00
George Hotz
eecfdd2f6e
hotfix: fix dataset reading for new llm.c
2024-06-03 14:10:05 +02:00
qazal
6e0c16dfb0
cleanup render_reduceop ( #4807 )
...
* update acc key
* refactor return type
* remove return type
* run all reduces
* set acc key [run_process_replay]
* local_idxs are copied in render_reduceop [run_process_replay]
2024-06-03 14:39:02 +03:00
George Hotz
dd84f7d35e
touchup: show process name in multiprocess assert
2024-06-03 13:09:40 +02:00
qazal
0db9674dea
skip process replay on master ( #4808 )
2024-06-03 12:29:28 +03:00
qazal
f64fa51a64
process replay for test/* ( #4799 )
...
* add input to unit tests [run_process_replay]
* add setup [run_process_replay]
* run tests [run_process_replay]
* add cuda and amd [run_process_replay]
* run everything but BEAM=2 [run_process_replay]
* skip export_model [run_process_replay]
* fix amd CI
* add concurrency back
2024-06-03 12:01:58 +03:00
nimlgen
e8b5f2040d
nv faster signal on dma queue ( #4789 )
2024-06-02 21:47:24 +03:00
Francis Lata
707099487a
Multiprocessing UNet3D dataloader ( #4801 )
...
* testing dataloader
* matching dataloader implementation for unet3d
* remove comments
* clean up dataloader
* add cookie and cleanup
* use shm_path when creating SharedMemory
* add support for testing resnet and unet3d dataloaders
* update dataset test to return preprocesed data directory in prep for dataloader testing
* pass preprocessed dataset directory properly
* update loader function for dataloader
* add shuffling on indices
* update shm name
* more cleanup for unet3d dataloader
* remove changes to tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-02 11:30:47 -04:00
Timmy
ca32921f84
Multireduce PADTO Test ( #4785 )
...
* padto test
* expanded multireduce padto tests
* cuda doesnt run on ci
* moving padto_where_multireduce test to SUM so that we can check the reduce axis
* cleaning up tests some more
* add wanna_outputs
* refactor test_padto_sum_multireduce
* fix max and refactor where
* fix axis
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-06-02 13:46:53 +03:00
qazal
231ed2c656
compute aliased buffer idxs pre reduce ( #4788 )
2024-06-01 16:46:52 -04:00
nimlgen
1b18ebb133
minor cleanups ( #4802 )
2024-06-01 20:11:43 +03:00
chenyu
1ffa5ec492
unit test ShapeTracker.consecutive ( #4800 )
2024-06-01 10:10:51 -04:00
nimlgen
7384ee08a0
amd cleanup sdma ( #4796 )
...
* amd cleanup sdma
* faster enqueue for sdma
* typo
* remove commnted lines
* fix overrun check
* flushhdp better command
2024-06-01 17:06:44 +03:00
qazal
240d6b5bc0
process replay benchmarks ( #4668 )
2024-06-01 14:36:21 +03:00
Alec Chen
b377db7f0d
Refactor UOps pattern matcher to UPat instead of dicts ( #4791 )
2024-06-01 10:55:51 +02:00
qazal
de8c8abbd8
define indexes pre reduce ( #4795 )
2024-05-31 18:53:27 -04:00
nimlgen
bd2e7c8b31
amd registers from file ( #4778 )
...
* amd registers from file
* remove commentes
* linetr
* no off
2024-05-31 18:48:57 +03:00
chenyu
8942230b1f
minor cleanups of test_tensor and extend some cases ( #4794 )
2024-05-31 10:43:22 -04:00
qazal
637f482588
configure derandomizing CI tests ( #4793 )
2024-05-31 17:06:58 +03:00
wozeparrot
ed0a740fe4
greater chat api endpoint compat ( #4792 )
2024-05-30 22:47:31 -07:00
chenyu
7cc883ecee
CMPLT is safe to pad ( #4790 )
...
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb
fix lazy r const folding with variable shape ( #4783 )
...
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
c4d1283049
simplify _cumsum with _first_zero=True ( #4782 )
...
handled the case with 0 in shape output of _cumsum, and _cumsum returns the correct shape with _first_zero=True
2024-05-30 13:19:33 -04:00
chenyu
4921de1945
fix cumsum of 0-d tensor ( #4781 )
...
* fix cumsum of 0-d tensor
* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f
failed test case for ellipsis in einsum ( #4779 )
...
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7
Add UOps pattern matcher regression tests ( #4725 )
...
* add pattern matcher regression tests
* Remove test for dtype str after rebasing
* Make test uops match type spec
* leave const const, add const alu vin test
* correct uops
* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3
add fused tensor core opts tests ( #4775 )
...
* add fused tc opts tests
* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a
apply NOOPT in test_arange complexity ( #4774 )
...
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7
isolate the 134ms kernel in train_gpt2.py ( #4773 )
...
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
nimlgen
57204c4014
amd cleanup pm4 queue ( #4772 )
2024-05-29 22:59:06 +03:00
lopusz
b2c408912c
Add docs link to README ( #4768 )
2024-05-29 17:47:47 +00:00
chenyu
f2414c666f
fix train_gpt2.py ( #4771 )
...
added `with Tensor.train():`
2024-05-29 12:01:34 -04:00
chenyu
59c6472b9f
check contiguous in View.create after canonicalizing mask and offset ( #4770 )
...
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
qazal
6e5fa5fd92
map local aliases to reduceop ( #4766 )
...
* map
* ugh
* save one line
* concerning, does this pass
* Revert "concerning, does this pass"
This reverts commit 64d4664f17 .
* use local_alias
2024-05-28 21:11:25 -04:00
chenyu
7624ad3ddd
add --timing and --profile to llama3 example ( #4767 )
2024-05-28 16:24:44 -04:00
qazal
c235223c07
refactor tc_opt creation ( #4765 )
...
* move reduceop loop
* this is more mergable code
add assert
* integrate s2
2024-05-28 23:10:27 +03:00
qazal
a88aea626d
map tensor core bufs to reduceop ( #4763 )
...
* tc_opts.bufs to its only map
* lint
* iterate reduceop bufs
2024-05-28 22:07:39 +03:00
wozeparrot
6fcf220b21
feat: tag 0.9.0 ( #4762 )
v0.9.0
2024-05-28 18:44:45 +00:00
chenyu
e22cdb40f3
docs: fix mkdoc warnings and link to tensor.md ( #4760 )
2024-05-28 14:24:11 -04:00
nimlgen
872827b6ae
fix usage of args struct in hcq ( #4758 )
...
* do not allocate empty buffer in hcq
* do not take args struct from program
2024-05-28 21:10:39 +03:00
wozeparrot
b2b49cef6f
split tensor docs ( #4754 )
2024-05-28 11:03:52 -07:00
nimlgen
fe26d3fefe
nv sync before free for binded commands ( #4759 )
...
* nv sync before free for binded commands
* shorter comment
2024-05-28 20:49:29 +03:00
chenyu
e614b7c696
docs: showcase remove mnist_gan and add conversation.py ( #4757 )
...
fixed both examples, and i think it's better to show conversation
2024-05-28 11:09:26 -04:00
nimlgen
019f4680e5
check dims before execution on nv ( #4756 )
...
* check dims before execution on nv
* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4
pre multi reduce codegen/* cleanup ( #4755 )
...
* refactor self.reduceop
* free lines
* fix test
2024-05-28 08:15:48 -04:00
chenyu
fd249422f5
minor cleanup example stable_diffusion ( #4753 )
2024-05-28 00:05:37 -04:00
chenyu
53b9081aab
check arg types of Tensor.randint ( #4751 )
...
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
chenyu
16756af13c
docs: polish tensor.py ( #4750 )
...
* docs: polish tensor.py
* don't change that
2024-05-27 20:00:56 -04:00
Elias Wahl
c4b0acf095
Global norm + small changes ( #4749 )
...
* norm
* no empty
* default loss scaler in float
2024-05-27 18:35:27 -04:00