chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
chenyu
4df63da190
clean up rest of the loadop [run_process_replay] ( #5440 )
...
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
chenyu
9a187e6102
fix handcode_opt script ( #5435 )
...
* fix handcode_opt script
* run in ci
* real run in ci
* HALF=0
2024-07-12 20:52:28 -04:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
wozeparrot
d1cbd6bb95
unity handcode_resnet_opt and handcode_bert_opt ( #5418 )
2024-07-12 12:05:01 -07:00
wozeparrot
b7cc75a9df
usage summary in handcode opt ( #5414 )
2024-07-12 11:21:18 -07:00
George Hotz
8390feb7b9
optim.OptimizerGroup in hlb_cifar ( #5401 )
2024-07-11 20:14:36 -07:00
wozeparrot
c24d495ef9
metadata in handcode_opt ( #5400 )
2024-07-11 17:45:34 -07:00
George Hotz
5232e405ce
hotfix: add BS to beautiful_mnist
2024-07-11 10:55:05 -07:00
wozeparrot
c9b3ae6bbf
fix llama.py chat mode assert ( #5366 )
2024-07-10 18:06:14 -07:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
Elias Wahl
73bddc44f6
Fix fake dataloader ( #5326 )
2024-07-08 09:07:44 -04:00
chenyu
43c3f73fbc
handcode_bert_opt.py ( #5295 )
...
similar to handcode_resnet50_opt.py, one file to check bert kernels without dataset.
2024-07-05 11:01:20 -04:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
reddyn12
d3e244d8b7
prev speed improvements ( #5252 )
...
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-07-03 09:06:01 -07:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
chenyu
b9122ecdaf
revert stable diffusion validation with threefry ( #5248 )
...
* Revert "use threefry in stable diffusion benchmark (#4988 )"
This reverts commit 44dfa37c70 .
* sdxl and validation fix
* relax threshold
2024-07-01 14:43:47 -04:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
chenyu
88763eb9ff
fix stable_diffusion with fp16 ( #5239 )
2024-06-30 12:59:31 -04:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
63fa4e2a0e
fix seed = 0 in sdxl ( #5209 )
...
removed a few unneeded realize and contiguous too
2024-06-28 08:48:59 -04:00
Tobias Fischer
4688f97d48
Add SDXL Inference to Examples ( #5206 )
...
* added sdxl inference code
* fixed trailing whitespace
* use original impl code, removed uneeded numpy calls
2024-06-28 07:42:28 -04:00
chenyu
0ba093dea0
hotfix: only validate stable diffusion when using threefry ( #5166 )
2024-06-26 16:50:38 -04:00
chenyu
e4a5870b36
validate stable_diffusion output ( #5163 )
...
changed default steps, forgot to update validation
2024-06-26 16:42:21 -04:00
nimlgen
21b225ac45
llama3 download works ( #5160 )
2024-06-26 22:45:13 +03:00
wozeparrot
c91b3c4079
shard llama3 on 0 sometimes ( #5157 )
2024-06-26 11:50:57 -07:00
Elias Wahl
e267f3161d
Add MLLogger ( #5125 )
...
* add MLPerf logger
* eval steps
* start with step 1
* compliance for 3.1.0 and 4.0.0
* more compliance
* assert, comment and contiguous
2024-06-26 12:23:56 -04:00
David Hou
3604642847
Llama shard axis 0 sometimes ( #5123 )
...
* make buffer view optional with a flag [run_process_replay]
* do not view when sharding to save memory [run_process_replay]
* llama shard axis=0 sometimes
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-26 10:35:25 -04:00
chenyu
dade7677cf
validate llama3 output only with model "LLaMA-3/8B-SF-DPO" ( #5138 )
2024-06-24 20:58:25 -04:00
chenyu
055e616302
cleanup mnist data load in beautiful_mnist ( #5106 )
2024-06-22 18:31:51 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
wozeparrot
acb715c64c
fix: llama3 special tokens ( #5045 )
2024-06-18 17:08:44 -07:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
Elias Wahl
f31ef11537
Better default hparams for large BS ( #5030 )
...
* better default hparams for large BS
* bf16 too
* use tuple
2024-06-18 11:13:06 -04:00
Elias Wahl
7bfa9101c0
Float in scaled dot product attention ( #4985 )
...
* Monkeypatch scaled-dot-product-attention
* Use dot instead of matmul
* new api
* imports
* least_upper_dtype
2024-06-18 08:16:41 -04:00
chenyu
c52352bd9a
fix yolov8 example ( #5003 )
...
it was creating Tensor from a list of numpy arrays, which is not supported after moving creating from a list not using numpy.
2024-06-16 20:47:29 -04:00
chenyu
44dfa37c70
use threefry in stable diffusion benchmark ( #4988 )
...
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
wozeparrot
8209cd3c55
easier llama3 + fetch subdir ( #4938 )
2024-06-14 13:47:27 -07:00
chenyu
67e8df4969
remove numpy from dtype ( #4969 )
...
replaced all dtype.np with _to_np_dtype defined in tensor.py.
after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
wozeparrot
2a974ff257
fix: no readablestream await of, too new ( #4965 )
2024-06-14 11:22:19 -07:00