Commit Graph

142 Commits

Author SHA1 Message Date
Francis Lata
6a62ece474 minor cleanups 2025-02-28 15:43:11 +00:00
Francis Lata
074e9f742b more typing fixes 2025-02-28 15:42:11 +00:00
Francis Lata
2c3417dfce Merge branch 'master' into retinanet_mlperf 2025-02-23 21:23:28 +00:00
Francis Lata
60c13c2932 update loss calculation for regresionhead and some cleanups 2025-02-23 21:22:33 +00:00
chenyu
1692087db5 _one_hot_along_dim input needs to be int (#9179)
* _one_hot_along_dim input needs to be int

indexing and onehot compare with arange, and non-int dtype is likely a bug
2025-02-20 09:00:43 -05:00
Francis Lata
fc36f09b1e no need to return loaded keys for resnet 2025-02-19 20:35:03 +00:00
Francis Lata
41378e74a6 model init, hyperparam, and data preprocessing updates 2025-02-19 18:47:06 +00:00
chenyu
975c318dbc bert use int32 for input ids (#9173)
original data was int32 for these. float might have caused precision issues
2025-02-19 08:17:27 -05:00
Francis Lata
f61b10450e Merge branch 'master' into retinanet_mlperf 2025-02-12 15:47:05 +00:00
divinity76
bec4f59ce8 workaround f16 cast ambiguity (#8935)
for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose)

error: 
```
Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                 ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                               ^

<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
            function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
            function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
    *((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
                                                                                                                              ^

4 errors detected in the compilation of "<null>".
```
2025-02-11 09:38:56 +08:00
Francis Lata
041481f910 Merge branch 'master' into retinanet_mlperf 2025-02-07 15:28:29 +00:00
chenyu
a092b6395d Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
George Hotz
80089536e5 Revert "move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720)" (#8786)
This reverts commit af0452f116.
2025-01-28 18:59:02 +09:00
mesozoic-egg
af0452f116 move llvm_bf16_cast to renderer for CLANG and LLVM [pr] (#8720)
* handle bf16 via bitcasting for CLANG and LLVM

* On LLVM, skip float16 cast

* float32 on llvm lite, float32 elsewhere

* code format

* trigger pr

* move to rewriter

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-01-28 18:16:43 +09:00
Francis Lata
2177053076 Merge branch 'master' into retinanet_mlperf 2025-01-27 08:07:19 -08:00
chenyu
49b914ee69 simpler bert acc [pr] (#8714)
logit.log_softmax().argmax(-1) is equivalent to logit.argmax(-1)
2025-01-22 10:32:19 -05:00
Francis Lata
be2e97260d fix dtype for anchor inside dataloader and fix horizontal flip transformation 2025-01-20 22:45:25 -08:00
Francis Lata
bef389dec7 realize boxcoder's encoding 2025-01-19 15:59:28 -08:00
Francis Lata
40d6752854 adjust regression loss to mask after L1 loss is calculated 2024-12-27 17:41:12 +00:00
Francis Lata
8441130822 Merge branch 'master' into retinanet_mlperf 2024-12-27 15:38:57 +00:00
Francis Lata
5755ac1f72 Fix FC layer ResNet load_from_pretrained error (#8387)
* validate that FC exists before loading pretrained weights

* add test case for ResNet pretrained model without FC layer

* remove extra newline

* rename test case

* reraise exception if not handled by check
2024-12-26 18:11:27 -05:00
Francis Lata
c1a18e13ef make training work 2024-12-23 21:48:55 +00:00
Francis Lata
46e0bcf412 minor cleanup 2024-12-23 20:22:21 +00:00
Francis Lata
03eda98a31 Merge branch 'master' into retinanet_mlperf 2024-12-23 08:30:46 -08:00
Francis Lata
239d2a7214 explicitly check value for not None (#8382) 2024-12-23 11:12:39 -05:00
Francis Lata
8defd337d8 fixes after helper refactor cleanup 2024-12-23 05:30:03 +00:00
Francis Lata
0e4fd5c573 revert helper changes 2024-12-23 05:16:13 +00:00
Francis Lata
fb689f7097 move BoxCoder to MLPerf helpers 2024-12-23 05:12:51 +00:00
Francis Lata
972127496e cleanup losses 2024-12-23 05:06:37 +00:00
Francis Lata
d57f7cc209 fix regression loss 2024-12-21 09:34:10 +00:00
Francis Lata
630267914f implement regression loss 2024-12-20 23:38:25 +00:00
Francis Lata
971d10361f revert anchors to use np 2024-12-20 16:56:50 +00:00
Francis Lata
759e1d6cbc make anchors use Tensors 2024-12-19 22:20:03 +00:00
Francis Lata
9b9e5871ed remove sigmoid when computing loss 2024-12-18 20:07:49 +00:00
Francis Lata
17eabdd1b2 cleanups + fix dataloader tests 2024-12-13 20:38:23 +00:00
Francis Lata
43e1f33d33 make ClassificationHead loss work 2024-12-13 20:20:27 +00:00
Francis Lata
827b2114e2 update focal loss to support masking 2024-12-10 23:00:32 +00:00
Francis Lata
e5bc0c0485 start some work on classification loss 2024-12-10 16:33:16 +00:00
Francis Lata
bb6f6075cd Merge branch 'master' into retinanet_mlperf 2024-12-04 21:48:36 -05:00
Francis Lata
c3187087f7 QwQ-32B-Preview support (#7962)
* load weights with some debugging

* start running a prompt

* cleanup

* optionally permute layers and cleanup

* add validation for simple prompt

* small cleanup

* minor cleanup with formatting download links

* add a longer prompt

* add timing option

* some typings

* remove unused arg

* reset GlobalCounters

* minor cleanups
2024-12-04 21:46:37 -05:00
Francis Lata
b21a25e6fd Merge branch 'master' into retinanet_mlperf 2024-12-01 07:37:02 -05:00
chenyu
336a9b6bf3 remove dtype from llama precompute_freqs_cis (#7930)
do the cast based on input in first forward call instead
2024-11-27 22:28:40 -05:00
chenyu
3b26e51fce Tensor.cummax (#7854)
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
Francis Lata
99efa2cfde Merge branch 'master' into retinanet_mlperf 2024-11-18 04:42:57 -08:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
Francis Lata
0aad640465 Merge branch 'master' into retinanet_mlperf 2024-11-12 02:45:23 -08:00
Ahmed Harmouche
9c63c3d8ab These casts should only happen if these are supported (#7644) 2024-11-12 07:56:50 +08:00
Francis Lata
bb6f27d2f3 Merge branch 'master' into retinanet_mlperf 2024-11-04 19:19:22 -08:00
chenyu
fb694a63eb Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
eliotgolding
e920f1d663 Llama 3.2 1B load from GGUF (#7295)
* gguf 1b-instruct

* not needed
2024-10-27 09:29:02 +08:00