chenyu
22fc0a2e36
bert sum acc in half ( #9412 )
...
also BS=96
2025-03-11 23:03:15 -04:00
George Hotz
2780e2027e
devectorize prereqs [pr] ( #9404 )
2025-03-11 12:33:29 +08:00
chenyu
3ae66e59a3
least_upper_float is at least default_float ( #9303 )
...
* least_upper_float is at least default_float
en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.
* fix bert acc
2025-02-28 10:41:56 -05:00
George Hotz
fc32ff80d6
torch and numpy dtype interop [pr] ( #9224 )
...
* torch and numpy dtype interop [pr]
* less lines
* order
2025-02-24 18:26:49 +08:00
b1tg
1f1362fd27
add truncate_bf16 ( #9078 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-02-15 07:59:09 +08:00
George Hotz
3e082d4a9d
add float4 support to LLVM ( #8920 )
...
* add float4 support to LLVM
* is_bool
2025-02-06 12:15:50 +08:00
George Hotz
c1c5227acb
preserve size in dtype ptr [pr] ( #8898 )
2025-02-05 14:38:57 +08:00
chenyu
3f46425f1e
typos found by gemini [pr] ( #8400 )
...
not very effective... maybe due to tokenizer
2024-12-24 22:32:25 -05:00
chenyu
b7397c1322
more typing cleanups [pr] ( #8376 )
...
List, Tuple, DefaultDict
2024-12-22 05:21:03 -05:00
George Hotz
9c77e9f9b7
replace Tuple with tuple [pr] ( #8344 )
...
* replace Tuple with tuple [pr]
* replace List with list [pr]
* replace Dict with dict [pr]
* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
6608ba316d
add size of the buffer to the ptr dtype ( #8322 )
2024-12-18 12:46:35 -08:00
JaSpa99
3c5d5f9414
mypy==1.13.0 ( #7990 )
...
* explicit instantiation and narrowing asserts
* explicit cast
* bump
* one line assert
* handle case for no copy_queue_t
* Revert "handle case for no copy_queue_t"
This reverts commit 38347806ca .
* more readable control flow
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-06 12:09:14 +08:00
chenyu
87594a8153
simpler dtypes.max for int [pr] ( #8058 )
2024-12-05 10:31:41 -05:00
JaSpa99
38f34ca0cb
prepare mypy==1.13.0: legacy cast ( #7866 )
...
* use helper to narrow literal type
* narrow with asserts instead of cast
* remove parantheses
* tensor.item() calls tensor.data()
* no copy
* proper indexing
2024-11-27 10:33:35 -05:00
chenyu
40d7535eeb
clean up DTYPES_DICT [pr] ( #7845 )
2024-11-22 10:01:34 -05:00
chenyu
d800a79112
use "signed char" for int8 ( #7796 )
...
* use "signed char" for int8
"char" might be unisgned depends on platform.
fixed `python -m pytest test/test_ops.py::TestOpsUint8::test_interpolate_bilinear` on arm64 linux
* opencl does not have "signed char"
2024-11-19 19:29:54 -05:00
chenyu
397a2e6eb6
no special case for int32 in truncate [pr] ( #7657 )
...
this masked an issue that idx is not data, and should never need truncate
2024-11-12 14:52:14 -05:00
George Hotz
d8691a4f03
lil touchups ( #7597 )
2024-11-08 22:31:43 +08:00
George Hotz
d87adccb6c
fast scalar ( #7545 )
...
* fast scalar set on dtype
* prevent loop
* lru_cache those
2024-11-05 14:08:08 +08:00
George Hotz
d419364b66
faster dtype compare [pr] ( #7542 )
...
* faster dtype compare [pr]
* simpler reduce and bring name back
* preserve pr
* lines
* now pr will pass
* use fields in vec
* remove that assert
2024-11-05 13:09:48 +08:00
George Hotz
d39f21da8f
scalar image is image [pr] ( #7398 )
...
* scalar image is image [pr]
* base property
2024-10-30 18:51:47 +08:00
George Hotz
76a41a1083
don't compare with pointer dtype ( #7394 )
...
* don't compare with pointer dtype
* more cleanup
* images are pointers
* handle IMAGE better
* cleaner test_image
* this work
* pr match
* cleanup
2024-10-30 17:48:27 +08:00
George Hotz
27995a2a04
vcount + cleanups ( #7393 )
...
* Revert "Revert "Restore vcount [pr] (#7390 )" (#7392 )"
This reverts commit 4ca53db604 .
* ugh bugfix [pr]
* uops_to_dtypes function
* fixups
* varnames
* fix mypy
* just 4,8
* tests
2024-10-30 12:50:15 +08:00
George Hotz
4ca53db604
Revert "Restore vcount [pr] ( #7390 )" ( #7392 )
...
This reverts commit 1058f9c9ff .
2024-10-30 11:40:25 +08:00
George Hotz
1058f9c9ff
Restore vcount [pr] ( #7390 )
...
* Revert "Revert "add vcount to PtrDtype (#7388 )""
This reverts commit 399a5219dd .
* Revert "Revert "add tests to vcount stuff [pr] (#7389 )""
This reverts commit cc8d6dbdf3 .
* no ptr
2024-10-30 11:27:55 +08:00
George Hotz
399a5219dd
Revert "add vcount to PtrDtype ( #7388 )"
...
This reverts commit b086584d64 .
2024-10-30 10:56:52 +08:00
George Hotz
cc8d6dbdf3
Revert "add tests to vcount stuff [pr] ( #7389 )"
...
This reverts commit 1b7084899b .
2024-10-30 10:56:49 +08:00
George Hotz
1b7084899b
add tests to vcount stuff [pr] ( #7389 )
2024-10-30 10:54:54 +08:00
George Hotz
b086584d64
add vcount to PtrDtype ( #7388 )
2024-10-30 10:43:54 +08:00
chenyu
6bf38c35e5
clean up transcendental frexp [pr] ( #7384 )
...
also added some unit tests for frexp
2024-10-29 18:51:37 -04:00
George Hotz
4cb236a495
index in cstyle ( #7328 )
...
* index only in cstyle
* fix prefix dtypes
* fix tests
* global indexing
* Revert "global indexing"
This reverts commit 4d507e8abb .
* fix image
* fix image
* ptx tests
* fix CUDA dtype rendering
2024-10-29 13:06:26 +08:00
chenyu
f511ad9103
No pyint again ( #7156 )
...
* Revert "bring back pyint (#7150 )"
This reverts commit 37e83ca6fc .
* remove truncate in const folding
* truncate_output=False
2024-10-19 13:48:59 -04:00
chenyu
37e83ca6fc
bring back pyint ( #7150 )
...
fixed test_failure_52 and resnet. need to understand this better
2024-10-18 14:54:37 -04:00
George Hotz
b0a13896d7
PtrDType is dataclass [pr] ( #7125 )
...
* PtrDType is dataclass [pr]
* new dataset
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-10-18 09:40:33 -04:00
George Hotz
ded1b38b84
minor dtype cleanup [pr] ( #7124 )
...
* minor dtype cleanup [pr]
* use ptr() function
2024-10-17 17:41:23 +08:00
George Hotz
cd61e81f55
beautiful mnist works on windows ( #7100 )
...
* beautiful mnist works on windows [pr]
* add comment for that (no pr)
2024-10-16 23:00:05 +08:00
chenyu
bd8ecf7fd6
remove NumNode ( #7035 )
2024-10-13 16:42:19 -04:00
chenyu
23faeacb23
remove outdated comments ( #7018 )
2024-10-12 10:51:07 -04:00
George Hotz
85a45164fb
remove pyint [pr] ( #7016 )
...
* remove pyint
* bump time on tp [pr]
* dont truncate in const fold
* remove dead code
* Revert "dont truncate in const fold"
This reverts commit 29c81db0f7 .
* remove define_var
2024-10-12 22:36:24 +08:00
qazal
29363fb85e
add dtype.ptr() [pr] ( #6839 )
2024-10-02 15:03:05 +08:00
George Hotz
7fca0bc912
use pattern matcher for image [run_process_replay] ( #6762 )
...
* use pattern matcher for image [run_process_replay]
* try again
* this
2024-09-26 15:49:09 +08:00
George Hotz
b199b699ed
use shl everywhere ( #6744 )
...
* use shl everywhere
* fix parens
* late patterns
* works as an extra pass
* ptx
2024-09-26 09:59:36 +08:00
George Hotz
cb22ef379a
truncate consts early ( #6741 )
...
* truncate consts early
* ptx still fails
* Update dtype.py
2024-09-25 16:49:51 +08:00
George Hotz
e945fa9c5c
put local on the PtrDtype [run_process_replay] ( #6656 )
...
* put local on the PtrDtype [run_process_replay]
* those are local too
2024-09-23 10:29:17 +08:00
qazal
607113fcdf
fix vectorized dtype repr [run_process_replay] ( #6535 )
2024-09-16 13:42:55 +08:00
Tim Becker
7c078191ce
Misc rewrite perf improvements ( #6500 )
...
* Make UOp a normal class and use __slots__
* Use __slots__ in UPat
* Cache dtypes.{min,max}
* Use faster iterables in ops.py
* extend is a lot faster than nested listcomp
Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com >
---------
Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com >
2024-09-13 11:31:50 +08:00
George Hotz
327eb12600
folding for vectorized consts [run_process_replay] ( #6498 )
...
* folding for vectorized consts [run_process_replay]
* remove that if statement
* inf loop
2024-09-12 17:29:37 +08:00
George Hotz
119b0ea4af
add UOps.VCONST [run_process_replay] ( #6487 )
...
* add UOps.VCONST [run_process_replay]
* VCONST folding
* simpler devectorize
* alu
* revert that type
2024-09-12 14:03:39 +08:00
George Hotz
1b4d1823b7
add pyint to DTYPES_DICT [run_process_replay] ( #6477 )
...
* add pyint to DTYPES_DICT [run_process_replay]
* also fix uop alu bug
* exclude pyint there too
* ne ne
* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
78148e16d8
init changes from the dtypes_void branch [run_process_replay] ( #6475 )
2024-09-11 16:34:50 +08:00