Commit Graph

79 Commits

Author SHA1 Message Date
chenyu
22fc0a2e36 bert sum acc in half (#9412)
also BS=96
2025-03-11 23:03:15 -04:00
George Hotz
2780e2027e devectorize prereqs [pr] (#9404) 2025-03-11 12:33:29 +08:00
chenyu
3ae66e59a3 least_upper_float is at least default_float (#9303)
* least_upper_float is at least default_float

en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too.

* fix bert acc
2025-02-28 10:41:56 -05:00
George Hotz
fc32ff80d6 torch and numpy dtype interop [pr] (#9224)
* torch and numpy dtype interop [pr]

* less lines

* order
2025-02-24 18:26:49 +08:00
b1tg
1f1362fd27 add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
George Hotz
3e082d4a9d add float4 support to LLVM (#8920)
* add float4 support to LLVM

* is_bool
2025-02-06 12:15:50 +08:00
George Hotz
c1c5227acb preserve size in dtype ptr [pr] (#8898) 2025-02-05 14:38:57 +08:00
chenyu
3f46425f1e typos found by gemini [pr] (#8400)
not very effective... maybe due to tokenizer
2024-12-24 22:32:25 -05:00
chenyu
b7397c1322 more typing cleanups [pr] (#8376)
List, Tuple, DefaultDict
2024-12-22 05:21:03 -05:00
George Hotz
9c77e9f9b7 replace Tuple with tuple [pr] (#8344)
* replace Tuple with tuple [pr]

* replace List with list [pr]

* replace Dict with dict [pr]

* replace Set with set [pr]
2024-12-19 21:27:56 -08:00
George Hotz
6608ba316d add size of the buffer to the ptr dtype (#8322) 2024-12-18 12:46:35 -08:00
JaSpa99
3c5d5f9414 mypy==1.13.0 (#7990)
* explicit instantiation and narrowing asserts

* explicit cast

* bump

* one line assert

* handle case for no copy_queue_t

* Revert "handle case for no copy_queue_t"

This reverts commit 38347806ca.

* more readable control flow

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-06 12:09:14 +08:00
chenyu
87594a8153 simpler dtypes.max for int [pr] (#8058) 2024-12-05 10:31:41 -05:00
JaSpa99
38f34ca0cb prepare mypy==1.13.0: legacy cast (#7866)
* use helper to narrow literal type

* narrow with asserts instead of cast

* remove parantheses

* tensor.item() calls tensor.data()

* no copy

* proper indexing
2024-11-27 10:33:35 -05:00
chenyu
40d7535eeb clean up DTYPES_DICT [pr] (#7845) 2024-11-22 10:01:34 -05:00
chenyu
d800a79112 use "signed char" for int8 (#7796)
* use "signed char" for int8

"char" might be unisgned depends on platform.
fixed `python -m pytest test/test_ops.py::TestOpsUint8::test_interpolate_bilinear` on arm64 linux

* opencl does not have "signed char"
2024-11-19 19:29:54 -05:00
chenyu
397a2e6eb6 no special case for int32 in truncate [pr] (#7657)
this masked an issue that idx is not data, and should never need truncate
2024-11-12 14:52:14 -05:00
George Hotz
d8691a4f03 lil touchups (#7597) 2024-11-08 22:31:43 +08:00
George Hotz
d87adccb6c fast scalar (#7545)
* fast scalar set on dtype

* prevent loop

* lru_cache those
2024-11-05 14:08:08 +08:00
George Hotz
d419364b66 faster dtype compare [pr] (#7542)
* faster dtype compare [pr]

* simpler reduce and bring name back

* preserve pr

* lines

* now pr will pass

* use fields in vec

* remove that assert
2024-11-05 13:09:48 +08:00
George Hotz
d39f21da8f scalar image is image [pr] (#7398)
* scalar image is image [pr]

* base property
2024-10-30 18:51:47 +08:00
George Hotz
76a41a1083 don't compare with pointer dtype (#7394)
* don't compare with pointer dtype

* more cleanup

* images are pointers

* handle IMAGE better

* cleaner test_image

* this work

* pr match

* cleanup
2024-10-30 17:48:27 +08:00
George Hotz
27995a2a04 vcount + cleanups (#7393)
* Revert "Revert "Restore vcount [pr] (#7390)" (#7392)"

This reverts commit 4ca53db604.

* ugh bugfix [pr]

* uops_to_dtypes function

* fixups

* varnames

* fix mypy

* just 4,8

* tests
2024-10-30 12:50:15 +08:00
George Hotz
4ca53db604 Revert "Restore vcount [pr] (#7390)" (#7392)
This reverts commit 1058f9c9ff.
2024-10-30 11:40:25 +08:00
George Hotz
1058f9c9ff Restore vcount [pr] (#7390)
* Revert "Revert "add vcount to PtrDtype (#7388)""

This reverts commit 399a5219dd.

* Revert "Revert "add tests to vcount stuff [pr] (#7389)""

This reverts commit cc8d6dbdf3.

* no ptr
2024-10-30 11:27:55 +08:00
George Hotz
399a5219dd Revert "add vcount to PtrDtype (#7388)"
This reverts commit b086584d64.
2024-10-30 10:56:52 +08:00
George Hotz
cc8d6dbdf3 Revert "add tests to vcount stuff [pr] (#7389)"
This reverts commit 1b7084899b.
2024-10-30 10:56:49 +08:00
George Hotz
1b7084899b add tests to vcount stuff [pr] (#7389) 2024-10-30 10:54:54 +08:00
George Hotz
b086584d64 add vcount to PtrDtype (#7388) 2024-10-30 10:43:54 +08:00
chenyu
6bf38c35e5 clean up transcendental frexp [pr] (#7384)
also added some unit tests for frexp
2024-10-29 18:51:37 -04:00
George Hotz
4cb236a495 index in cstyle (#7328)
* index only in cstyle

* fix prefix dtypes

* fix tests

* global indexing

* Revert "global indexing"

This reverts commit 4d507e8abb.

* fix image

* fix image

* ptx tests

* fix CUDA dtype rendering
2024-10-29 13:06:26 +08:00
chenyu
f511ad9103 No pyint again (#7156)
* Revert "bring back pyint (#7150)"

This reverts commit 37e83ca6fc.

* remove truncate in const folding

* truncate_output=False
2024-10-19 13:48:59 -04:00
chenyu
37e83ca6fc bring back pyint (#7150)
fixed test_failure_52 and resnet. need to understand this better
2024-10-18 14:54:37 -04:00
George Hotz
b0a13896d7 PtrDType is dataclass [pr] (#7125)
* PtrDType is dataclass [pr]

* new dataset

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-18 09:40:33 -04:00
George Hotz
ded1b38b84 minor dtype cleanup [pr] (#7124)
* minor dtype cleanup [pr]

* use ptr() function
2024-10-17 17:41:23 +08:00
George Hotz
cd61e81f55 beautiful mnist works on windows (#7100)
* beautiful mnist works on windows [pr]

* add comment for that (no pr)
2024-10-16 23:00:05 +08:00
chenyu
bd8ecf7fd6 remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu
23faeacb23 remove outdated comments (#7018) 2024-10-12 10:51:07 -04:00
George Hotz
85a45164fb remove pyint [pr] (#7016)
* remove pyint

* bump time on tp [pr]

* dont truncate in const fold

* remove dead code

* Revert "dont truncate in const fold"

This reverts commit 29c81db0f7.

* remove define_var
2024-10-12 22:36:24 +08:00
qazal
29363fb85e add dtype.ptr() [pr] (#6839) 2024-10-02 15:03:05 +08:00
George Hotz
7fca0bc912 use pattern matcher for image [run_process_replay] (#6762)
* use pattern matcher for image [run_process_replay]

* try again

* this
2024-09-26 15:49:09 +08:00
George Hotz
b199b699ed use shl everywhere (#6744)
* use shl everywhere

* fix parens

* late patterns

* works as an extra pass

* ptx
2024-09-26 09:59:36 +08:00
George Hotz
cb22ef379a truncate consts early (#6741)
* truncate consts early

* ptx still fails

* Update dtype.py
2024-09-25 16:49:51 +08:00
George Hotz
e945fa9c5c put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]

* those are local too
2024-09-23 10:29:17 +08:00
qazal
607113fcdf fix vectorized dtype repr [run_process_replay] (#6535) 2024-09-16 13:42:55 +08:00
Tim Becker
7c078191ce Misc rewrite perf improvements (#6500)
* Make UOp a normal class and use __slots__

* Use __slots__ in UPat

* Cache dtypes.{min,max}

* Use faster iterables in ops.py

* extend is a lot faster than nested listcomp

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>

---------

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>
2024-09-13 11:31:50 +08:00
George Hotz
327eb12600 folding for vectorized consts [run_process_replay] (#6498)
* folding for vectorized consts [run_process_replay]

* remove that if statement

* inf loop
2024-09-12 17:29:37 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
George Hotz
1b4d1823b7 add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
78148e16d8 init changes from the dtypes_void branch [run_process_replay] (#6475) 2024-09-11 16:34:50 +08:00