Commit Graph

60 Commits

Author SHA1 Message Date
nimlgen
110cff3f2e fix device arg to Tensor.randn (#11194)
* fix device arg to Tensor.randn

* simpler test

* self.assertEqual
2025-07-12 13:51:59 -04:00
George Hotz
4e2c3560b4 smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
George Hotz
8c76250d31 speed up a few tests (#10692) 2025-06-07 20:39:25 -07:00
wozeparrot
0d86f8d375 fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
qazal
40f4ce3390 enable AMD CI for TestRandomness.test_multinomial [pr] (#10295) 2025-05-14 14:32:22 +03:00
chenyu
8a906cb124 Tensor.randn_like (#10276) 2025-05-13 11:53:59 -04:00
chenyu
c8f47c1d07 not_support_multi_device helper (#9831)
unify the test helper to skip ci device that does not support multi
2025-04-10 05:25:29 -04:00
George Hotz
78caf55154 Revert "FP8 support on NVIDIA (#8631)"
This reverts commit 2c8e4ea865.
2025-04-09 12:27:41 +08:00
pkotzbach
2c8e4ea865 FP8 support on NVIDIA (#8631)
* squashed fp8 commits

* tensorcore start

* minor changes

* pre-commit

* pylint

* Delete fp8mul.cu

* clean

* small bugfix

* fix test_dtype

* fix test_dtype_alu

* add EMULATE_CUDA_SM89

* fix ci

* fix test_linearizer

* fix test_linearizer

* fix swizzle

* add debug to simple_matmul

* fixed swizzle

* python emulator

* refactor python emulator

* setup fix

* numpy setup

* ml_dtypes only in emulate_cuda_sm89

* fix pylint

* fix tests

* fix mypy

* fix mypy

* fix ruff

* done python emulator

* add acc type

* tests

* mypy

* clean code

* add cuda tensor core tests to CI

* minor fix

* clean test_dtype.py

* clean cstyle.py

* clean test_ops.py

* fix test

* fix test

* whitespaces

* pylint

* pylint

* amd?

* amd?

* amd

* reduce lines

* mockgpu remove

* fix

* ruff

* ruff

* fix mypy

* ruff

* test only for cuda

* fixed formatting

* small fixes

* small fix

* least_upper_dtype if fp8s not supported

* log and reciprocal are supported for fp8s

* ops python fixes

* dtypes.fp8s use

* e4m3 + e5m2 result dtype test

* truncate linter fix

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-04-08 21:54:04 -04:00
chenyu
6427272bf6 minor update to rand [pr] (#9566) 2025-03-24 18:49:50 -04:00
chenyu
c33679c47b increase size in test_multinomial_counterexample (#9540)
should be less flaky
2025-03-21 17:46:52 -04:00
George Hotz
117b7a16ef VALIDATE_WITH_CPU [pr] (#9488)
* VALIDATE_WITH_CPU [pr]

* fix test
2025-03-18 15:15:04 +08:00
chenyu
4f8eac59ea failed test case for threefry (#9469)
* failed test case for threefry

not sure if it's always like this, but increment before _threefry_random_bits is incorrect. the counts should start with random numbers generated so far.

use jax to generate 20 + 20 + 10 random numbers, the first 20 + 20 matches and the last 10 are different. just moving increment after _threefry_random_bits matches the number but jit test failes

* workaround

* why is this different?

* revert those

* and that
2025-03-17 14:52:10 -04:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
nimlgen
9bc317d5d2 mockcuda (#8503)
* init mockcuda

* run gpu ocelot

* fix

* sfixes

* disable broken tests

* linter

* these fails as well

* pylint

* myypy

* this fails on real platforms as well

* mypy please
2025-01-05 01:23:57 +03:00
George Hotz
4f1f823021 add tiny test for randomness + remove ulong buffers (#7648)
* add tiny test for randomness

* Tensor._device_seeds is a Tuple

* no tuple, just a 2 element tensor

* no more longs

* fix tests, and maybe ocelot works now

* NV still doesn't work. cleanup rules

* test + two more rules
2024-11-12 12:45:52 +08:00
chenyu
c06a5a9c72 Tensor.linspace raises for dtype.bool (#7649)
also fixed an assert when passing str dtype to randint
2024-11-11 23:05:14 -05:00
George Hotz
205befa788 move is_dtype_supported to device [pr] (#7575) 2024-11-07 20:38:03 +08:00
wozeparrot
9eb6eef441 seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
wozeparrot
2be0b26a1f rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
wozeparrot
46e360fdc0 check bfloat16 range with threefry (#6660) 2024-09-23 10:48:44 +08:00
chenyu
2de174677a threefry touchup [run_process_replay] (#6169)
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
wozeparrot
0c5189de25 threefry half (#6154) 2024-08-18 15:23:12 -07:00
chenyu
dc942bf1f6 jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
qazal
637f482588 configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu
53b9081aab check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
nimlgen
9b02aef45a remove rhip (#4579)
* remove rhip

* remove hip runner
2024-05-14 17:58:19 +03:00
nimlgen
2131556c2c amd mockgpu (#4535)
* start mock amd gpu

* virt files

* cleaner

* init ci

* small fixes

* linter

* better?

* ugh

* linter

* fix

* diable some

* run shorter

* fixes

* add hcq test

* fix

* fix cmd revert
2024-05-14 14:28:04 +03:00
qazal
23445db2b9 no skipped tests in RHIP (#4337)
* delete skip

* delete split skip

* remu dev

* compiler fails here

* Revert "remu dev"

This reverts commit 28b933d4eb.
2024-04-28 12:23:05 -04:00
chenyu
b43e470f80 always use f32 for rand source of randn (#3998)
* always use f32 for source of randn

fixed bfloat16 randn to not have inf.
don't really care about float64. threefry is float32 based too

* HSA is broken
2024-03-29 17:04:34 -04:00
chenyu
6b6461122e test case Tensor.randn should be finite (#3994)
* test case Tensor.randn should be finite

there's a hack to fix float16, need a generic solution that works with bf16 and threefry

* skip not supported

* bfloat16 local is wrong

* skip RHIP
2024-03-29 14:51:02 -04:00
wozeparrot
a0ab755317 threefry again (#3785)
* feat: initial xor

* feat: initial threefly

* feat: remove custom random

* fix: really need to install precommit

* feat: lmao forgot that this is rotate not a shift

* clean: put that there

* feat: numpy xor

* feat: quick test for xor

* feat: llvm xor

* feat: slightly working xor in torch

* feat: rand works in jit

* clean: save a line

* feat: match jax

* feat: maybe test against jax

* feat: requires_grad

* fix: fix test_symbolic_ops

* feat: lower alpha

* feat: just pad

* fix: maybe fix training tests?

* fix: fix some llvm stuff

* feat: cursed realize on the way out

* feat: testing jax

* fix: why is the jax install process not simple

* fix: maybe passing test

* fix: symbolic workarounds

* clean: still need that precommit

* fix: aaaa

* fix: more test fixes

* fix: quick fix for wgsl

* feat: need to set requires_grad on the final tensor

* feat: one more tensor

* feat: don't take forever

* feat: seeing y ci is brok

* feat: can't allocate 64GiB lmao

* fix: fix this

* feat: hope this doesn't break smth before i go to bed

* feat: don't destroy ram

* feat: int

* feat: remove jax

* feat: properish workaround?

* feat: skip slow webgpu tests

* feat: no longer fails

* feat: use dtypes

* feat: real number

* fix: torch

* fix: don't test against reference for torch

* feat: to device

* feat: fix advanced indexing

* feat: correct casting

* feat: even rng_counter

* feat: match master

* feat: this was actually bad

* fix: maybe?

* feat: store

* feat: remove realizes

* feat: somehow this is important

* feat: somehow this is also important

* feat: save a line

* fix: don't need that anymore

* feat: restore this

* fix: linter

* feat: remove realizes

* fix: realized is in base now

* fix: add back cast

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: :(

* fix: :(

* fix: not being dumb

* feat: try changing less tests

* feat: shouldn't have to change that

* feat: contiguous bumps it by one

* fix: hmm

* fix: numpy memory moment

* fix: cl_khr_fp16

* fix: torch has different tensor count

* fix: missing contiguous

* hmm: hmm

* fix: some fixes

* fix: typing

* feat: dont do that

* feat: typing fixes

* feat: why is this realize required?

* feat: ngl kinda odd typing

* feat: oh

* feat: remove realizes

* feat: why is this realize required?

* fix: hacky patch for cudacpu

* fix: without this realize pytest crashes?????

* fix: shorter line

* fix: cudacpu fixes

* fix: cudacpu fixes

* feat: real buffer

* feat: don't search when searching lmao

* fix: can't use contiguous things

* fix: no more 100GB arrays

* fix: revert

* fix: skip 7 and 10

* feat: working ish beam

* feat: minimize changes

* feat: seed 0 stable diffusion example changed

* fix: different on ci

* fix: no beam

* feat: make threefry optional

* fix: check value

* fix: unused import

* feat: threefry default

* fix: 5d

* feat: allow non upcast div

* fix: 5d better

* fix: 5d better

* fix: save all dtype

* feat: proper error

* feat: lazyop key

* fix: check float

* feat: try removing this realize now

* feat: disable threefry for uops hip tensor cores

* feat: don't need that

* feat: only check upcast

* fix: disable threefry for some metal tests

* feat: disable for metal tensor uops as well

* feat: disable for most uops

* fix: disable threefry for new uops tests

* feat: multitensor

* fix: typing

* feat: threefry default off

* feat: skip threefry half rand

* feat: restore old

* fix: bad git

* clean: ruff

* feat: bfloat16 fix

* fix: :|

* feat: restore old

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-18 16:47:07 -04:00
George Hotz
311cf2b7d3 Revert "threefry_2x32 (#2601)" (#3784)
This reverts commit db3de54bc4.
2024-03-17 10:27:20 -07:00
wozeparrot
db3de54bc4 threefry_2x32 (#2601)
* feat: initial xor

* feat: initial threefly

* feat: remove custom random

* fix: really need to install precommit

* feat: lmao forgot that this is rotate not a shift

* clean: put that there

* feat: numpy xor

* feat: quick test for xor

* feat: llvm xor

* feat: slightly working xor in torch

* feat: rand works in jit

* clean: save a line

* feat: match jax

* feat: maybe test against jax

* feat: requires_grad

* fix: fix test_symbolic_ops

* feat: lower alpha

* feat: just pad

* fix: maybe fix training tests?

* fix: fix some llvm stuff

* feat: cursed realize on the way out

* feat: testing jax

* fix: why is the jax install process not simple

* fix: maybe passing test

* fix: symbolic workarounds

* clean: still need that precommit

* fix: aaaa

* fix: more test fixes

* fix: quick fix for wgsl

* feat: need to set requires_grad on the final tensor

* feat: one more tensor

* feat: don't take forever

* feat: seeing y ci is brok

* feat: can't allocate 64GiB lmao

* fix: fix this

* feat: hope this doesn't break smth before i go to bed

* feat: don't destroy ram

* feat: int

* feat: remove jax

* feat: properish workaround?

* feat: skip slow webgpu tests

* feat: no longer fails

* feat: use dtypes

* feat: real number

* fix: torch

* fix: don't test against reference for torch

* feat: to device

* feat: fix advanced indexing

* feat: correct casting

* feat: even rng_counter

* feat: match master

* feat: this was actually bad

* fix: maybe?

* feat: store

* feat: remove realizes

* feat: somehow this is important

* feat: somehow this is also important

* feat: save a line

* fix: don't need that anymore

* feat: restore this

* fix: linter

* feat: remove realizes

* fix: realized is in base now

* fix: add back cast

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: bump deadline

* fix: :(

* fix: :(

* fix: not being dumb

* feat: try changing less tests

* feat: shouldn't have to change that

* feat: contiguous bumps it by one

* fix: hmm

* fix: numpy memory moment

* fix: cl_khr_fp16

* fix: torch has different tensor count

* fix: missing contiguous

* hmm: hmm

* fix: some fixes

* fix: typing

* feat: dont do that

* feat: typing fixes

* feat: why is this realize required?

* feat: ngl kinda odd typing

* feat: oh

* feat: remove realizes

* feat: why is this realize required?

* fix: hacky patch for cudacpu

* fix: without this realize pytest crashes?????

* fix: shorter line

* fix: cudacpu fixes

* fix: cudacpu fixes

* feat: real buffer

* feat: don't search when searching lmao

* fix: can't use contiguous things

* fix: no more 100GB arrays

* fix: revert

* fix: skip 7 and 10

* feat: working ish beam

* feat: minimize changes

* feat: seed 0 stable diffusion example changed

* fix: different on ci

* fix: no beam

* feat: make threefry optional

* fix: check value

* fix: unused import

* feat: threefry default

* fix: 5d

* feat: allow non upcast div

* fix: 5d better

* fix: 5d better

* fix: save all dtype

* feat: proper error

* feat: lazyop key

* fix: check float

* feat: try removing this realize now

* feat: disable threefry for uops hip tensor cores

* feat: don't need that

* feat: only check upcast

* fix: disable threefry for some metal tests

* feat: disable for metal tensor uops as well

* feat: disable for most uops

* fix: disable threefry for new uops tests

* feat: multitensor

* fix: typing

* feat: threefry default off

* feat: skip threefry half rand

* feat: restore old

* fix: bad git

* clean: ruff

* feat: bfloat16 fix

* fix: :|

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-03-17 10:19:33 -07:00
chenyu
8ea53951c1 bfloat16 Tensor.rand (#3764)
* Tensor.rand for bfloat16

for numpy based random, generate one for float then cast for bfloat16.

close #3653

* remove realize
2024-03-15 15:05:13 -04:00
Skosh
e8c350fdac fix: make Tensor.rand produce correct values for float16 (#3654)
* fix: make Tensor.rand produce correct values for float16

Due to precision loss when casting to float16, the data distribution created by custom_random isnt correctly in the interval ]0, 1[, but instead in the interval ]0, 1], which causes the Tensor.randn to incorrectly generate values of infinity.

The solution uses a scaling value to make sure the values stay under 1, when using half precision.

Closes #3611

* update implementation to truncate to closest f16 value to 1

* chore: fix whitespace

* test larger distribution

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-03-10 18:48:00 -04:00
andresgit
28ba1c5406 fix Tensor.randint ignoring kwargs (#3350)
* fix Tensor.randint ignoring kwargs

* randint kwargs fix
2024-02-09 17:12:16 +01:00
chenyu
ae112c9dbe fix some long lines in tests (#3006)
* fix some long lines in tests

* better
2024-01-03 23:53:33 -05:00
George Hotz
a280cfe169 move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
Will
016aebcd84 Fixed Tensor.randint() not accepting tuple shapes (#2923)
* ww/Fixed Tensor.randint() to accept shape tuples ()

* ww/Wrote a test to cover this typo

* ww/Updated Tensor random objects to optionally take (,) or *() to be more consistent

* ww/no lint no worries

* ww/Made peace with linter

* ww/Added new line can't reduce line size without reducing readablitity

* ww/reverted to using .mul
2023-12-24 20:32:26 -05:00
chenyu
9c32474a1f Revert "Revert "Tensor.randint is Tensor.uniform with dtypes.int32 (#2801)" (#2802)" (#2814)
This reverts commit fa84998244.
2023-12-17 12:14:17 -05:00
chenyu
fa84998244 Revert "Tensor.randint is Tensor.uniform with dtypes.int32 (#2801)" (#2802)
This reverts commit 86c2f267d4.
2023-12-16 15:53:28 -05:00
chenyu
86c2f267d4 Tensor.randint is Tensor.uniform with dtypes.int32 (#2801) 2023-12-16 15:14:50 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
chenyu
1ac958a058 update pytest marks and CI test filters (#2587)
* remove pytest marks

* test more stuff

* fine revert some

* add that mark back

* skip that

* hmm LLVM does not work on ubuntu

* too slow on CUDA CI

* dup test
2023-12-03 15:20:44 -05:00
chenyu
03968622a2 Pretty multinomial (#2365)
* pretty multinomial

p, cdf_normalized -> weight, cdf
symmetric unsqueeze / squeeze
check num_sample > 0

TODO: how do we want to handle 0/0 in general?

* no 0-dim input

* single sum
2023-11-19 15:10:10 -05:00
Marcello Fuschi
b8d460d203 Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
chenyu
9215bccb41 Tensor.uniform set default to standard uniform (#2158)
* Tensor.uniform set default to standard uniform

* clean up test to reuse function
2023-10-27 16:15:30 -04:00