* subbuffer support
* diskbuffer offset
* cuda subbuffer works
* use subbuffer
* more subbuffer tests
* consecutive
* cast
* consec
* offset
* view is a better name
* offset is in nbytes
* fix view + memory planner
* delete unused DiskRunner
* reverse order
* no subbuffers on unrealized consts
* only enabled for disk
* don't reverse memory
* view supported devices
* pickle buffer view
* ring jit
* support extra view inputs in jit
* fix JIT=2 issue
* test copy jit
* p2p isn't an option anymore
* fix dep tracking issue
* fix mypy
* fix pickle
* from_nv is contents now
* handle reshape with remainder in _reshape_mask
* remove trailing whitespce
* use helper_test_op to generate tensors from shapes
* test in shapetracket too
* remove whitespace
* revert property name in other class tests
* Shape changing bitcast
* only support it on disk
* basic test
* more tests
* RuntimeError instead of assert
* create unique temp files
* move tests that use disk to test_disk_tensor
* linter
* remove assert on error messages
* that's RuntimeError now
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* Factoring bug
* Another one in case
* It works now so change tests back
* large arange cumsum optimization
* More cleanup
* symbolic no factor div test
* name change
* Rename test
---------
Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>
* feat: initial xor
* feat: initial threefly
* feat: remove custom random
* fix: really need to install precommit
* feat: lmao forgot that this is rotate not a shift
* clean: put that there
* feat: numpy xor
* feat: quick test for xor
* feat: llvm xor
* feat: slightly working xor in torch
* feat: rand works in jit
* clean: save a line
* feat: match jax
* feat: maybe test against jax
* feat: requires_grad
* fix: fix test_symbolic_ops
* feat: lower alpha
* feat: just pad
* fix: maybe fix training tests?
* fix: fix some llvm stuff
* feat: cursed realize on the way out
* feat: testing jax
* fix: why is the jax install process not simple
* fix: maybe passing test
* fix: symbolic workarounds
* clean: still need that precommit
* fix: aaaa
* fix: more test fixes
* fix: quick fix for wgsl
* feat: need to set requires_grad on the final tensor
* feat: one more tensor
* feat: don't take forever
* feat: seeing y ci is brok
* feat: can't allocate 64GiB lmao
* fix: fix this
* feat: hope this doesn't break smth before i go to bed
* feat: don't destroy ram
* feat: int
* feat: remove jax
* feat: properish workaround?
* feat: skip slow webgpu tests
* feat: no longer fails
* feat: use dtypes
* feat: real number
* fix: torch
* fix: don't test against reference for torch
* feat: to device
* feat: fix advanced indexing
* feat: correct casting
* feat: even rng_counter
* feat: match master
* feat: this was actually bad
* fix: maybe?
* feat: store
* feat: remove realizes
* feat: somehow this is important
* feat: somehow this is also important
* feat: save a line
* fix: don't need that anymore
* feat: restore this
* fix: linter
* feat: remove realizes
* fix: realized is in base now
* fix: add back cast
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: :(
* fix: :(
* fix: not being dumb
* feat: try changing less tests
* feat: shouldn't have to change that
* feat: contiguous bumps it by one
* fix: hmm
* fix: numpy memory moment
* fix: cl_khr_fp16
* fix: torch has different tensor count
* fix: missing contiguous
* hmm: hmm
* fix: some fixes
* fix: typing
* feat: dont do that
* feat: typing fixes
* feat: why is this realize required?
* feat: ngl kinda odd typing
* feat: oh
* feat: remove realizes
* feat: why is this realize required?
* fix: hacky patch for cudacpu
* fix: without this realize pytest crashes?????
* fix: shorter line
* fix: cudacpu fixes
* fix: cudacpu fixes
* feat: real buffer
* feat: don't search when searching lmao
* fix: can't use contiguous things
* fix: no more 100GB arrays
* fix: revert
* fix: skip 7 and 10
* feat: working ish beam
* feat: minimize changes
* feat: seed 0 stable diffusion example changed
* fix: different on ci
* fix: no beam
* feat: make threefry optional
* fix: check value
* fix: unused import
* feat: threefry default
* fix: 5d
* feat: allow non upcast div
* fix: 5d better
* fix: 5d better
* fix: save all dtype
* feat: proper error
* feat: lazyop key
* fix: check float
* feat: try removing this realize now
* feat: disable threefry for uops hip tensor cores
* feat: don't need that
* feat: only check upcast
* fix: disable threefry for some metal tests
* feat: disable for metal tensor uops as well
* feat: disable for most uops
* fix: disable threefry for new uops tests
* feat: multitensor
* fix: typing
* feat: threefry default off
* feat: skip threefry half rand
* feat: restore old
* fix: bad git
* clean: ruff
* feat: bfloat16 fix
* fix: :|
* feat: restore old
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* feat: initial xor
* feat: initial threefly
* feat: remove custom random
* fix: really need to install precommit
* feat: lmao forgot that this is rotate not a shift
* clean: put that there
* feat: numpy xor
* feat: quick test for xor
* feat: llvm xor
* feat: slightly working xor in torch
* feat: rand works in jit
* clean: save a line
* feat: match jax
* feat: maybe test against jax
* feat: requires_grad
* fix: fix test_symbolic_ops
* feat: lower alpha
* feat: just pad
* fix: maybe fix training tests?
* fix: fix some llvm stuff
* feat: cursed realize on the way out
* feat: testing jax
* fix: why is the jax install process not simple
* fix: maybe passing test
* fix: symbolic workarounds
* clean: still need that precommit
* fix: aaaa
* fix: more test fixes
* fix: quick fix for wgsl
* feat: need to set requires_grad on the final tensor
* feat: one more tensor
* feat: don't take forever
* feat: seeing y ci is brok
* feat: can't allocate 64GiB lmao
* fix: fix this
* feat: hope this doesn't break smth before i go to bed
* feat: don't destroy ram
* feat: int
* feat: remove jax
* feat: properish workaround?
* feat: skip slow webgpu tests
* feat: no longer fails
* feat: use dtypes
* feat: real number
* fix: torch
* fix: don't test against reference for torch
* feat: to device
* feat: fix advanced indexing
* feat: correct casting
* feat: even rng_counter
* feat: match master
* feat: this was actually bad
* fix: maybe?
* feat: store
* feat: remove realizes
* feat: somehow this is important
* feat: somehow this is also important
* feat: save a line
* fix: don't need that anymore
* feat: restore this
* fix: linter
* feat: remove realizes
* fix: realized is in base now
* fix: add back cast
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: bump deadline
* fix: :(
* fix: :(
* fix: not being dumb
* feat: try changing less tests
* feat: shouldn't have to change that
* feat: contiguous bumps it by one
* fix: hmm
* fix: numpy memory moment
* fix: cl_khr_fp16
* fix: torch has different tensor count
* fix: missing contiguous
* hmm: hmm
* fix: some fixes
* fix: typing
* feat: dont do that
* feat: typing fixes
* feat: why is this realize required?
* feat: ngl kinda odd typing
* feat: oh
* feat: remove realizes
* feat: why is this realize required?
* fix: hacky patch for cudacpu
* fix: without this realize pytest crashes?????
* fix: shorter line
* fix: cudacpu fixes
* fix: cudacpu fixes
* feat: real buffer
* feat: don't search when searching lmao
* fix: can't use contiguous things
* fix: no more 100GB arrays
* fix: revert
* fix: skip 7 and 10
* feat: working ish beam
* feat: minimize changes
* feat: seed 0 stable diffusion example changed
* fix: different on ci
* fix: no beam
* feat: make threefry optional
* fix: check value
* fix: unused import
* feat: threefry default
* fix: 5d
* feat: allow non upcast div
* fix: 5d better
* fix: 5d better
* fix: save all dtype
* feat: proper error
* feat: lazyop key
* fix: check float
* feat: try removing this realize now
* feat: disable threefry for uops hip tensor cores
* feat: don't need that
* feat: only check upcast
* fix: disable threefry for some metal tests
* feat: disable for metal tensor uops as well
* feat: disable for most uops
* fix: disable threefry for new uops tests
* feat: multitensor
* fix: typing
* feat: threefry default off
* feat: skip threefry half rand
* feat: restore old
* fix: bad git
* clean: ruff
* feat: bfloat16 fix
* fix: :|
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* explicitly create_lt_node when used in shapetracker
leave regular __lt__ and cmps for symbolic shape cmp
* hmm it fixed that?
* LtNode.substitute uses create_lt_node
Some devices create cache table names with non-alphanumerical characters, e.g. "compile_hip_gfx1010:xnack-_12".
This commit escapes the table name in single quotes s.t. sqlite works (see https://github.com/tinygrad/tinygrad/issues/3538).
* fix LtNode simplification when lhs and rhs contain same variables
`(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)`
* fix with less perf impact
it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling.
It's preferred because it does not change the __new__ behavior.
cast implicitly resets shapetracker and makes it contiguous (for disk tensor), which fails for Interpreted backend if inputs contain non-contiguous st.