George Hotz
32141ec867
make apt CI faster ( #10702 )
2025-06-08 09:43:39 -07:00
chenyu
4f535641f7
add one huggingface_onnx test to mac benchmark ci ( #10700 )
...
this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine
2025-06-08 12:26:12 -04:00
George Hotz
32e9949052
rename lazydata to uop ( #10698 )
2025-06-08 08:42:22 -07:00
uuuvn
8e3f337075
Skip flaky test in ci ( #10696 )
...
`test_data_parallel_resnet_train_step` is already skipped on LLVM/CPU:
```python
@unittest.skipIf(CI and REAL_DEV in ("CUDA", "NV", "LLVM", "CPU"), "slow, and flaky on LLVM/CPU")
@unittest.skipIf(REAL_DEV == "WEBGPU" and not OSX, "WEBGPU Vulkan can only run kernels with up to 10 buffers")
def test_data_parallel_resnet_train_step(self):
```
It looks like `test_data_parallel_resnet` (no `_train_step`) is flaky in a similar way:
https://github.com/tinygrad/tinygrad/actions/runs/15472667248/job/43560773882?pr=10642#step:9:64
2025-06-08 08:24:09 -07:00
George Hotz
3ece2e4bb5
hotfix: remove accel from extra
2025-06-08 08:20:34 -07:00
qazal
1ad8062591
more generic naming in VIZ [pr] ( #10695 )
...
* note
* rename kernel to ctx
* rename uop things to currentStep + expandSteps
* already destructured
* some things that were called ctx are steps
* still a kernel
2025-06-08 15:37:39 +03:00
qazal
c70486908e
viz: clicking a KERNEL node can open codegen rewrite ( #10683 )
...
* work
* now it doesn't have 20% slowdown
* label like this
* closer
* ansiStrip
* remove
* better
* id is faster
* fix that
2025-06-08 13:11:03 +03:00
George Hotz
48eb7d76b1
use ALLOW_DEVICE_USAGE context variable instead of MainProcess check ( #10693 )
...
* use DISALLOW_DEVICE_OPEN context variable instead of MainProcess check
* device usage can be disallowed
2025-06-08 00:07:40 -07:00
geohotstan
dedff0e96c
fix run huggingface onnx debug ( #10679 )
2025-06-08 00:59:20 -04:00
George Hotz
8c76250d31
speed up a few tests ( #10692 )
2025-06-07 20:39:25 -07:00
chenyu
e80870e27c
BasicBlock2 -> BasicBlock [pr] ( #10691 )
2025-06-07 23:33:51 -04:00
George Hotz
7ff175c022
cache a venv to avoid pip usage ( #10689 )
...
* try built in pip caching
* try venv
* export venv
* set VIRTUAL_ENV
* revert that
* venv key
* fix
* ci cache hit?
* fix windows
2025-06-07 20:13:41 -07:00
ihar
40c1479267
added unit tests for 'argfix' ( #10678 )
2025-06-07 22:17:10 -04:00
ihar
74b849b5e1
remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' ( #10677 )
...
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'
* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'
* improved tests for 'view' op
2025-06-07 22:15:31 -04:00
chenyu
e88fe41d37
update vits vctk model to use download from huggingface ( #10688 )
...
google drive points to a warning page that does not work
2025-06-07 20:47:28 -04:00
Sieds Lykles
c29a56dd51
Fix whisper OOB ( #10685 )
...
* fix whisper and test
* remove import
2025-06-07 20:23:50 -04:00
George Hotz
53ed64e133
ci speed work 1 ( #10676 )
...
* skip a few slow tests
* use a venv for python packages
* create venv
* no user, it's in venv
* ignore venv
* venv
* new cache key
* try that
* this
* version the python cache
2025-06-07 16:33:11 -07:00
George Hotz
db01c5a08a
ramp.py file from stream ( #10686 )
2025-06-07 14:58:21 -07:00
Sieds Lykles
2f605eadf7
fix oob ( #10666 )
2025-06-07 11:32:03 -04:00
qazal
cb61774ab6
move shared viz fields out of serve.py [pr] ( #10684 )
...
* move shared viz fields out [pr]
* update javascript
* update test_viz
2025-06-07 17:18:18 +03:00
qazal
b515d796fb
inline viz get_name [pr] ( #10682 )
...
* inline viz get_name [pr]
* changing name_fxn makes this simpler
* waitUntil dom
2025-06-07 11:16:16 +03:00
qazal
86a19e19e8
cleanup bits of viz [pr] ( #10681 )
2025-06-07 09:18:12 +03:00
wozeparrot
e3805171e2
feat: variable bs bitcast ( #10674 )
2025-06-06 17:21:53 -07:00
George Hotz
54db1f8ee8
prevent huge waste of multi ram ( #10669 )
...
* prevent huge waste of multi ram
* fix ram usage
* only define var
* add resolve
* fix tests
* fix cifar training
* remove that logic
* fix test without long
2025-06-06 17:17:21 -07:00
George Hotz
b68b7dbc2a
test winograd is close to normal conv [pr] ( #10557 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-06-06 19:11:49 -04:00
nimlgen
85cea23557
nv: original bw qmd ( #10672 )
...
* nv: original bw qmd
* forgot
2025-06-07 01:43:22 +03:00
George Hotz
5ef7c5923f
docs: remove unused METAL_XCODE env var ( #10421 )
2025-06-06 18:39:54 -04:00
Sidharth N. Babu
ef14dfb277
compile fixes ( #10442 )
2025-06-06 18:38:37 -04:00
leopf
eb7305e6a4
Tensor.keccak("sha3_256") ( #7186 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-06-06 15:24:05 -07:00
nimlgen
346b8542da
nv: fix inval from gpu_get_id_info_v2 ( #10670 )
2025-06-07 00:54:32 +03:00
chenyu
bdede4924e
fix odd number in get_test_global_size ( #10671 )
...
factor might not be a integer if input global_size has an odd number in it
2025-06-06 17:31:35 -04:00
George Hotz
bf4ffc054c
mstack replaces scheduler complexity ( #10654 )
...
* mstack replaces scheduler complexity
* leave that one
* contiguous
* work
* upd
* minimal failing test
* simpler
* attention is broken
* fix transformer
* failing tests
* real fix for llama
* kv cache test
* jit multi assign test
* better tests
* comment
* fix jit issue
* traverse after buf_uop
2025-06-06 11:31:41 -07:00
George Hotz
7f0f97aa76
new test_multitensor tests ( #10667 )
...
* new test_multitensor tests
* cleanup scheduler
2025-06-06 10:26:28 -07:00
qazal
5170f387b3
remove UOp.metaop [pr] ( #10664 )
...
* little simpler UOp.const_like [pr]
* remove UOp.metaop
* bind
* remove
* min diff
* that comment is fine
2025-06-06 16:21:48 +03:00
chenyu
4a6d84c4c3
hotfix llama start_pos vmax is max_context-1 ( #10659 )
...
* hotfix llama start_pos vmax is max_context-1
fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`
* hotfix: multitensor transformer test tests kv cache
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a
Revert "hotfix: multitensor transformer test tests kv cache"
...
This reverts commit ad9f88419a .
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a
hotfix: multitensor transformer test tests kv cache
2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192
tests for multi assign ( #10658 )
...
* tests for multi assign
* transformer tests
* add that assert
2025-06-05 20:56:40 -07:00
wozeparrot
0d86f8d375
fix failed threefry ( #10646 )
2025-06-05 17:17:42 -07:00
chenyu
e67642d430
update doc example for multinomial ( #10657 )
...
also added many `s` for consistency
2025-06-05 20:16:52 -04:00
Eitan Turok
61352b8aa2
Add some more docs ( #10634 )
...
* more docs
* Add multinomial to ops
* better doc
2025-06-05 19:40:37 -04:00
qazal
884b6cf288
remove gbarrier on const ( #10656 )
2025-06-06 02:36:52 +03:00
chenyu
ff1aad7b69
fix const float pow to int tensor ( #10655 )
...
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
6619f17e26
force store to be contiguous ( #10652 )
2025-06-05 15:42:54 -07:00
wozeparrot
37e1ef1be3
feat: cleanup old AM processes ( #10653 )
2025-06-05 15:41:00 -07:00
George Hotz
baba274a76
minimal mstack pr to fix allreduce ( #10649 )
...
* minimal mstack pr to fix allreduce
* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17
MSTACK little non-functional changes ( #10648 )
2025-06-05 13:20:22 -07:00
b1tg
79d04d1baf
AMD_LLVM: support mfma for mi300x ( #10625 )
...
* amd llvm: support mfma for mi300x
* don't pass self
* refactor wmma render
* arch as lambda arg
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-06-05 15:55:44 -04:00
chenyu
46811d0d3c
minor external_model_benchmark cleanup ( #10644 )
2025-06-05 14:13:28 -04:00
qazal
26afbc954f
delete redundant tests from test_schedule [pr] ( #10643 )
2025-06-05 20:08:39 +03:00