chenyu
ffb032e31d
test_diagonal touchup ( #10962 )
2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632
Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops ( #10945 )
...
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend
* cleanup
* generic fix
* tests
* cmp with diagonal too
* oops
* move tests
* fix test
* remove unnecessary import
* fix assert
* compare against numpy
---------
Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local >
2025-06-24 15:36:06 -04:00
nimlgen
26ddf8d714
amd: rename dev_iface -> iface to match nv ( #10959 )
2025-06-24 20:22:19 +03:00
chenyu
bfa87f3490
clean up binary_crossentropy_logits ( #10958 )
2025-06-24 12:23:40 -04:00
qazal
2ccddfc0ca
viz: match canvas fontsize ( #10957 )
...
it's 10px https://developer.mozilla.org/en-US/docs/Web/API/CanvasRenderingContext2D/font?utm_source=chatgpt.com .
2025-06-24 19:07:06 +03:00
qazal
de4b9bf53b
add opts_to_apply option to AST KernelInfo ( #10950 )
...
* proposal: add option to override opts in the get_program API
* update test_linearizer_rewrite
* state in uops
* update process_replay and names
* empty isn't none
* fix process replay
2025-06-24 18:55:39 +03:00
chenyu
18e264a449
Tensor.logsigmoid ( #10955 )
2025-06-24 11:16:14 -04:00
Ignacio Sica
f15247d2d2
remove outdated index masking in lowerer [pr] ( #10953 )
...
* add assert to check idx is never replaced with const 0
* remove outdated index masking
2025-06-24 07:53:30 -07:00
b1tg
cc32394b32
support copyin/copyout/is_allocated for subbuffers ( #10869 )
...
* support copyin/copyout/is_allocated for subbuffers
* simple
* clean up
* rm underlying_buf
* add function is_initialized
* add tests
* better test_subbuffer_copy_in_out
* fix allocator
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-06-24 07:49:04 -07:00
chenyu
35504c938e
torch.clip(x,y) -> x.clip(y) in test_ops ( #10954 )
...
* torch.clip(x,y) -> x.clip(y) in test_ops
* test_binary_crossentropy_logits_pos_weights
2025-06-24 10:22:19 -04:00
Fang-Pen Lin
86d458533f
Add pos_weight for binary_crossentropy_logits ( #10855 )
...
* Add pos_weight for binary_crossentropy_logits
* Remove debug code
* Code style
* Code style
* Rename
2025-06-24 09:42:37 -04:00
Sieds Lykles
61dad3740f
fix min_max and add test ( #10952 )
2025-06-24 09:33:26 -04:00
qazal
ab8c5d04ab
viz: convert to function_name in server [pr] ( #10951 )
...
* viz: convert to function_name in server [pr]
* it exists
2025-06-24 13:59:37 +03:00
nimlgen
c0d9cf09e0
system: flock ( #10949 )
...
* system: flock
* imports
* xx
2025-06-24 11:33:49 +03:00
nimlgen
5202970feb
system: move memory_barrier to System ( #10948 )
...
* system: move memory_barrier to System
* fixed
2025-06-24 11:09:43 +03:00
qazal
f41c28a048
update test_tensor_uop_representation comments [pr] ( #10946 )
...
These comments can update to match new tinygrad.
2025-06-24 10:47:09 +03:00
qazal
7a5e4e0bf1
fix unittests process replay [pr] ( #10947 )
2025-06-24 10:30:23 +03:00
George Hotz
7d560dbd75
hotfix: corealize in the tiny mnist test
2025-06-23 17:41:16 -07:00
Alexey Zaytsev
230ad3a460
[bounty] Don't use numpy inside hlb_cifar10 training loop ( #10777 )
...
* Don't use numpy inside hlb_cifar10 training loop
* Lint it
* jit it
* Drop the last half-batch
* Use gather for random_crop and reuse perms
* Wrap train_cifar in FUSE_ARANGE context
* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py
* Add cutmix to jittable augmentations
* Remove .contiguous() from fetch_batches
* Fix indexing boundary
---------
Co-authored-by: Irwin1138 <irwin1139@gmail.com >
2025-06-23 17:24:56 -07:00
George Hotz
383010555f
delete linearize and to_program from kernel.py ( #10943 )
2025-06-23 17:04:05 -07:00
George Hotz
0f89660ce4
Revert "change clang -march flag to -mcpu on arm ( #10841 )" ( #10942 )
...
This reverts commit 897e42fd1b .
2025-06-23 16:48:28 -07:00
Ignacio Sica
956a8391a5
minor cleanup on test_tensor_core_opts tests ( #10924 )
...
* minor cleanup on test_tensor_core_opts tests
Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup
* refactor test_tensor_core_opts_group
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-06-23 16:30:21 -07:00
ttomsa
897e42fd1b
change clang -march flag to -mcpu on arm ( #10841 )
...
* change clang -march flag to -mcpu with fp16 disassembly test
* fix
* add capstone to macos dependencies
* just check no cast in test
* rm import
* woops
* lets check
* move check
* llvm init before cpu chcek
* try this
* bump autogen llvm version
* also update libclang?
* revert
* add comment
* skip llvm test and add comment
* linter
2025-06-23 16:28:48 -07:00
Sieds Lykles
772cd02ad2
Perform index validation on load/store, not on the index ( #10849 )
...
* move index validation to load/stores
* add name
* add linearizer_failure
* add validate_store with implicit gates
* linearizer_failure_58 is fixed!
* add test_uop_graph test
* rename cond to gate
* test gated load/stores
* use or_casted()
2025-06-23 16:25:05 -07:00
George Hotz
ae4d2d71b4
bump line count to 14500
2025-06-23 15:32:27 -07:00
Harsh Natuskar
79d7cdd9ba
Fix device ( #10929 )
...
* fix: pkg
* better
* added test
* less lines
2025-06-23 15:30:19 -07:00
George Hotz
e15754db28
remove (some) kernelize from llama and test schedule speed ( #10939 )
...
* remove kernelize from llama
* 405B
* space
2025-06-23 15:07:31 -07:00
chenyu
3699d1d3ba
hotfix llama3 temperature is float ( #10938 )
2025-06-23 15:20:56 -04:00
uuuvn
4e2c9e36c7
Remote multihost (p2p transfer) ( #10601 )
2025-06-23 11:47:29 -07:00
chenyu
42b1c9625b
skip test TestKiTS19Dataset::test_training_set ( #10936 )
...
flaky
2025-06-23 14:27:24 -04:00
patrini32
9e9fd44987
refactor test/external/external_llama_eval.py ( #10567 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com >
2025-06-23 10:43:20 -07:00
chenyu
785b4ea8ac
optim flatten().shape[0] is numel ( #10935 )
2025-06-23 13:11:19 -04:00
qazal
ac39f27ae6
viz: non blocking UOp tracing ( #10913 )
...
* viz: non blocking UOp tracing
* u.arg
* no if Ops.KENREL
* drop replace
* switch to weakref.WeakKeyDictionary
* back
* remove ram usage skips, viz works here
* cache on reconstruct
2025-06-23 19:59:28 +03:00
Ignacio Sica
b8d09a1dae
tc with group/grouptop ( #10903 )
2025-06-23 09:58:41 -07:00
qazal
9944c2c02d
viz: show time taken on hover ( #10934 )
2025-06-23 19:00:40 +03:00
George Hotz
1e99a7f1c9
hotfix: don't viz the indexing rewrites
2025-06-23 08:20:26 -07:00
chenyu
f9b59924f1
OPTIM_DTYPE to specify dtype for optim params ( #10925 )
...
one more flag
2025-06-23 10:32:03 -04:00
qazal
7820aeca8e
update codegen process replay to use get_program [pr] ( #10921 )
...
* update codegen process replay to get_program [pr]
* precommit
* try str replace
* +to_function_name
* fixup tc
* local2.sh
* fix openpilot NOLOCALS
* new local.sh
* correct merge
* beam cache
* back
* revert beam thing
* adding opts_override and name_override makes output of get_program
reproducible
* min diff
2025-06-23 17:31:41 +03:00
nimlgen
eceb7a00d2
nv: rename iface mem functions ( #10931 )
2025-06-23 16:34:51 +03:00
qazal
4e864bd304
fix: getenv("NOLOCALS")/NOLOCALS context var ( #10927 )
...
OptOps shouldn't rely on os.environ.
2025-06-23 11:23:59 +03:00
alpharush
22f9696522
Fix/hcqfuzz harnesss bug ( #10923 )
...
* update command so extra module is found
* fix empty range in randrange errors
* lint
2025-06-23 11:22:30 +03:00
qazal
f037f85532
s/getenv("TC")/USE_TC context var ( #10922 )
2025-06-23 00:39:45 +03:00
qazal
9201224e0b
viz: remove Kernel check [pr] ( #10920 )
...
* viz: remove Kernel check [pr]
* TestVizIntegration
* test/unit allows opening of devices
* kernel -> Kernel
2025-06-22 20:47:54 +03:00
nimlgen
3ccdb2356b
system: factor out PCIIfaceBase ( #10917 )
...
* system: factor out PCIIfaceBase
* linter
* typing
2025-06-22 20:03:14 +03:00
George Hotz
b09c47366f
opt transforms the ast into an optimized ast ( #10900 )
...
* opt transforms the ast into an optimized ast
* fix get_kernel order and to_function_name
* function_name property
* update docs
* copy from kernel.py
* improve docs
* ci didn't trigger?
2025-06-22 09:41:26 -07:00
qazal
ffddf165f8
viz: color by kernel names in profiler ( #10919 )
...
* viz: color by kernel names in profiler
* ellipsis stays in bounds
2025-06-22 18:07:52 +03:00
nimlgen
36536ef6f0
nv: minor changes from nvpci ( #10918 )
2025-06-22 18:04:39 +03:00
geohotstan
4ab7d792cc
ONNX improve dtype fallback ( #10800 )
...
* fix
* add early verbose demo test
* is this how to write tests :s
* is definition drift even a thing? gemini says it is
* clean up
* better
* even better
* try add to CI
* doesn't work quite yet
* much more work to be done
* whoops
* partition the test heh
* skipif
* some nits for better names
* add webgpu test for onnxrunner
* fix reference links
* flush for now
2025-06-21 19:29:45 -04:00
chenyu
0480139def
log_perplexity metrics ( #10912 )
2025-06-21 10:44:47 -04:00
nimlgen
0e7bd9fd03
factor out generic MemoryManager ( #10910 )
...
* allocator -> memory
* just moveout it
* mm is abstracted
* need entry abstraction
* fix
* mypy
2025-06-21 16:18:33 +03:00