George Hotz
a2f502b89e
fix rangeify=1 ops on GPU ( #12130 )
2025-09-12 11:17:37 +08:00
George Hotz
0766616962
isolate the const hacks in the old kernelize ( #12126 )
...
* isolate the const hacks in the old kernelize
* if rangeify, don't waste time
2025-09-12 08:35:35 +08:00
Sieds Lykles
1f3950a484
Invalid idx ( #12067 )
...
* merge index_dtype_3
* new lowering with Invalid idx
* remove that dtype from range
* finish merge
* annotate better
* indentation
* dont need that anymore
* always process replay for openpilot
* more uop_given_valid for idx
* valid past index_child
* fix bug preventing load getting an alt value
* add track_match_stats back in in shapetracker and remove cache
* get_valid_idx -> get_valid and get_idx
* fix heuristics with new idx
* split line
* fix typo
* fix signature
* dont skip idx if stride is 0
the idx may still be invalid
* lower const with new valid
* delete to_indexed_uops
* update shapetracker test
* delete axis_is_masked
* add cache back
* move around comment
* fix get_valid bug
* move invalid fold to symbolic so its earlier
* cleanup
* update applying padto to new idx
* add unit tests
* cleanup
* fold line
* improve spec
* dont try to render Invalid as a float
* more consistent invalid index
* update some tests
* Fold index with true cond
* skip test
* vconst min max if Invalid in arg
* fix signature of UOp.const
* add test for min/max of Invalid CONST/VCONST
* add InvalidType to as_const signature
* is Invalid to isinstance
* Add InvalidType to ConstLike
* index gate is a where gate
* make that a metaclass
* fix heurisics for new idx
* mypy happy
2025-09-12 01:42:02 +02:00
chenyu
544eb2c402
clean up test_scatter_reduce ( #12125 )
2025-09-11 16:36:58 -04:00
chenyu
9ad6a56d17
smaller test_simple_reduce ( #12124 )
2025-09-11 15:45:38 -04:00
chenyu
e5ef9ec5b1
remove IGNORE_OOB=0 in ci tests ( #12117 )
2025-09-11 15:05:04 -04:00
chenyu
3a83b56da5
fix test_dequantization_mxfp4 ( #12123 )
...
* fix test_dequantization_mxfp4
* assert_allclose
* rtol
2025-09-11 14:22:06 -04:00
chenyu
520e2e0727
actually run unit tests in ci MacOS (unit) ( #12122 )
...
* actually run unit tests in ci MacOS (unit)
* that's always wrong
2025-09-11 13:32:30 -04:00
nimlgen
acb700fc26
ci: fix ptx env ( #12120 )
2025-09-11 12:42:15 -04:00
chenyu
20cd7177de
delete test_bert_fuse_arange ( #12121 )
...
* delete test_bert_fuse_arange
it's the default now and we are not interested in FUSE_ARANGE=0 version
* remove -v
2025-09-11 12:35:51 -04:00
chenyu
b07f962058
split metal model tests ( #12119 )
...
* split metal model tests
* llama too
2025-09-11 12:20:12 -04:00
chenyu
66593f135f
remove duplicated test_real_world ( #12118 )
...
included in the test/models right below
2025-09-11 11:57:14 -04:00
qazal
e76211fcbc
viz: specify all rect styles in parent ( #12115 )
...
* viz: specify all rect styles in parent
Visually a no-op, but it's easier to reason about when the rect's coloring comes from `g` parent that holds UOp data.
* this stays
2025-09-11 13:48:59 +03:00
nimlgen
400ad93892
ci: gate boost paths for macos only ( #12114 )
2025-09-11 12:48:34 +03:00
George Hotz
3ef0e5e01e
rangeify: use Ops.REALIZE and not Ops.CONTIGUOUS if it's added by system ( #12111 )
...
* rangeify: use Ops.REALIZE and not Ops.CONTIGUOUS if it's added by system
* fix contig + BufferizeOpts
* no outerworld
2025-09-11 11:56:59 +08:00
b1tg
52ebed991e
change dtype promo lattice when fp8s is supported ( #12088 )
...
* change dtype promo lattice when fp8s is supported
* no device check
* int64 + uint64 => fp8
2025-09-10 22:09:11 -04:00
George Hotz
d4eba5800d
rangeify cost function infrastructure ( #12091 )
...
* one call to hc opt
* does that pass?
* add cost function to rangeify
* test
* more test
* gate thread
* bufferize has shape
* ish
* match old behavior
* no ci there
2025-09-11 07:19:53 +08:00
qazal
78610b681e
viz: light up children ( #12107 )
...
* viz: light up children
* keep tag coloring
2025-09-11 01:28:01 +03:00
Sieds Lykles
3989f5b559
Revert "Simplify valid in symbolic ( #12104 )" ( #12108 )
...
This reverts commit 73d479a016 .
2025-09-10 23:36:40 +02:00
Sieds Lykles
73d479a016
Simplify valid in symbolic ( #12104 )
...
* cleanup cast_folding
* from sym to symbolic
* no more sym in dtype lowering
* move around simplify_valid
* update test
2025-09-10 23:26:19 +02:00
chenyu
e306650d39
remove GPUDevice ( #12106 )
2025-09-10 16:35:00 -04:00
George Hotz
d8a7a1c9c7
BUFFERIZE shape should be each range, not the product ( #12105 )
...
* BUFFERIZE shape should be each range, not the product
* fix tests
* resolve
2025-09-11 04:02:24 +08:00
Sieds Lykles
3730172c10
cleanup cast_folding ( #12101 )
...
* cleanup cast_folding
* from sym to symbolic
* no more sym in dtype lowering
2025-09-10 21:30:20 +02:00
chenyu
0e266f376c
ops_gpu -> ops_cl ( #12103 )
2025-09-10 15:15:48 -04:00
chenyu
0599e86186
replace hardcoded GPU in llama debug msg ( #12102 )
2025-09-10 13:56:40 -04:00
qazal
5a84d86db7
viz: fix buffer tooltip offset ( #12100 )
...
* fixup offsets
* add buffer num to tooltip
2025-09-10 20:12:20 +03:00
nimlgen
fb96394ff5
auto-select available compilers ( #12094 )
...
* device: auto select compilers
* fix
* metal+opencl
* nv/cuda
* test without ptx
* ptx
* fix tests
* fix
* fix test
* rename
* test + cleaner
* xx
* ops
* better test
* win?
* um?
* types
* debug
* win??
* sep rung
* wtf?
* debug
* skip win
* revert this
* types
2025-09-10 19:52:01 +03:00
chenyu
bb67829e99
raise KernelOptError in TC _apply_tc_opt ( #12099 )
...
currently getting
```
2025-09-10 13:18:19
File "/home/chenyu/tinygrad/tinygrad/codegen/opt/search.py", line 149, in beam_search
2025-09-10 13:18:19
acted_lins: list[Scheduler] = flatten([get_kernel_actions(lin, include_0=False).values() for lin,_ in beam])
2025-09-10 13:18:19
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-10 13:18:19
File "/home/chenyu/tinygrad/tinygrad/codegen/opt/search.py", line 107, in get_kernel_actions
2025-09-10 13:18:19
lin2.apply_opt(a)
2025-09-10 13:18:19
File "/home/chenyu/tinygrad/tinygrad/codegen/opt/postrange.py", line 169, in apply_opt
2025-09-10 13:18:19
ret = self._apply_tc_opt(use_tensor_cores, cast(int, opt.axis), tc_select, tc_opt)
2025-09-10 13:18:19
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-10 13:18:19
File "/home/chenyu/tinygrad/tinygrad/codegen/opt/postrange.py", line 235, in _apply_tc_opt
2025-09-10 13:18:19
idx = self.rngs.index(a)
2025-09-10 13:18:19
^^^^^^^^^^^^^^^^^^
2025-09-10 13:18:19
ValueError: UOp(Ops.RANGE, dtypes.index, arg=(1002, <AxisType.REDUCE: 6>), src=(
2025-09-10 13:18:19
UOp(Ops.CONST, dtypes.index, arg=15, src=()),)) is not in list
```
2025-09-10 12:32:19 -04:00
George Hotz
84b249ef0e
move simplify reduce out of devectorizer ( #12098 )
2025-09-10 21:24:57 +08:00
qazal
5d66a2d885
viz: refactor range clipping ( #12097 )
2025-09-10 16:23:46 +03:00
George Hotz
9789337722
early reduce simplify ( #12046 )
...
* early reduce simplify
* min changes
* need that
* that goes in simplify
* no more arange reduce opt
2025-09-10 21:02:46 +08:00
nimlgen
21e6926a6a
HostLLVMCompiler -> CPULLVMCompiler ( #12096 )
2025-09-10 14:04:16 +03:00
nimlgen
551560b87c
do not use getenv('PTX') in tests ( #12095 )
...
* test without ptx
* fix tests
* fix test
* linters
2025-09-10 14:04:07 +03:00
Sieds Lykles
0e420e68b4
delete axis_is_masked ( #12092 )
2025-09-10 05:26:19 +02:00
George Hotz
ef53a6fc19
one call to hc opt ( #12074 )
...
* one call to hc opt
* does that pass?
* Clean up postrange.py by removing comments
2025-09-10 11:18:18 +08:00
Sieds Lykles
499f50483b
x | !x -> True ( #12090 )
2025-09-10 03:26:01 +02:00
Sieds Lykles
5b73076e48
assert benchmark times ( #12042 )
...
* assert jitted times in openpilot
* better error
* better error
* add ASSERT_MIN_STEP_TIME to more models
* t is step_times
* update benchmark times
* update times
2025-09-09 23:40:02 +02:00
b1tg
58d13a6e3e
remove redundant check ( #12087 )
2025-09-09 15:15:39 -04:00
qazal
71fcb23d4a
viz: cleanup renderDag ( #12086 )
2025-09-09 19:19:45 +03:00
b1tg
82e955fe79
fix inf bug in float_to_fp8 ( #12085 )
2025-09-09 12:02:56 -04:00
b1tg
14faf7a5c0
AutoCastType tests for fp8s/bf16 ( #12084 )
2025-09-09 11:33:01 -04:00
qazal
5e76eff26d
viz: pre fetch workers ( #12083 )
...
* viz: pre fetch workers
* move check
2025-09-09 15:56:39 +03:00
qazal
5fde033794
viz: prune worker payload ( #12082 )
2025-09-09 14:45:13 +03:00
nimlgen
1c6c42715f
unify cpu and llvm ( #11982 )
...
* try unify cpu and llvm
* fixes
* fix
* ops
* no llvm
* fix
* rm
* lvmm is ot
* oops
* override
* no llvm
* ignore
* skip llvm
* ooops
2025-09-09 13:54:44 +03:00
qazal
50cc7175cb
viz: use complete progress helper ( #12081 )
...
* viz: use complete progress helper
* min diff
* rename show to start
2025-09-09 11:00:52 +03:00
Sieds Lykles
239091d111
numba>=0.55 for uv resolution ( #12079 )
...
* force numba version
* update comment
2025-09-09 01:43:32 +02:00
chenyu
2bd1fff79c
ci GPU misc cleanups ( #12078 )
2025-09-08 16:47:29 -04:00
chenyu
1781d5bced
remove PYTHONPATH in test.yml ( #12077 )
...
set globally already
2025-09-08 15:41:47 -04:00
nimlgen
9182948951
remove llvm_bf16_cast ( #12075 )
2025-09-08 20:51:15 +03:00
chenyu
11213398b9
reorder amdremote in test yml ( #12073 )
2025-09-08 13:43:04 -04:00