Francis Lata
76a03e950a
make kits19 dataset samples have small sizes ( #8591 )
2025-01-14 08:27:45 -08:00
ignaciosica
4057b98f7f
rename i and j into k and row/col ( #8607 )
2025-01-14 08:27:05 -08:00
nimlgen
1ff6862a3d
ci: sleep a bit to let the driver unload the prev pid ( #8605 )
2025-01-14 15:55:23 +03:00
qazal
97ec564b03
noop changes from the block_assign branch [pr] ( #8606 )
2025-01-14 07:47:17 -05:00
qazal
5aab2806f0
rename to test_tensor_uop + use upats for asserting [pr] ( #8604 )
...
* rename to test_tensor_uop + use upats for asserting [pr]
* fix pr
2025-01-14 05:09:56 -05:00
qazal
863abc7140
scheduling graph_rewrite prereqs for BLOCK in ASSIGN ( #8598 )
...
* remove the BUF_LIMIT assert
* skip the base one
* work
* work
* good error
* ok comment
* shorter check
2025-01-14 03:01:59 -05:00
chenyu
05e54f00d3
remove bounds from MultiLazyBuffer.from_sharded [pr] ( #8603 )
...
without a custom bound, the bound is uniquely determined by shape and axis
2025-01-13 23:40:05 -05:00
chenyu
d443e91d82
remove custom splits in Tensor.shard [pr] ( #8602 )
...
towards even split only
2025-01-13 21:29:13 -05:00
chenyu
227d96d7a3
remove unused src from metaop [pr] ( #8601 )
2025-01-13 20:28:14 -05:00
chenyu
c4e33048c6
test Tensor.clone has a different lazydata [pr] ( #8600 )
2025-01-13 20:13:44 -05:00
qazal
ae2229d727
assert kernel buffer limit at compile time [pr] ( #8595 )
...
* remove the BUF_LIMIT assert
* skip the base one
2025-01-13 16:32:07 -05:00
nimlgen
c2504357af
am: lock to access dev ( #8594 )
...
* amm lock to access dev
* wording
* just works
* disbale
2025-01-13 23:53:13 +03:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03
bump 75 -> 73 for test failure
2025-01-13 09:18:38 -08:00
Francis Lata
c25d5d3101
improve isin checks ( #8589 )
2025-01-13 12:12:31 -05:00
nimlgen
74b83c4c41
am in ci ( #8532 )
...
* try am in ci
* no sudo
* temp
* run more am test
* run half on am
* insert amdgpu
* other machine as well
2025-01-13 19:55:17 +03:00
nimlgen
d224d0ed7f
nv: fix fault info ( #8587 )
...
* nv: fix fault info
* and emu for amd
* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32
use UOp.st for kernel reduce axes ( #8499 )
...
* use UOp.st for kernel reduce axes [pr]
* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399
better test for reduce swizzle + don't use double dtype [pr] ( #8586 )
...
* better test_permute_rewrite
* use float32
2025-01-13 05:02:21 -05:00
George Hotz
df59b072db
rename to top_down_rewrite [pr] ( #8583 )
2025-01-12 18:36:38 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
05e5de6a91
ugh, remove that binary blob
2025-01-12 17:02:28 -08:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db
start on test rewrite map [pr] ( #8432 )
...
* start on test rewrite map [pr]
* chatgpt writes dumb tests
* comment out failing
* fix that test
* fix gc issue
* oh, frame 2
* remove uop mutability
* map is only the map
* simplier + more tests
* test tiny passes
* tests that need to pass
* parent test passes
* child test passes
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169 .
* Revert "tensor universe"
This reverts commit c21301852a .
* test_mutate_add passes
* this can pass
* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"
This reverts commit 657822dcdc , reversing
changes made to 2a126c145b .
* Revert "test_mutate_add passes"
This reverts commit ab4fc4c78e .
* correct enough
* remove test_rewrite_map_schedule.py
* viz
* uops are immutable
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-12 13:13:51 -05:00
qazal
2f71a00236
remove PYTHONPATH=. from mypy ci [pr] ( #8578 )
2025-01-12 09:52:03 -08:00
qazal
cde18fddce
fix DEBUG=2 output for copy runners [pr] ( #8579 )
...
* fix DEBUG=2 output for copy runners [pr]
* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb
use unravel in views_to_indexed_uops [pr] ( #8560 )
...
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a
mypy for mockgpu/cuda & dsp/run ( #8575 )
2025-01-12 18:25:39 +03:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db
fix half4 on qcom and gpu ( #8573 )
...
* add test_setitem_half
* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560
remove global PYTHONPATH setting in CI (test.yml) [pr] ( #8568 )
...
* remove global PYTHONPATH setting in CI [pr]
* only run mypy in tinygrad/
* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d
fixes from adapting tvm tests ( #8570 )
2025-01-11 11:38:36 -05:00
qazal
79738d768c
do not require PYTHONPATH=. for process replay [pr] ( #8567 )
2025-01-11 09:45:34 -05:00
qazal
a70d1bf439
move print_diff to process replay [pr] ( #8566 )
...
* move print_diff to process replay [pr]
* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2
qcom: use hwinterface ( #8565 )
...
* qcom: use hwinterface
* ops
* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621
use CAPTURE_PROCESS_REPLAY=1 in CI [pr] ( #8564 )
2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9
am logs to debug2 ( #8563 )
2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574
more aggressive onnx const creation [pr] ( #8561 )
2025-01-10 17:38:32 -08:00
ignaciosica
8891495996
minor arg spec check on wmma ( #8525 )
2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95
viz fixups + scheduler option [pr] ( #8557 )
2025-01-10 15:09:31 -08:00
nimlgen
f457cb64d6
am: do not reload fw each run ( #8466 )
...
* am do not reload fw each run
* works
* comment this
* clean + comment
* warn message
* linter
* move out pci en master
* useless
* more correct
* oops
* oops
2025-01-10 23:33:38 +03:00
nimlgen
337328e409
am: fini gpu after use ( #8556 )
...
* am: fini gpu after use
* mypy
2025-01-10 21:02:34 +03:00
chenyu
6a7f971fa0
hotfix max(DEBUG, 2) -> max(DEBUG.value, 2) [pr] ( #8553 )
2025-01-10 12:57:44 -05:00
George Hotz
cd4edc5206
hotfix: pylint ignores runtime for speed
2025-01-10 09:07:18 -08:00
nimlgen
92b59c9b7a
test_hcq limits for mockgpu not (only) ci ( #8555 )
...
* test_hcq limits for mockgpu not (only) ci
* rm CI
2025-01-10 17:37:28 +03:00
George Hotz
9833fe83d8
more work on onnx imagenet [pr] ( #8552 )
...
* more work on onnx imagenet [pr]
* working quantization
* static quant
* benchmark onnx 0 dim
2025-01-09 20:28:18 -08:00
George Hotz
e172b759f0
more working ( #8550 )
2025-01-09 18:40:08 -08:00