qazal
dddd4e5f9f
hotfix: remove duplicate TestTensorMutates [pr] ( #8619 )
...
* hotfix: remove duplicate TestTensorMutates [pr]
* imports
2025-01-14 16:03:17 -05:00
nimlgen
c5782e85d2
tlsf: optimize alloc ( #8608 )
2025-01-14 23:48:07 +03:00
George Hotz
bfbe81df71
remove cast before view ( #8613 )
...
* remove cast before view
* greener
* indexing
* that passes too
* openpilot too
* ack
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-14 15:04:58 -05:00
chenyu
393eec3201
raise RuntimeError for uneven shard [pr] ( #8593 )
...
no 7B llama on 6 GPUs
skip 70B
2025-01-14 14:51:48 -05:00
ignaciosica
d5a646d492
CUDA Turing TC ( #8597 )
...
* init turing tc
* reorder tc
* hotfix: remove some spaces
* revert var name to x
* consistent order of factors
* revert order of terms to match old stuff
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-14 10:35:14 -08:00
chenyu
cbfd51f5a5
make MultiLazyBuffer.bounds a property [pr] ( #8614 )
...
determined by lbs shapes and axis
2025-01-14 13:25:54 -05:00
chenyu
52e7003414
Revert "make kits19 dataset samples have small sizes ( #8591 )" ( #8610 )
...
This reverts commit 76a03e950a .
2025-01-14 12:24:27 -05:00
Francis Lata
76a03e950a
make kits19 dataset samples have small sizes ( #8591 )
2025-01-14 08:27:45 -08:00
ignaciosica
4057b98f7f
rename i and j into k and row/col ( #8607 )
2025-01-14 08:27:05 -08:00
nimlgen
1ff6862a3d
ci: sleep a bit to let the driver unload the prev pid ( #8605 )
2025-01-14 15:55:23 +03:00
qazal
97ec564b03
noop changes from the block_assign branch [pr] ( #8606 )
2025-01-14 07:47:17 -05:00
qazal
5aab2806f0
rename to test_tensor_uop + use upats for asserting [pr] ( #8604 )
...
* rename to test_tensor_uop + use upats for asserting [pr]
* fix pr
2025-01-14 05:09:56 -05:00
qazal
863abc7140
scheduling graph_rewrite prereqs for BLOCK in ASSIGN ( #8598 )
...
* remove the BUF_LIMIT assert
* skip the base one
* work
* work
* good error
* ok comment
* shorter check
2025-01-14 03:01:59 -05:00
chenyu
05e54f00d3
remove bounds from MultiLazyBuffer.from_sharded [pr] ( #8603 )
...
without a custom bound, the bound is uniquely determined by shape and axis
2025-01-13 23:40:05 -05:00
chenyu
d443e91d82
remove custom splits in Tensor.shard [pr] ( #8602 )
...
towards even split only
2025-01-13 21:29:13 -05:00
chenyu
227d96d7a3
remove unused src from metaop [pr] ( #8601 )
2025-01-13 20:28:14 -05:00
chenyu
c4e33048c6
test Tensor.clone has a different lazydata [pr] ( #8600 )
2025-01-13 20:13:44 -05:00
qazal
ae2229d727
assert kernel buffer limit at compile time [pr] ( #8595 )
...
* remove the BUF_LIMIT assert
* skip the base one
2025-01-13 16:32:07 -05:00
nimlgen
c2504357af
am: lock to access dev ( #8594 )
...
* amm lock to access dev
* wording
* just works
* disbale
2025-01-13 23:53:13 +03:00
geohotstan
4abe631b56
fix onnx mobilenetv2-7-quantized.onnx ( #8574 )
...
* is 67% considered fixed?
* move test up
* share function
* add qgemm too
* make sure qgemm comes out as int
* actually that note is not right
* remove qgemm (I did it wrong) and add it later lol.
2025-01-13 09:25:06 -08:00
George Hotz
d19c1c7f03
bump 75 -> 73 for test failure
2025-01-13 09:18:38 -08:00
Francis Lata
c25d5d3101
improve isin checks ( #8589 )
2025-01-13 12:12:31 -05:00
nimlgen
74b83c4c41
am in ci ( #8532 )
...
* try am in ci
* no sudo
* temp
* run more am test
* run half on am
* insert amdgpu
* other machine as well
2025-01-13 19:55:17 +03:00
nimlgen
d224d0ed7f
nv: fix fault info ( #8587 )
...
* nv: fix fault info
* and emu for amd
* skip if not mock
2025-01-13 14:38:43 +03:00
qazal
586e730d32
use UOp.st for kernel reduce axes ( #8499 )
...
* use UOp.st for kernel reduce axes [pr]
* do not return dict
2025-01-13 06:24:11 -05:00
qazal
7562cc0399
better test for reduce swizzle + don't use double dtype [pr] ( #8586 )
...
* better test_permute_rewrite
* use float32
2025-01-13 05:02:21 -05:00
George Hotz
df59b072db
rename to top_down_rewrite [pr] ( #8583 )
2025-01-12 18:36:38 -08:00
chenyu
994944920b
simpler batch_load_train_bert [pr] ( #8582 )
...
don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step.
https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview
2025-01-12 20:25:05 -05:00
George Hotz
05e5de6a91
ugh, remove that binary blob
2025-01-12 17:02:28 -08:00
George Hotz
4ac4c1415a
free intermediate buffers in the jit [pr] ( #8581 )
...
* free intermediate buffers in the jit [pr]
* intermediates_freed
* deallocate if not allocated
* self._first_run is simpler
2025-01-12 15:41:41 -08:00
George Hotz
d817dc10db
start on test rewrite map [pr] ( #8432 )
...
* start on test rewrite map [pr]
* chatgpt writes dumb tests
* comment out failing
* fix that test
* fix gc issue
* oh, frame 2
* remove uop mutability
* map is only the map
* simplier + more tests
* test tiny passes
* tests that need to pass
* parent test passes
* child test passes
* remove uop mutability [pr]
* test fixups
* most tests pass
* more tests pass
* lil test fixups
* them too
* fix test
* unneeded
* err, that
* fix test_hcq
* fix test failures
* fix that test
* tensor universe
* does this pass test
* Revert "does this pass test"
This reverts commit ed516b3169 .
* Revert "tensor universe"
This reverts commit c21301852a .
* test_mutate_add passes
* this can pass
* Revert "Merge remote-tracking branch 'origin/no_uop_mutability' into test_rewrite_map"
This reverts commit 657822dcdc , reversing
changes made to 2a126c145b .
* Revert "test_mutate_add passes"
This reverts commit ab4fc4c78e .
* correct enough
* remove test_rewrite_map_schedule.py
* viz
* uops are immutable
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-12 13:13:51 -05:00
qazal
2f71a00236
remove PYTHONPATH=. from mypy ci [pr] ( #8578 )
2025-01-12 09:52:03 -08:00
qazal
cde18fddce
fix DEBUG=2 output for copy runners [pr] ( #8579 )
...
* fix DEBUG=2 output for copy runners [pr]
* itemsize is constant
2025-01-12 12:03:01 -05:00
eliotgolding
867004fbeb
use unravel in views_to_indexed_uops [pr] ( #8560 )
...
* use unravel in shape
* make process replay work
* earlier View.minify()
* fix
* fix tests
* mypy
* get rid of early minify
* fix
* linter
* clean and add test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-12 10:25:55 -05:00
nimlgen
38b5ac4d4a
mypy for mockgpu/cuda & dsp/run ( #8575 )
2025-01-12 18:25:39 +03:00
chenyu
def90b22f6
EVAL_BS=36 for bert [pr] ( #8576 )
...
3X faster eval compared to BS=6.
green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview
red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview
2025-01-12 09:43:56 -05:00
qazal
ae241e96db
fix half4 on qcom and gpu ( #8573 )
...
* add test_setitem_half
* this fixes comma benchmark
2025-01-12 06:23:05 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
qazal
98c9e23560
remove global PYTHONPATH setting in CI (test.yml) [pr] ( #8568 )
...
* remove global PYTHONPATH setting in CI [pr]
* only run mypy in tinygrad/
* still needed for benchmarks
2025-01-11 12:47:50 -05:00
geohotstan
815c505e1d
fixes from adapting tvm tests ( #8570 )
2025-01-11 11:38:36 -05:00
qazal
79738d768c
do not require PYTHONPATH=. for process replay [pr] ( #8567 )
2025-01-11 09:45:34 -05:00
qazal
a70d1bf439
move print_diff to process replay [pr] ( #8566 )
...
* move print_diff to process replay [pr]
* ruff rightfully complians
2025-01-11 09:28:45 -05:00
nimlgen
2f0856c1e2
qcom: use hwinterface ( #8565 )
...
* qcom: use hwinterface
* ops
* not needed anymore
2025-01-11 17:11:23 +03:00
qazal
60503c8621
use CAPTURE_PROCESS_REPLAY=1 in CI [pr] ( #8564 )
2025-01-11 06:03:48 -05:00
nimlgen
61665a63c9
am logs to debug2 ( #8563 )
2025-01-11 13:33:18 +03:00
George Hotz
c7acd40574
more aggressive onnx const creation [pr] ( #8561 )
2025-01-10 17:38:32 -08:00
ignaciosica
8891495996
minor arg spec check on wmma ( #8525 )
2025-01-10 15:42:56 -08:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
George Hotz
70fa65cd95
viz fixups + scheduler option [pr] ( #8557 )
2025-01-10 15:09:31 -08:00