Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
c1dfe5c00d compact get_late_rewrite_patterns [pr] (#9116) 2025-02-15 20:33:09 -05:00
qazal
2e97022e5e remove extra block in viz [pr] (#9115) 2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1 user scatter_reduce in scatter [pr] (#9114) 2025-02-15 18:21:01 -05:00
chenyu
c954419bc8 minor tweak to transcendental pow (#9112)
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0 raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c add RAWAST back to DEBUG=3 [pr] (#9107) 2025-02-15 09:12:51 -05:00
qazal
2e19976d03 assert views in tensor uops [pr] (#9106) 2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d improve DEBUG=3 [pr] (#9105) 2025-02-15 18:44:56 +08:00
qazal
41d143d27c new order to prepare for becomes_map = tensor_map [pr] (#9104) 2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73 actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96 increase lr for bert (#9098)
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b refactor LLVMRenderer (#9090)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27 add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c Synchronize webgpu (#9093) 2025-02-15 00:52:10 +03:00
chenyu
b58e7b1898 zero out the weight in bert init run (#9076)
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
qazal
82ad0d2e65 keep CONST/BUFFER uops in tensor_map [pr] (#9083) 2025-02-14 14:50:08 +02:00
qazal
65297066c2 move buffer refcount increment to the toposort [pr] (#9081) 2025-02-14 12:54:22 +01:00
chenyu
73af42aeab fix pow backward when base is 0 (#9075) 2025-02-13 21:06:01 -05:00
qazal
2d04a75a40 start tracking bottom_up_rewrite in viz [pr] (#9071)
* start tracking bottom_up_rewrite in viz [pr]

* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
chenyu
5ef48bbe0a swap order in rsqrt (#9069)
fixed backward for 0
2025-02-13 16:51:21 -05:00
Ahmed Harmouche
e83905696e Show install instructions when dawn library is missing (#9059)
* Show install instructions when dawn library is missing

* Handle missing dawn in ops_webgpu

* Simplify

* Solve f-string backlash error
2025-02-14 00:30:20 +03:00
chenyu
9e91898941 bert eval at the end of training (#9070)
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
e02e3b94c3 remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
947c97e6ff add test_sqrt to test_speed_v_torch (#9066)
working on getting rid of llvm sqrt hack
2025-02-13 15:25:54 -05:00
chenyu
49abc09f77 remove the reshapes in test_arange_2_reduce [pr] (#9063) 2025-02-13 12:33:25 -05:00
chenyu
2573d0621a Tensor.scatter_reduce touchup [pr] (#9060) 2025-02-13 10:01:14 -05:00
Josh Moore
1f9d2442b9 Add Tensor.scatter_reduce (#8947)
* pytorch scatter -> scatter_reduce

* WIP scatter_reduce implementation

* _pre_scatter return type hint

* split out src, mask to satisfy linter

* Add src cast back in

* dict of lambdas instead of ifs

* sum and prod reduction ops with include_self

* add reduce arg error message

* add amax and amin reduction ops

* Fix include_self for higher dims

* Simplify

* Simplify amax and amin too

* Pull include_self logic out into _inv_mask function

* reduce arg cannot be None for scatter_reduce

* Fix self-mask issue

* Add mean reduce op

* Add tests

* any() not needed here

* remove comment

* End support for Tensor src with reduce arg in tinygrad scatter

* Process index, dim inside actual functions

* Add scatter_reduce to onnx

* Add excluded onnx ScatterElements reduction tests back in

* Save 2 lines on the mask helpers

* Update docs

* Add include_self=False tests

* cleanup

* Remove unneeded helper function

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-13 09:08:54 -05:00
qazal
2b9ce1235a simple failing case for reorder expand + keep views in tensor_map [pr] (#9057) 2025-02-13 11:22:55 +01:00
George Hotz
765a936b81 getenv(CC) for clang (#9054) 2025-02-13 15:30:01 +08:00
George Hotz
33a1151f2f Revert "match torch rmsnorm implementation (#6799)" (#9052)
This reverts commit a66b8250e0.
2025-02-13 14:42:45 +08:00
Ryan Dorrington
a66b8250e0 match torch rmsnorm implementation (#6799)
* update rmsnorm to match torch implementation

* run all tests

* formatting

* formatting

* oneline

* default to 1e-6

* restore old test

* formatting

* don't save elementwise_affine

* your message

* ignore webgpu

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 13:02:51 +08:00
gg
19ae829bd1 test float uop in sym_infer (#7456)
* float uop in sym_infer

* break line :(

* rerun mypy

* update GlobalCounters types

* revert type change and cast assignments to mem and ops

* cast inferred value to UOp in reshape

* cast hcq, update view reshape to handle inferred float

* rm extra space

* update error

* no type updates
2025-02-13 12:55:28 +08:00
Sieds Lykles
095504d094 mulacc_unrolled should happen even with no DEVECTORIZE (#9029)
* mulacc_unrolled should happen even with no DEVECTORIZE

* Update rewriter.py

* Update rewriter.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:54:15 +08:00
George Hotz
74742c018f hotfix: setup_mock_nv_osx 2025-02-13 12:26:15 +08:00
JaSpa99
d2ff55e9c6 OSX GPUOcelot (#8209)
* add patches

* add osx test in ci

* macos specific uvm, gpfifo mask

* only do that for now

* Revert "add patches"

This reverts commit 80d3112a57.

* use fork for now

* workflow only one worker

* merge osxtests with tests

* Revert "merge osxtests with tests"

This reverts commit 3461c8f46c.

* macos pagesize 16384

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-13 12:24:29 +08:00
chenyu
f4f56d7c15 move time_linearizer to extra.optimization.helpers [pr] (#9048)
no longer used in tinygrad
2025-02-12 15:49:58 -05:00
chenyu
c15486cf39 remove contiguous in test_subbuffer_used [pr] (#9046)
test works without contiguous
2025-02-12 14:41:16 -05:00
rmtew
b3eab03055 Three things to get Windows CI working correctly: (#9047)
- Ensure that the set backend environment variable is persisted to the next step via $GITHUB_ENV
- It doesn't actually persist for Windows unless shell is explicitly set to bash.
- Add the assertion to ensure the selected backend is actually used.
2025-02-12 14:41:00 -05:00
chenyu
f53b819648 UOps. -> Ops. [pr] (#9044)
updated the comments and doc except extra
2025-02-12 12:53:23 -05:00
qazal
6811688d29 disallow VIEW(BUFFER) in tensor [pr] (#9041) 2025-02-12 17:27:35 +01:00
chenyu
7b5ac2c15e free_intermediates in bert (#9040)
also re-enable dropout and update EVAL_BS
2025-02-12 10:00:39 -05:00
Ahmed Harmouche
916d5e7f08 WebGPU f16 support (f16 bounty part 2) (#8653)
* WebGPU f16 support

* Don't enable f16 yet

* dtype tests passing after bitcast fix

* Maybe all WebGPU green?

* Require shader-f16 in examples

* Minor wgsl touchup

* 1 line shorter

* Simpler

* Add transcendetal support

* log2 nan location mismatch on Vulkan

* Nan skips
2025-02-12 19:46:53 +08:00
Ignacio Sica
aaed315fee add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
Josh Moore
0c97c10814 TestOps: silence pytorch std()/var() degrees of freedom warnings (#9034) 2025-02-12 14:49:18 +08:00
Ignacio Sica
d581afd873 skipdata capstone (#9026) 2025-02-12 08:11:14 +08:00
chenyu
2845f8797a failed test cases for rsqrt at 0 and similar ones (#9035)
* failed test cases for rsqrt at 0 and similar ones

related to 0*inf

* this failed
2025-02-11 17:50:16 -05:00
nimlgen
101652a55c hcq: thread fence (#8991)
* amd: thread fence

* nv
2025-02-11 18:09:37 +03:00
George Hotz
45aae8a6bc hotfix: add External Benchmark Schedule to CI 2025-02-11 22:06:17 +08:00