Commit Graph

3776 Commits

Author SHA1 Message Date
wozeparrot
a18963d9e7 feat: use tinygrad useragent (#10488) 2025-05-23 15:44:40 -07:00
qazal
7a762f01ab s/shape_spec/ast_spec [pr] (#10485) 2025-05-23 15:43:54 +03:00
qazal
127a7c8aee assert AST views only exist in the edges (#10484)
* assert AST views only exist in the edges

* valid without device
2025-05-23 15:27:09 +03:00
qazal
e491168685 add metadata note + whitespace fixup [pr] (#10483)
* add metadata note + whitespace fixup [pr]

* TestSchedule.test_kernelize_diamond
2025-05-23 14:37:45 +03:00
Sieds Lykles
ce6ebfb8ee verify rewrites in test_uop_symbolic (#10430)
* verify rewrites in test_uop_symbolic

* use global context
2025-05-23 06:57:29 -04:00
George Hotz
1e4d63e06e uops can have multiple metadata (#10479)
* uops can have multiple metadata

* fixups
2025-05-22 21:35:02 -07:00
George Hotz
9fc01c1e03 support for uop tags (#10477)
* support for uop tags [pr]

* test uop tags
2025-05-22 19:53:48 -07:00
chenyu
8cc2dff4d8 only float Tensors have gradient [pr] (#10475) 2025-05-22 21:02:11 -04:00
George Hotz
147f7747f2 remove the map from create_schedule_with_vars [pr] (#10472) 2025-05-22 15:58:25 -07:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
chenyu
7bfb20757c fix tensor int floor div (#10327)
* fix tensor int floor div

* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
Sieds Lykles
2b4375f36d Correct divmod folding behind flag (#10433)
* add flag

* add test

* remove import
2025-05-21 06:46:13 -04:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
chenyu
29624af872 skip commavq in external_model_benchmark (#10439)
precision issue with different onnxruntime version
2025-05-21 01:45:33 -04:00
George Hotz
03e7a99ca8 add edge cases found by codex [pr] (#10423)
* add edge cases found by codex [pr]

* another test

* more edgecases

* docs

* instructions

* fine, add that one

* nan cases

* roll failures

* inv prob

* more failing tests

* err, that's failing

* more tests

* more failures

* uop verif

* failures

* webgpu
2025-05-20 14:53:18 -07:00
nimlgen
2895198c36 am: download regs (#10419)
* am: download regs

* x

* linter

* mypy

* after merge

* raise

* fixed name

* fix

* xx

* remove

* missing reg

* missing reg

* move to online

* ops
2025-05-20 18:59:56 +03:00
uuuvn
ec9955c956 Use REAL_DEV for test skips (#10420)
This should fix remote cpu tests flakiness (segfaults were in
`test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't
skipped on remote cpu)
2025-05-19 17:32:14 -07:00
Sieds Lykles
db09676250 Dont simplify gate in gate, fix FUSE_ARANGE=1 python test/test_ops.py TestOps.test_scatter_add (#10411)
* substitute out index

* Add test

* change comment
2025-05-19 13:16:21 -04:00
qazal
cc8dda1d75 move multi_map to grouper rewrite pass (#10409)
* move multi_map to grouper rewrite pass

* delete that
2025-05-19 10:44:06 +03:00
George Hotz
b06291077c no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
uuuvn
27c12be471 amd mockgpu graph support (#10385)
For testing remote graph stuff (prompted by #10371) in ci
2025-05-18 09:43:16 -07:00
qazal
04b23087d8 grouper tests from fuse_arange_default [pr] (#10394) 2025-05-18 18:42:43 +03:00
qazal
9e2089dcd4 don't raise Exception in process replay [pr] (#10392)
* don't raise Exception in process replay [pr]

* continue generating diffs unless [pr] is set, exit(1) otherwise

* change

* works
2025-05-18 11:23:23 +03:00
qazal
0294bfe507 simpler can_pad (#10364)
* simpler can_pad [pr]

* 3 kernels

* tests

* less kernels
2025-05-18 10:00:07 +03:00
George Hotz
6f77b938d7 Move getbits tests into test_helpers (#10382) 2025-05-17 17:04:00 -07:00
George Hotz
6ec88d94df add tests for multi ram usage [pr] (#10376) 2025-05-17 15:33:40 -07:00
वेदांत
2453d99050 rms matching pytorch implementation (#10319)
* rms matching pytorch implementation

* pre commit fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-17 08:23:11 -07:00
qazal
e054b53a75 kernel count tests for pad [pr] (#10369)
* kernel count tests for pads

* handcoded rand one kernel

* comment

* prerealize device rng counter

* test_rand_handcoded generates /0

* remove track_rewrites
2025-05-17 17:20:46 +03:00
George Hotz
e13f2a3092 multi is O(1) (#10183)
* multi is O(1)

* allreduce

* no new uops needed

* junk

* something

* simple

* that's really what i want

* closer

* inject _device_num

* pretty print

* cleanups

* this

* early dnum

* ops allreduce is good

* ish

* device is the tuple and this is fine

* simpler

* progress

* copy_multi

* work

* more tests

* more tests pass

* work

* no None axis

* tests

* no none multi

* type fixes

* pre commit passes

* lil

* remove this

* mlperf dataloader on mac

* that test was wrong

* unbind

* support DEBUG=2

* realize

* only unbind bound vars

* don't include fixedvars

* graph test

* one test

* fixedvars in hcq

* new ring reduce

* ring reduce

* simpler ring

* mselect

* mselect doesn't work

* Revert "mselect doesn't work"

This reverts commit c78b77bd7d.

* Revert "mselect"

This reverts commit bb2e430ac3.

* simpler

* fixups

* no optional

* fix jit

* move things around

* cleanup multi

* simpler multi

* simpler reshape
2025-05-16 23:14:23 -07:00
George Hotz
e1a40e8040 add hcq fixedvars support [pr] (#10356)
* add hcq fixedvars support [pr]

* different test

* fixedvars are only for comp_queues

* fix hcq varvals
2025-05-16 22:05:53 -07:00
George Hotz
876d2275a1 changes from new multi (#10353)
* changes from new multi

* revert hcq change
2025-05-16 13:07:29 -07:00
wozeparrot
66e00c04dd fix: skip kernel timing tests on ci cuda (#10348) 2025-05-16 11:48:06 -07:00
qazal
e9e5b54e43 grouper cleanups and merge with insert_kernels [pr] (#10349)
* grouper cleanups and merge with insert_kernels [pr]

* remove that
2025-05-16 14:39:56 +03:00
b1tg
caded2f413 llvm diagnostic error (#10267)
* llvm diagnostic info

* use decorator

* better error reporting

* fix mypy

* collect all diag msgs

* test diag error

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-16 02:03:20 -04:00
George Hotz
a4a25720b2 add test_multitensor_jit_input [pr] (#10347) 2025-05-15 20:47:57 -07:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
wozeparrot
f59ecf2116 fix: mockgpu cuda timing (#10343) 2025-05-15 14:14:14 -07:00
qazal
7cfe367c07 failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330) 2025-05-15 14:58:11 +03:00
qazal
0a45cd0cbe grouper: merge views in fuse elementwise (#10325)
* grouper: merge views in fuse elementwise

* with gradient api
2025-05-15 13:17:09 +03:00
qazal
89d8d5b25e add dims check in FUSE_ARANGE (#10323) 2025-05-15 11:33:21 +03:00
qazal
8fad0f0124 grouper: check for unsafe PAD in FUSE (#10322) 2025-05-15 10:53:44 +03:00
chenyu
f008e5f233 test_dtype_alu should cast bf16 input (#10320)
when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate
2025-05-15 01:11:39 -04:00
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4 cleanup test_conv2d_ceildiv_edge_case [pr] (#10317) 2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd Add conv2d ceildiv edge case (#10303) 2025-05-14 22:50:23 -04:00
wozeparrot
9bbc2bc2a7 hotfix: filter_too_much (#10308) 2025-05-14 15:31:51 -07:00
George Hotz
42e70193c9 multi: instead of real, just copy (#10289)
* multi: instead of real, just copy

* fix test

* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4 do not require self for track_rewrites [pr] (#10302) 2025-05-14 18:23:32 +03:00
qazal
d342f7688d remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296) 2025-05-14 14:54:07 +03:00