Commit Graph

4692 Commits

Author SHA1 Message Date
George Hotz
fa00ef66fd Update README.md 2024-06-13 00:29:19 +02:00
chenyu
eb0f5b5660 failed test case for getitem with leading Nones (#4936)
* failed test case for getitem with leading Nones

torch matched numpy so tinygrad is incorrect.
another repro
```
t = np.arange(12).reshape((3, 4))
print(t[None, None, np.array([1, 2])])

t = torch.arange(12).reshape((3, 4))
print(t[None, None, torch.tensor([1, 2])].numpy())

t = Tensor.arange(12).reshape(3, 4)
print(t[None, None, Tensor([1, 2])].numpy())
```

* # noqa
2024-06-12 16:19:42 -04:00
Elias Wahl
d2e3c391e8 Residual in MLM loss + Change default steps (#4935)
* Residual in mlm loss

* Reduce default steps to 160K * 24

* oops

* comment
2024-06-12 16:09:18 -04:00
chenyu
a21ea165bc skip linearizer test_failure_22 on llvm (#4937)
getting flaky recently
2024-06-12 16:03:38 -04:00
chenyu
27903c5ed5 minor minor Tensor.__getitem__ cleanup (#4934)
more consistent variable names and update comments before next minor cleanup that touches logic
[run_process_replay]
2024-06-12 15:08:18 -04:00
chenyu
5e6336edda minor Tensor.gather cleanup (#4933)
`permarg[i]` is just `i`, and break the big return into two lines.
[run_process_replay]
2024-06-12 13:57:28 -04:00
Timmy
720c700a8a Multireduce-Kernels: Linearizer Changes and Tests (#4259)
* basic tests

* cleanup

* pylint

* ruff

* use define acc as a proxy for rendered reductions

* use define acc as a proxy for rendered reductions

* recursive reduceop rendering via ast_parse

* linters + cleanup

* fixing late buf loading

* plus linters

* removing extra line

* linters

* does this break ci?

* added tests and if add end change

* typo in add_ends

* linters

* removing comments

* allow endifs to be inserted before the end of the graph

* find add ENDIF before next BARRIER

* removing tests with manual ENDIF + linters

* specifically the next barrier aftr the store of the local result

* Revert "specifically the next barrier aftr the store of the local result"

This reverts commit b288a5c3ce.

* keeping up to date

* linters + merge changes

* cleaning up old bad decisions

* linters and opts

* mrged linearizer tests

* fixing merge issues

* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions

* small diff fixes

* updating linearizer to work without uops.add( ... cachable)

* linters

* comment in multireduce tests

* skipping tests without locals

* full tests

* linters

* load_cache[key] fix for multiple accs

* linters

* assert only one reduceop

* fix loop_scope test to actually cause an issue

* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique

* updated tests

* fixing merge

* removing debug prints

* complete merge fix

* linters

* diff cleanup

* adding tests in

* give each reduce it's own local buffer

* gpu=1 changes

* store and load locals with upcasting

* modifying test?

* make multireduce_netsted_local_upcast test match single reduce shapes

* removing todo

* cleaning up the diff

* unroll test

* unroll and upcast tests

* fix gpu

* seq and self.load_cache[key] cleaning

* linters

* padto works

* merge fixes

* fixes

* add skips for amd

* linters + seq

* cleaning & more tests

* softmax tests

* linters

* [run_process_replay]

* add new tests back

This reverts commit 19dec22e01.

* more hardcoded -1s

* fix ptx

* Fix name for loop in ptx

* cleaning up the diff

* cleaning up the uops diff

* nv ci is too slow

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-06-12 13:29:43 -04:00
Nicklas Boman
6e86472cd6 fix typing for test to run in py38 (#4930) 2024-06-12 13:22:30 -04:00
chenyu
1326f29e24 fix Tensor.gather shape checking criteria (#4932)
it's fine if `self.shape[d] >= index.shape[d]` for all `d != dim`, not for all `d`
2024-06-12 13:10:14 -04:00
qazal
898430c004 more typing in linearizer uoping utils (#4929)
* type check everything

* idxs will be uops
2024-06-12 11:00:02 -04:00
George Hotz
828c98d5c4 add slides from code europe to docs 2024-06-12 14:35:08 +02:00
George Hotz
9a3c1e4a17 fix mul div failure (#4928) 2024-06-12 13:58:46 +02:00
George Hotz
11a03cbbf5 don't use uops.add while constructing (#4913)
* don't use uops.add while constructing

* rebase

* bugfixes

* have to use BFS

* prove it's late

* simpler uop symbolic test (why we did this)

* use dict, not set
2024-06-12 13:31:34 +02:00
qazal
d894acbb50 remove hardcoded -1s referencing late reduce (#4926) 2024-06-12 04:50:15 -04:00
qazal
b833a112ba allocate shared memory per block (#4924)
* define temp

* use idx

* cleaner [run_process_replay]
2024-06-12 03:43:10 -04:00
George Hotz
ca4ccddcd6 docsfix: nn.Tensor -> Tensor 2024-06-12 09:18:32 +02:00
wozeparrot
3d13c23bfa llama3 --download_model (#4922) 2024-06-11 22:59:59 -07:00
chenyu
f902af4f0b increase metal ci test timeout to 20 minutes (#4920)
make it less annoying for now
2024-06-11 18:45:51 -04:00
chenyu
fdbb4305cb skip unsupported dtype in fuzz_linearizer (#4917)
resolve issues in #4887. dataset generated from ubuntu but metal does not support double
2024-06-11 18:18:21 -04:00
qazal
7f3d9e6d94 revert hsa autogen removal (#4914)
* Revert "only install comgr in AMD CI (#4909)"

This reverts commit 7f03420d05.

* rocm-llvm only removal
2024-06-11 12:55:45 -04:00
nimlgen
58cf6eaba9 add missing dir level for amd mockgpu (#4911) 2024-06-11 18:35:04 +02:00
chenyu
b886d250fb improve test_dropout_on_shard (#4912)
tested some basic property, also minor formatting for a few Tensor.training setups
2024-06-11 11:36:02 -04:00
qazal
7f03420d05 only install comgr in AMD CI (#4909)
* test

* delete hsa autogen
2024-06-11 06:19:33 -04:00
George Hotz
35e53c0809 add sharded arange test (#4908) 2024-06-11 10:58:33 +02:00
chenyu
798ea61377 widen test_ops [low, high] and more strict atol (#4906)
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
2024-06-10 20:47:09 -04:00
chenyu
97b05f567e revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
qazal
8b5bcf309a process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
George Hotz
9715a7193a replace set with dedup (#4901) 2024-06-10 18:20:38 +02:00
chenyu
c8cd637236 test case for Tensor.var reducing over size = 1 axis (#4902)
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu
c0fb7eee09 cleanup lazy const fold for binary (#4900)
removed pylint: disable=possibly-used-before-assignment
[run_process_replay]
2024-06-10 10:46:58 -04:00
nimlgen
5bf1f7d4d3 nv better error messages for ioctls (#4899) 2024-06-10 16:01:50 +03:00
George Hotz
b9f26eedc9 hotfix: import datasets in nn init 2024-06-10 11:33:50 +02:00
chenyu
b56ae5606c cosmetic changes to uop _match (#4897)
minor cleanup before fixing two level match
[run_process_replay]
2024-06-09 18:29:42 -04:00
SnakeOnex
b1db2d0094 tqdm replacement (#4846)
* tqdm replacement almost

* formatting

* formatting

* imports

* line len

* fix

* removed set description :(

* removed set description :(

* fix

* fix

* green check?

* rewrote as class, fixed several bugs

* types spacing

* removed imports

* fix

* iterable

* typing

* mypy disagreement

* imports

* more e2e tests vs tqdm

* removed seed setting

* robustness against time.sleep() flakiness

* flaky fix

* automatic bar closing when count==total

* cleanup

* clang error with tqdm

* tqdm back

* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program

* back to shutil

* unit_scale + unit_scale test

* custom unit to tests

* pretty

* clean

* removed flaky test

* less test iters

* empty line

* remove disable
2024-06-09 23:46:03 +02:00
qazal
05d7ab774f set tensor core opt options in Renderer (#4896) 2024-06-09 14:12:41 -04:00
George Hotz
f42183ba28 hotfix: relax cifar to 93.2 2024-06-09 13:09:21 +02:00
qazal
1dde829e34 UOps.IF* to graph spec (#4894) 2024-06-09 07:00:12 -04:00
George Hotz
b9afb0d577 test uop as symbolic (#4870)
* start work

* more tests passing

* more tests passing

* more

* 34 failures

* expect the failures

* remove broken rule

* render is fine in just the test

* simplify and put in test
2024-06-09 12:15:11 +02:00
nimlgen
654a8b9ef7 retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
chenyu
e33efd6a3d test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
chenyu
a3ec4234df expand broadcast functions a bit (#4891)
taking some good stuff from the #4886. I think `from_, to` is more readble than `sh, s` too
[run_process_replay]
2024-06-08 20:16:54 -04:00
wozeparrot
2849d0a2a1 fix copying to clipboard on a non secure context (#4890) 2024-06-08 16:51:47 -07:00
nimlgen
6327b50e51 amd in benchmarks (#4861)
* amd in benchmarks

* remove all hsa
2024-06-08 23:24:46 +03:00
nimlgen
d24e57c615 amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
wozeparrot
6c24eda522 feat: tinychat (#4869) 2024-06-08 12:05:45 -07:00
Brennan Kinney
9445946cae docs: Update referenced yaml in yolov8.py (#4871)
YAML files have since been relocated.
2024-06-08 15:05:00 -04:00
Roelof van Dijk
794fecf8e3 perf: faster element deletion during matching (#4882)
* perf: faster deletion

* fix: leave the tuple init
2024-06-08 15:16:35 +02:00
Roelof van Dijk
0eebb8e998 fix: _free should not return (#4880) 2024-06-08 14:45:06 +02:00
Roelof van Dijk
1785a70e77 fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
qazal
1e3325f369 raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00