Commit Graph

10417 Commits

Author SHA1 Message Date
chenyu
97b05f567e revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
qazal
8b5bcf309a process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
George Hotz
9715a7193a replace set with dedup (#4901) 2024-06-10 18:20:38 +02:00
chenyu
c8cd637236 test case for Tensor.var reducing over size = 1 axis (#4902)
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu
c0fb7eee09 cleanup lazy const fold for binary (#4900)
removed pylint: disable=possibly-used-before-assignment
[run_process_replay]
2024-06-10 10:46:58 -04:00
nimlgen
5bf1f7d4d3 nv better error messages for ioctls (#4899) 2024-06-10 16:01:50 +03:00
George Hotz
b9f26eedc9 hotfix: import datasets in nn init 2024-06-10 11:33:50 +02:00
chenyu
b56ae5606c cosmetic changes to uop _match (#4897)
minor cleanup before fixing two level match
[run_process_replay]
2024-06-09 18:29:42 -04:00
SnakeOnex
b1db2d0094 tqdm replacement (#4846)
* tqdm replacement almost

* formatting

* formatting

* imports

* line len

* fix

* removed set description :(

* removed set description :(

* fix

* fix

* green check?

* rewrote as class, fixed several bugs

* types spacing

* removed imports

* fix

* iterable

* typing

* mypy disagreement

* imports

* more e2e tests vs tqdm

* removed seed setting

* robustness against time.sleep() flakiness

* flaky fix

* automatic bar closing when count==total

* cleanup

* clang error with tqdm

* tqdm back

* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program

* back to shutil

* unit_scale + unit_scale test

* custom unit to tests

* pretty

* clean

* removed flaky test

* less test iters

* empty line

* remove disable
2024-06-09 23:46:03 +02:00
qazal
05d7ab774f set tensor core opt options in Renderer (#4896) 2024-06-09 14:12:41 -04:00
George Hotz
f42183ba28 hotfix: relax cifar to 93.2 2024-06-09 13:09:21 +02:00
qazal
1dde829e34 UOps.IF* to graph spec (#4894) 2024-06-09 07:00:12 -04:00
George Hotz
b9afb0d577 test uop as symbolic (#4870)
* start work

* more tests passing

* more tests passing

* more

* 34 failures

* expect the failures

* remove broken rule

* render is fine in just the test

* simplify and put in test
2024-06-09 12:15:11 +02:00
nimlgen
654a8b9ef7 retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
chenyu
e33efd6a3d test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
chenyu
a3ec4234df expand broadcast functions a bit (#4891)
taking some good stuff from the #4886. I think `from_, to` is more readble than `sh, s` too
[run_process_replay]
2024-06-08 20:16:54 -04:00
wozeparrot
2849d0a2a1 fix copying to clipboard on a non secure context (#4890) 2024-06-08 16:51:47 -07:00
nimlgen
6327b50e51 amd in benchmarks (#4861)
* amd in benchmarks

* remove all hsa
2024-06-08 23:24:46 +03:00
nimlgen
d24e57c615 amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
wozeparrot
6c24eda522 feat: tinychat (#4869) 2024-06-08 12:05:45 -07:00
Brennan Kinney
9445946cae docs: Update referenced yaml in yolov8.py (#4871)
YAML files have since been relocated.
2024-06-08 15:05:00 -04:00
Roelof van Dijk
794fecf8e3 perf: faster element deletion during matching (#4882)
* perf: faster deletion

* fix: leave the tuple init
2024-06-08 15:16:35 +02:00
Roelof van Dijk
0eebb8e998 fix: _free should not return (#4880) 2024-06-08 14:45:06 +02:00
Roelof van Dijk
1785a70e77 fix: else-return on runtime (#4881)
* fix: add init file

* fix: no else-return

* fix: remove file again
2024-06-08 14:44:24 +02:00
qazal
1e3325f369 raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal
d19f39d4dd unbind Variable pre LazyOp (#4873)
* early unbind

* assert ConstType is correct
2024-06-08 08:16:38 -04:00
George Hotz
9c30889ce9 [run_process_replay] faster and simpler match function (#4876) 2024-06-08 14:08:30 +02:00
Roelof van Dijk
aadab3e3da fix: pylint will not lint folders without __init__.py (#4875)
* fix: add __init__.py

* fix: no-else-return

* fix: redefined-builtin

* fix: unused-variable

* fix: possibly-used-before-assignment
2024-06-08 14:00:24 +02:00
Szymon Ożóg
1680a4bcb8 Remove unused and internal variables (#4862) 2024-06-07 23:05:38 +02:00
Roelof van Dijk
15e5a4fb26 fix: variable defined in assert breaks -O (#4866) 2024-06-07 21:36:24 +03:00
chenyu
3a20cff7c2 expand ShapeTracker.invert a bit (#4864)
removed a type cast and it can early return now

[run_process_replay]
2024-06-07 14:26:02 -04:00
nimlgen
688b14c933 do not sleep immediately in amd's wait_signal (#4859)
* that was slow python in hlb

* wait actibely for 5s

* just this

* revert this back

* fix
2024-06-07 16:33:46 +03:00
qazal
66dfd5e7bf faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
chenyu
dd5378378b cleanup kernel simplify_merge_adjacent (#4852)
cleanup kernel simplify_merge_adjacent
2024-06-06 12:04:54 -04:00
nimlgen
47bfd7c2b7 fix sync of offset buffers in graphs (#4850)
* correctly sync offset buffers

* test

* style

* run less

* just use base
2024-06-06 16:09:45 +03:00
qazal
eeb5a7af39 refactor linearize to render_block, P1 (#4839)
* refactor to render_block

* move rendering the reduce to its own thing

* add todo and cleanups [run_process_replay]

* inplace update of idxs [run_process_replay]
2024-06-06 15:31:43 +03:00
George Hotz
b932ce0f1d [run_process_replay] style: clean up UPat 2024-06-06 08:54:24 +02:00
chenyu
b42f49b506 minor cleanup of view _merge_dims (#4849) 2024-06-05 23:20:26 -04:00
nimlgen
1649c21ead nv fix round of allocation sizes (#4828)
* fix round of allocation sizes

* comment on prefetch

* use huge pages
2024-06-06 00:21:56 +03:00
nimlgen
09bfb8c10a nv sync program copies to other exection (#4845) 2024-06-05 23:34:33 +03:00
chenyu
99e7a1d5e9 support symbolic reshape with non-contiguous (#4844)
* support symbolic reshape with non-contiguous

pre-requisite for symbolic arange (make symbolic ones that can be folded).

* test cases

* typo

* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce symbolic Tensor.var (#4843)
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Nik
085c0bbf6b add mlperf train subset of openimages (#4841) 2024-06-05 10:10:11 -04:00
Timmy
887643cf34 Multireduce atomic local load/store test (#4786)
* atomic load/store test

* tests for nested & unrolled

* check barriers

* linters

* cleaning up diff

* fix assert in _temp_create_multireduce_ast changes

* cleaning up the check for redundant barriers

* minor cleanups for the assert

* always seed randn, helps with debuggability

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
George Hotz
3954f102aa style: make __init__ first in Tensor class 2024-06-05 12:51:41 +02:00
Szymon Ożóg
273945df67 Regression tests for bitshift (#4829)
* Regression tests for bitshift

* Add test for bitshift not triggered

* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8 Construct UOps patterns using UPat (#4821)
* Allow UPat pattern definitions

* Convert pattern matcher tests to UPat constructions

* Convert constant_folder patterns to upat constructions

* Convert assembly patterns to upat constructions

* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a Disable for PTX as well (#4838)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl
e576aca044 Disable dropout (#4837) 2024-06-04 18:57:26 -04:00