github/ROCm - ROCm - AtHeartEngineering

mirror of https://github.com/ROCm/ROCm.git synced 2026-04-05 03:01:17 -04:00

Author	SHA1	Message	Date
Michael Melesse	1581a1da26	fix libdevice	2023-04-26 12:09:56 -05:00
Michael Melesse	2784b804d9	Merge remote-tracking branch 'upstream/main' into ifu_4_26_2023	2023-04-26 12:04:21 -05:00
Keren Zhou	8f7ec23401	[FRONTEND] Refine arithmetic checks and corresponding tests for extern_elementwise (#1577 ) The current main would fail on `math.scalbn` because we implicitly cast the first argument from `int32` to `float32`, while the function only accepts `int32` as the first argument and `float32` as the second argument. So we update the type matching logic as follows: 1. Check if there's a type tuple that matches the types of the input arguments 2. If yes, we don't allow arithmetic check. 3. If not, we will do arithmetic check to implicitly cast types among arguments. 4. If we still don't find a corresponding function that accepts the casted types, throwing an error. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-25 14:25:45 -07:00
Philippe Tillet	d9020179ee	[FRONTEND] libdevice path no longer part of the runtime driver (#1580 )	2023-04-25 13:44:08 -07:00
Zahi Moudallal	4963d67cd3	[FRONTEND] Use ttgir module num-warps instead of default value (#1576 ) Use ttgir num-warps attribute instead of default value.	2023-04-25 08:22:49 -07:00
Natalia Gimelshein	d5969b81fe	[FRONTEND] Test pow with mixed dtypes (#1575 ) Also reverts #1541 that breaks this test.	2023-04-24 21:38:40 -04:00
Philippe Tillet	ec242430d1	[THIRD_PARTY] bumped `ptxas` version to 12.1.105 (#1574 )	2023-04-24 16:49:31 -07:00
Himanshu Pathak	6d226431b1	[FRONTEND] do not run AccelerateMatmul on pre-Volta GPUs (#1505 ) Related to #1271 . I am currently working on adding support for Pre-volta GPUs in Triton. --------- Co-authored-by: Himanshu Pathak <himanshu@mtatva.com> Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-24 15:53:02 -07:00
Philippe Tillet	a359b62ef3	[RUNTIME] Lazy driver initialization (#1571 )	2023-04-24 15:16:09 -07:00
Ian O'Connell	cd096afa58	[FRONTEND] don't hold a file lock (#1569 ) We have had complaints/issues randomly where a zombie python process is holding this lock. We don't need it since renames are atomic on posix. So refactor this to make temp files unique and then use replace (https://docs.python.org/3/library/os.html#os.replace )	2023-04-24 12:50:24 -07:00
Michaël Benesty	7d2a4d95c2	[DOCS] fixed num warps / stages in matmul (#1561 )	2023-04-21 12:57:26 -07:00
peterbell10	c71bf73f24	[BUILD] Use a persistent directory for cmake (#1548 ) Fixes #1545 `build_temp` is a temporary directory which `distutils` used to keep in the `./build` directory, but when `pyproject.toml` is present `pip` now puts it in `/tmp` and removes it at the end of the build. Instead, this creates a new permanent directory like `python/build/cmake.linux_x86_64-cpython-3.8` (the old name but with cmake instead of temp). While I was looking at the verbose pip output, I also noticed a bunch of warnings like ``` Python recognizes 'triton/runtime.backends' as an importable package, but it is not listed in the `packages` configuration of setuptools. 'triton/runtime.backends' has been automatically added to the distribution only because it may contain data files, but this behavior is likely to change in future versions of setuptools (and therefore is considered deprecated). ``` So I've also added these to the packages list. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-04-20 16:38:44 -07:00
cctry	3e213dccb1	[FRONTEND] Make lru_cache compatible for Python 3.7 or older (#1552 ) Change the usage of LRU cache decorator from @functools.lru_cache to @functools.lru_cache(). The former raises an error TypeError('Expected maxsize to be an integer or None' for Python 3.7 or older.	2023-04-20 16:14:32 -07:00
Keren Zhou	fef8150b65	[FRONTEND] Remove debug print in code_gen (#1550 )	2023-04-19 17:13:01 -07:00
Alexander Efimov	8b5b45fbf3	replace outdated allclose function, fix comments in test	2023-04-19 10:58:02 +00:00
Da Yan	b42e3d06d4	[FRONTEND] fix type checking in extern_elementwise (#1541 ) Some math ops accept inputs of different types (e.g., tl.math.jn). We don't want to cast the scalar types of input operands of those math ops.	2023-04-18 16:59:21 -07:00
Daniil Fukalov	a90a2d864f	[BUILD] Add ability to build with clang+lld. (#1544 ) This way reduces build time with assertions enabled LLVM and dramatically speeds up triton's build with a "debug" LLVM. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-18 21:20:12 +00:00
Alexander Efimov	fe612b1fc7	fix rebase issues	2023-04-18 18:16:59 +02:00
Alexander Efimov	9ca9f7a604	Update python/test/unit/language/test_core_amd.py	2023-04-18 18:13:58 +02:00
Aleksandr Efimov	d7dbe8f3a9	add test	2023-04-18 18:13:58 +02:00
Aleksandr Efimov	53f09da370	[Matmul] [Optimization] Disable pipeline pass for amd gpu This PR temporary disables pipeline optimization for amd gpu to fix matmul operation.	2023-04-18 18:13:56 +02:00
Natalia Gimelshein	7d1a95b046	[TESTS] Added test for avg_pool_bwd kernel (#1540 ) This kernel was briefly broken on main, prevent future regressions. --------- Co-authored-by: Keren Zhou <kerenzhou@openai.com>	2023-04-17 21:20:34 -07:00
peterbell10	a3c3e5a3a1	[TESTS][OPTIMIZER] enable tests for argmin/max and fix some bugs (#1537 ) `argmin`/`argmax` is currently only tested in 1d and when we enable the tests for 2d it reveals a few bugs.	2023-04-17 18:47:31 -07:00
Sharad Vikram	cf26e05a8f	[FRONTEND] remove debug print (#1538 )	2023-04-17 15:17:19 -07:00
rsanthanam-amd	a791028601	Merge pull request #193 from ROCmSoftwarePlatform/bc-patch Move cuda2gcn.bc files to third_party	2023-04-17 14:59:51 -05:00
Michael Melesse	d211cd7750	skip bad test	2023-04-17 13:12:34 -05:00
Michael Melesse	705d47d0dd	fix lit test issues This is a combination of 6 commits. install lit fix lit test fix lit test fix aot lit issues fix final lit tests add lit tests	2023-04-17 11:46:37 -05:00
Philippe Tillet	608ec061c1	[TESTING] Added more tests for annotations and autotuner (#1533 ) Essentially identical to #538, but it fails formatting tests and I don't want to ping the author on a weekend.	2023-04-15 19:44:08 -07:00
Philippe Tillet	df6c2babbd	[FRONTEND] Now using strings for annotations (#1529 ) Works with `__future__` annotations and also avoids having to import torch just for the sake of type annotations.	2023-04-15 15:32:22 -07:00
Philippe Tillet	f367647b38	[FRONTEND] Added `tl.extra.cuda.smid` (#1532 )	2023-04-15 14:42:59 -07:00
Chenggang Zhao	c9311ef361	[TUTORIALS] Fix rendering issues in the block pointer tutorial (#1530 ) Found some rendering issues here: https://triton-lang.org/main/getting-started/tutorials/08-experimental-block-pointer.html, sorry for not checking carefully in the last PR.	2023-04-15 14:27:14 -07:00
Philippe Tillet	e5c7d2a83c	[FRONTEND] cleaned up language; added frontend function for `globaltimer` special register (#1525 )	2023-04-14 15:29:27 -07:00
peterbell10	0d76c4ca95	[FRONTEND] Rename `tl.reduction` -> `tl.reduce` and improve testing (#1521 ) `tl.reduction` is currently tested indirectly through the existing reduction operators, but it's good to have a direct test for the function itself. --------- Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-14 14:35:31 -07:00
Bert Maher	bfd1f65ac7	[FRONTEND] cache path to ptxas (#1526 ) When running python 3.8, I've found that process creation gets slower over time (e.g. after creating a CUDA context, it can take 50-300ms per subprocess.run), and we do one of these calls to `ptxas --version` for every kernel, so a model with thousands of kernels can end up spending substantial time just calling ptxas redundantly. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-14 17:01:42 +00:00
Chenggang Zhao	c624778e73	[TUTORIALS] Add tutorial for block pointers (#1519 ) This PR contains: - Several fixes for the matrix multiplication (M and N dimensions may have out-of-bound access) - A type check for block-based store - The tutorial for block pointers - Fix some formats	2023-04-14 00:40:41 -07:00
Keren Zhou	fdf1c1f2a1	[DOCS] Fix documentation workflow (#1520 ) Co-authored-by: Phil Tillet <phil@openai.com>	2023-04-13 13:49:36 -07:00
Michael Melesse	3603483fc0	clean up previous platform functions	2023-04-13 13:20:08 -05:00
peterbell10	6550c528b7	[FRONTEND] don't call `tl.view` in `arg{min,max}` (#1518 ) A small oversight in #1305, since `view` can rearrange elements it should be avoided here. Instead I use indexing with `None` to create new dimensions. Co-authored-by: Philippe Tillet <phil@openai.com>	2023-04-13 07:32:23 +00:00
Philippe Tillet	c0d86d3b04	[RUNTIME] refactor driver (#1515 ) Improved separation between different backends	2023-04-12 23:50:44 -07:00
peterbell10	e152183570	[FRONTEND][BACKEND] ReduceOp to support arbitrary reduce operations (#1305 ) Fixes #1285 This changes `tt.reduce` to replace `redOp` by a region containing arbitrary code. For example, `tl.sum` is now lowered as: ```mlir %res = "tt.reduce"(%arg0) ({ ^bb0(%arg1: f32, %arg2: f32): %add = arith.addf %arg1, %arg2 : f32 tt.reduce.return %add : f32 }) {axis = 1 : i32} : (tensor<128x128xf32>) -> tensor<128xf32> ``` Support for index reductions at the MLIR level are also dropped in favor of simultaneous reductions over multiple tensors. Which generalizes the code without loss of performance. So for example `argmin` gets lowered as: ```mlir %7 = tt.make_range {end = 256 : i32, start = 0 : i32} : tensor<256xi32> %8 = tt.view %7 : (tensor<256xi32>) -> tensor<1x256xi32> %9:2 = "tt.reduce"(%6, %8) ({ ^bb0(%arg4: f32, %arg5: i32, %arg6: f32, %arg7: i32): %14 = arith.cmpf olt, %arg4, %arg6 : f32 %15 = arith.cmpf ogt, %arg4, %arg6 : f32 %16 = arith.cmpi slt, %arg5, %arg7 : i32 %17 = arith.select %16, %arg5, %arg7 : i32 %18 = arith.select %15, %arg7, %17 : i32 %19 = arith.select %14, %arg5, %18 : i32 %20 = arith.cmpf olt, %arg4, %arg6 : f32 %21 = arith.select %20, %arg4, %arg6 : f32 tt.reduce.return %21, %19 : f32, i32 }) {axis = 1 : i32} : (tensor<1x256xf32>, tensor<1x256xi32>) -> (tensor<1xf32>, tensor<1xi32>) ```	2023-04-13 01:37:39 +00:00
Philippe Tillet	5b9119117b	[CI] No longer install triton in editable mode to run tests (#1476 )	2023-04-12 17:55:44 -07:00
root	a0a1c92622	Move cuda2gcn files to third party	2023-04-12 15:30:03 +00:00
root	4115fc71fd	Revert "Append .bc files to package_data" This reverts commit `455c29591c`.	2023-04-12 14:54:55 +00:00
Jack Taylor	455c29591c	Append .bc files to package_data Additional context: https://github.com/ROCmSoftwarePlatform/frameworks-internal/issues/3367#issuecomment-1505072217 When triton is installed via `python setup.py install` the required cuda2gcn.bc file is not copied over to the package location. This results in UT failures in pytorch `Failed to load /opt/conda/envs/py_3.8/lib/python3.8/site-packages/triton/language/cuda2gcn.bcTranslate to LLVM IR failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.` To alleviate this issue I have proposed to add the .bc file to package_data of setup.py to ensure the file is copied over. Reproducing torch UT: `pytest test/inductor/test_torchinductor_dynamic_shapes.py -k "test_any_dynamic_shapes_cuda" --verbose`	2023-04-12 12:03:34 +01:00
Phil Tillet	9530d93504	[TESTING] change `do_bench` defaults	2023-04-11 22:03:52 -07:00
Phil Tillet	d7d62ddae9	Revert "[BUILD] Fixed typo in setup.py" This reverts commit `2931bb8195`.	2023-04-11 20:12:22 -07:00
Phil Tillet	2931bb8195	[BUILD] Fixed typo in setup.py	2023-04-11 20:09:09 -07:00
Philippe Tillet	02e3c18f04	[TESTING] clean up `testing.do_bench` (#1513 )	2023-04-11 20:05:58 -07:00
zahimoud	fd34b20fba	[BACKEND] Fixed bug in reduce; add tests	2023-04-11 18:09:18 -07:00
Phil Tillet	3e22e18295	[TESTING] `do_bench` now return min time by default. This is likely to be more stable in general for benchmarks that have L2 hit rate comparable to what is encountered in practice	2023-04-11 17:18:01 -07:00

... 3 4 5 6 7 ...

1164 Commits