Commit Graph

2323 Commits

Author SHA1 Message Date
Szymon Ożóg
4123920bcc remove deprecated variables 2023-08-19 13:56:37 +02:00
Szymon Ożóg
fecc58cc2b proper function name 2023-08-19 13:54:21 +02:00
Szymon Ożóg
b624a374b9 ConstOp loading 2023-08-18 18:30:57 +02:00
Szymon Ożóg
ef757aa5c3 Add TernaryOps 2023-08-18 18:30:18 +02:00
Szymon Ożóg
5430fcb4e9 cuda envs 2023-08-18 17:49:48 +02:00
Szymon Ożóg
89bda3c550 upscale local index to power of 2 and add masking 2023-08-18 17:46:46 +02:00
Szymon Ożóg
2cfc7121b1 remove emulated from triton tests 2023-08-18 17:40:17 +02:00
Szymon Ożóg
4af5a60caf merge double package install 2023-08-18 17:38:22 +02:00
Szymon Ożóg
4362ebb547 install cuda packages for testing 2023-08-18 17:05:45 +02:00
Szymon Ożóg
48c7c9161e Merge test.yml 2023-08-18 16:51:16 +02:00
Szymon Ożóg
3ad856f1bb Add pycuda to triton dependencies 2023-08-18 16:41:03 +02:00
Szymon Ożóg
07d4731e9f Track local size 2023-08-18 16:17:49 +02:00
Szymon Ożóg
89da2be2e5 Merge remote-tracking branch 'upstream/master' into triton 2023-08-18 16:12:45 +02:00
Szymon Ożóg
4a4bab61d2 variable load fix 2023-08-18 16:11:05 +02:00
Szymon Ożóg
a11aa78d6b Disable float4 support 2023-08-18 16:10:53 +02:00
wozeparrot
c65ad43a93 cleanup ops_gpu (#1566) 2023-08-17 23:43:08 -04:00
nimlgen
bd111411bf init allocator for compiled backends (#1467)
* init allocator for compiled backends

* Update ops_webgpu.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-17 10:33:32 -07:00
Szymon Ożóg
2534e8f5a3 Disable float4 support 2023-08-17 16:08:51 +02:00
geohotstan
a293c18d34 Gather bugfix (#1561) 2023-08-16 19:53:14 -04:00
Ethan Sorrell
cb62911f6b PTX Reintegration and Passing Tests (#1512)
* move assembly, assembly_ptx

* successful but broken rendering of ptx asm

* clear ins before render asm

* slightly less broken :')

* we needed thread syncs

* fix float16 loading, rounding modifiers and other casting stuff, passing casts_from_half

* Fix runtime_args for gpuocelot

* our casts were flipped on both ends

* more casting

* add ternary where op

* dealing with storing/loading bool

* add test for casting to bool from negative

* Fix args.valid on ConstOp

* add to CI, TODO: fix runtime_args for test_uops

* fix placement of runtime_args to work with lazy.Device

* undo ci changes so I can push

* fix lints

* start cleanup and fix things we broke fixing lints

* add checks for PTX specifc asm instructions

* revert added test -- doesn't pass on llvm

* skip tests for underflow,overflow

* another fix for how we're setting runtime args

* Less broken cleanup

* add to CI

* add more env variables for ci test

* fix ci to install pycuda for ptx

* ci: copy cuda test command

* cleanup

* assert to make sure we're actually running ptx in ci

* remove test assert

* move is_ptx arg

* move assembly, assembly_ptx back to extras

* fix imports

* initial merge fixes

* clear registers, fix UOps.LOAD with invalid value

* draft merge fixes

* remove prints

* quick lint and merge fixes

* cleanup

* remove PTXProgram wrapper

* final cleanup

* temp change for ci rerun

* ci rerun

* rollback ISA version
2023-08-16 16:20:20 -07:00
geohotstan
8763037f0e Fancy indexing is fancy wow and gather thing (#1399) 2023-08-16 18:35:49 -04:00
chenyu
11dd9b1741 symbolic codegen and exec (#1552)
* symbolic codegen and exec

* fix and add test

* no sketchy

* merge_dicts type

* dtypes._arg_int32
2023-08-16 14:43:41 -07:00
George Hotz
1e1d48b4e6 single model (#1560) 2023-08-16 13:22:19 -07:00
JaSpa99
491e85597a Run onnx commavq model (#1537)
* try to run commavq

* fix 0 dim, start implementing new ops

- Implement EmbedLayerNormalization
- Implement Attention

* SkipLayerNormalization and FastGelu

* use original torch model, cast inputs

* fix some ops:

- properly do Cast
- Attention: bi- and unidirectional
- FastGelu: add bias before gelu

* cleanup onnx_ops.py

* add validation option to benchmark

* cleanup imports

* add checks incase onnx2torch implements ops in future

* run onnx instead of original torch

* just skip gpu on m1

* reactivate the other models

* check for strange params & squash whitespace

* cleanup

* fix causal mask Attention

* Range doesn't need int cast

* embedding vocab_counter same dtype as input

* no need to cast

* always validate, fix PosixPath ort

---------

Co-authored-by: George Hotz <george@comma.ai>
2023-08-16 12:24:40 -07:00
wozeparrot
55d95d1658 llama 70b (#1558)
* feat: llama 70b

* feat: llama 70b but simpler
2023-08-16 11:36:12 -07:00
nimlgen
c93e63b8b5 make TestNonFloatUOps.test_mul_bool pass on all platforms (#1557) 2023-08-16 11:34:09 -07:00
wozeparrot
074c467020 hotfix for broken ci (#1559) 2023-08-16 13:52:03 -04:00
Szymon Ożóg
ba1ad1dfa8 remove double print 2023-08-16 19:48:09 +02:00
madt2709
962972ee68 Fix uops int32 for llvm (#1554)
* fix-uops-int32-llvm

* fix tests

* Ignore mypy error
2023-08-15 23:22:32 -07:00
Sam Barani
2cde667d40 Change Any to List[Optional[RawBuffer]] in JIT (#1553)
* Change Any to List[Optional[RawBuffer]] in JIT

* remove ignore[no-redef]

* remove ignore

* pick different names
2023-08-15 23:21:33 -07:00
Szymon Ożóg
7045109592 Deprecated includes removed 2023-08-16 07:17:08 +02:00
Szymon Ożóg
7616b27a5b Added triton tests 2023-08-16 06:48:42 +02:00
Szymon Ożóg
463b6686b0 remove deprecated include 2023-08-16 06:44:56 +02:00
Szymon Ożóg
034273726c pass _buf to program 2023-08-16 06:44:42 +02:00
Szymon Ożóg
207cd697bf Added new parameter 2023-08-16 06:44:29 +02:00
nimlgen
fa81e282c2 fix missing dtypes in is_int,is_float,is_unsigned (#1550) 2023-08-15 21:22:29 -04:00
Diogo
d17ecccd78 Torch/LLVM/arm F64 support (#1551) 2023-08-15 21:21:08 -04:00
YiMing Han
913263c155 add double: c_type.double for CLANG (#1549) 2023-08-15 13:19:33 -07:00
Szymon Ożóg
64d02d5246 Merge branch 'tinygrad:master' into triton 2023-08-15 21:03:25 +02:00
Szymon Ożóg
360b450262 triton_compile 2023-08-15 21:01:37 +02:00
Szymon Ożóg
50003a830f Use RawCUDABuffer 2023-08-15 21:01:15 +02:00
Szymon Ożóg
41ae7cb508 Triton Buffer 2023-08-15 19:47:48 +02:00
Szymon Ożóg
13e45691b4 Add TritonProgram 2023-08-15 19:47:12 +02:00
Szymon Ożóg
83516c6ec8 Remove deprecated buffer 2023-08-15 19:46:52 +02:00
Szymon Ożóg
b68851c298 Remove deprecated AST Kernel 2023-08-15 19:46:33 +02:00
George Hotz
0b5930d406 more uops testing, who isn't passing right now... (#1522)
* more uops

* llvm refactor

* update test uops

* rest of the nodes

* ors and ands
2023-08-15 09:07:26 -07:00
SzymonOzog
89c4c47f0b Move ops_triton to runtime and remove errors from deprecated code 2023-08-15 11:53:12 +02:00
George Hotz
f8109b830c promote assembly to the main codebase (#1544)
* promote assembly to the main codebase

* not namedtuple
2023-08-14 22:47:45 -07:00
wozeparrot
666ac61070 support for p2p buffer transfers (#1523)
* feat: RawBufferTransfer

* feat: gate behind P2P

* feat: gate properly

* feat: raise error when not implemented
2023-08-14 22:39:57 -07:00
Steven Anderson
93a36c3659 Arm (#1421)
* testing new memops

* better debugging

* testing padded conv

* branching with load

* refactoring a bit

* first try

* fixing bugs

* fixing some

* eq

* eq2

* do not use x's

* working

* fixing imm

* getting things working

* refactor

* pow not working

* working except one

* refactor: one store mem

* refactor: global load

* refactor: imm

* refactor: cleaning

* fixing big offsets

* refactor with ci

* try ci

* typo

* another typo

* ubuntu default

* forgot git

* do i need git?

* missing packages

* adding python-dev

* with cache?

* buildx action

* buildx name issue?

* maybe now?

* python3

* newline warning

* maybe now

* i actually need this

* ci should work now

* improved caching

* fixing cache

* maybe now it will cache

* this

* testing cache

* trying again

* load

* missing platform

* caching gha

* testing cache

* full testing

* typo

* now?

* why

* adding checkout back

* bad formatting

* fixing convention issues

* supporting python

* adding CI flag

* testing all

* better comments

* adding debugging

* takes 12x longer

* does it output progress now?

* ignore models for speed

* fixing merge

* excluding conv_transpose2d

* only 2 test cuz is to slow

* another approach

* let's see

* faster duh

* my bad

* T_T

* typo

* sup

* with output?

* comment test

* comment test

* comment test

* :?

* no comment

* with cache

* back to normal

* testing that ci works

* back to passing

* trying again

* does it create another entry

* does it create another entry?

* build local

* hey

* Revert "excluding conv_transpose2d"

This reverts commit cc7348de03.

* does it cache if done before?

* does it cache?

* done

* adding test ops

* bad formatting

* no need for this

* working static mem

* sum 1d

* add ndim

* better reg import

* fix stack

* back to np

* working except for softmax

* 5 failing

* no pogress

* remove keystone

* remove keystone

* testops passing

* cleanups

* more cleanup

* typo

* ci

* ci2

* cond import

* ci3

* ci4

* ci4

* ci5

* ci5

* ci6

* aligment

* test all

* correct test

* err read_unmapped

* passing test

* ignore for speed

* ignore for speed

* ci7

* cleanup

* remove docker

* fixing merge

* fixing bugs

* add skipload for const ops

* comments

* First merge to master: Renderer

* fix emulation

* passing all tests arm64

* cleaning

* fix handcoded binary

* cleaning

* fix errs

* fix runtime arg binary

* clean git diff

* fix and clean

* fixing metal test

* cleaning

* fix metal test

* ci ~8 min

* fix pylint and clang

* cache the files in ops_clang

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-08-14 19:29:30 -07:00