Commit Graph

24 Commits

Author SHA1 Message Date
George Hotz
c690eeaca9 flip mulacc to save a line (#997) 2023-06-17 16:47:55 -07:00
George Hotz
ba56ee6020 RDNA assembly backend ($1000 bounty) (#787)
* Revert "Revert "ops rdna""

This reverts commit 0400315078.

* Revert "Revert "writing 2""

This reverts commit 325a3bf2cf.

* no dump

* 2x 2

* simple asm

* local size

* sub

* lil work

* support args != 3

* assembler work

* generate that

* ptx assembler

* begin index renderer

* max

* ptx loops

* gemms work

* valid works

* asm working a bit more

* close

* passing all ops tests

* ptx is a codegen only, not a backend

* ptx

* float16 support

* rdna goes here

* install types

* make amd disassemble

* ansilen for pretty print

* fix ptx log2/exp2

* assemblyinstruction

* new asm

* working gemm

* fix cmp

* more passing

* mod

* ptx works again

* rdan3 add works

* log exp

* sin is sin 2pi

* fix types

* progress

* loops work

* rdna xyz

* better addressing

* cleanups

* handle exception in early process

* div support

* rdna float4

* locals work

* fix neg index

* cast

* smaller diff

* yaml

* import only if selected

* fromimport

* types

* this all needs rewriting

* a few more
2023-06-16 09:33:18 -07:00
George Hotz
039f0d372f delete ltypes (#984)
* delete ltypes

* only upcast float types

* test dtype on mac passes

* ugh, these upcasts
2023-06-15 16:24:45 -07:00
Diogo
0629791cbd F64 support (#976)
* initial commit

* added osx check for opencl

* added llvm f64 conversions

* typo in llvmir

* more tests and modified unsupported error

* fixed linting error

* added pragma fp64

* simplified exclusion for OSX

* fixed device check and also added it to cast func

* added ifdef check for fp16 in ops_gpu

* Revert "added ifdef check for fp16 in ops_gpu"

This reverts commit 92de754d48.

* f64 prekernel signature match f16

* moved condition to buffer init
2023-06-13 21:31:31 -07:00
George Hotz
ba4eadb04c PTX assembly support (#977)
* ptx assembly

* all ops tests pass

* fix tests
2023-06-13 12:31:42 -07:00
cloud11665
5f13e7c3cf cuda: fix fp16, uint8, int64, half4 codegen (#968)
* cuda: add uchar, int64 typedefs

* cuda: fix float16 codegen

* fuck it, half4 stub. llama time!

* inline fp16 half4, revert changes to CStyleLanguage

* add inline just in case

* remove half4 operators

* use dict
2023-06-12 11:15:44 -07:00
George Hotz
df40a9c238 EXP+LOG -> EXP2+LOG2 (#954)
* EXP+LOG -> EXP2+LOG2

* update docs
2023-06-08 10:57:31 -07:00
cloud11665
43ea1614b0 fix inf/nan codegen (#935)
* fix inf/nan codegen

* remove nasty oneliner, fix -inf

* inf/nan const mul/div tests
2023-06-05 11:24:09 -07:00
George Hotz
dd41f3ee40 compact the global dimensions using the shapetracker (#897) 2023-06-01 13:09:54 -07:00
Diogo
0dab8edc97 support Int64 type in cstyle gen (#860)
* added metal int64 and some simple tests

* removed bool return type def

* typo in test

* also missing in clang and gpu runtimes

* switched order for opencl

* increased atol and removed new line in kernel prefix
2023-05-30 16:04:46 -07:00
crthilakraj
7925fa58d9 Fix cuda (#836)
* disabled float4 ALU ops for CUDA, small fix to add half_prekernel before kernel_prefix

* added supports_float4_alu option, and disabled for ops_cuda
2023-05-29 07:59:36 -07:00
wozeparrot
2fd2fb6380 int8/uint8 support (#837)
* feat: int8 support

* feat: uint8 support

* feat: int8 tests

* fix: fix uint8 on clang

* feat: test casting between int8/uint8/float16/float32

* clean: way cleaner dtype tests

* feat: preprocess_imagenet using the correct dtype

* feat: add test for overflow between uint8 and int8
2023-05-28 23:15:06 -07:00
George Hotz
3ddcb5c36f Half4 load (#804)
* support half4 load

* cast to float4

* dead assert
2023-05-25 20:21:15 -07:00
Diogo
c19ef0fcce Add sin/cos/tan (#794)
* added sin/cos/tan

* fix lint

* added onnx ops support
2023-05-25 09:04:56 -07:00
George Hotz
90fff82c8a Rdna (#776)
* assembler maybe

* custom asm

* rdna3 on quiet

* trigger crashes

* fixed notes

* non-fatal rdna2 crash

* Crash4

* improve rdna sniffer

* comments

* improve sniffer

* asm

* 131 TFLOPS RDNA3

* opt simple matmul

* todos
2023-05-16 05:33:57 -07:00
George Hotz
8b7ecd63bb Remove Zeroview (#748)
* no zeroview start

* closer

* stride mask

* st tests pass, delete ZeroView

* byebye zv

* close to working

* not contiguous with mask

* subtract, don't add

* mask on view

* ugh, that shouldn't have been in there

* shape merge

* bugfixes

* fuzzer + 4 fuzzer failures

* fuzzer for symbolic

* more fuzzing and nothing

* that fuzzer doesn't hit either

* fixes padding...ugh

* no more offsets

* working

* rewrite load and store

* all checks

* fix idxs

* progress

* bugfix

* float4_axis

* works

* cleanups

* complex valids_okay
2023-04-17 08:21:46 -07:00
Jan Henrik Høiland
4e17d27d09 Fix cuda errors when running llama example (#749) 2023-04-16 13:52:10 -07:00
George Hotz
20894991ed good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
George Hotz
2e18469fd4 clean up display name 2023-03-22 18:32:05 -07:00
George Hotz
120d7072bd indexing merge almost works 2023-03-20 16:17:07 -07:00
George Hotz
06abbbfe7c remove the stupid register class (#721)
* remove the stupid register class

* touchups

* colorful display name
2023-03-20 15:45:12 -07:00
George Hotz
25287a974e types (#720)
* types

* cleanups

* don't use None, use LocalBuffer

* eh
2023-03-20 12:31:02 -07:00
George Hotz
9b314c6342 factor uops transformers into functions 2023-03-20 08:19:48 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00