Commit Graph

4571 Commits

Author SHA1 Message Date
chenyu
f1bf916b8a apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7 isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
nimlgen
57204c4014 amd cleanup pm4 queue (#4772) 2024-05-29 22:59:06 +03:00
lopusz
b2c408912c Add docs link to README (#4768) 2024-05-29 17:47:47 +00:00
chenyu
f2414c666f fix train_gpt2.py (#4771)
added `with Tensor.train():`
2024-05-29 12:01:34 -04:00
chenyu
59c6472b9f check contiguous in View.create after canonicalizing mask and offset (#4770)
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
qazal
6e5fa5fd92 map local aliases to reduceop (#4766)
* map

* ugh

* save one line

* concerning, does this pass

* Revert "concerning, does this pass"

This reverts commit 64d4664f17.

* use local_alias
2024-05-28 21:11:25 -04:00
chenyu
7624ad3ddd add --timing and --profile to llama3 example (#4767) 2024-05-28 16:24:44 -04:00
qazal
c235223c07 refactor tc_opt creation (#4765)
* move reduceop loop

* this is more mergable code

add assert

* integrate s2
2024-05-28 23:10:27 +03:00
qazal
a88aea626d map tensor core bufs to reduceop (#4763)
* tc_opts.bufs to its only map

* lint

* iterate reduceop bufs
2024-05-28 22:07:39 +03:00
wozeparrot
6fcf220b21 feat: tag 0.9.0 (#4762) v0.9.0 2024-05-28 18:44:45 +00:00
chenyu
e22cdb40f3 docs: fix mkdoc warnings and link to tensor.md (#4760) 2024-05-28 14:24:11 -04:00
nimlgen
872827b6ae fix usage of args struct in hcq (#4758)
* do not allocate empty buffer in hcq

* do not take args struct from program
2024-05-28 21:10:39 +03:00
wozeparrot
b2b49cef6f split tensor docs (#4754) 2024-05-28 11:03:52 -07:00
nimlgen
fe26d3fefe nv sync before free for binded commands (#4759)
* nv sync before free for binded commands

* shorter comment
2024-05-28 20:49:29 +03:00
chenyu
e614b7c696 docs: showcase remove mnist_gan and add conversation.py (#4757)
fixed both examples, and i think it's better to show conversation
2024-05-28 11:09:26 -04:00
nimlgen
019f4680e5 check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4 pre multi reduce codegen/* cleanup (#4755)
* refactor self.reduceop

* free lines

* fix test
2024-05-28 08:15:48 -04:00
chenyu
fd249422f5 minor cleanup example stable_diffusion (#4753) 2024-05-28 00:05:37 -04:00
chenyu
53b9081aab check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
chenyu
16756af13c docs: polish tensor.py (#4750)
* docs: polish tensor.py

* don't change that
2024-05-27 20:00:56 -04:00
Elias Wahl
c4b0acf095 Global norm + small changes (#4749)
* norm

* no empty

* default loss scaler in float
2024-05-27 18:35:27 -04:00
chenyu
c7beb36b73 docs: update page headers (#4748)
replaced the "Index" from Home page with "tinygrad documentation"
2024-05-27 18:04:14 -04:00
wozeparrot
0680b9cb60 expand sidebar by default (#4747) 2024-05-27 15:03:31 -07:00
chenyu
db0e19fbd7 docs: minor update to quickstart (#4746)
update import, point to docs more, remove mention that assign to index not supported
2024-05-27 17:46:36 -04:00
wozeparrot
4cb38a15a5 tweak docs style (#4745) 2024-05-27 14:32:09 -07:00
chenyu
5b323d77db docs: change nav structure and add repo_url (#4743)
tensor / function / dtype / nn under API now
2024-05-27 15:45:54 -04:00
nimlgen
50e95b8212 nv qmd sync (#4740)
* qmd sync

* better hcq

* mockgpu support chain qmd

* fix mockgpu & linter
2024-05-27 18:51:30 +03:00
qazal
c69929ee25 integrate ALU uoping (#4741) 2024-05-27 23:05:24 +08:00
qazal
0e69b22629 multireduce OptOps tests (start) (#4733)
* start

* full tests

* add skips

* unrelated

* notes
2024-05-27 12:21:33 +03:00
chenyu
0b58203cbe docs: fix nn rendering (#4736)
* docs: fix nn rendering

need to import nn at the start of the page. maybe there's a better way to move it on top?

* print that
2024-05-26 18:55:20 -04:00
wozeparrot
5c6af97436 nn docs (#4735) 2024-05-26 13:08:22 -07:00
qazal
c7b1d802f1 delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
nimlgen
c87b066b66 optimize nv sync (#4729)
* optimize nv sync

* sdma signal without wfi

* nv mockgou support

* sep change
2024-05-25 23:10:41 +03:00
chenyu
8415b14978 pow cleanup part 3 (#4731)
fast pow for int or (int+0.5) const exponent. and more comments
2024-05-25 15:48:52 -04:00
Szymon Ożóg
de5c69c4c9 Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00
chenyu
7e90026eb0 pow cleanup part 2 (#4727)
more cleanups and fix 0 ** 0
2024-05-25 07:17:40 -04:00
chenyu
85e57223bd pow cleanup part 1 (#4726)
use _broadcasted to convert 3 cases into 1. const simplification should be handled by const folding.
2024-05-25 03:24:10 -04:00
Szymon Ożóg
f7201b6852 Remove deprecated code (#4724) 2024-05-25 03:02:12 -04:00
wozeparrot
5f503226de finish tensor docs (#4722) 2024-05-24 15:57:43 -07:00
chenyu
edf27470c1 docs: fix stack and add dtype.DType (#4721) 2024-05-24 18:23:01 -04:00
chenyu
a16d2572a0 docs: clean up mentions of mlops (#4720) 2024-05-24 17:49:32 -04:00
chenyu
31358cbea5 change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
chenyu
ba116ff630 docs: fix mnist type and fixed seed and loss (#4717) 2024-05-24 16:18:30 -04:00
Szymon Ożóg
212025b53c Int mulacc for ptx (#4680)
* IntMulacc

* don't mov const

* Dont do int mulacc on ocelot

* Workaround for ocelot

* Remove ocelot workaround

* Fix tests that merged into mulacc

* fix uop cout after mergin to mulacc
2024-05-24 15:20:48 -04:00
chenyu
0ac761716a docs: logo and favicon (#4716) 2024-05-24 14:33:12 -04:00
Szymon Ożóg
a4de81e9a6 Update ocelot version (#4715) 2024-05-24 14:32:53 -04:00
chenyu
a894209bf7 docs: add ConstType to dtypes, limit function to its member (#4714) 2024-05-24 14:22:34 -04:00
chenyu
a41701ce71 docs: elementwise ops (broadcasted) and update examples (#4713)
* docs: elementwise ops (broadcasted) and update examples

* fix where

* space
2024-05-24 13:19:21 -04:00
qazal
c170ddceaf fix commavq benchmark (#4712)
* fix _slice and assert explicit device

* with _slice
2024-05-24 19:40:57 +03:00