Francis Lata
ce61be16f1
clean up how preprocessed folder is defined ( #5813 )
2024-07-30 12:35:26 -04:00
chenyu
471b188d79
fix mypy errors in latest mypy ( #5794 )
...
* fix mypy errors in latest mypy
mypy has stricter partial and api arg checks now
* PYTHONPATH="."
2024-07-29 14:53:30 -04:00
nimlgen
ea27ec4cd0
nv switch classlist_v2 to classlist ( #5763 )
...
* nv switch classlist_v2 to classlist
* support in mockgpu
* fix mockgpu
2024-07-28 20:24:42 +03:00
chenyu
3686b6726a
move GraphException to jit.py ( #5744 )
...
same place where GraphRunner is defined
2024-07-26 19:01:12 -04:00
George Hotz
489a5b99a5
hotfix: triton_nv_matmul touchups
2024-07-24 23:24:29 +00:00
George Hotz
bf24be4c8c
triton gets 163 TFLOPS on 4090
2024-07-24 18:32:29 +00:00
George Hotz
4d47968580
fix acc folding for NV tensor cores ( #5658 )
...
* fix acc folding for NV tensor cores
* fix correctness of reduce_before_expand
2024-07-23 13:03:02 -07:00
nimlgen
08a9c0ae5e
hcq cache invalidation for beam ( #5630 )
...
* nv full cache invalidation
* the same command on amd
* linter
* fix amd
* nv no hardcoded consts
* beam default
2024-07-22 18:13:17 +03:00
George Hotz
6c6d74d922
parallel mcts ( #5626 )
...
* start work on parallel mcts
* compile was linearizing twice
* typing + more early stopping
* fix compiler error
2024-07-21 14:53:23 -07:00
George Hotz
ef179087a4
mcts exit condition wasn't right, also use it with BEAM>=100 ( #5619 )
...
* mcts exit condition wasn't right, also use it with BEAM>=100
* mcts touchups
* clean up sample
2024-07-21 10:16:47 -07:00
George Hotz
0f67ef4674
mcts graph and dedup support ( #5618 )
...
* mcts graph and dedup support
* usable graph
* mcts colors
* C=4 seems better
* C=3 even better
* sample_tree
* backprop is external function
* late expand to match algo
2024-07-20 23:29:14 -07:00
chenyu
eddc5bcfd7
MCTS tweaks ( #5616 )
...
MCTS 500 is competitive with BEAM=8 on resnet on M1 Max.
- increment trial times even with compiled error and runtime error.
- use best time of children as the node value.
2024-07-20 19:45:59 -07:00
George Hotz
1113e47f96
print best in MCTS + light up the winner in hcopt
2024-07-20 09:39:36 -07:00
George Hotz
ac99ecd94e
use statistics.median for timing ( #5606 )
2024-07-20 08:37:32 -07:00
George Hotz
06e336bccb
mcts search ( #5598 )
...
* mcts search
* mcts cleanups
* mcts cleanup
* random shuffle children order
* mcts in handcode_opt
* src and remove_node
* debug 3 to print ast
* print the type
* mcts in extra
2024-07-19 21:38:39 -07:00
Tobias Fischer
72da3fe7e6
added clip vision model ( #5595 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-19 18:35:51 -04:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Tobias Fischer
85d4ca7caa
FID Inception Model ( #5516 )
...
* added model impl
* minor cleanups
* extracted weights loading into from_pretrained
* reorganized model for better weight loading
* removed lru cache for state dict loading
2024-07-16 23:12:03 -04:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
chenyu
00813a92a0
update Tensor.eye api to match torch ( #5433 )
...
* update Tensor.eye api to match torch
input is n for nrows and optional m for ncols
* space
* fix onnx
2024-07-12 20:25:12 -04:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
94599c0637
fixup ast in kernel to be MetaOps.SINK [run_process_replay] ( #5424 )
...
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]
* fix tests
* fix more tests
2024-07-12 14:01:03 -07:00
uuuvn
3cb94a0a15
Rename tinygrad/runtime/driver to support ( #5413 )
2024-07-12 11:06:42 -07:00
wozeparrot
a02b38c0ac
download openimages by running it ( #5396 )
2024-07-11 16:06:13 -07:00
wozeparrot
fa873df9c1
bring tinychat more inline with tinyos' version ( #5358 )
2024-07-10 13:13:52 -07:00
George Hotz
c13da83f12
tests from lowerer branch ( #5339 )
...
* tests from lowerer branch
* Update test_image_dtype.py
* Update test_image_dtype.py
* Update test_image_dtype.py
2024-07-08 21:23:19 -07:00
nimlgen
51d6f372e4
nv get classes based on device ( #5325 )
...
* nv get classes
* support in mockgpu
* choose sm based on gpu
* fix
* fix
* fix arch
2024-07-08 18:25:05 +03:00
Tobias Fischer
0c3a35e5c2
Stable Diffusion v2 Inference ( #5283 )
...
* model implementation
* clip fix, more qol options
2024-07-03 22:47:10 -04:00
chenyu
b2c3a28a5e
nn.RMSNorm ( #5272 )
...
the norm itself has no significant value to add to Tensor method, but we would want Tensor.normalize
2024-07-02 21:39:01 -04:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
nimlgen
57e89645cd
hcq spec test ( #5226 )
...
* start hcq spec test
* more test
* fixes
* run on amd as well
* test amdgpu exec
* fix amd
* amd mockgpu support sdma timestamp
2024-07-01 17:36:37 +03:00
George Hotz
14980f79dd
hotfix: unbreak llama
2024-06-30 15:27:54 -07:00
George Hotz
3df47bc21e
OpenELM + repeat_interleave ( #5234 )
...
* start writing openelm
* progress...hit bug
* repeat_interleave support
* gqa
* add rotary embedding
* spp
* i think it runs correctly
* broken
* output is good now
* cleanups
* no io_uring on android
2024-06-30 15:18:39 -07:00
nimlgen
dd7eef7d71
libc defs to autogen ( #5217 )
...
* libc defs to autogen
* amd import libc
* linter
* better a bit
* remove comment, check this
* not hardcoded path
2024-06-29 14:37:33 +03:00
qazal
3e56c8422c
remu err handling ( #5208 )
...
* add error handling
* use pre release
* minor
* works
2024-06-28 13:15:18 +03:00
reddyn12
f1c7944c44
Fix batchnorm shapes for resnet.load_pretrained ( #5167 )
...
* Fix batchnorm shapes
* make it general reshape
2024-06-26 18:44:10 -04:00
nimlgen
69f116a7e1
nv/amd profiler ( #4718 )
...
* nv/amd profiler
* fix
* fix
* profile copies
* profile logger
* fixes
* more fixes
* less lines and fixes
* fixes
* some linter
* back sync, no related change
* fix gpu2cpu time def
* simpler
* linter
* linter
* docs
* add add_event api
2024-06-23 17:10:12 +03:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
chenyu
e468601226
update llama attention casting ( #5096 )
...
* update llama attention casting
updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.
* fix that
2024-06-22 10:57:17 -04:00
chenyu
8bd6cb9511
update llama model RMSNorm casting ( #5095 )
...
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu
0c857ae2d6
some onnx_ops cleanups ( #5094 )
2024-06-21 22:01:32 -04:00
nimlgen
fb1bf48cfe
io_uring for copies from disk ( #5035 )
...
* exp uring
* fixes and old version
* nv
* cleaner
* cmp vs aio
* fix
* no lib
* fix nv
* linter
* disk_speed_test now runs default
* fixes
* uring -> io_uring
* linter happy
* get_temp_buf comment added
* tiny nits
* put wait back
* test runs everywhere
* remove consts
* remove mmap consts
* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor ( #5071 )
...
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained ( #5040 )
2024-06-18 16:29:22 -04:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
Junjun Dong
c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset ( #4977 )
...
* feat: remove BinaryOps.SUB
* remove SUB in test_early_end_local
* regenerate dataset. remove SUB in test_linearizer_*
* reenable overflow tests
* simplify tensor.sub function by returning a+(-b)
* remove whitespaces
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-18 09:06:13 -04:00