Commit Graph

1242 Commits

Author SHA1 Message Date
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
qazal
e84d089ef1 delete ReduceOps, only use REDUCE_AXIS (#7667) 2024-11-13 19:04:27 +08:00
qazal
e07d2d0966 skip TestBeamSearch.test_large_ast (#7652) 2024-11-12 20:52:22 +08:00
chenyu
035e39f900 remove copied is_dtype_supported from onnx [pr] (#7646) 2024-11-11 19:20:32 -05:00
Ahmed Harmouche
9c63c3d8ab These casts should only happen if these are supported (#7644) 2024-11-12 07:56:50 +08:00
nimlgen
4d81b7952a qcom match texture/sampler descriptors to OpenCL (#7622)
* qcom ioctl compare more regs

* bug fix
2024-11-11 21:56:51 +03:00
uuuvn
94a484542b Hook memoryview via class instead of a function (#7627) 2024-11-11 09:07:06 +08:00
chenyu
e7b18cf5c0 fix load_worlds filter_novariable (#7564)
filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels
2024-11-05 16:06:39 -05:00
chenyu
207bca6cea set PAGE_SIZE=1 and generate new dataset (#7559)
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
chenyu
7581a57aac show the actual dataset size in error message (#7557) 2024-11-05 09:16:30 -05:00
chenyu
0db5f52b2a check datasets/sops.gz size to be > 5000 (#7555)
it has > 12000 rows now, but it depends on the backend that generates these so setting a lower but meaningful threshold
2024-11-05 09:03:19 -05:00
chenyu
e641bbc859 safe softmax trick in MCTS ucb_explored_children (#7515)
* safe softmax trick in MCTS ucb_explored_children

fixed
```
  File "numpy/random/mtrand.pyx", line 971, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
```
when all ucb_explored_children are big negative numbers result in all NaN probabilities

* better type
2024-11-03 15:59:31 -05:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
chenyu
fb694a63eb Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
eliotgolding
e920f1d663 Llama 3.2 1B load from GGUF (#7295)
* gguf 1b-instruct

* not needed
2024-10-27 09:29:02 +08:00
nimlgen
68cd2c0669 nv correct local memory based on device (#7307)
* nv correct local memory based on device

* linter

* oops

* oops2
2024-10-25 22:23:42 +03:00
nimlgen
ea11382087 nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
qazal
aeeb917b6e mask out writable bufs in runtime access_resources (#7234) 2024-10-23 16:13:50 +03:00
George Hotz
b0a13896d7 PtrDType is dataclass [pr] (#7125)
* PtrDType is dataclass [pr]

* new dataset

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-18 09:40:33 -04:00
nimlgen
45db7d9045 fuzz qcom vs opencl (#7130)
* fuzz qcom vs opencl

* fix nv

* bettre?

* typo

* open both devs
2024-10-17 18:49:08 +03:00
George Hotz
3169cb386d remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
nimlgen
b025495e5c fuzz nv vs cuda (#7066)
* fuzz nv vs cuda

* fixes

* smth

* um

* cmp the same

* dnrt

* correct gpfifo scan

* fix
2024-10-15 22:22:40 +03:00
qazal
8ff6514ba3 delete extra/ops.py [pr] (#7072) 2024-10-15 22:14:21 +03:00
nimlgen
586ff4c910 nv record uvm mappings (#7059)
* nv record uvm mappings

* linteeer

* smth

* ooops
2024-10-15 00:12:49 +03:00
nimlgen
8094340221 nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00
chenyu
bd8ecf7fd6 remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu
c4c806a210 generate new kernel dataset (#7034)
* generate new kernel dataset

pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
qazal
13846930cd hotfix: extract_dataset.py (#7029) 2024-10-13 11:18:23 +03:00
George Hotz
a71bb09ec3 remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz
5ae2de9845 UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
qazal
20d3c2d113 unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
Tobias Fischer
f9e32f2bb2 clip device fix (#6924) 2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d dtype=float in bert log_softmax for loss and accuracy (#6916) 2024-10-06 11:15:56 -04:00
George Hotz
4df5c7a4ef move lazy to engine [pr] (#6886)
* move lazy to engine [pr]

* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
8ca506ee37 remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a fix var_vals in MCTS (#6882)
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61 node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6 things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
c3c93f332a symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
Tobias Fischer
33f7599158 Compute FID Score (#6802)
* compute fid score code

* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
Francis Lata
d3a387be63 [MLPerf] Prepare openimages dataset script (#6747)
* prepare openimages for MLPerf

* cleanup

* fix issue when clearing jit_cache on retinanet eval

* revert pandas specific changes
2024-09-27 11:13:56 -04:00
nimlgen
3c56aeee70 add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
chenyu
396c96357b update mlperf bert scripts (#6755)
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
wozeparrot
4ebc9589a6 feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
nimlgen
56979aa3ed qcom ioctl log levels (#6735) 2024-09-25 14:59:27 +08:00
wozeparrot
2be0b26a1f rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
nimlgen
ca66b11e07 qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
samm393
19c11792fd Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
chenyu
31b9c74c77 tiny import cleanup and fix typo (#6692) 2024-09-23 21:48:23 -04:00