geohotstan
f8056a74d6
combine pad2d with pad ( #7677 )
...
* I have pad2d, I have pad, uuh~, pad2dpad~
* fix some small things
* strategically placed cast hack
* fix more
* fix more more
* tests
* periods
2024-11-14 17:56:02 +08:00
qazal
e84d089ef1
delete ReduceOps, only use REDUCE_AXIS ( #7667 )
2024-11-13 19:04:27 +08:00
qazal
e07d2d0966
skip TestBeamSearch.test_large_ast ( #7652 )
2024-11-12 20:52:22 +08:00
chenyu
035e39f900
remove copied is_dtype_supported from onnx [pr] ( #7646 )
2024-11-11 19:20:32 -05:00
Ahmed Harmouche
9c63c3d8ab
These casts should only happen if these are supported ( #7644 )
2024-11-12 07:56:50 +08:00
nimlgen
4d81b7952a
qcom match texture/sampler descriptors to OpenCL ( #7622 )
...
* qcom ioctl compare more regs
* bug fix
2024-11-11 21:56:51 +03:00
uuuvn
94a484542b
Hook memoryview via class instead of a function ( #7627 )
2024-11-11 09:07:06 +08:00
chenyu
e7b18cf5c0
fix load_worlds filter_novariable ( #7564 )
...
filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels
2024-11-05 16:06:39 -05:00
chenyu
207bca6cea
set PAGE_SIZE=1 and generate new dataset ( #7559 )
...
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
chenyu
7581a57aac
show the actual dataset size in error message ( #7557 )
2024-11-05 09:16:30 -05:00
chenyu
0db5f52b2a
check datasets/sops.gz size to be > 5000 ( #7555 )
...
it has > 12000 rows now, but it depends on the backend that generates these so setting a lower but meaningful threshold
2024-11-05 09:03:19 -05:00
chenyu
e641bbc859
safe softmax trick in MCTS ucb_explored_children ( #7515 )
...
* safe softmax trick in MCTS ucb_explored_children
fixed
```
File "numpy/random/mtrand.pyx", line 971, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
```
when all ucb_explored_children are big negative numbers result in all NaN probabilities
* better type
2024-11-03 15:59:31 -05:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
chenyu
fb694a63eb
Tensor.erf ( #7419 )
...
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
eliotgolding
e920f1d663
Llama 3.2 1B load from GGUF ( #7295 )
...
* gguf 1b-instruct
* not needed
2024-10-27 09:29:02 +08:00
nimlgen
68cd2c0669
nv correct local memory based on device ( #7307 )
...
* nv correct local memory based on device
* linter
* oops
* oops2
2024-10-25 22:23:42 +03:00
nimlgen
ea11382087
nv fix shared_memory_size ( #7239 )
2024-10-23 21:59:47 +03:00
qazal
aeeb917b6e
mask out writable bufs in runtime access_resources ( #7234 )
2024-10-23 16:13:50 +03:00
George Hotz
b0a13896d7
PtrDType is dataclass [pr] ( #7125 )
...
* PtrDType is dataclass [pr]
* new dataset
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-10-18 09:40:33 -04:00
nimlgen
45db7d9045
fuzz qcom vs opencl ( #7130 )
...
* fuzz qcom vs opencl
* fix nv
* bettre?
* typo
* open both devs
2024-10-17 18:49:08 +03:00
George Hotz
3169cb386d
remove graph [pr] ( #7085 )
2024-10-16 11:40:07 +08:00
nimlgen
b025495e5c
fuzz nv vs cuda ( #7066 )
...
* fuzz nv vs cuda
* fixes
* smth
* um
* cmp the same
* dnrt
* correct gpfifo scan
* fix
2024-10-15 22:22:40 +03:00
qazal
8ff6514ba3
delete extra/ops.py [pr] ( #7072 )
2024-10-15 22:14:21 +03:00
nimlgen
586ff4c910
nv record uvm mappings ( #7059 )
...
* nv record uvm mappings
* linteeer
* smth
* ooops
2024-10-15 00:12:49 +03:00
nimlgen
8094340221
nv print info about faults ( #7057 )
...
* nv print info about faults
* unrelated changes
* nv_gpu.GT200_DEBUGGER in mockgpu
* regen with ocrrect version
* spacing
2024-10-14 21:49:38 +03:00
chenyu
bd8ecf7fd6
remove NumNode ( #7035 )
2024-10-13 16:42:19 -04:00
chenyu
c4c806a210
generate new kernel dataset ( #7034 )
...
* generate new kernel dataset
pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
qazal
13846930cd
hotfix: extract_dataset.py ( #7029 )
2024-10-13 11:18:23 +03:00
George Hotz
a71bb09ec3
remove symbolic file [pr] ( #7012 )
2024-10-12 18:44:44 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
qazal
20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW ( #6955 )
...
* add UOps.VIEW
* update hardcoded asts
* update sops.gz
2024-10-09 02:00:17 +08:00
Tobias Fischer
f9e32f2bb2
clip device fix ( #6924 )
2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d
dtype=float in bert log_softmax for loss and accuracy ( #6916 )
2024-10-06 11:15:56 -04:00
George Hotz
4df5c7a4ef
move lazy to engine [pr] ( #6886 )
...
* move lazy to engine [pr]
* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
8ca506ee37
remove the magic methods for moving between devices [pr] ( #6881 )
...
* remove the magic methods for moving between devices [pr]
* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a
fix var_vals in MCTS ( #6882 )
...
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61
node cleanup + local metal test speed [pr] ( #6880 )
...
* node cleanup [pr]
* fix tests, including the double one on metal
* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6
things that are only used in one place don't belong in helpers [pr] ( #6878 )
...
* things that are only used in one place don't belong in helpers [pr]
* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630 .
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
chenyu
c3c93f332a
symbolic bool raise ValueError when not sure [pr] ( #6853 )
2024-10-02 09:10:58 -04:00
Tobias Fischer
33f7599158
Compute FID Score ( #6802 )
...
* compute fid score code
* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
Francis Lata
d3a387be63
[MLPerf] Prepare openimages dataset script ( #6747 )
...
* prepare openimages for MLPerf
* cleanup
* fix issue when clearing jit_cache on retinanet eval
* revert pandas specific changes
2024-09-27 11:13:56 -04:00
nimlgen
3c56aeee70
add Tensor.from_blob ( #6765 )
...
* draft tensor from pointer init
* some docs and types
* comment
* cleaner
* test
* malloc
* qcom cl interop
* jit example
* cleaner
* dealoc
* wording
* docs
2024-09-26 18:33:19 +08:00
chenyu
396c96357b
update mlperf bert scripts ( #6755 )
...
removed DISABLE_DROPOUT=1.
updated BS to 54 that works on tinyboxes with dropouts.
used bert's sparse_categorical_crossentropy that takes Tensor ignore_index in accuracy method
2024-09-25 23:55:05 -04:00
wozeparrot
4ebc9589a6
feat: make buffer ( #6745 )
2024-09-25 18:31:03 +08:00
nimlgen
56979aa3ed
qcom ioctl log levels ( #6735 )
2024-09-25 14:59:27 +08:00
wozeparrot
2be0b26a1f
rand only supports single device ( #6682 )
2024-09-24 16:07:44 +08:00
nimlgen
ca66b11e07
qcom fix disasm ( #6703 )
2024-09-24 15:23:43 +08:00
samm393
19c11792fd
Flux.1 ( #6334 )
...
* initial commit
* whitespace
* get rid of torch import
* indentation
* less hardcoding
* add flux.1-dev
* jit
* no double
* t5 tidy up
* validation image
* reuse sdxl autoencoder
* typing changes
* empty lines
* remove unneeded comments
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-09-24 10:08:04 +08:00
chenyu
31b9c74c77
tiny import cleanup and fix typo ( #6692 )
2024-09-23 21:48:23 -04:00