Commit Graph

876 Commits

Author SHA1 Message Date
Francis Lata
99efa2cfde Merge branch 'master' into retinanet_mlperf 2024-11-18 04:42:57 -08:00
geohotstan
72a41095bc add Tensor.meshgrid (#7714)
* initial implementation and test

* some other places that can use meshgrid

* revert the onnx_ops change

* add to docs

* revert interpolate too

* update

* improve edge case test

* might as well test grad

* add to test can improve docs

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-16 23:06:47 -05:00
ignaciosica
597a239e28 Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725)
* remove unaryops

* remove ternaryops

* remove metaops

* hotfix

* remove binaryops

* hotfix: test_pattern_matcher

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-16 20:56:56 +08:00
chenyu
d736ae7153 example script to show BasicTransformerBlock speed regression (#7724) 2024-11-15 15:48:25 -05:00
geohotstan
f8056a74d6 combine pad2d with pad (#7677)
* I have pad2d, I have pad, uuh~, pad2dpad~

* fix some small things

* strategically placed cast hack

* fix more

* fix more more

* tests

* periods
2024-11-14 17:56:02 +08:00
Francis Lata
a0c0a77f54 Merge branch 'master' into retinanet_mlperf 2024-11-13 21:30:12 -08:00
qazal
e84d089ef1 delete ReduceOps, only use REDUCE_AXIS (#7667) 2024-11-13 19:04:27 +08:00
qazal
e07d2d0966 skip TestBeamSearch.test_large_ast (#7652) 2024-11-12 20:52:22 +08:00
Francis Lata
0aad640465 Merge branch 'master' into retinanet_mlperf 2024-11-12 02:45:23 -08:00
chenyu
035e39f900 remove copied is_dtype_supported from onnx [pr] (#7646) 2024-11-11 19:20:32 -05:00
Ahmed Harmouche
9c63c3d8ab These casts should only happen if these are supported (#7644) 2024-11-12 07:56:50 +08:00
nimlgen
4d81b7952a qcom match texture/sampler descriptors to OpenCL (#7622)
* qcom ioctl compare more regs

* bug fix
2024-11-11 21:56:51 +03:00
uuuvn
94a484542b Hook memoryview via class instead of a function (#7627) 2024-11-11 09:07:06 +08:00
Francis Lata
185e055dc8 add LambdaLR support 2024-11-09 19:45:14 -08:00
Francis Lata
bf2dc3ae33 Merge branch 'master' into retinanet_mlperf 2024-11-09 17:00:30 -08:00
chenyu
e7b18cf5c0 fix load_worlds filter_novariable (#7564)
filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels
2024-11-05 16:06:39 -05:00
chenyu
207bca6cea set PAGE_SIZE=1 and generate new dataset (#7559)
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
chenyu
7581a57aac show the actual dataset size in error message (#7557) 2024-11-05 09:16:30 -05:00
chenyu
0db5f52b2a check datasets/sops.gz size to be > 5000 (#7555)
it has > 12000 rows now, but it depends on the backend that generates these so setting a lower but meaningful threshold
2024-11-05 09:03:19 -05:00
Francis Lata
bb6f27d2f3 Merge branch 'master' into retinanet_mlperf 2024-11-04 19:19:22 -08:00
chenyu
e641bbc859 safe softmax trick in MCTS ucb_explored_children (#7515)
* safe softmax trick in MCTS ucb_explored_children

fixed
```
  File "numpy/random/mtrand.pyx", line 971, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
```
when all ucb_explored_children are big negative numbers result in all NaN probabilities

* better type
2024-11-03 15:59:31 -05:00
George Hotz
c8bf09b7d4 s/UOps/Ops (#7500)
* s/UOps/Ops [pr]

* fix
2024-11-03 11:26:10 +08:00
chenyu
fb694a63eb Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
eliotgolding
e920f1d663 Llama 3.2 1B load from GGUF (#7295)
* gguf 1b-instruct

* not needed
2024-10-27 09:29:02 +08:00
Francis Lata
8a5cbb14e4 Merge branch 'master' into retinanet_mlperf 2024-10-25 22:56:30 -07:00
Francis Lata
65c561a618 update image to be float32 2024-10-25 21:18:34 -07:00
Francis Lata
4b21a8fb8d got dataloader with normalize working 2024-10-25 20:25:07 -07:00
nimlgen
68cd2c0669 nv correct local memory based on device (#7307)
* nv correct local memory based on device

* linter

* oops

* oops2
2024-10-25 22:23:42 +03:00
nimlgen
ea11382087 nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
qazal
aeeb917b6e mask out writable bufs in runtime access_resources (#7234) 2024-10-23 16:13:50 +03:00
Francis Lata
967438ca71 Merge branch 'master' into retinanet_mlperf 2024-10-22 02:48:51 -07:00
George Hotz
b0a13896d7 PtrDType is dataclass [pr] (#7125)
* PtrDType is dataclass [pr]

* new dataset

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-18 09:40:33 -04:00
nimlgen
45db7d9045 fuzz qcom vs opencl (#7130)
* fuzz qcom vs opencl

* fix nv

* bettre?

* typo

* open both devs
2024-10-17 18:49:08 +03:00
Francis Lata
498141c579 Merge branch 'master' into retinanet_mlperf 2024-10-16 10:14:39 -04:00
George Hotz
3169cb386d remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
nimlgen
b025495e5c fuzz nv vs cuda (#7066)
* fuzz nv vs cuda

* fixes

* smth

* um

* cmp the same

* dnrt

* correct gpfifo scan

* fix
2024-10-15 22:22:40 +03:00
qazal
8ff6514ba3 delete extra/ops.py [pr] (#7072) 2024-10-15 22:14:21 +03:00
nimlgen
586ff4c910 nv record uvm mappings (#7059)
* nv record uvm mappings

* linteeer

* smth

* ooops
2024-10-15 00:12:49 +03:00
nimlgen
8094340221 nv print info about faults (#7057)
* nv print info about faults

* unrelated changes

* nv_gpu.GT200_DEBUGGER in mockgpu

* regen with ocrrect version

* spacing
2024-10-14 21:49:38 +03:00
Francis Lata
2cb4f1d45f Merge branch 'master' into retinanet_mlperf 2024-10-14 12:15:12 -04:00
chenyu
bd8ecf7fd6 remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu
c4c806a210 generate new kernel dataset (#7034)
* generate new kernel dataset

pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
qazal
13846930cd hotfix: extract_dataset.py (#7029) 2024-10-13 11:18:23 +03:00
Francis Lata
d5813a3c42 Merge branch 'master' into retinanet_mlperf 2024-10-12 22:04:58 -04:00
George Hotz
a71bb09ec3 remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz
5ae2de9845 UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
Francis Lata
1295a3020f Merge branch 'master' into retinanet_mlperf 2024-10-11 23:08:17 -04:00
Francis Lata
b802f74cee add dataloader 2024-10-11 23:04:21 -04:00
qazal
20d3c2d113 unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW (#6955)
* add UOps.VIEW

* update hardcoded asts

* update sops.gz
2024-10-09 02:00:17 +08:00
Tobias Fischer
f9e32f2bb2 clip device fix (#6924) 2024-10-07 00:47:32 +08:00