Francis Lata
99efa2cfde
Merge branch 'master' into retinanet_mlperf
2024-11-18 04:42:57 -08:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
Francis Lata
a0c0a77f54
Merge branch 'master' into retinanet_mlperf
2024-11-13 21:30:12 -08:00
qazal
e84d089ef1
delete ReduceOps, only use REDUCE_AXIS ( #7667 )
2024-11-13 19:04:27 +08:00
chenyu
1884f021e3
add conv3x3 to speed_v_theoretical ( #7658 )
...
* add conv3x3 to speed_v_theoretical
* show test duration
2024-11-12 16:41:56 -05:00
chenyu
962dafb467
use randn in speed_v_theoretical instead of rand ( #7656 )
...
* use randn in speed_v_theoretical instead of rand
this made green gemv 20% faster... but why?
* update threshold
2024-11-12 15:00:32 -05:00
chenyu
6159790ab8
add gemv to speed_v_theoretical ( #7654 )
...
* add gemv to speed_v_theoretical
getting ~300GB/s if we just count the memory of inputs and output
* better green numbers
* flip
2024-11-12 11:19:35 -05:00
Francis Lata
0aad640465
Merge branch 'master' into retinanet_mlperf
2024-11-12 02:45:23 -08:00
chenyu
99f29e50b2
update speed_v_theoretical numbers ( #7647 )
...
better amd after set compute profile
2024-11-11 20:05:13 -05:00
chenyu
773d5b60bf
beam benchmark tests ( #7638 )
...
* beam benchmark tests
* lower AMD number somehow
* less flaky
2024-11-11 18:11:18 -05:00
nimlgen
4d81b7952a
qcom match texture/sampler descriptors to OpenCL ( #7622 )
...
* qcom ioctl compare more regs
* bug fix
2024-11-11 21:56:51 +03:00
Francis Lata
bf2dc3ae33
Merge branch 'master' into retinanet_mlperf
2024-11-09 17:00:30 -08:00
chenyu
8ca422e21a
script to compare kernel opt with BEAM ( #7604 )
...
intersting that on m1 max hcopt wins BEAM 2 about 20% of the time
2024-11-08 17:40:28 -05:00
Harald Schäfer
e7cbc29f48
openpilot benchmark: add cast from numpy to benchmark ( #7593 )
...
* openpilot benchmark: add cast from numpy to benchmark
* whitespace
* comment
2024-11-08 19:31:00 +08:00
George Hotz
205befa788
move is_dtype_supported to device [pr] ( #7575 )
2024-11-07 20:38:03 +08:00
Carl Basho
630a7f37cf
update tests ( #7554 )
...
Co-authored-by: John Doe <null@mail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-05 11:35:15 -05:00
chenyu
207bca6cea
set PAGE_SIZE=1 and generate new dataset ( #7559 )
...
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
Francis Lata
bb6f27d2f3
Merge branch 'master' into retinanet_mlperf
2024-11-04 19:19:22 -08:00
George Hotz
99bd4372a5
Ops.ALU is no more, the arg is just an op ( #7525 )
...
* op arg alu [pr]
* more
* more passing
* fix more tests
* more tests passing
* fix single failing test
* so much cleaner
* noop to not have process replay trigger
* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
0c19b6298b
rename ops to have unique names ( #7522 )
2024-11-04 17:09:45 +08:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
qazal
e955aa1bee
hotfix: process replay ( #7418 )
2024-10-30 22:45:40 +02:00
George Hotz
4e2895f8d2
safe changes from new dtype branch [pr] ( #7397 )
...
* safe changes from new dtype branch [pr]
* only image test on GPU
2024-10-30 17:18:48 +08:00
qazal
51c0c8d27e
cachable small graph rewrite ( #7371 )
2024-10-29 22:28:13 +08:00
qazal
e46edc22aa
use unittest helpers in TestTensorMetadata [pr] ( #7329 )
...
* use unittest helpers in TestTensorMetadata [pr]
* fix that
* 5 args
2024-10-28 18:38:30 +08:00
qazal
8d9459f281
always run process replay with contextvars ( #7323 )
...
* always run process replay with contextvars [pr]
* not the last two
* extra
* no pr
2024-10-27 20:44:42 +02:00
Francis Lata
e5d37f26f6
Merge branch 'master' into retinanet_mlperf
2024-10-26 15:36:23 -07:00
nimlgen
293714610a
capture beam log runtime errors ( #7311 )
2024-10-26 13:59:45 +03:00
Francis Lata
8a5cbb14e4
Merge branch 'master' into retinanet_mlperf
2024-10-25 22:56:30 -07:00
Francis Lata
6e3efd4ed6
add validation set test
2024-10-25 22:55:49 -07:00
Francis Lata
2586555bd3
clean up reference dataset implementation + ruff changes
2024-10-25 22:13:48 -07:00
Francis Lata
1344871a15
add back normalization and negate it in test
2024-10-25 21:50:42 -07:00
Francis Lata
4b21a8fb8d
got dataloader with normalize working
2024-10-25 20:25:07 -07:00
qazal
d482d927a8
hotfix: nobody uses [run_process_replay] [pr] ( #7264 )
2024-10-24 13:37:29 +03:00
chenyu
f890d1cbbd
remove PUSH_PERMUTES from external_test_opt ( #7232 )
...
remove old comments and update kernel count for test_convnext
2024-10-23 00:11:34 -04:00
qazal
dae908299e
full_ast_rewrite api with ScheduleItemContext ( #7223 )
2024-10-22 23:17:05 +03:00
Francis Lata
967438ca71
Merge branch 'master' into retinanet_mlperf
2024-10-22 02:48:51 -07:00
Francis Lata
ec146da5cf
trim dataloader related code needed from ref
2024-10-22 02:48:11 -07:00
Francis Lata
d9d65b9537
cleanup dataloader test and revert shm path
2024-10-19 17:32:58 -07:00
chenyu
ea016b55d1
don't throw in fuzz_linearizer ( #7148 )
...
already broken on master and needs fix. don't throw to not block other pr
2024-10-18 09:28:30 -04:00
nimlgen
45db7d9045
fuzz qcom vs opencl ( #7130 )
...
* fuzz qcom vs opencl
* fix nv
* bettre?
* typo
* open both devs
2024-10-17 18:49:08 +03:00
George Hotz
ded1b38b84
minor dtype cleanup [pr] ( #7124 )
...
* minor dtype cleanup [pr]
* use ptr() function
2024-10-17 17:41:23 +08:00
Francis Lata
4bebe61a9c
add dataloader + test
2024-10-16 15:38:47 -04:00
Francis Lata
3d857d758e
Merge branch 'master' into retinanet_mlperf
2024-10-16 15:36:37 -04:00
nimlgen
39ab67e9ef
beam capture and replay in fuzz ( #7099 )
...
* beam capture and reply in fuzz
* clean a bit
2024-10-16 20:26:58 +03:00
Francis Lata
498141c579
Merge branch 'master' into retinanet_mlperf
2024-10-16 10:14:39 -04:00
qazal
40f33c110b
big graph var_vals as rewrite context ( #7007 )
...
* var_vals as rewrite context
* no default arg
* add st var_vals
* delete some stuff
* add the rewrite rule again
* extra
* this whole part is preschedule
* test with a second context
* redo
* i always forget tensor variable
2024-10-16 07:31:44 +03:00
qazal
390171d686
delete SAVE_SCHEDULE=1 [pr] ( #7087 )
2024-10-16 07:13:20 +03:00
George Hotz
3169cb386d
remove graph [pr] ( #7085 )
2024-10-16 11:40:07 +08:00
nimlgen
b025495e5c
fuzz nv vs cuda ( #7066 )
...
* fuzz nv vs cuda
* fixes
* smth
* um
* cmp the same
* dnrt
* correct gpfifo scan
* fix
2024-10-15 22:22:40 +03:00