geohotstan
089eeec271
setitem in-place operator tests ( #4577 )
...
* tests and error
* rename to in-place
* add a note
* more comments
* more comments
* disable folded advanced setitem tests for now
2024-05-14 01:28:02 -04:00
chenyu
0fa57b8ce9
raise error if setitem tensors have requires_grad ( #4575 )
...
* raise error if setitem tensors have requires_grad
working on supporting this, first properly raises error
* NotImplementedError
2024-05-13 18:56:47 -04:00
Filip Brzek
f7d08bd454
feat: add acc_dtype to einsum ( #4571 )
2024-05-13 14:02:07 -04:00
Szymon Ożóg
d97d5a7689
Optimize PTX gated loads index calculation ( #4304 )
...
* WIP but working
* Cleanup
* Remove float4 pred and alt
* Cleanup
* this is somehow slowin it down
* Simplify
* add define var to ignore when optimizing gates
* Update assembly.py
* Test for optimizing gated loads
* Cleanup
* Fix NEG needed before if
* Remove unused parameters
* Update assembly.py
* Fix for cachable gone
---------
Co-authored-by: oz <oz@oz-MS-7B86.NAT.gliwice.vectranet.pl >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-05-13 10:14:01 -07:00
qazal
77aa8659f5
use assign_targets in LazyOp creation ( #4568 )
...
* start
* correct error
* this is possible
* document it
2024-05-13 10:24:35 +03:00
qazal
b0fa97e176
assert error detail in test_assign ( #4567 )
...
* use regex assert
* that shouldnt raise
2024-05-13 09:56:05 +03:00
qazal
4e1135a0bc
assign buffer read/write tests ( #4565 )
...
* simple tests
* more tests
2024-05-13 09:43:36 +03:00
George Hotz
b660f60125
all uops are now cachable ( #4564 )
...
* all uops are now cachable
* cachable is gone
2024-05-12 22:34:35 -07:00
George Hotz
02327b8adf
simple stuff from new_uops branch ( #4563 )
2024-05-12 22:18:05 -07:00
ziereis
f53a23d21e
Test for optim assertion ( #4558 )
...
* add test for assertion
* whitespace
* restore state
---------
Co-authored-by: Thomas Ziereis <thomas.ziereis@web.de >
2024-05-12 14:21:28 -07:00
wozeparrot
d7670f8141
quantized llama multilazybuffer fix ( #4557 )
2024-05-12 14:19:21 -07:00
George Hotz
7a26bdac65
move scheduleitem to schedule.py ( #4541 )
...
* move scheduleitem to schedule.py
* don't need that type checking anymore
2024-05-11 21:13:04 -07:00
George Hotz
508e8a6666
add cpu objdump to LLVM/CLANG ( #4537 )
2024-05-11 14:28:44 -07:00
George Hotz
328b083e66
lil profiling script
2024-05-11 11:02:44 -07:00
qazal
2fb564c125
multi reduce linearizer tests start ( #4529 )
...
* test_end_local
* test_early_end_local
* todos
* mean+std
* skip no locals
2024-05-11 14:06:40 +03:00
qazal
3cba22920f
test_linearizer_correctness ( #4458 )
...
* test helper
* uops asserts
* cleanup args
* nits
2024-05-11 13:02:08 +03:00
qazal
b3d9fd48d0
infra for testing linearizer correctness ( #4528 )
...
* refactor outbufs
* delete helper
2024-05-11 12:10:33 +03:00
George Hotz
2f970a4fc2
all realize 2 ( #4527 )
...
* all realize 2
* tests fixup
* fix more tests
* fix openpilot
* fix tests
* unneeded
2024-05-10 22:43:09 -07:00
George Hotz
347a3acb37
add renderer class ( #4524 )
...
* add renderer class
* tests pass
* fix pylint
* fix tensor cores
2024-05-10 21:40:02 -07:00
chenyu
b00b6b16f0
fix TRAIN_BEAM and Tensor.training for mlperf bert ( #4525 )
...
also hard coded bert model config instead of looking up a file
2024-05-11 00:18:36 -04:00
chenyu
7fab8c9e17
add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit ( #4523 )
...
* add symbolic mean test cases in test_symbolic_ops and test_symbolic_jit
2d symbolic mean in jit does not quite work, order of the variable inputs are not deterministic?
* skip
2024-05-10 23:19:55 -04:00
George Hotz
827058f030
update tests get_runner ( #4522 )
2024-05-10 20:09:22 -07:00
George Hotz
d438d5698d
bring buffer back to device ( #4517 )
2024-05-10 11:22:31 -07:00
George Hotz
4eef1ee9bf
move renderer into options ( #4514 )
...
* move renderer into options
* fix tests
* renders are functions
2024-05-10 10:01:51 -07:00
George Hotz
89e119bc58
move Allocator to buffer.py ( #4502 )
...
* move Allocator to buffer.py
* move those to realize
* memory file
* cleanup
2024-05-09 19:45:56 -07:00
George Hotz
1e843d495e
cleaning up search with Program ( #4500 )
...
* cleaning up search
* fix tests
* test fix
* minor compiler cleanup
2024-05-09 19:01:53 -07:00
chenyu
d3dc332c2e
Tensor.logsumexp ( #4442 )
...
the subtract max part should share with safe softmax
cleaner
2024-05-09 20:49:06 -04:00
George Hotz
c9e84ed0da
refactor to Program class ( #4476 )
...
* refactor to Program class
* switch to Program
* fix tests
* smaller diff
* self.p
* more tests
* fix metal test
* tests
* fix openpilot
* move that to linearizer
* p.launchdims
2024-05-09 17:29:07 -07:00
nimlgen
a2e2ba380c
nv tune shmem size ( #4495 )
...
* nv tune shmem size
* compare them
* linter
* linter2
2024-05-10 00:35:01 +03:00
nimlgen
e14d5b6fd7
nv fix oob qmd ptr ( #4478 )
...
* nv fix oob qmd ptr
* test kernargs no oob
2024-05-08 23:11:04 +03:00
chenyu
36a1f38049
lazy folding: mul -1 is neg, and neg neg is noop ( #4472 )
2024-05-08 01:52:22 -04:00
chenyu
c508eb7425
revert the removal of CAST_BEFORE_VIEW ( #4471 )
...
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
chenyu
7eb035e7c5
stronger test case for half mean overflow ( #4470 )
2024-05-07 22:40:09 -04:00
chenyu
ca7300c783
fix half mean and its backward ( #4469 )
...
* fix half mean and its backward
cast to sum_acc_type, sum, div, then cast back
* mean dtype tests
2024-05-07 21:46:41 -04:00
Francis Lam
7da1b41f38
fuzz_linearizer: add FUZZ_REQUIRE_TC option to require TC in opts ( #4468 )
...
useful for checking late opts after TC such as GROUP, etc.
2024-05-07 17:14:21 -04:00
chenyu
46a793111b
test for LazyBuffer._view when mask out and degrade into const ( #4465 )
...
changed the condition from all 0 in masked dims to any 0 in masked. it's no-op because shapetracker rewrites whole mask to 0 if any dim has 0 as part of canonicalization
2024-05-07 12:56:23 -04:00
nimlgen
a1d350a810
nv timeline semaphores ( #4464 )
...
* nv timeline semaphores
* nv hcq fixes
2024-05-07 17:31:19 +03:00
nimlgen
e3bb85fd0e
amd timeline semaphores ( #4416 )
...
* amd timeline semaphores
* v2
* fixes
* reset signals
* fix
* rollover test
* small fixes
* linter
* copyin
2024-05-07 11:17:32 +03:00
George Hotz
17faae091b
optimizer shouldn't be run without training ( #4460 )
...
* optimizer shouldn't be run without training
* set training in relevant tests
* fix multitensor
* that too
2024-05-06 15:34:12 -07:00
qazal
35dfbc6354
rand_for_dtype helper ( #4459 )
2024-05-07 00:03:42 +03:00
Francis Lam
47750e65fd
kernel: un-reverse the order of the local indices ( #4454 )
...
no change to performance or behavior. new LOCALS are added to the
left side of the LOCALS block (to the left of the first_reduce).
2024-05-06 15:21:27 -04:00
chenyu
5e036cd0b3
test unary and more reduces in test_flopcounter ( #4455 )
...
cannot really catch a spec change error without testing the new spec explicitly, but we don't intended to change the lazy spec lightly
another possible way to catch reduce flopcounter shape would be type checking InterpretedFlopCounter and throw error if `in` results in `Never`
2024-05-06 15:15:16 -04:00
nimlgen
d0b8862dea
fix out of resource kernels on nv ( #4450 )
...
* fix out of resource kernels on nv
* better comment
* noqa
* noqa 2
* linter
2024-05-06 19:24:20 +03:00
nimlgen
113c2f00b9
amd doorbell size is 64bits ( #4448 )
...
* amd doorbell size ids 64bits
* add test
* test to pass 32bit boundary is more correct
* no need to round there
2024-05-06 16:59:59 +03:00
qazal
6dbe5585b0
batchnorm + conv backward in test_schedule ( #4420 )
...
* test both optims
* batchnorm_backward
2024-05-06 16:40:17 +03:00
chenyu
afe020710d
disable PADTO on upcasted axis ( #4444 )
...
fixed test_failure_31. PADTO upcasted is at best a no-op, and might fail at edge cases.
2024-05-05 21:52:03 -04:00
Francis Lam
c8595a9655
update sops.gz, fix tests and add new linearizer test ( #4437 )
...
* update sops.gz, fix tests and add new linearizer test
* remove METAL CI skip for test_failure_22
* re-add skip to METAL CI to test_failure_22
2024-05-05 17:31:25 -04:00
chenyu
d0eb1540d5
helpers.diskcache_clear ( #4436 )
...
drop all tables in diskcache. added a unit test but disabled it by default because it will drop all cache...
2024-05-05 14:19:01 -04:00
George Hotz
595a6e3069
test_fold_conv_relu_backward test
2024-05-05 11:13:43 -07:00
George Hotz
f95658bc3e
hotfix: pickle jit works if you delete the function
2024-05-05 10:14:03 -07:00