Commit Graph

4108 Commits

Author SHA1 Message Date
George Hotz
b7e281cf10 JitItem -> ExecItem (#4146)
* JitItem -> ExecItem

* execitem in realize

* cleaner

* JITRunner -> Runner
2024-04-11 08:24:57 -07:00
George Hotz
e79a11b99c hotfix: revert llama change 2024-04-10 20:13:15 -07:00
George Hotz
2e6c39b0b2 Do less realizes (#4141)
* less realize

* corealize jit inputs

* prints

* print before we run
2024-04-10 19:50:50 -07:00
chenyu
06bcae13b4 PADTO SUM if parents of sum are all zero-preserving (#4140)
* PADTO SUM if parents of sum are all zero-preserving

* test case unsafe ops after sum is fine

* reuse UNSAFE_PAD_OPS

* update db version
2024-04-10 22:16:12 -04:00
George Hotz
081dd1573f hotfix: keep CUDA D2D copy behind the CUDA_P2P flag 2024-04-10 21:36:48 +00:00
George Hotz
af5984df43 cudagraph memcpy through host (#4137) 2024-04-10 13:17:17 -07:00
terafo
5e6d2155e4 Add driving monitoring model to benchmarks (#4134)
* add driving monitoring model to benchmarks

* handle crash
2024-04-10 14:27:03 -04:00
chenyu
bf3583f9b2 use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132) 2024-04-10 13:11:50 -04:00
George Hotz
a35375df85 run_schedule is so simple now (#4130) 2024-04-10 09:49:30 -07:00
George Hotz
86bd2eb500 hotfix: update copy_from_fd for new DiskBuffer 2024-04-10 15:41:06 +00:00
George Hotz
ee457a4b20 no more underlying diskbuffer, that's just the device (#4129) 2024-04-10 08:32:25 -07:00
geohotstan
fe88591890 update onnx to 1.16.0 (#4127)
* update

* pass tests and skip tests
2024-04-10 11:19:13 -04:00
chenyu
6bbbeb93ac skip a few clang test that took > 30 seconds in CI (#4126)
* skip slow CLANG test test_train_cifar

* skip those too

* and that

* only CI

* one more
2024-04-10 02:00:34 -04:00
George Hotz
08ddeb5685 create schedule has global vars (#4125)
* abstractions3 is currently wishful thinking

* create_schedule_with_vars
2024-04-09 21:42:16 -07:00
George Hotz
216eb235e5 hotfix: cast mnist to float 2024-04-09 19:30:03 -07:00
George Hotz
fea774f669 spend 5 lines to bring mnist into the repo (#4122) 2024-04-09 19:24:57 -07:00
qazal
42edae8935 pickle schedules (#4114)
* pickle schedules

* Update test_pickle.py

* Update test_pickle.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-04-09 13:47:25 -07:00
Felix Kuehling
38ae4194a6 Fixes for ops_kfd (#4105)
* kfd_ops: Fix GPU node discovery on NUMA systems

Ignore potentially multiple CPU NUMA nodes and any GPU nodes that are
not accessible because of device cgroups.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

* kfd_ops: Format the GFX arch target name correctly

The target version in sysfs properties is a decimal representation with
two digits per component.

The format for LLVM GFX target names is a bit quirky for historical
reasons. It uses one digit for the minor version and stepping. When it
ran out of decimal digits for the stepping on gfx90X it started using
hexadecimal there. But the major version is still decimal and went
double digit in GFX10.

Make sure to parse and format it accordingly for all supported GPUs.

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>

---------

Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
2024-04-09 13:21:21 -07:00
George Hotz
10dbf90b2c hotfix: test speed 2024-04-09 13:20:39 -07:00
George Hotz
ae849d12d7 numpy device + pickle it (#4120) 2024-04-09 13:19:30 -07:00
chenyu
1ef9c50fd7 Update ssa input order and annotate types in cstyle and assembly (#4117)
variable prefix is never optional (removed the default "t") and UOp can be optional (added the default None).
2024-04-09 13:10:29 -04:00
geohotstan
15f2f39658 conceptually simpler fancy index (#3335)
* init

* add failed case

* fix: temp comment out MULACC cast

* is this right?

* add test case

* oops, forgot to get rid of temp test

* WOOOOOO TOOK OUT 2 TRANSPOSES IN GATHER YAY

* cleaner

* comment cleanup

* update docs

* resolve conflict

* oops

* SUPA FAST

* comment out a test

* del some print statements

* use new broadcast stuff

* more clean up

* move try except

* skip fancy indexing for python backend test_ops
2024-04-09 11:18:04 -04:00
David González Martínez
980124a605 add lerp operation to tensor (#4102)
* feat: add lerp operation to tensor

* fix

* style: fit in one line:

* tests: test backward for lerp
2024-04-08 17:03:27 -07:00
Francis Lam
46850a0269 search: add a BEAM_COMPARE env to optionally not compare to hc/tc (#4107)
* search: add a BEAM_COMPARE env to optionally not compare to hc/tc

setting BEAM_COMPARE=0 will prevent additional memory allocation
needed to do the timing tests assuming the BEAM result is in
the diskcache.

* change to optionally use Buffer.allocate
2024-04-08 18:54:01 -04:00
qazal
c390828f61 refactor outbufs (#4112) 2024-04-08 14:54:10 -07:00
andresgit
7fd12aba85 graph remove input buffer references (#4100)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-08 16:49:16 -04:00
chenyu
078d841479 add SPLIT_REDUCEOP to disable reduce split (#4115)
verify with `SPLIT_REDUCEOP=0 BIG=2 MPS=1 python3 -m pytest -rA test/test_speed_v_torch.py -k sum`. 10X slower on mac
2024-04-08 16:31:08 -04:00
qazal
eea42d864f account for all outputs (#4113) 2024-04-08 10:04:19 -07:00
chenyu
dbd39ab78a setitem support setting python const (#4111) 2024-04-08 11:37:50 -04:00
chenyu
f8dc82a8a7 use single tensor for llama kv chache (#4108)
similar to optimization in gpt2
2024-04-08 00:38:32 -04:00
chenyu
92c0675ccf setitem initial support (#4093)
* wip setitem

it's an eager assign to output shapetracker view

* cleanups and tests

* more cleanups
2024-04-07 20:35:22 -04:00
geohotstan
183708b3fd broadcast expand to match torch (#4085)
* initial version

* heh gimme grrrreen

* version 2

* clean ups

* some test confusion

* fix onnx

* rename to _broadcast_tensors

* improved errors and test

* fixed?

* some test fixup

* version 3 lol

* comments

* cleaner

* add failure test for expand to 0 test

* 1 more assertRaises test

* make err msg better

* also rewrite the expand onnx op? :s
2024-04-07 16:23:13 -04:00
uuuvn
2b81d9b334 Fix broken test (#4104) 2024-04-07 12:02:12 -04:00
chenyu
9a95d87366 metal CI run llama with 4 shards (#4103)
this can catch multi tensor issue on mac.
2024-04-07 11:04:08 -04:00
George Hotz
444d2a7487 hotfix: fix SDMA read_pointer_address in KFD 2024-04-07 13:13:15 +00:00
uuuvn
bb7567b365 Fix metal (#4101) 2024-04-07 05:21:19 -07:00
chenyu
bdbcac67f1 assign jit test case with other tensor as input (#4098)
hmm it works
2024-04-06 14:41:14 -04:00
George Hotz
e4a1858471 revert command queue (#4097) 2024-04-06 08:58:18 -07:00
George Hotz
97c402d69e use imagenet spawn (#4096) 2024-04-06 08:34:10 -07:00
George Hotz
fffd9b05f5 mock mnist data for imagenet trainer (#4095)
* mock mnist data for imagenet

* move print and test

* needed to reshape
2024-04-06 08:08:40 -07:00
George Hotz
8739d33fe9 kfd: disable copy_from_fd while debugging (#4091)
* kfd: disable copy_from_fd while debugging

* increase timeout to a minute
2024-04-05 18:02:58 -07:00
George Hotz
93824e59eb support MOCKDATA=1 for resnet (#4090)
* mockdata for resnet

* fix eval, revert hsa
2024-04-05 17:19:18 -07:00
George Hotz
164329a8ea address kfd feedback (#4087)
* address kfd feedback

* signals cleanup

* signals cleanup

* handle 2 doorbell pages correctly

* signal reset cleanup

* signals cleanup

* more GTT

* cleanups

* minor cleanups
2024-04-05 15:24:41 -07:00
geohotstan
dafa42e864 clean up (#4081)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-04-05 11:57:44 -04:00
Akshit Talwar
750ecf8fef replace slice by pad/shrink in _pool (#4082) 2024-04-05 11:47:22 -04:00
George Hotz
a337922c44 more work on kfd (#4079)
* more work on kfd

* fix multitensor test on kfd

* stuff
2024-04-05 08:36:36 -07:00
chenyu
e7ff5102cf failed test in test_pattern_matcher (#4080)
something about the PTX rewrite is incorrect that it has duplicated rewritten uops
2024-04-05 02:53:50 -04:00
chenyu
a023a1ed87 update github action to actions/cache@v4 (#4077)
get rid of warning `Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/cache@v3.`
2024-04-04 22:24:26 -04:00
George Hotz
28ec6c67be hotfix: hlb_cifar KFD works 2024-04-05 02:19:14 +00:00
chenyu
1de9778949 import Buffer and BufferOption from tinygrad.buffer (#4076) 2024-04-04 22:12:23 -04:00