Files
tinygrad/ane
George Hotz 1dcaecacc4 Support for Apple Neural Engine (#130)
* ane query is success

* cite and build instructions

* low level access, need to disable AMFI

* coreml_ane works

* coreml fun

* more work

* compiled example

* progress

* compiler works

* model flow

* TODOs in the readme

* put some real weights in

* we are learning objc

* much progress i think

* signed model still doesn't work

* working example

* there are float16

* clean up: part 1

* h11ane header, more cleanup

* cleanup DeviceController creation

* remove the stupid sleep

* notes

* start a hwx parser

* no tabs

* compare stuff

* hmm, why don't inputs work

* cache doesn't seem to fix it

* hmm, the issue was the compiler

* fix the compiler, guess i didn't put in weights

* logging for compiler

* uselessness in plist

* remove hwx before compile, weights are converted to float16

* better compare

* better compare

* last line in comparE

* opcodes from compiler

* notes
2020-12-03 10:32:26 -08:00
..
2020-12-03 10:32:26 -08:00

kernel driver: AppleH11ANEInterface
  requires entitlement: com.apple.ane.iokit-user-access
	compiler is run in ANE_ProgramCreate_gated

2 helper processes:
  /usr/libexec/aned
  ANECompilerService

Espresso:
	Contains ANECompilerEngine

AppleNeuralEngine: Objective-C interface called by Espresso
  ANEServices: communication with the device
  ANECompiler: compile plist into hwx file
		com.apple.ANECompilerService.allow in AppleNeuralEngine?
		Called from ANECompilerService.xpc in AppleNeuralEngine.framework

== Model Flow ==

    Keras/ONNX model
          |
          |   1_build
          | (coremltools, open source)
          v
      CoreML model
          |
          |   TODO: automate this
          |   Grabbed plist from lldbing ANECompilerService during 1_build
          | (Espresso)
          v
       net.plist
          |
          |   2_compile
          | (AppleNeuralEngine, ANECompiler)
          v
       model.hwx
          |
          |   3_run
          | (AppleNeuralEngine, ANEServices, AppleH11ANEInterface)
          v
 <run on neural engine>

TODO: Write a nice plist grabber
DONE: Write a call to the compiler with plist+weights
DONE: Write an hwx runner

== Zin Constants ==

kZinIrOpCodeConv = 0?
kZinIrOpCodePool = 1
kZinIrOpCodeElementWiseOp = 6
kZinIrOpCodeConcat = 8
kZinIrOpCodeFlattenComposite
kZinIrOpCodeNEConvOp
kZinIrOpCodeTranspose

# guessing from the hwx
kZinTensorFormatUInt8 = 0
kZinTensorFormatInt8 = 1
kZinTensorFormatFloat16 = 2
kZinTensorFormatInvalid

== plist exploration ==

Float16 -> UInt8 for output works, Float32 doesn't
Same for input
All weights must be float

Index 0: model.espresso.weights @ 192 is weights
Index 1: net.additional.weights @ 0 is bias

Float16 -> Float32 for bias works

It's possible the compiler is Float32 -> Float16 converting, and the engine only supports Float16 + UInt8

== call to the compiler (in dmesg!) ==

[54476.282258]: H11ANEIn: ANE_ProgramCreate_gated:, ZinComputeProgramMake, get Mcache size: 0x0
[54476.282259]: H11ANEIn: ANE_ProgramCreate_gated:,Program Identifier:ANEC v1
zin_ane_compiler v4.2.1
	-t h13
	--fdram-allocator=ffreuse
	--fdram-tensor-priority=sizethenliverange
	--fl2-allocator=ffreuse
	--fl3-allocator=ffreuse
	--fl2-cache-mode=resident
	--fsignature=ident
	--memcache-strategy=
[54476.282262]: 	--memcache-size=4194304
	--fspatial-split=disabled
	--fkernel-rewind=enabled
	--Wl-undefined=fvmlib
	-i /Library/Caches/com.apple.aned/tmp/Python/DB7E897E7F4D5D27501A998428B6D3863AFD96CEA82DAF2207A75394E6BAC44C/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/net.plist
	-o /Library/Caches/com.apple.aned/20A2411/Python/C9981871BC59572E74AFA3014B183EA37567EE9A2A08328446CE4A2B754E109D/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/model.hwx.tmp

== ANECCompile (in ANECompiler framework) ==
  ANECCompile(__CFDictionary *param_1, __CFDictionary *param_2, unsigned long param_3)

param_1:
{   
    InputNetworks =     (
                {
            NetworkPlistName = "net.plist";
            NetworkPlistPath = "/Library/Caches/com.apple.aned/tmp/run/A2ACB9D5AA31B301563A4F62885BA379E62B0E1240E95C6902A93900FE0A9B54/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
        }
    );
    OutputFileName = "model.hwx.tmp";
    OutputFilePath = "/Library/Caches/com.apple.aned/20A2411/run/E68910CD1994681121EEDAFAE1BC524AA8E84CF80C42AFC0C7DE2C082C58BDFD/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
}

param_2:
{
    TargetArchitecture = h13;
}

== Backtrace of device open ==

  * frame #0: 0x00000001a68fac54 ANEServices`H11ANEDeviceOpen
    frame #1: 0x00000001a78405b8 AppleNeuralEngine`__29-[_ANEDeviceController start]_block_invoke + 436
    frame #2: 0x0000000193c84420 libdispatch.dylib`_dispatch_client_callout + 20
    frame #3: 0x0000000193c92a98 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
    frame #4: 0x00000001a78403e8 AppleNeuralEngine`-[_ANEDeviceController start] + 136
    ...
    frame #23: 0x00000001a64a4f38 Espresso`Espresso::ANERuntimeEngine::compiler::build_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2080
    ...
    frame #31: 0x000000019ab6099c CoreML`-[MLNeuralNetworkEngine rebuildPlan:] + 1640

== Backtrace of run? ==

  * frame #0: 0x00000001a68f9108 ANEServices`H11ANEProgramProcessRequestDirect
    frame #1: 0x00000001a7839694 AppleNeuralEngine`-[_ANEProgramForEvaluation processRequest:qos:qIndex:modelStringID:options:error:] + 1904
    frame #2: 0x00000001a7843ba4 AppleNeuralEngine`-[_ANEClient doEvaluateDirectWithModel:options:request:qos:error:] + 1236
    frame #3: 0x00000001a7842034 AppleNeuralEngine`-[_ANEClient evaluateWithModel:options:request:qos:error:] + 104
    frame #4: 0x00000001a64a2988 Espresso`Espresso::ANERuntimeEngine::compiler::__forward_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2008
    frame #5: 0x00000001a6414548 Espresso`Espresso::net_compiler_segment_based::__forward(std::__1::shared_ptr<Espresso::abstract_batch> const&) + 992
    frame #6: 0x00000001a67e2e3c Espresso`EspressoLight::espresso_plan::dispatch_task_on_compute_batch(std::__1::shared_ptr<Espresso::abstract_batch> const&, std::__1::shared_ptr<EspressoLight::plan_task_t> const&) + 612
    frame #7: 0x00000001a67ebab0 Espresso`EspressoLight::espresso_plan::execute_sync() + 356
    frame #8: 0x00000001a67f26fc Espresso`espresso_plan_execute_sync + 120
    frame #9: 0x000000019ab674b8 CoreML`-[MLNeuralNetworkEngine executePlan:error:] + 136
    frame #10: 0x000000019ab6799c CoreML`-[MLNeuralNetworkEngine evaluateInputs:bufferIndex:options:error:] + 368