mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-04-29 03:00:14 -04:00
* ane query is success * cite and build instructions * low level access, need to disable AMFI * coreml_ane works * coreml fun * more work * compiled example * progress * compiler works * model flow * TODOs in the readme * put some real weights in * we are learning objc * much progress i think * signed model still doesn't work * working example * there are float16 * clean up: part 1 * h11ane header, more cleanup * cleanup DeviceController creation * remove the stupid sleep * notes * start a hwx parser * no tabs * compare stuff * hmm, why don't inputs work * cache doesn't seem to fix it * hmm, the issue was the compiler * fix the compiler, guess i didn't put in weights * logging for compiler * uselessness in plist * remove hwx before compile, weights are converted to float16 * better compare * better compare * last line in comparE * opcodes from compiler * notes
kernel driver: AppleH11ANEInterface
requires entitlement: com.apple.ane.iokit-user-access
compiler is run in ANE_ProgramCreate_gated
2 helper processes:
/usr/libexec/aned
ANECompilerService
Espresso:
Contains ANECompilerEngine
AppleNeuralEngine: Objective-C interface called by Espresso
ANEServices: communication with the device
ANECompiler: compile plist into hwx file
com.apple.ANECompilerService.allow in AppleNeuralEngine?
Called from ANECompilerService.xpc in AppleNeuralEngine.framework
== Model Flow ==
Keras/ONNX model
|
| 1_build
| (coremltools, open source)
v
CoreML model
|
| TODO: automate this
| Grabbed plist from lldbing ANECompilerService during 1_build
| (Espresso)
v
net.plist
|
| 2_compile
| (AppleNeuralEngine, ANECompiler)
v
model.hwx
|
| 3_run
| (AppleNeuralEngine, ANEServices, AppleH11ANEInterface)
v
<run on neural engine>
TODO: Write a nice plist grabber
DONE: Write a call to the compiler with plist+weights
DONE: Write an hwx runner
== Zin Constants ==
kZinIrOpCodeConv = 0?
kZinIrOpCodePool = 1
kZinIrOpCodeElementWiseOp = 6
kZinIrOpCodeConcat = 8
kZinIrOpCodeFlattenComposite
kZinIrOpCodeNEConvOp
kZinIrOpCodeTranspose
# guessing from the hwx
kZinTensorFormatUInt8 = 0
kZinTensorFormatInt8 = 1
kZinTensorFormatFloat16 = 2
kZinTensorFormatInvalid
== plist exploration ==
Float16 -> UInt8 for output works, Float32 doesn't
Same for input
All weights must be float
Index 0: model.espresso.weights @ 192 is weights
Index 1: net.additional.weights @ 0 is bias
Float16 -> Float32 for bias works
It's possible the compiler is Float32 -> Float16 converting, and the engine only supports Float16 + UInt8
== call to the compiler (in dmesg!) ==
[54476.282258]: H11ANEIn: ANE_ProgramCreate_gated:, ZinComputeProgramMake, get Mcache size: 0x0
[54476.282259]: H11ANEIn: ANE_ProgramCreate_gated:,Program Identifier:ANEC v1
zin_ane_compiler v4.2.1
-t h13
--fdram-allocator=ffreuse
--fdram-tensor-priority=sizethenliverange
--fl2-allocator=ffreuse
--fl3-allocator=ffreuse
--fl2-cache-mode=resident
--fsignature=ident
--memcache-strategy=
[54476.282262]: --memcache-size=4194304
--fspatial-split=disabled
--fkernel-rewind=enabled
--Wl-undefined=fvmlib
-i /Library/Caches/com.apple.aned/tmp/Python/DB7E897E7F4D5D27501A998428B6D3863AFD96CEA82DAF2207A75394E6BAC44C/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/net.plist
-o /Library/Caches/com.apple.aned/20A2411/Python/C9981871BC59572E74AFA3014B183EA37567EE9A2A08328446CE4A2B754E109D/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/model.hwx.tmp
== ANECCompile (in ANECompiler framework) ==
ANECCompile(__CFDictionary *param_1, __CFDictionary *param_2, unsigned long param_3)
param_1:
{
InputNetworks = (
{
NetworkPlistName = "net.plist";
NetworkPlistPath = "/Library/Caches/com.apple.aned/tmp/run/A2ACB9D5AA31B301563A4F62885BA379E62B0E1240E95C6902A93900FE0A9B54/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
}
);
OutputFileName = "model.hwx.tmp";
OutputFilePath = "/Library/Caches/com.apple.aned/20A2411/run/E68910CD1994681121EEDAFAE1BC524AA8E84CF80C42AFC0C7DE2C082C58BDFD/37C083FF396EB5948979EE20FD0457483E4ACE840AD23391A129BB83CFBC9C63/";
}
param_2:
{
TargetArchitecture = h13;
}
== Backtrace of device open ==
* frame #0: 0x00000001a68fac54 ANEServices`H11ANEDeviceOpen
frame #1: 0x00000001a78405b8 AppleNeuralEngine`__29-[_ANEDeviceController start]_block_invoke + 436
frame #2: 0x0000000193c84420 libdispatch.dylib`_dispatch_client_callout + 20
frame #3: 0x0000000193c92a98 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 60
frame #4: 0x00000001a78403e8 AppleNeuralEngine`-[_ANEDeviceController start] + 136
...
frame #23: 0x00000001a64a4f38 Espresso`Espresso::ANERuntimeEngine::compiler::build_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2080
...
frame #31: 0x000000019ab6099c CoreML`-[MLNeuralNetworkEngine rebuildPlan:] + 1640
== Backtrace of run? ==
* frame #0: 0x00000001a68f9108 ANEServices`H11ANEProgramProcessRequestDirect
frame #1: 0x00000001a7839694 AppleNeuralEngine`-[_ANEProgramForEvaluation processRequest:qos:qIndex:modelStringID:options:error:] + 1904
frame #2: 0x00000001a7843ba4 AppleNeuralEngine`-[_ANEClient doEvaluateDirectWithModel:options:request:qos:error:] + 1236
frame #3: 0x00000001a7842034 AppleNeuralEngine`-[_ANEClient evaluateWithModel:options:request:qos:error:] + 104
frame #4: 0x00000001a64a2988 Espresso`Espresso::ANERuntimeEngine::compiler::__forward_segment(std::__1::shared_ptr<Espresso::abstract_batch> const&, int, Espresso::net_compiler_segment_based::segment_t const&) + 2008
frame #5: 0x00000001a6414548 Espresso`Espresso::net_compiler_segment_based::__forward(std::__1::shared_ptr<Espresso::abstract_batch> const&) + 992
frame #6: 0x00000001a67e2e3c Espresso`EspressoLight::espresso_plan::dispatch_task_on_compute_batch(std::__1::shared_ptr<Espresso::abstract_batch> const&, std::__1::shared_ptr<EspressoLight::plan_task_t> const&) + 612
frame #7: 0x00000001a67ebab0 Espresso`EspressoLight::espresso_plan::execute_sync() + 356
frame #8: 0x00000001a67f26fc Espresso`espresso_plan_execute_sync + 120
frame #9: 0x000000019ab674b8 CoreML`-[MLNeuralNetworkEngine executePlan:error:] + 136
frame #10: 0x000000019ab6799c CoreML`-[MLNeuralNetworkEngine evaluateInputs:bufferIndex:options:error:] + 368