The previous implementation called `wait_cloned()` on parent blocks during
computation, which triggered recursive `compute_trie_data` calls. For long
chains (e.g., 90 blocks in P2P sync tests), this caused deep recursion
that could lead to timeouts or stack issues.
The new implementation:
1. Traverses parent chain iteratively, collecting unsorted data directly
2. Uses Ready ancestor's trie_input as base when found (O(1) shortcut)
3. Sorts pending ancestors' data ourselves (no recursive wait_cloned)
4. Builds cumulative overlay from base + sorted data + current data
This eliminates deep recursion while maintaining correctness.