Skip to content

Draft: Partial vectorization for Hexagon DSP HVX

What does this implement/fix?

Additional optimization for vectorization using HVX on the Hexagon DSP. HVX uses 128-byte vector registers so vectors and matrices with sizes that don't fit 32 elements have quite some overhead. With this change we want to enable using shorter packets as well with the HVX register.

This change focuses on dynamic vectors and operations on the dynamic vectors. We add some extra memory to the dynamic vectors to avoid reading invalid memory when reading a full HVX register.

This is a first contribution for us. Any suggestions for improvements and aligning with the Eigen code base are very welcome. Thank you!

Additional information

This is in addition to the work from Cheng Wang and comes independently from Qualcomm.

Merge request reports