Draft: Partial vectorization for Hexagon DSP HVX
What does this implement/fix?
Additional optimization for vectorization using HVX on the Hexagon DSP. HVX uses 128-byte vector registers so vectors and matrices with sizes that don't fit 32 elements have quite some overhead. With this change we want to enable using shorter packets as well with the HVX register.
This change focuses on dynamic vectors and operations on the dynamic vectors. We add some extra memory to the dynamic vectors to avoid reading invalid memory when reading a full HVX register.
This is a first contribution for us. Any suggestions for improvements and aligning with the Eigen code base are very welcome. Thank you!
Additional information
This is in addition to the work from Cheng Wang and comes independently from Qualcomm.
Merge request reports
Activity
334 362 HVX_store_partial<unpacket_traits<Packet8f>::size, 0>(to, from.Get()); 335 363 } 336 364 365 template <> 366 EIGEN_STRONG_INLINE void pstoreu_partial<float>(float* to, const Packet32f& from, const Index n, const Index offset) { 367 const Index packet_size = unpacket_traits<Packet32f>::size; 368 eigen_assert(n <= packet_size && "number of elements plus offset will write past end of packet"); 146 146 147 147 check_that_malloc_is_allowed(); 148 148 EIGEN_USING_STD(malloc) 149 void* original = malloc(size + alignment); 149 void* original = malloc(size + 2*alignment); 419 419 dstIsAligned ? 0 : internal::first_aligned<requestedAlignment>(kernel.dstDataPtr(), size); 420 420 const Index alignedEnd = alignedStart + ((size - alignedStart) / packetSize) * packetSize; 421 421 422 #ifdef EIGEN_VECTORIZE_PARTIAL Hi @cantonios, I work with Gerhard at Qualcomm and am looking into the docker setup for libeigen testing on hexagon. I am coordinating with another team inside Qualcomm on this and I think it would be useful if we can get on a short 30min call to discuss the logistics of the installation of tools needed to perform hexagon testing. Please share some time slots next week if you are ok with a call. You can reach me at abisain@qti.qualcomm.com
@cantonios, the Docker image we had in mind is in !1634 (closed). Please take a look. It requires one-time creation of a Qualcomm Developer account and some click-through agreements in order to install the Hexagon SDK into a base testing image. This Dockerfile is re-distributable, but the resulting docker image would not be. To reduce the overhead of generating the image each time you could re-use the image privately in your CI system as long as the image is not publicly distributed.
Edited by Bardia Behabadi
This has some common elements in it.
requested review from @rmlarsen1
requested review from @cantonios
requested review from @chuckyschluz
mentioned in merge request !1695
I closed this MR because it is superceded by !1695.