Skip to content
Snippets Groups Projects

Draft: Partial vectorization for Hexagon DSP HVX

What does this implement/fix?

Additional optimization for vectorization using HVX on the Hexagon DSP. HVX uses 128-byte vector registers so vectors and matrices with sizes that don't fit 32 elements have quite some overhead. With this change we want to enable using shorter packets as well with the HVX register.

This change focuses on dynamic vectors and operations on the dynamic vectors. We add some extra memory to the dynamic vectors to avoid reading invalid memory when reading a full HVX register.

This is a first contribution for us. Any suggestions for improvements and aligning with the Eigen code base are very welcome. Thank you!

Additional information

This is in addition to the work from Cheng Wang and comes independently from Qualcomm.

Merge request reports

Merge request pipeline #1148883982 failed

Merge request pipeline failed for f125670d

Approval is optional

Closed by Gerhard ReitmayrGerhard Reitmayr 10 months ago (Sep 26, 2024 3:10pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
334 362 HVX_store_partial<unpacket_traits<Packet8f>::size, 0>(to, from.Get());
335 363 }
336 364
365 template <>
366 EIGEN_STRONG_INLINE void pstoreu_partial<float>(float* to, const Packet32f& from, const Index n, const Index offset) {
367 const Index packet_size = unpacket_traits<Packet32f>::size;
368 eigen_assert(n <= packet_size && "number of elements plus offset will write past end of packet");
  • 146 146
    147 147 check_that_malloc_is_allowed();
    148 148 EIGEN_USING_STD(malloc)
    149 void* original = malloc(size + alignment);
    149 void* original = malloc(size + 2*alignment);
  • 419 419 dstIsAligned ? 0 : internal::first_aligned<requestedAlignment>(kernel.dstDataPtr(), size);
    420 420 const Index alignedEnd = alignedStart + ((size - alignedStart) / packetSize) * packetSize;
    421 421
    422 #ifdef EIGEN_VECTORIZE_PARTIAL
  • Is there a way to test this? Maybe a docker image that provides the SDK, plus emulation?

    • thank you for reviewing this. We are working on providing a docker image for testing, but that will probably take a few weeks (SDK distribution question).

      So, we will update this merge request first to see if we can avoid the extra buffer and your other comments.

      thanks!

    • Thanks for the update.

    • do you have any requirements for a docker image? like preferred Linux base image? we could go with a recent ubuntu image like 22.04

    • No preference - just something we can set up for the CI.

    • Hi @cantonios, I work with Gerhard at Qualcomm and am looking into the docker setup for libeigen testing on hexagon. I am coordinating with another team inside Qualcomm on this and I think it would be useful if we can get on a short 30min call to discuss the logistics of the installation of tools needed to perform hexagon testing. Please share some time slots next week if you are ok with a call. You can reach me at abisain@qti.qualcomm.com

    • @cantonios, the Docker image we had in mind is in !1634 (closed). Please take a look. It requires one-time creation of a Qualcomm Developer account and some click-through agreements in order to install the Hexagon SDK into a base testing image. This Dockerfile is re-distributable, but the resulting docker image would not be. To reduce the overhead of generating the image each time you could re-use the image privately in your CI system as long as the image is not publicly distributed.

      Edited by Bardia Behabadi
    • Please register or sign in to reply
  • This has some common elements in it.

    !1348 (closed)

  • requested review from @rmlarsen1

  • requested review from @cantonios

  • requested review from @chuckyschluz

  • Bardia Behabadi mentioned in merge request !1695

    mentioned in merge request !1695

  • I closed this MR because it is superceded by !1695.

  • Please register or sign in to reply
    Loading