Draft: Extend pexp_complex for double
Reference issue
This MR follows the SIMD implementation campaign #2635. This MR involves the extension of pexp_complex
for double precision.
What does this implement/fix?
This MR is a draft (for the moment) implementing the extension of pexp_complex
for double precision.
Additional information
References values in NEON are wrong in some edge cases. I will think of a workaround, but we might start discussing about the presence of psincos_aux
, which is an auxiliary function I used to avoid duplicating the code of pexp_complex
for float
and double
.
Merge request reports
Activity
added 2 commits
425 426 return plog_complex<Packet2cf>(a); 426 427 } 427 428 429 template <> 430 EIGEN_STRONG_INLINE Packet1cd pexp<Packet1cd>(const Packet1cd& a) { 431 return pexp_complex<Packet1cd>(a); Unfortunately, it is pretty slow. I also checked the performances of
pexp_complex
fordouble
with other architectures, and it makes vectorization worth only for AVX512. I find it quite weird because the performances of vectorizedsincos
andexp
(no complex numbers) withdouble
are already faster withAVX2
(similar performances with SSE 4.1).
added 3 commits
-
baf0816d...ad452e57 - 2 commits from branch
libeigen:master
- bee8b22d - Extend pexp_complex for double
-
baf0816d...ad452e57 - 2 commits from branch
For other maintainers @ChipKerchner @cantonios @rmlarsen1, since it provides speedup only for AVX512, do you suggest to work on speed it up before merging?
We can selectively enable it only for AVX512 if it's really bad. If it's somewhat comparable (even if slightly slower), then I agree with @chuckyschluz.
I would define 25% slower as really bad. Just to pick a random number...
Edited by Rasmus Munk Larsen
1016 1016 return cos(a); 1017 1017 } 1018 1018 1019 /** \internal \returns the sine cosine of \a a (coeff-wise) */ 1020 template <typename Packet> 1021 EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS Packet psincos_aux(const Packet& a) { 1022 EIGEN_USING_STD(sin); 1023 EIGEN_USING_STD(cos); 1024 return pselect(peven_mask(a), sin(a), cos(a)); This reads a bit funny. I think it only works for complex because peven_mask treats
a
as a Packet of reals? Maybe we should have a wrapper aroundpeven_mask
for complex values and name itpimag_mask
? Or am I missing something?Edited by Rasmus Munk LarsenThis function should never be called. If you enable vectorization,
psincos_double
andpsincos_float
will take care of it. If you don't enable vectorization, your code will never call this function. However, this function cannot be removed. It follows the same pattern ofpsin
,pcos
, etc.
added 269 commits
-
ad452e57...53309609 - 269 commits from branch
libeigen:master
-
ad452e57...53309609 - 269 commits from branch