Draft: Use masked load/stores in vectorized assignment loops

requested review from @cantonios and @rmlarsen1

changed the description

Are we getting rid of half-packets as well, which also contribute to numerical differences?

Not with this MR. If find_best_packet chooses the half packet, then the partial_load/store applies to that packet choice. If we do change/replace find_best_packet, this MR should be compatible with that change.

added 1 commit

ecda21f2 - add various assignment functors

Compare with previous version

added 1 commit

6d9a8f46 - typos

Compare with previous version

added 1 commit

49b6c738 - replace pstoret with pstoret_partial

Compare with previous version

added 1 commit

0e65c19f - blindly implement partial packets in products

Compare with previous version

added 1 commit

4a4be0b3 - blindly implement partial packets in products

Compare with previous version

added 1 commit

d5feebd5 - feed monkies more bananas to type random code more quickly

Compare with previous version

added 1 commit

e324ffe6 - correctly call etor_product_partial_packet_impl::run

Compare with previous version

added 1 commit

d453dc39 - partial redux

Compare with previous version

added 1 commit

ac10ae98 - swap and a typo in product evaulator

Compare with previous version

added 1 commit

dc2a8486 - typo in product evaluator

Compare with previous version

added 1 commit

36c9cabb - fix avx mask logic for scalars where sizeof != 4

Compare with previous version

added 1 commit

26668153 - account for generic masked load/stores in unalignedcount test

Compare with previous version

changed the description

added 1 commit

22f8932a - specify all ops in unaligned_dense_assignment_loop unaligned

Compare with previous version

added 1 commit

977b5973 - fixes

Compare with previous version

added 1 commit

6f32ee07 - reinterpret int64_t* to long long* and same with unsigned; specify alined...

Compare with previous version

changed the description

Draft: Use masked load/stores in vectorized assignment loops

Reference issue

What does this implement/fix?

Additional information

Activity

Draft: Use masked load/stores in vectorized assignment loops

Reference issue

What does this implement/fix?

Additional information

Merge request reports

Activity