Search code examples
Find the INDEX of element having max. absolute value using AVX512 instructions...


cmaxinstructionsavx512

Read More
Best way to store 256 bit AVX vectors into unsigned long integers...


cvectoravxavx2avx512

Read More
Interleaved merging of 2 AVX-512 vector elements - C intrinsic...


chpcintrinsicsavxavx512

Read More
Fastest way to calculate a digit-sum for a large number (as a decimal string)...


cassemblysseintrinsicsavx512

Read More
Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2...


cintrinsicsavxavx2avx512

Read More
How to achieve the effect of vpmovmskb on ZMM registers?...


assemblyx86bitmaskavx512

Read More
SIMD optimize small matrix multiply (16 x 16) x (16 x 1)...


matrix-multiplicationsimdavxavx512

Read More
How would you write feature agnostic code for both AVX2 and AVX512?...


c++c-preprocessorintrinsicsavx2avx512

Read More
How to instruct MS Visual C++ compiler to use an uninitialized __m512i register...


c++visual-c++intrinsicsmicro-optimizationavx512

Read More
How can I gather single bytes with AVX512 intrinsics, given a vector of int offsets?...


cssesimdintrinsicsavx512

Read More
How to do manual code vectorization with better performance that automatic vectorization for edge de...


c++optimizationavx512

Read More
Disabling all AVX512 extensions...


gccavxinstruction-setavx512

Read More
Intel AVX-512: how to set the EVEX.z bit...


assemblyx86machine-codeavx512

Read More
How to load a avx-512 zmm register from a ioremap() address?...


gccx86-64inline-assemblyavxavx512

Read More
Load vector into AVX2 register with non matching size...


c++avxavx2avx512

Read More
SSE: does mask store affect the bytes that were masked out...


ssesimdavx2avx512

Read More
BMI for generating masks with AVX512...


x86simdavx512bmi

Read More
Speedup by AVX2 and AVX512...


cavxavx2avx512

Read More
Dividing packed 16-bit integer with mask using AVX512 or SVML intrinsics...


cintrinsicsavxavx512

Read More
Converting packed 64-bit integers to packed 8-bit integers with signed saturation using AVX512...


cintrinsicsavxavx512

Read More
Does vzeroall zero registers ymm16 to ymm31?...


assemblyx86intelavxavx512

Read More
c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?...


c++intelintrinsicsavx2avx512

Read More
Gather AVX2&512 intrinsic for 16-bit integers?...


optimizationavx2avx512

Read More
Is there an x86 intrinsic that generates the AVX512 broadcast operation from a 32 bit floating point...


cintrinsicsavx512

Read More
Does GCC have builtins for AVX512 operations?...


gccavx512gcc9

Read More
Efficient way to create masking kreg values...


optimizationx86simdavx512

Read More
build tensorflow for intel xeon gold 6148...


tensorflowbazelavx2avx512intel-tensorflow

Read More
Count leading zero bits for each element in AVX2 vector, emulate _mm256_lzcnt_epi32...


bit-manipulationsimdavxavx2avx512

Read More
Does Skylake need vzeroupper for turbo clocks to recover after a 512-bit instruction that only reads...


assemblyx86intelmicro-optimizationavx512

Read More
avx512F kmovw mov word by word...


assemblyx86avx512

Read More
BackNext