Search code examples
AVX three operands for sqrt?...


assemblyx86simdinstructionsavx

Read More
Understanding the SIMD shuffle control mask...


cgccsimdavx

Read More
Intel C Compiler uses unaligned SIMD moves with aligned memory...


intelssememory-alignmentintrinsicsavx

Read More
SIMD intrinsics slower than a single scalar implementation for toy example accessing Eigen matrices...


c++performanceeigensimdavx

Read More
Exception 13 with AVX instruction...


x86simdsseavxvxworks

Read More
How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps)...


assemblyx86-64sseavx

Read More
Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?...


c++simdsseavxavx2

Read More
How to check inf for AVX intrinsic __m256...


c++csseintrinsicsavx

Read More
How to get data out of AVX registers?...


c++visual-c++intrinsicsavxfma

Read More
Why do bit manipulation intrinsics like _bextr_u64 often perform worse than simple shift and mask op...


gccbit-manipulationx86-64intrinsicsavx

Read More
Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? O...


gccassemblyx86sseavx

Read More
Why do SSE instructions preserve the upper 128-bit of the YMM registers?...


performancex86simdsseavx

Read More
How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?...


c++x86x86-64sseavx

Read More
Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?...


coptimizationsseavxauto-vectorization

Read More
How does SIMD (avx) processing work? for example, if I want 10 32 bit floats how do i fit in a 256 b...


csimdavx

Read More
why is my simd vector plus and set slower than using std::transform and std::plus<T> - am i do...


c++vectorvectorizationsimdavx

Read More
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX...


cssecpu-architectureavxfma

Read More
Does SSE/AVX provide a means of determining if a result was rounded up?...


x86roundingssesimdavx

Read More
Best way to mask a single bit in AVX2?...


cx86simdavxavx2

Read More
How to efficiently perform double/int64 conversions with SSE/AVX?...


c++floating-pointssesimdavx

Read More
What is the inverse of "_mm256_cvtepi16_epi32"...


x86g++intrinsicsavxavx2

Read More
AVX2: Get every second int32...


csimdavxavx2int32

Read More
How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...


cx86-64simdsseavx

Read More
I need more performance for int8 vector multiplication (Intel AVX-512)...


performancesimdavxavx2avx512

Read More
Efficient way for using int8 AVX512-VNNI instruction, especially about loading the data to zmm regis...


performanceintelmatrix-multiplicationavxavx512

Read More
AVX 32-bit integer to double precision float best practice...


avxavx2

Read More
Have I written these sha256 #define's the correct way?...


calgorithmsha256avxsha2

Read More
What is the difference between shuffle and permute...


x86intelsimdnamingavx

Read More
Load and duplicate 4 single precision float numbers into a packed __m256 variable with fewest instru...


c++avx

Read More
Differences between AVX and AVX2...


x86matrix-multiplicationsimdavxavx2

Read More
BackNext