avx Examples and Free Source Code

AVX three operands for sqrt?...

assembly x86 simd instructions avx

Understanding the SIMD shuffle control mask...

c gcc simd avx

Intel C Compiler uses unaligned SIMD moves with aligned memory...

intel sse memory-alignment intrinsics avx

SIMD intrinsics slower than a single scalar implementation for toy example accessing Eigen matrices...

c++performance eigen simd avx

Exception 13 with AVX instruction...

x86 simd sse avx vxworks

How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps)...

assembly x86-64 sse avx

Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?...

c++simd sse avx avx2

How to check inf for AVX intrinsic __m256...

c++c sse intrinsics avx

How to get data out of AVX registers?...

c++visual-c++intrinsics avx fma

Why do bit manipulation intrinsics like _bextr_u64 often perform worse than simple shift and mask op...

gcc bit-manipulation x86-64 intrinsics avx

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? O...

gcc assembly x86 sse avx

Why do SSE instructions preserve the upper 128-bit of the YMM registers?...

performance x86 simd sse avx

How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?...

c++x86 x86-64 sse avx

Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?...

c optimization sse avx auto-vectorization

How does SIMD (avx) processing work? for example, if I want 10 32 bit floats how do i fit in a 256 b...

c simd avx

why is my simd vector plus and set slower than using std::transform and std::plus<T> - am i do...

c++vector vectorization simd avx

How to use Fused Multiply-Add (FMA) instructions with SSE/AVX...

c sse cpu-architecture avx fma

Does SSE/AVX provide a means of determining if a result was rounded up?...

x86 rounding sse simd avx

Best way to mask a single bit in AVX2?...

c x86 simd avx avx2

How to efficiently perform double/int64 conversions with SSE/AVX?...

c++floating-point sse simd avx

What is the inverse of "_mm256_cvtepi16_epi32"...

x86 g++intrinsics avx avx2

AVX2: Get every second int32...

c simd avx avx2 int32

How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...

c x86-64 simd sse avx

I need more performance for int8 vector multiplication (Intel AVX-512)...

performance simd avx avx2 avx512

Efficient way for using int8 AVX512-VNNI instruction, especially about loading the data to zmm regis...

performance intel matrix-multiplication avx avx512

AVX 32-bit integer to double precision float best practice...

avx avx2

Have I written these sha256 #define's the correct way?...

c algorithm sha256 avx sha2

What is the difference between shuffle and permute...

x86 intel simd naming avx

Load and duplicate 4 single precision float numbers into a packed __m256 variable with fewest instru...

c++avx

Differences between AVX and AVX2...

x86 matrix-multiplication simd avx avx2