sse Examples and Free Source Code

Can long integer routines benefit from SSE?...

performance integer sse bignum arbitrary-precision

Mode for _mm_cmpistrm SSE4.2 intrinsic...

c sse intrinsics sse4

Intel C Compiler uses unaligned SIMD moves with aligned memory...

intel sse memory-alignment intrinsics avx

Can I enable vectorization only for one part of the code?...

c++gcc sse pragma auto-vectorization

What is the fastest way to test if a double number is integer (in modern intel X86 processors)...

c optimization assembly x86 sse

Fast transposition of an image and Sobel Filter optimization in C (SIMD)...

c optimization sse simd

What are the 128-bit to 512-bit registers used for?...

assembly x86-64 sse simd cpu-registers

Fast method to copy memory with translation - ARGB to BGR...

c x86 rgb sse micro-optimization

Exception 13 with AVX instruction...

x86 simd sse avx vxworks

Counting the number of leading zeros in a 128-bit integer...

c++gcc bit-manipulation sse

Fast counting the number of set bits in __m128i register...

c sse simd sse2 hammingweight

Fastest Implementation of the Natural Exponential Function Using SSE...

c optimization vectorization sse simd

Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?...

x86 floating-point sse sse2 x87

Is there a C++ function that returns exactly the value of the built-in CPU operation RSQRTSS for inv...

c++x86 floating-point sse sqrt

How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps)...

assembly x86-64 sse avx

Are arrays of simd vectors naturally inefficient?...

c++assembly x86 simd sse

What is the fastest way to evaluate a cubic given 4 packed double coefficients in a YMM register?...

c optimization sse intrinsics avx2

What does the "P" prefix stand for in the x86 instruction PCLMULQDQ?...

assembly x86 x86-64 sse instruction-set

Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?...

c++simd sse avx avx2

How to check inf for AVX intrinsic __m256...

c++c sse intrinsics avx

Scope of MXCSR control register? Does it affect other threads?...

multithreading x86 floating-point sse cpu-registers

Header files for x86 SIMD intrinsics...

x86 header-files sse simd intrinsics

Set an XMM register to a repeating byte pattern (broadcast a constant byte)...

assembly sse micro-optimization sse2

Why is the generated assembly reordered when using intrinsics?...

c gcc x86 sse intrinsics

Auto-vectorizing: Convincing the compiler that alias check is not necessary...

c++opencv gcc vectorization sse

Is there a difference between SVML vs. normal intrinsic square root functions?...

c++intel sse intrinsics sse2

Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? O...

gcc assembly x86 sse avx

In GNU C inline asm, what are the size-override modifiers for xmm/ymm/zmm for a single operand?...

c gcc sse inline-assembly avx512

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math...

c++sse compiler-optimization simd fast-math

Why do SSE instructions preserve the upper 128-bit of the YMM registers?...

performance x86 simd sse avx