Search code examples
Can I enable vectorization only for one part of the code?...


c++gccssepragmaauto-vectorization

Read More
What is the fastest way to test if a double number is integer (in modern intel X86 processors)...


coptimizationassemblyx86sse

Read More
Fast transposition of an image and Sobel Filter optimization in C (SIMD)...


coptimizationssesimd

Read More
What are the 128-bit to 512-bit registers used for?...


assemblyx86-64ssesimdcpu-registers

Read More
Fast method to copy memory with translation - ARGB to BGR...


cx86rgbssemicro-optimization

Read More
Exception 13 with AVX instruction...


x86simdsseavxvxworks

Read More
Counting the number of leading zeros in a 128-bit integer...


c++gccbit-manipulationsse

Read More
Fast counting the number of set bits in __m128i register...


cssesimdsse2hammingweight

Read More
Fastest Implementation of the Natural Exponential Function Using SSE...


coptimizationvectorizationssesimd

Read More
Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?...


x86floating-pointssesse2x87

Read More
Is there a C++ function that returns exactly the value of the built-in CPU operation RSQRTSS for inv...


c++x86floating-pointssesqrt

Read More
How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps)...


assemblyx86-64sseavx

Read More
Are arrays of simd vectors naturally inefficient?...


c++assemblyx86simdsse

Read More
What does the "P" prefix stand for in the x86 instruction PCLMULQDQ?...


assemblyx86x86-64sseinstruction-set

Read More
Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?...


c++simdsseavxavx2

Read More
How to check inf for AVX intrinsic __m256...


c++csseintrinsicsavx

Read More
Scope of MXCSR control register? Does it affect other threads?...


multithreadingx86floating-pointssecpu-registers

Read More
Header files for x86 SIMD intrinsics...


x86header-filesssesimdintrinsics

Read More
Set an XMM register to a repeating byte pattern (broadcast a constant byte)...


assemblyssemicro-optimizationsse2

Read More
Why is the generated assembly reordered when using intrinsics?...


cgccx86sseintrinsics

Read More
Auto-vectorizing: Convincing the compiler that alias check is not necessary...


c++opencvgccvectorizationsse

Read More
Is there a difference between SVML vs. normal intrinsic square root functions?...


c++intelsseintrinsicssse2

Read More
Vectorizing with unaligned buffers: using VMASKMOVPS: generating a mask from a misalignment count? O...


gccassemblyx86sseavx

Read More
In GNU C inline asm, what are the size-override modifiers for xmm/ymm/zmm for a single operand?...


cgccsseinline-assemblyavx512

Read More
Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math...


c++ssecompiler-optimizationsimdfast-math

Read More
Why do SSE instructions preserve the upper 128-bit of the YMM registers?...


performancex86simdsseavx

Read More
How many clock cycles does cost AVX/SSE exponentiation on modern x86_64 CPU?...


c++x86x86-64sseavx

Read More
How to best emulate the logical meaning of _mm_slli_si128 (128-bit bit-shift), not _mm_bslli_si128...


cssesimdintrinsicssse2

Read More
Logarithm with SSE, or switch to FPU?...


ssesimdlogarithmnatural-logarithm

Read More
parallel prefix (cumulative) sum with SSE...


csumopenmpsse

Read More
BackNext