Search code examples
How to compute sine values somewhere, and then move them into XMM0 in assembly?...


assemblyx86ssex87fpu

Read More
Why won't simple code get auto-vectorized with SSE and AVX in modern compilers?...


coptimizationsseavxauto-vectorization

Read More
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX...


cssecpu-architectureavxfma

Read More
SSE4.1 slower than SSE3 on 4x4 matrix multiplication?...


c++matrixsimdssematmul

Read More
Does SSE/AVX provide a means of determining if a result was rounded up?...


x86roundingssesimdavx

Read More
Write access violation on read instruction (MOVQ load on old Athlon XP)...


visual-c++x86sseamd-processorsse2

Read More
What series of intrinsics will complete this paeth prediction code?...


c++sseintrinsics

Read More
Calculating constants for CRC32 using PCLMULQDQ...


ssecrc32modular-arithmeticgalois-field

Read More
Classification of x86 instructions according to floating point rounding mode sensitivity?...


assemblyfloating-pointx86-64sserounding-error

Read More
Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?...


assemblyx86ssesse2x87

Read More
Intel x86_64 assembly compare signed double precision floats...


assemblyx86-64intelprecisionsse

Read More
How to efficiently perform double/int64 conversions with SSE/AVX?...


c++floating-pointssesimdavx

Read More
Is there a way to utilize all XMM registers?...


c++cssecpu-registers

Read More
Output errors when using libmvec intrinsics for trigo functions manually (like cosf)...


c++gccglibcsseintrinsics

Read More
How to optimize cell-width measuring with SIMD (find the first column to have a non-zero byte in an ...


cx86-64simdsseavx

Read More
Is worth using SSE or should I just rely on the compiler?...


c++optimizationintelsimdsse

Read More
Accelerate CRC32b using intel processors...


x86intelssecrc32

Read More
Why does .NET use SIMD and not x87 for math operations not intrinsic to SIMD?...


.netassemblysimdssex87

Read More
Why is SSE4.2 cmpstr slower than regular code?...


cperformanceassemblyx86sse

Read More
SIMD: Accumulate Adjacent Pairs...


c++ssesimdintrinsicsavx

Read More
How do I use SSE(1,2,3,4) optimizations?...


c++coptimizationsse

Read More
Data not aligned correctly in Visual Studio if run in debugger...


c++visual-studioalignmentsse

Read More
What are the best instruction sequences to generate vector constants on the fly?...


assemblyx86ssesimdavx

Read More
Do the higher level SSE flags imply the lower ones in GCC / clang?...


gccsse

Read More
Shifting SSE/AVX registers 32 bits left and right while shifting in zeros...


x86ssesimdavxavx2

Read More
What is the point of MOVAPS in x86 if it does the same as MOVUPS in modern computers?...


assemblyx86sse

Read More
Structure of SSE vectorization calls for summing vector of floats...


cgccvectorizationsimdsse

Read More
gdb: SSE register output format...


debuggingassemblygdbssecpu-registers

Read More
AVX2 what is the most efficient way to pack left based on a mask?...


c++vectorizationssesimdavx2

Read More
Why do modern compilers prefer SSE over FPU for single floating-point operations...


cassemblyfloating-pointssex87

Read More
BackNext