Making Software 10x Faster with Low-Level CPU Optimizations
September 14, 2015
Speaker: Sasha Goldshtein Modern processors are extremely complex. Writing fast code means not only avoiding slow APIs but also taking advantage of every last bit of performance the processor has to offer. In this session we'll review some key performance wins you can get from modern processors by properly using instruction-level parallelism, vectorizing loops, avoiding store-to-load forwarding stalls, making better use of the CPU cache, and employing other low-level optimizations that a regular profiler won't tell you about. To improve performance methodically without having to guess where the bottleneck lies, we'll use Intel VTune Amplifier, a low-level performance profiler that has incredible insight into CPU optimizations.
About the Author
You May Also Like