In software development we always think about performance (sometimes quality comes after due to short road map )...
So it's important to find rapidly information about what we do, to do it cleaver and get time to test the code after.
It's why I share here a good SSE documentation :
The website is well documented and will give you good pointer for doing your own code...
As an example I add an example that I have made to compute L2 distance with SSE optimization : here
Comparative time ( 900000 distances on 128 float long histograms) :
- SSE : 0.05 ms
- Naive method : 0.12 ms
- Naive Optimized : 0.09 ms