Computer Vision News - November 2016

Multiplying x by N, takes integer x in the range [0, 232) and maps it to multiples of N in [0, N * 232). By dividing by 232, we map all multiples of N in [0, 232) to 0, all multiples of N in [232, 2 * 232) to one, and so forth. As mentioned computing x * N is very fast, adding to this the shift operation -- on the latest 64bit processor (Skylake) – this would involve 2 cycles with latency of 4 cycles. In a benchmark created by Daniel Lemire performance results showed that the above module operation is four times faster -- about 2 CPU cycles instead of 8. Let’s check to see that it is indeed a fair mapping: All we need to do is count the number of multiples of N in intervals of length 232. This count must be either ceil(232/N) or floor(232/N). The maximum number of multiples of N will occur if the first value in the interval is a multiple of N. How many multiples will there be? Exactly ceil(232/N). Indeed, if you draw sub-intervals of length N, then every complete interval begins with a multiple of N and if there is any remainder, then there will be one extra multiple of N. The minimum number of multiples of N will occur if the first multiple of N appears at position N-1 in the interval. In that case, we get floor(232/N) multiples. To see why - again - draw sub-intervals of length N. Every complete sub-interval ends with a multiple of N. Computer Vision News Trick 11 Dear reader, How do you like Computer Vision News? Did you enjoy reading it? Give us feedback here: It will take you only 2 minutes to fill and it will help us give the computer vision community the great magazine it deserves! Give us feedback, please (click here) FEEDBACK Trick “ This trick was recently included in TensorFlow, boosting its performance by 10-20% ” “ Benchmark performance results showed that the above module operation is four times faster ”