common: Improve linux dcache hash algorithm
In ceph_str_hash_linux(), hash value is defined as unsigned long,
which is 8 bytes on 64-bit platforms when compiled with gcc. But
the return value is truncated to 4 bytes, and there's no need to
preserve 8 bytes intermediate value in the algorithm. The compiler
doesn't figure out this fact and produces redundant code.
After removing the "long" definition, this route runs much faster.
Following result are tested on x86_64 and aarch64 platforms, built
by gcc 5.3.1 with optimization level -O2. Same output is observed
with gcc 4.8.4 and -O3 optimization.
ARM Cortex-A57
+---------------+--------------+---------------+-------------+
| String Length | Time w/ long | Time w/o long | Improvement |
+---------------+--------------+---------------+-------------+
| 32 | 0.088 us | 0.067 us | 24% |
+---------------+--------------+---------------+-------------+
| 4096 | 10.26 us | 8.20 us | 20% |
+---------------+--------------+---------------+-------------+
| 65536 | 164 us | 131 us | 20% |
+---------------+--------------+---------------+-------------+
|
1048576 | 2624 us | 2099 us | 20% |
+---------------+--------------+---------------+-------------+
Intel i7-4790
+---------------+--------------+---------------+-------------+
| String Length | Time w/ long | Time w/o long | Improvement |
+---------------+--------------+---------------+-------------+
| 32 | 0.033 us | 0.028 us | 16.3% |
+---------------+--------------+---------------+-------------+
| 4096 | 3.87 us | 3.64 us | 6.2% |
+---------------+--------------+---------------+-------------+
| 65536 | 61.3 us | 57.8 us | 5.7% |
+---------------+--------------+---------------+-------------+
|
1048576 | 973 us | 917 us | 5.8% |
+---------------+--------------+---------------+-------------+
Signed-off-by: Yibo Cai <yibo.cai@linaro.org>
(cherry picked from commit
0cdee4bf0a55f63321957729c50caf17e7cd3b4a)