Re: The Case for Rust (in any system)
Date: Sat, 14 Sep 2024 02:24:14 UTC
> Try and explain this for example: > > Sorting int array with clang++18 and subscripts... > User time = 4.74 seconds (.07900 minutes) (.00131 hours). > RSS = 4204 KB > > Sorting long array with clang++18 and subscripts... > User time = 2.22 seconds (.03700 minutes) (.00061 hours). > RSS = 4608 KB A new, curious participant here. My guess is that the ints are being extended to longs inside the loop, which would require an extra sign extension instruction. I don't think that explains the time doubling, but simply running that one instruction may not be the only cause of performance loss from an extra instruction. That one instruction may actually be the straw that broke the L1 camel's back; without it, the L1 instruction cache may not overflow, but with it, the L1 instruction cache may overflow, causing cache misses into L2 on every iteration of the loop. It would also occupy one of the arithmetic units, which could lead to less instruction level parallelism or give the compiler less room for unrolling the loop. Just a theory; I have no clue. If you have code to share, I'd love to see it and try to reproduce the effect. Gavin Howard