Replies: 1 comment
-
Looking closer at optimizing Version.Parse, I slapped together this implementation that avoids int.parse: static Version Parse(string str)
{
return TryParse(str, out var v) ? v : throw new FormatException();
}
static bool TryParse(string str, out Version ver)
{
ver = default;
Span<int> components = stackalloc int[4];
int count = 0;
uint v = 0;
var l = -1;
var i = 0;
for (; i < str.Length; i++)
{
var c = str[i];
var d = (uint)(c - '0');
if (d <= 9)
{
v = v * 10 + d;
continue;
}
if (c == '.')
{
if (i - l == 1 || v > short.MaxValue)
return false;
components[count] = (int)v;
v = 0;
count++;
l = i;
if(count >= 4)
return false;
continue;
}
return false;
}
if(l == i)
return false;
components[count] = (int)v;
count++;
switch (count)
{
default:
return false;
case 2:
ver = new Version(components[0], components[1]);
break;
case 3:
ver = new Version(components[0], components[1], components[2]);
break;
case 4:
ver = new Version(components[0], components[1], components[2], components[3]);
break;
}
return true;
} Which benchmarks between 4-5x faster than the BCL implementation.
There are probably edge cases I didn't consider that might slow it down though. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In dealing with a dataset (CSV) that contains a lot of single-digit numbers, I've found that by special-casing the handling of the single digit case and circumventing int.parse, it was measurably faster. I was curious if this special case handling was worth putting into the number parsing routines in the BCL. Single-character string will always parse to the same integer value regardless of number style or culture, so it is possible to avoid those altogether (I'd love to know if I'm wrong about that assumption).
The optimization is trivial, if the string is a single character assume it is a digit in the range 0-9, otherwise fallback to the standard implementation:
I created a benchmark to compare this to the BCL implementation. These results for single-character strings show significant performance improvement: (the "s" column is the string being parsed)
The improvement for decimal isn't as significant because
new Decimal
has to do a bit of work. I found that creating a lookup table of the 10 values and indexing into it instead, cut this approx in half to 2ns.Of course, this comes at the cost of slightly slower parsing for longer strings that encounter this new branch in the logic:
The results are a little inconsistent here. The difference between 2-char integers (which I think should be the worst-case regression) shows >10% improvement (surprising), and the 3-char integer show ~10% regression. On average though, there appears to be a very slight regression, maybe only a couple percent. I'm not sure if it is possible to get more stable/consistent results with BenchmarkDotNet (I'm a bit new to it) and this is using the default configurations.
So, is that very slight regression worth the significant improvement in single character parsing? Obviously, it depends on the dataset/use-case whether this would be a net win. The breakeven point is mostly dictated by the regression, so a 1% regression means the breakeven is when ~1% of the values are single digit. Do enough real-world data sets meet this criteria? "0" certainly comes up a lot. For integers in an ordered list, the first 9 elements would be single digit, and only after 900 elements would the breakeven point be reached. How often does that happens in reality, I have no idea. For double and decimal values, it is probably less likely provide a gain.
In the benchmark dataset for the csFastFloat the mesh.txt contains 5% single-character values. Actually, 10% of the values would be single-digit if formatted differentl; there are a lot of "0.0" and "1.0" values. The canada.txt, on the other hand, has no single-digit values.
Of course, the real benefit would be in data sets that contain a significant number of single-digit values; I just don't know if this is common enough to merit baking this into the BCL. For my specific scenario where I had a majority of '0' values, it was a win.
One specific use-case where single digits are very common is in
System.Version
parsing, which currently usesint.Parse
to convert the version components into integer values. Perhaps, at the very least, it would be worth considering applying this optimization there?Beta Was this translation helpful? Give feedback.
All reactions