Mysterious StringReader performance

707 Views Asked by At

I was doing some research on the StringReader class in .NET and C#, using the implementation found here: https://referencesource.microsoft.com/#mscorlib/system/io/stringreader.cs

I made a small class which I thought used the same basic implementation for reading a string, but to my surprise my code is over twice as slow as the .NET StringReader.

Here is my class:

public class DataReader
{
    private String source;
    private int pos;
    private int length;

    public DataReader(string data)
    {
        source = data;
        length = source.Length;
    }

    public int Peek()
    {
        if (pos == length) return -1;
        return source[pos];
    }

    public int Read()
    {
        if (pos == length) return -1;
        return source[pos++];
    }
}

Here is my test code:

using System;
using System.IO;
using System.Diagnostics;
using System.Text;
using System.Collections.Generic;                   

public class Program
{
    public static void Main()
    {
        var s = new String('x', 10000000);

        StringReaderTest(s);
        
        DataReaderTest(s);
    }
    
    private static void StringReaderTest(string s)
    {
        var stopwatch = new Stopwatch();
        
        stopwatch.Start();
        
        var reader = new StringReader(s);
        
        while (reader.Peek() > -1)
        {
            reader.Read();
        }
        
        stopwatch.Stop();
        
        Console.WriteLine(stopwatch.ElapsedMilliseconds);
    }

    private static void DataReaderTest(string s)
    {
        var stopwatch = new Stopwatch();
        
        stopwatch.Start();

        var reader = new DataReader(s);

        while (reader.Peek() > -1)
        {
            reader.Read();
        }

        stopwatch.Stop();
        
        Console.WriteLine(stopwatch.ElapsedMilliseconds);
    }
}

And here is a .NET fiddle of the whole thing. https://dotnetfiddle.net/MqbU5q

This is the output from the Fiddle. My implementation is twice as slow.

77
159

I must have missed something, can anybody please explain?

1

There are 1 best solutions below

0
TheGeneral On

So firstly, Stopwatch is not a legitimate benchmarking tool, there are many reasons why it's not appropriate.

You should be using BenchmarkDotNet or similar, which pre-warms, pre-JIT's, garbage collects before each run, runs the test multiple times, and alerts you when you are in debug etc.

Here is an example of how you might produce a more reliable benchmark.

Disclaimer: There is more that should be reasoned about in a good benchmark, like using a more realistic representation of your actual data and use cases.

Test Code

[SimpleJob(RuntimeMoniker.Net60, baseline: true)]  
public class ReaderTest
{ 
    private const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private string _s;
    public static Random R = new(42);

    [Params(1000, 10000, 100000)] public int N;

    [GlobalSetup]
    public void Setup() => _s = new string(Enumerable.Repeat(Chars, N).Select(s => s[R.Next(s.Length)]).ToArray());

    public static void Main() => BenchmarkRunner.Run<ReaderTest>();

    [Benchmark]
    public int StringReader()
    {
        var stringReader = new StringReader(_s);
        var result = 0;
        while (stringReader.Peek() > -1)
            result ^= stringReader.Read();
        return result;
    }

    [Benchmark]
    public int DataReader()
    {
        var dataReader = new DataReader(_s);
        var result = 0;
        while (dataReader.Peek() > -1)
            result ^= dataReader.Read();      
        return result;
    }   
}

Benchmarks

Environment

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]   : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT  [AttachedDebugger]
  .NET 6.0 : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT

Job=.NET 6.0  Runtime=.NET 6.0

Results

Method N Mean Error StdDev Ratio
StringReader 1000 3.424 us 0.0186 us 0.0174 us 1.00
DataReader 1000 1.459 us 0.0126 us 0.0098 us 1.00
StringReader 10000 34.157 us 0.2244 us 0.2099 us 1.00
DataReader 10000 14.693 us 0.0577 us 0.0540 us 1.00
StringReader 100000 348.650 us 6.7857 us 7.2606 us 1.00
DataReader 100000 146.257 us 1.0318 us 0.9651 us 1.00

Summary

As you can see (and as you would expect) your implementation is faster, there are less checks, and it doesn't have to go through layers of inheritance to achieve the same results (ergo the compiler isn't emitting CallVirt to lookup vtables at runtime)

The callvirt instruction calls a late-bound method on an object. That is, the method is chosen based on the runtime type of obj rather than the compile-time class visible in the method pointer.