Using System.Text.Json DeserializeAsyncEnumerable to deserialize non-root items

450 Views Asked by At

I'd like to process a large JSON response (a large list of measurements) from a webserver in a streaming fashion using JsonSerializer.DeserializeAsyncEnumerable(). Problem is that the array of measurements is wrapped in a header JSON document. For example:

public record Header(string Id, Measurement [] Measurements);
public record Measurement(string Timestamp, decimal Value);

The DeserializeAsyncEnumerable() specifies that it only works for root level items. Is it possible to still use this method and somehow skip the wrapping class?

I've looked into writing a custom JsonConverter, but that doesn't seem to solve the problem.

I've also tried to create a property of type IAsyncEnumerable<Measurement> on the Header but even if I don't iterate the collection it has already created all the measurement objects.

As for my scenario: I want to go through the file without actually loading the entire file into memory. Simple example: calculate an average over the measurement values. I don't need the header but since it's produced by an external service I cannot change the contract. In the past, with an XML reader I could relatively easily do this but it appears it's not as easy with system.text.json.

1

There are 1 best solutions below

1
On BEST ANSWER

As of .NET 8, System.Text.Json only implements streaming SAX-like parsing for root level JSON arrays. As stated in Announcing .NET 6 Preview 4: Streaming deserialization:

JsonSerializer.DeserializeAsyncEnumerable... only supports reading from root-level JSON arrays, although that could be relaxed in the future based on feedback.

Unfortunately the restriction has not been relaxed as of .NET 8. For confirmation, see [API Proposal]: Support streaming deserialization of JSON objects #64182 which was closed as a duplicate of Developers should be able to pass state to custom converters. #63795 -- which is still open.

So what are some possible workarounds?

Firstly, you could use Utf8JsonStreamReader from this answer by mtosh to Parsing a JSON file with .NET core 3.0/System.text.Json to stream through the measurements, deserialize each one, and process it as required:

using var jsonStreamReader = new Utf8JsonStreamReader(stream, 32 * 1024);

int totalCount = 0;
decimal totalValue = 0;

while (jsonStreamReader.Read())
{
    if (jsonStreamReader.CurrentDepth == 1 && jsonStreamReader.TokenType == JsonTokenType.PropertyName)
    {
        var propertyName = jsonStreamReader.GetString();             
        if (string.Equals(propertyName, "Measurements", StringComparison.OrdinalIgnoreCase))
        {
            if (!jsonStreamReader.Read())
                throw new JsonException();
            if (jsonStreamReader.TokenType == JsonTokenType.StartArray)
            {
                while (jsonStreamReader.Read() && jsonStreamReader.TokenType != JsonTokenType.EndArray)
                {
                    var measurement = jsonStreamReader.Deserialize<Measurement>();
                    // Do something with Measurement, such as compute the total measurement value and count.
                    totalCount++;
                    totalValue += measurement.Value;
                }
            }
        }
    }
}

var average = totalValue / totalCount;

Demo fiddle #1 here.

Secondly, you could use a psuedo-collection that implements ICollection<Measurement> but only processes the added measurements without actually accumulating them.

E.g., define the following classes:

public record Measurement(string Timestamp, decimal Value);
public record Header<TCollection> (string Id, TCollection Measurements) where TCollection : IEnumerable<Measurement>;

public class TotalMeasurementCollection : AgggratingCollectionBase<Measurement>
{
    public int TotalAdded { get; set; }
    public decimal TotalValue { set; get; }

    // Do something with Measurement, such as compute the total measurement value and count.
    public override void Add(Measurement item) => (this.TotalAdded, this.TotalValue) = (this.TotalAdded + 1, this.TotalValue + item.Value);
};

public class AgggratingCollectionBase<TItem> : ICollection<TItem>
{
    public virtual void Add(TItem item) {}
    public bool Contains(TItem item) => false;
    public void CopyTo(TItem[] array, int arrayIndex) => ArgumentNullException.ThrowIfNull(array);
    public int Count => 0;
    public bool IsReadOnly => false;
    public bool Remove(TItem item) => false;
    public void Clear() {}
    public IEnumerator<TItem> GetEnumerator() => Enumerable.Empty<TItem>().GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

And now you will be able to do:

var result = await JsonSerializer.DeserializeAsync<Header<TotalMeasurementCollection>>(stream);
var average = result!.Measurements!.TotalValue / result!.Measurements!.TotalAdded;

This trick is non-obvious but has the advantage of working with any serializer.

Demo fiddle #2 here.

Thirdly, you could switch to Json.NET which supports streaming deserialization of individual objects inside a huge JSON file natively by using JsonTextReader to read through the file. See e.g.: