C# distributed tracing with Jaeger / OpenTelemetry, but only the caller should report finally

72 Views Asked by At

I have the following scenario:

Service 1 is an ASP.Net service based on the worker template (See https://learn.microsoft.com/en-us/dotnet/core/extensions/workers). This means that there is a simple background service that is added as hosted service (services.AddHostedService<TestService>();) and runs some dummy HTTP requests against service 2.

Example:

public sealed class TestService : BackgroundService
{
    private readonly HttpClient httpClient = new()
    {
        BaseAddress = new Uri("http://localhost:5001")
    };

    public override async Task StartAsync(CancellationToken cancellationToken)
    {
        Log.Logger.Information("Starting test service");
        await base.StartAsync(cancellationToken);
    }

    public override async Task StopAsync(CancellationToken cancellationToken)
    {
        Log.Logger.Information("Stopping test service");
        await base.StopAsync(cancellationToken);
    }

    protected override async Task ExecuteAsync(CancellationToken cancellationToken)
    {
        Log.Logger.Information("Executing test service");
        
        while (!cancellationToken.IsCancellationRequested)
        {
            Log.Logger.Information("Running test service request");
            await this.httpClient.GetAsync("/", cancellationToken);
            await Task.Delay(10000, cancellationToken);
        }
    }
}

Service 1 is configured to do tracing to console and OLTP like this within the ASP.Net pipeline:

services.AddOpenTelemetry()
    .ConfigureResource(resource => resource.AddService(serviceName: "Test1Service"))
    .WithTracing(tracing => 
        tracing
            .AddAspNetCoreInstrumentation()
            .AddConsoleExporter()
            .AddOtlpExporter());

Service 2 is implemented in the same way, but doesn't report to Jaeger directly, e.g. it only sets the activity source like this around the dummy HTTP requests:

private static readonly ActivitySource traceSpan = new("Tracing.Test2");

app.UseEndpoints(endpoints =>
{
    endpoints.MapGet(
        "/",
        async context =>
        {
            using var activity = traceSpan.StartActivity("Request");
            activity?.SetTag("Beer", "Manfred");
            activity?.SetStatus(ActivityStatusCode.Ok);

            var sb = new StringBuilder();
            sb.Append("Hello there!");
            await context.Response.WriteAsync($"{sb}");
        });
});

Jaeger is run locally on Windows using the official binaries (jaeger-1.55.0-windows-amd64) and started with jaeger-all-in-one.exe --collector.otlp.enabled.

What I expected to happen: Start service 2 first, then service 1. Service 1 sends HTTP requests to service 2 all the time (With 10 seconds delay). Service 1 logs to the local Jaeger instance and reports the requests including traces from itself and service 2. Service 2 only adds its trace data to the requests and doesn't report directly (But implicit) to Jaeger.

What happens: Service 1 sends HTTP requests properly to service 2 and waits. However, Jaeger doesn't seem to receive anything at all except some traces for a service called jaeger-all-in-one which seems to be something default? So, basically, no distributed tracing is received... What do I need to check / change here?

Screenshot of Jaeger

Just as reference, I followed the following links:

Process example with two services

Update: I have added AddHttpClientInstrumentation() from OpenTelemetry.Instrumentation.Http and now the traces from service 1 are sent to Jaeger properly. However, I still search for a way to add values from service 2 to the trace without having to add AddOtlpExporter() for service 2 explicitely (See image above) as service 2 is inside a separate network segment where no trace collector can be run.

2

There are 2 best solutions below

0
FranzHuber23 On BEST ANSWER

I have done this differently now:

Service 1 now gets the 'traces' from a certain header that service 2 sets in a middleware.

Logic for service 1 (HttpClient side):

using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Reflection;
using System.Text.Json;
using System.Text.Json.Serialization;
using ILogger = Serilog.ILogger;

public static async Task<List<TraceContextModel>> GetTracesFromResponse(HttpResponseMessage message, ILogger? logger = null)
{
    var result = new List<TraceContextModel>();
    var headers = message.Headers.Where(x => x.Key.StartsWith("X-Tracing")).ToList();

    foreach (var header in headers)
    {
        foreach (var headerValue in header.Value)
        {
            if (!string.IsNullOrWhiteSpace(headerValue))
            {
                try
                {
                    var deserializedTrace = JsonSerializer.Deserialize<TraceContextModel?>(headerValue);

                    if (deserializedTrace is not null)
                    {
                        result.Add(deserializedTrace);
                    }
                }
                catch (Exception ex)
                {
                    logger?.Error(ex, "Failed to deserialize trace span");
                }
            }
        }
    }

    return result;
}

public sealed record class TraceContextModel
{
    public TraceContextModel()
    {
    }

    public TraceContextModel(Activity? activity)
    {
        if (activity is null)
        {
            return;
        }

        this.TraceId = activity.TraceId.ToString();
        this.SpanId = activity.SpanId.ToString();
        this.TraceFlags = activity.ActivityTraceFlags.ToString();
        this.ParentSpanId = activity.ParentSpanId.ToString();
        this.ActivitySourceName = activity.Source.Name;
        this.DisplayName = activity.DisplayName;
        this.Kind = activity.Kind.ToString();
        this.StartTime = activity.StartTimeUtc;
        this.Duration = activity.Duration;
        this.Tags = activity.Tags;
    }

    [JsonPropertyName("TraceId")]
    public string TraceId { get; init; } = string.Empty;

    [JsonPropertyName("SpanId")]
    public string SpanId { get; init; } = string.Empty;

    [JsonPropertyName("TraceFlags")]
    public string TraceFlags { get; init; } = string.Empty;

    [JsonPropertyName("ParentSpanId")]
    public string ParentSpanId { get; init; } = string.Empty;

    [JsonPropertyName("ActivitySourceName")]
    public string ActivitySourceName { get; init; } = string.Empty;

    [JsonPropertyName("DisplayName")]
    public string DisplayName { get; init; } = string.Empty;

    [JsonPropertyName("Kind")]
    public string Kind { get; init; } = string.Empty;

    [JsonPropertyName("StartTime")]
    public DateTime StartTime { get; init; }

    [JsonPropertyName("Duration")]
    public TimeSpan Duration { get; init; }

    [JsonPropertyName("Tags")]
    public IEnumerable<KeyValuePair<string, string?>> Tags { get; init; } = [];
}

On service 2 side, this is done with a custom middleware like this:

using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Reflection;
using System.Text.Json;
using System.Text.Json.Serialization;
using ILogger = Serilog.ILogger;

public sealed class CustomTracingMiddleware
{
    private static readonly string serviceName = Assembly.GetEntryAssembly()?.GetName()?.Name ?? string.Empty;

    private readonly ActivitySource activitySource = new(serviceName);

    private readonly RequestDelegate next;

    public CustomTracingMiddleware(RequestDelegate next)
    {
        // Add activity listener. Don't ask why, followed https://github.com/dotnet/runtime/issues/45070.
        var activityListener = new ActivityListener
        {
            ShouldListenTo = s => true,
            SampleUsingParentId = (ref ActivityCreationOptions<string> activityOptions) => ActivitySamplingResult.AllData,
            Sample = (ref ActivityCreationOptions<ActivityContext> activityOptions) => ActivitySamplingResult.AllData
        };
        ActivitySource.AddActivityListener(activityListener);

        // Add the next delegate.
        this.next = next;
    }

    public async Task Invoke(HttpContext context)
    {
        using var activity = this.activitySource.StartActivity(context.Request.Path);
        var value = JsonSerializer.Serialize(new TraceContextModel(activity));
        context.Response.Headers.Append($"X-Tracing-{serviceName}", value);
        await this.next.Invoke(context);
    }
}

and used like this in Configure(IApplicationBuilder app, IWebHostEnvironment env):

app.UseMiddleware<IvuTracingMiddleware>();

The solution works as I wanted, but with 2 downgrades, of course:

  1. It's not really tracing (as we can't write this to Jaeger, etc. but just log it)
  2. The Duration of the traces is always 00:00:00 as the middleware can only inject headers before the next action is done (Otherwise the headers might have been already written and an exception is thrown).
4
Cine On

OpenTelemetry works by each service reporting their own tracing to /some/endpoint(s). In this they used a TraceId, SpanId, and StartTime.

If a service A calls another service B, A can choose to inform that B what their TraceId is, so that the telemetry emitted by B uses the same Id. This mean that the telemetry is emitted by A is totally separated from the telemetry emitted by B. In DotNet you can do this for HttpClient using .AddHttpClientInstrumentation().

The Endpoint (Jaeger, Grafana, etc) is then the one responsible for combining the traces from all the various services and presenting it to your in a way that makes sense.

TLDR;

Each service must emit their own telemetry.