Micrometer's DefaultMeterObservationHandler in ObservationAPi doesn't use high-cardinality values

149 Views Asked by At

I'm using the new ObservationAPI in Spring Boot 3.2.2. When I create an Observation.Context I'm supplying my own high-cardinality values (e.g. requestId, conversationId) and I thought these would be tagged and published with the metric which pushed to Elastic via the ElasticMeterRegistry but when I looked at the source for io.micrometer.core.instrument.observation.DefaultMeterObservationHandler then I can see it only creates tags for low-cardinality values from the Observation.Context. I suspect the reason for this is how registration with the MeterRegistry work:

the io.micrometer.core.instrument.Counter.Builder.register method states in its JavaDoc, a new counter is returned only if a counter with the same tag values does not yet exist. This is because each registry is guaranteed to only create one counter for the same combination of name and tags.

..therefore if tags were used for high-cardinality values then the meterregistry is potentially adding a new meter for each metric and thus you get a memory leak. Jonatan Ivanov (Spring Engineering/Micrometer) discusses this in his post https://develotters.com/posts/high-cardinality/ . I think others have asked for a feature where they can add tags dynamically such as this approach (https://dzone.com/articles/spring-boot-metrics-with-dynamic-tag-values).

If this is the case then what's the point of having cardinality values in the Observation.Context? And is there any way to just publish these high-cardinality values as extra context around the metric? I think Jonatan Ivanov is suggesting this kind of information shouldn't be in the metric itself but should be in the logging but it seems a big drawback of micrometer if you can't add this extra contextual info and if it must be in the application logging then how can you link your metric to your log statement?

2

There are 2 best solutions below

0
Jonatan Ivanov On

Question 1

If this is the case then what's the point of having cardinality values in the Observation.Context?

I'm not 100% sure I get this but being able to attach tags dynamically is not the same as attaching high cardinality data. You can do the former in multiple ways: https://github.com/micrometer-metrics/micrometer/pull/4097 but you should not do the latter, my blog post calls out why:

This is usually what we mean by high cardinality: a lot of data is ok but infinite data will cause problems since you cannot store an endless amount of data in non-infinite space, either your service or your metrics backend will suffer.

This is not unique to Micrometer but is true for every metric library and metric backend. The point of being able to add high cardinality data to an Observation is being able to use it everywhere else other than metrics: tracing, logging, etc. The Observation API is not just an extra layer that creates metrics for you, it is an API you can use to create any data from your observations you want, metrics is only one of them.

Question 2

And is there any way to just publish these high-cardinality values as extra context around the metric?

There is, you can write your own ObservationHandler and where DefaultMeterObservationHandler attaches low cardinality tags, you can attach all. Though you need to face the consequences above, if you do this and your data is truly high cardinality, your JVM will run out of heap and your metrics backend will run out of memory/disk space.

Question/Statement 3

I think Jonatan Ivanov is suggesting this kind of information shouldn't be in the metric itself but should be in the logging...

I was suggesting using a signal that can handle high cardinality data, logging is just one of them but not the only one:

Instead of trying to attach this data to your metrics, try to use a different output that was designed to contain high cardinality data, e.g.: logging, distributed tracing, event store, etc.

Question 3.5

but it seems a big drawback of micrometer if you can't add this extra contextual info and if it must be in the application logging then how can you link your metric to your log statement?

As I mentioned above, this is a property of metrics, and it is not unique to Micrometer. I also gave a hint at the end to correlate metrics to other signals:

You can correlate this data (logs, traces, etc.) using Exemplars but that will be a topic for another post.

I haven't written a blog post about this but you can see this in action in one of my talks: https://www.youtube.com/watch?v=HQHuFnKvk_U#t=42m46s (I recommend watching the whole talk to have a better understanding about what is happening in the section I linked and also to see how to move between other signals.)

1
John Edwards On

Thanks for your length reply. I really appreciate it. I think I get the point that the instruments that are registered in the registry are (according to the docs) "be unique for each registry, but each registry is guaranteed to only create one counter for the same combination of name and tags" so if I use high cardinality tags then I'll leak memory because a meter will be created for each time. But if I use the withTags feature in 1.12.x of micrometer (see #Issue 535) then I presume I can circumvent this problem? So I implemented my own ObservationHandler as follows:

      public class ObservationHandlerWithHighCardinalityTagging implements MeterObservationHandler<Observation.Context> {
    
      private final MeterRegistry meterRegistry;
    
      public ObservationHandlerWithHighCardinalityTagging(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
      }
    
      @Override
      public void onStart(Observation.Context context) {
        final Meter.MeterProvider<LongTaskTimer> meterProvider =
                LongTaskTimer.builder(context.getName() + ".active")
                        .tags(tags(context.getLowCardinalityKeyValues()))
                        .withRegistry(meterRegistry);
        var registeredTimer = meterProvider.withTags(tags(context.getHighCardinalityKeyValues()));
        LongTaskTimer.Sample longTaskSample = registeredTimer.start();
        context.put(LongTaskTimer.Sample.class, longTaskSample);
    
        Timer.Sample sample = Timer.start(meterRegistry);
        context.put(Timer.Sample.class, sample);
      }
    
      @Override
      public void onStop(Observation.Context context) {
        List<Tag> tags = tags(context.getLowCardinalityKeyValues());
        tags.add(Tag.of("error", getErrorValue(context)));
        Timer.Sample sample = context.getRequired(Timer.Sample.class);
        final Meter.MeterProvider<Timer> meterProvider =
                Timer.builder(context.getName()).tags(tags).withRegistry(this.meterRegistry);
        var registeredTimer = meterProvider.withTags(tags(context.getHighCardinalityKeyValues()));
        sample.stop(registeredTimer);
        LongTaskTimer.Sample longTaskSample = context.getRequired(LongTaskTimer.Sample.class);
        longTaskSample.stop();
      }
    
      @Override
      public void onEvent(Observation.Event event, Observation.Context context) {
        final Meter.MeterProvider<Counter> meterProvider =
                Counter.builder(context.getName() + "." + event.getName())
                        .tags(tags(context.getLowCardinalityKeyValues()))
                        .withRegistry(meterRegistry);
        var registeredCounter = meterProvider.withTags(tags(context.getHighCardinalityKeyValues()));
        registeredCounter.increment();
      }
    
      private String getErrorValue(Observation.Context context) {
        Throwable error = context.getError();
        return error != null ? error.getClass().getSimpleName() : "none";
      }
    
    
      private List<Tag> tags(KeyValues keyValues) {
        return keyValues.stream()
                .map(keyValue -> Tag.of(keyValue.getKey(), keyValue.getValue()))
                .collect(Collectors.toList());
      }
}