I am having an intermittent issue with the telegraf processors.regex (at least that's my best guess)
We are using the following telegraf configs
- /etc/telegraf
- telegraf.conf (only configures
[[agent]]) - telegraf.d
- inputs.conf
- output.conf
- processors.conf
- telegraf.conf (only configures
inputs.conf
[[inputs.http]]
urls = [
"http://myserver.mycompany.com:8080/some/rest/api",
]
username = "user"
password = "password"
name_override = "monitor"
interval = "600s"
timeout = "3s"
data_format = "json"
json_query = "rows"
json_string_fields = [ "size" ]
tagexclude = ["host"]
outputs.conf
[[outputs.influxdb]]
database = "metrics"
urls = ["http://influxdb.mycompany.com:8086"]
processors.conf
[[processors.converter]]
[processors.converter.fields]
integer = [ "size" ]
# Process order is VERY important here
# Rename the url tag to target
[[processors.rename]]
[[processors.rename.replace]]
tag = "url"
dest = "target"
# Extract the target name from the url (I know we just renamed it ... weird)
[[processors.regex]]
[[processors.regex.tags]]
key = "url"
pattern='^http://(?P<target>[^:/]+).+'
replacement = "${target}"
When I run:
telegraf --config telegraf.conf --config-directory telegraf.d --test --debug --input-filter http
I get back the data I expect and url has been replaced with the regex target i.e.
monitor,target=myserver.mycompany.com size=123456789i 1627647959000000000
The problem is in the grafana graph I have created I see the original full url http://myserver.mycompany.com:8080/some/rest/api rather than the processed myserver.mycompany.com. Also very occasionally when I run the telegraf test I will see target returned with the full unprocessed url i.e.
monitor,target=http://myserver.mycompany.com:8080/some/rest/api size=123456789i 1627647959000000000
The data is correct and has been processed i.e. the size string returned in the json is always converted to int and url is always renamed to target.
Even stranger is I have pushed this config (with different urls in inputs.http depending on the region) to a number of servers and the majority of them work exactly as expected, it's just a few that have this behaviour. I have checked and made sure that all the versions of telegraf on each server match (1.19.1) and they are all running on Centos 7. I have also tried clearing the data from the influxdb.
The few servers that return the url in the target always do so, even though when I run the telegraf test on them they show the host stripped out as they should.
Any hints as to where to look next?
I have found the cause!
From the telegraf docs.
Even my comments reveal why it's an issue
Yes it is weird, but that was because I happened to keep hitting the same 50:50 chance in my tests but the other order is equally likely. When in the wrong order the key is renamed and the regex has nothing to process on.
The solution is to use
order.processors.conf
Now the regex will always run before the rename.