I am trying to parse some kind of an ID using Grok Expressions, but it seems to me that it first tries to parse this ID as a number and if it succeeds, then returns a number, associated with this ID. If it cannot parse the ID as a number due to capture having some weird format (e.g. having letters inside the number), it is working as expected. However, if the ID has 'e' in the middle of it, Grok thinks that this is a number written in scientific notation, thus, trying to parse it as a number, which leads to unexpected behaviour.
For example, for a pattern 'UNIQUE_ID ([A-Za-z0-9]+)' and the Grok expression %{UNIQIE_ID:uid}, the following input:
3a564e500920343446
4e4
4e56
47564e500920343446
evaluates to
[
{
"uid": "3a564e500920343446"
},
{
"uid": 40000
},
{
"uid": 4e+56
},
{
"uid": null
}
]
This is an unexpected result, since I expected it to evaluate to:
[
{
"uid": "3a564e500920343446"
},
{
"uid": 4e4
},
{
"uid": 4e56
},
{
"uid": 47564e500920343446
}
]
What is especially curious about Grok's behaviour is that it cannot cast 47564e500920343446 to a number, since it is too big and overflows whatever datatype it uses internally and quietly returns null, instead of using a fallback to parsing this as a string.
So, the question is -- how to force Grok to parse expressions as strings instead of numbers? This would solve the problem of incorrect parsing of strings, which resemble scientific notation, but still aren't meant to be read as numbers.
Alternatively, are there any ways to parse the IDs using Grok, which are resistant to these kinds of problematic inputs and work as expected?