Regex to catch email addresses in email header

168 Views Asked by Clodoaldo Neto At 03 June 2023 at 11:52

I'm trying to parse a To email header with a regex. If there are no <> characters then I want the whole string otherwise I want what is inside the <> pair.

import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
    m = re_destinatario.search(address)
    print(m.groups())
    print(m.group('to'))

But the regex is wrong:

('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <[email protected]>',)
Fulano de Tal | Atlantica Beans <[email protected]>

What am I missing?

Original Q&A

There are 2 best solutions below

anubhava On 03 June 2023 at 12:03 BEST ANSWER

You may use this regex:

<?(?P<to>[^<>]+)>?$

RegEx Demo

RegEx Demo:

<?: Match an optional <
(?P<to>[^<>]+): Named capture group to to match 1+ of any characters that are not < and >
>?: Match an optional >
$: End

Code Demo

Code:

import re
re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]
for address in addresses:
    m = re_destinatario.search(address)
    print(m.group('to'))

Output:

XKYDF/ABC (Caixa Corporativa)
[email protected]

The fourth bird On 03 June 2023 at 12:03

You should not make the angle brackets optional, but the whole angle bracket match.

^.*?(?:<(?P<to>.*)>)?$

Explanation

^ Start of string
.*? Match any character, as few as possible
(?: Non capture group to match as a whole part
- <(?P<to>.*)> Match <, then capture in named group to any character and then match > (note that .* can also cross matching < and >)
)? Close the non capture group and make it optional
$ End of string

Regex demo

For example:

import re

re_destinatario = re.compile(r'^.*?(?:<(?P<to>[^<>\n]*)>)?$')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <[email protected]>'
]

for address in addresses:
    m = re_destinatario.search(address)
    if m:
        if m.group('to'):
            print(m.group('to'))
        else:
            print(m.group())

Output:

XKYDF/ABC (Caixa Corporativa)
[email protected]

If you don't want to cross matching the angle brackets or a newline:

^.*?(?:<(?P<to>[^<>\n]*)>)?$

Regex demo

Regex to catch email addresses in email header

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in EMAIL-HEADERS

Related Questions in EMAIL-ADDRESS

Trending Questions

Popular # Hahtags

Popular Questions