Python - get TLD

898 Views Asked by At

I have a problem in function which should remove tld from domain. If domain has some subdomain it works correctly. For example:

Input: asdf.xyz.example.com

Output: asdf.xyz.example

Problem is when the domain has not any subdomain, there is dot in front of domain

Input: example.com

Output: .example

This is my code:

 res = get_tld(domain, as_object=True, fail_silently=True, fix_protocol=True)
 domain = '.'.join([res.subdomain, res.domain])

Function get_tld is from tld library

Could someone help me how to solve this problem?

3

There are 3 best solutions below

0
Mathieu On

With a very simple string manipulation, is this what you are looking for?

d1 = 'asdf.xyz.example.com'
output = '.'.join(d1.split('.')[:-1])
# output = 'asdf.xyz.example'

d2 = 'example.com'
output = '.'.join(d2.split('.')[:-1])
# output = 'example'
0
Dmytro Chasovskyi On

You can use filtering. It looks like get_tld works as intended but join is incorrect

domain = '.'.join(filter(lambda x: len(x), [res.subdomain, res.domain]))
0
hiro protagonist On

another simple version is this:

def remove_tld(url):
    *base, tld = url.split(".")
    return ".".join(base)


url = "asdf.xyz.example.com"
print(remove_tld(url))    # asdf.xyz.example

url = "example.com"
print(remove_tld(url))    # example

*base, tld = url.split(".") puts the TLD in tld and everything else in base. then you just join tĥat with ".".join(base).