Couldn't find the right Regex code to extract the exact numbers

213 Views Asked by At

I have extracted an string about 64 bit steam ID's and friendlist using web scraping. I want to get the unique steamid's so that I can store them on a different file. I used regex, but I think I have a mistake in the the notation part.

This is the string.

{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}

I used regex as this:

import re
re.findall("[^:[0-9]+[0-9]+", soup.text)

However, I got this result:

['"7656xxxxxxx80x76',
'"76561xxxxxxx4xx89',
'"765xxxxxxxxxxx3194']

How am I going to get rid of the ditto marks (") at the beginning of the numbers?

4

There are 4 best solutions below

2
furas On BEST ANSWER

You have JSON string so use module json

import json

text = '{"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}'

data = json.loads(text)

for friend in data["friendslist"]['friends']:
    print(friend['steamid'])

Result:

7656xxxxxxx80x76
76561xxxxxxx4xx89
765xxxxxxxxxxx3194
0
SM Abu Taher Asif On

I have made a recursive function which takes data and key then make a list of results:

data = {"friendslist":{"friends":[{"steamid":"7656xxxxxxx80x76","relationship":"friend","friend_since":1552765824},{"steamid":"76561xxxxxxx4xx89","relationship":"friend","friend_since":1508594830},{"steamid":"765xxxxxxxxxxx3194","relationship":"friend","friend_since":1543773569}]}}
def getDataFromNestedDict(data, dictKey):
    if isinstance(data, dict):
        if dictKey in data.keys():
            steamDataList.append(data[dictKey])
        for key, value in data.items():
            if isinstance(value, dict):
                getDataFromNestedDict(value, dictKey)
            elif isinstance(value, list):
                for item in value:
                    getDataFromNestedDict(item,dictKey)

    elif isinstance(data, list):
        for item in data:
            getDataFromNestedDict(item,dictKey)
steamDataList = []
getDataFromNestedDict(data, 'steamid')
print(steamDataList)

output:

['7656xxxxxxx80x76', '76561xxxxxxx4xx89', '765xxxxxxxxxxx3194']
0
Kellen On

The regex you're providing isn't doing what you expect. The first [ is matching with the first ].

Using lookahead/behind to find the double quotes:

(?<=\")(\d+[x\d]+\d)(?=\")

@Furas is right, though. You should just be parsing the JSON instead.

0
ecavard On

I recommend you follow the answer of @furas (use json parser).

But if you really want to use Regex: [^ ["]+[0-9]+[0-9]+