How to parse values appear after the same string in python?

1.6k Views Asked by weefwefwqg3 At 30 December 2016 at 07:44

I have a input text like this (actual text file contains tons of garbage characters surrounding these 2 string too.)

(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)

I am trying to parse the text to store something like this: value1="xxx" and value2="yyy". I wrote python code as follows:

value1_start = content.find('value')
value1_end = content.find(';', value1_start)

value2_start = content.find('value')
value2_end = content.find(';', value2_start)


print "%s" %(content[value1_start:value1_end])
print "%s" %(content[value2_start:value2_end])

But it always returns:

value=xxx
value=xxx

Could anyone tell me how can I parse the text so that the output is:

value=xxx
value=yyy

Original Q&A

There are 4 best solutions below

Wiktor Stribiżew On 30 December 2016 at 07:54 BEST ANSWER

Use a regex approach:

re.findall(r'\bvalue=[^;]*', s)

Or - if value can be any 1+ word (letter/digit/underscore) chars:

re.findall(r'\b\w+=[^;]*', s)

See the regex demo

Details:

\b - word boundary
value= - a literal char sequence value=
[^;]* - zero or more chars other than ;.

See the Python demo:

import re
rx = re.compile(r"\bvalue=[^;]*")
s = "$%$%&^(&value=xxx;$%^$%^$&^%^*value=yyy;%$#^%"
res = rx.findall(s)
print(res)

Mike Müller On 30 December 2016 at 07:51

For this input:

content = '(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)'

use a simple regex and manually strip off the first and last two characters:

import re

values = [x[2:-2] for x in re.findall(r'\*\*value=.*?\*\*', content)]
for value in values:
    print(value)

Output:

value=xxx
value=yyy

Here the assumption is that there are always two leading and two trailing * as in **value=xxx**.

Christian Dean On 30 December 2016 at 07:55

Use regex to filter the data you want from the "junk characters":

>>> import re
>>> _input = '#4@5%value=xxx38u952035983049;3^&^*(^%$3value=yyy#%$#^&*^%;$#%$#^'
>>> matches = re.findall(r'[a-zA-Z0-9]+=[a-zA-Z0-9]+', _input)
>>> matches
['value=xxx', 'value=yyy']
>>> for match in matches:
    print(match)


value=xxx
value=yyy
>>>

Summary or the regular expression:

[a-zA-Z0-9]+: One or more alphanumeric characters
=: literal equal sign
[a-zA-Z0-9]+: One or more alphanumeric characters

Serge Ballesta On 30 December 2016 at 08:35

You already have good answers based on the re module. That would certainly be the simplest way.

If for any reason (perfs?) you prefere to use str methods, it is indeed possible. But you must search the second string past the end of the first one :

value2_start = content.find('value', value1_end)
value2_end = content.find(';', value2_start)

How to parse values appear after the same string in python?

There are 4 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in STRING

Related Questions in PARSING

Related Questions in STRING-PARSING

Trending Questions

Popular # Hahtags

Popular Questions