convert string which contains sub string to dictionary

78 Views Asked by At

I am tring to convert particular strings which are in particular format to Python dictionary. String format is like below,

st1 = 'key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4'

I want to parse it and convert to dictionary as below,

dict1 {
    key1: None,
    key2: value2,
    key3: {
            key3.1: None,
            key3.2: value3.2,
            key3.3: value3.3,
            key3.2: None
          }
    key4: None,

I tried to use python re package and string split function. not able to acheive the result. I have thousands of string in same format, I am trying to automate it. could someone help.

2

There are 2 best solutions below

0
Nolan Walker On BEST ANSWER

If all your strings are consistent, and only have 1 layer of sub dict, this code below should do the trick, you may need to make tweaks/changes to it.

import json

st1 = 'key1 key2=item2 key3="key3.1, key3.2=item3.2 , key3.3 = item3.3, key3.4" key4'
st1 = st1.replace(' = ', '=')
st1 = st1.replace(' ,', ',')
new_dict = {}
no_keys=False

while not no_keys:
    st1 = st1.lstrip()
    
    if " " in st1:
        item = st1.split(" ")[0]
    else:
        item = st1
    
    if '=' in item:
        if '="' in item:
            item = item.split('=')[0]
            new_dict[item] = {}     
            
            st1 = st1.replace(f'{item}=','')
            sub_items = st1.split('"')[1]
            sub_values = sub_items.split(',')

            for sub_item in sub_values:
                if "=" in sub_item:
                    sub_key, sub_value = sub_item.split('=')
                    new_dict[item].update({sub_key.strip():sub_value.strip()})
                else:
                    new_dict[item].update({sub_item.strip(): None})
            
            st1 = st1.replace(f'"{sub_items}"', '')
        else:
            key, value = item.split('=')
            new_dict.update({key:value})
            st1 = st1.replace(f"{item} ","")
    else:
        new_dict.update({item: None})
        st1 = st1.replace(f"{item}","")
        
    if st1 == "":
        no_keys=True    
    
print(json.dumps(new_dict, indent=4))
0
HALF9000 On

Consider use parsing tool like lark. A simple example to your case:

_grammar = r'''
    ?start: value
    
    ?value: object
           | NON_SEPARATOR_STRING?

    object : "\"" [pair (_SEPARATOR pair)*] "\""
    pair : NON_SEPARATOR_STRING [_PAIRTOR] value

    
    NON_SEPARATOR_STRING: /[a-zA-z0-9\.]+/
    _SEPARATOR: /[,  ]+/
            | ","
    _PAIRTOR: " = "
            | "="
'''

parser = Lark(_grammar)

st1 = 'key1 key2=value2 key3="key3.1, key3.2=value3.2 , key3.3 = value3.3, key3.4" key4'

tree = parser.parse(f'"{st1}"')
print(tree.pretty())

"""
object
  pair
    key1
    value
  pair
    key2
    value2
  pair
    key3
    object
      pair
        key3.1
        value
      pair
        key3.2
        value3.2
      pair
        key3.3
        value3.3
      pair
        key3.4
        value
  pair
    key4
    value

"""

Then you can write your own Transformer to transform this tree to your desired date type.