I have a data frame and one column consists of list value. I have attached the picture in excel format and data frame as well.
column
"[
""Hello""
]"
"[
""Hello"",
""Hi""
]"
"[
""Hello"",
""Hi"",
""""
]"
"[
"""",
""Hello"",
""Hi""
]"
"[
""Hello"",
""""
]"
"[
"""",
""Hello""
]"
1][1]
The column value looks like
column
------
[\n "Hello" \n]
[\n "Hello", \n "Hi"\n]
[\n "Hello", \n "Hi"\n, \n ""\n]
[\n ""\n, \n "Hello", \n "Hi"\n]
[\n "Hello" \n, \n ""\n]
[\n ""\n, \n "Hello" \n]
So, I want to remove \n and "" from the list and have value as
column
------
["Hello"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello"]
["Hello"]
So, how can we obtain following result using pandas and python?
I'm not sure how to handle the input data that you have because that is not correctly formatted Python. However, I think there are a couple of ways to solve the problem.
Input data (as correct Python)
Code: First
mapthen List ComprehensionThe
mapremoves the whitespace including the newline\ncharacters. The list comprehension then removes the empty entries from each row ("").Output
Note that the end result has single quotes rather than double quotes. Let me know if this matters for what you're doing.
For fun
Just for fun, I took your input data absolutely literally, and wrote a set of replacements to result in exactly the output you have in the question.
Input data
Code
Output