How to remove \n and empty string in a column in a dataframe?

Question

How to remove \n and empty string in a column in a dataframe?

580 Views Asked by Bad Coder At 02 November 2022 at 23:41

I have a data frame and one column consists of list value. I have attached the picture in excel format and data frame as well.

column
"[
""Hello""
]"
"[
""Hello"", 
 ""Hi""
]"
"[
""Hello"", 
 ""Hi"",
 """"
]"
"[
"""",
""Hello"", 
 ""Hi""
]"
"[
""Hello"",
""""
]"
"[
"""",
""Hello""

]"

1][1]The column value looks like

column
------
[\n "Hello" \n]
[\n "Hello", \n "Hi"\n]
[\n "Hello", \n "Hi"\n, \n ""\n]
[\n ""\n, \n "Hello", \n "Hi"\n]
[\n "Hello" \n, \n ""\n]
[\n ""\n, \n "Hello" \n]

So, I want to remove \n and "" from the list and have value as

column
------
["Hello"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello", "Hi"]
["Hello"]
["Hello"]

So, how can we obtain following result using pandas and python?

Original Q&A

There are 3 best solutions below

**Utkonos** · Answer 1 · 2022-11-03T00:33:00.780000

I'm not sure how to handle the input data that you have because that is not correctly formatted Python. However, I think there are a couple of ways to solve the problem.

Input data (as correct Python)

column = [
    ['\n "Hello" \n'],
    ['\n "Hello"', '\n "Hi"\n'],
    ['\n "Hello"',' \n "Hi"\n', '\n ""\n'],
    ['\n ""\n', '\n "Hello"', '\n "Hi"\n'],
    ['\n "Hello" \n', '\n ""\n'],
    ['\n ""\n', '\n "Hello" \n']
]

Code: First `map` then List Comprehension

The map removes the whitespace including the newline \n characters. The list comprehension then removes the empty entries from each row ("").

def stripper(text):
    return text.strip().strip('"')

for row in column:
    output = list(map(stripper, row))
    print([i for i in output if i])

Output

['Hello']
['Hello', 'Hi']
['Hello', 'Hi']
['Hello', 'Hi']
['Hello']
['Hello']

Note that the end result has single quotes rather than double quotes. Let me know if this matters for what you're doing.

For fun

Just for fun, I took your input data absolutely literally, and wrote a set of replacements to result in exactly the output you have in the question.

Input data

column = r"""[\n "Hello" \n]
[\n "Hello", \n "Hi"\n]
[\n "Hello", \n "Hi"\n, \n ""\n]
[\n ""\n, \n "Hello", \n "Hi"\n]
[\n "Hello" \n, \n ""\n]
[\n ""\n, \n "Hello" \n]""".splitlines()

Code

for row in column:
    print(row.replace('\\n "', '"').replace('" \\n', '"').replace('""\\n, ', '').replace(', ""\\n', '').replace('"\\n', ''))

Output

["Hello"]
["Hello", "Hi]
["Hello", "Hi]
["Hello", "Hi]
["Hello"]
["Hello"]

**Dhruv Awasthi** · Answer 2 · 2023-06-04T22:28:20.457000

Taking an example that you provided with a dataframe df with column name column, we will use the following code snippet

def remove_empty_line(row):
    updated_list = list()
    for elem in row:
        updated_list.append(elem.replace("\n", "").strip())
return updated_list

df["column"] = df["column"].apply(lambda row: remove_empty_line(row))

Now you can check your df with df.head()

**Ian Thompson** · Answer 3 · 2023-12-10T02:41:17.130000

How about this?

from ast import literal_eval

import pandas as pd


# Recreating data.
column = [
    '[\n "Hello" \n]',
    '[\n "Hello", \n "Hi"\n]',
    '[\n "Hello", \n "Hi"\n, \n ""\n]',
    '[\n ""\n, \n "Hello", \n "Hi"\n]',
    '[\n "Hello" \n, \n ""\n]',
    '[\n ""\n, \n "Hello" \n]',
]
df = pd.DataFrame({"column": column})

out = df.assign(
    # We map the following operations to "column".
    column=df.column
    .map(
        # Iterate over each value -- each will become a list of literals (in this case, strings).
        lambda x: [
            # We unpack the filtered values into the list.
            *filter(
                # Check the bool value of each item produced by `literal_eval`, keeping only those that are True.
                bool, literal_eval(x)
            )
        ]
    )
)
print(out)

        column
0      [Hello]
1  [Hello, Hi]
2  [Hello, Hi]
3  [Hello, Hi]
4      [Hello]
5      [Hello]

How to remove \n and empty string in a column in a dataframe?

There are 3 best solutions below

Input data (as correct Python)

Code: First `map` then List Comprehension

Output

For fun

Input data

Code

Output

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in LIST

Related Questions in DATAFRAME

Related Questions in DATA-SCIENCE-EXPERIENCE

Trending Questions

Popular # Hahtags

Popular Questions

How to remove \n and empty string in a column in a dataframe?

There are 3 best solutions below

Input data (as correct Python)

Code: First map then List Comprehension

Output

For fun

Input data

Code

Output

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in LIST

Related Questions in DATAFRAME

Related Questions in DATA-SCIENCE-EXPERIENCE

Trending Questions

Popular # Hahtags

Popular Questions

Code: First `map` then List Comprehension