I'm trying to run the data in a column and extract only the company name using MeCab library and list them in a new column. The target column is a comment column which includes employee names, company names, invoice number etc all together or by itself depending on the transaction. Listed below is my code trying to extract only the company name. Please note the below code is still in production, but just wanted to post something to start with. Sorry in advance for my messy coding...
Thank you,
import mecab-python3
import ipadic
df = pd.read_csv("")
m = MeCab.Tagger(ipadic.MECAB_ARGS)
def kaiseki(column):
list= df[column].values.tolist()
new_list = []
new_list2 = []
for li in list:
li = m.parse(li)
new_list.append(li)
li2 = li.split('\n')
new_list2.append(li2)
for li1 in li2:
li2 = li1.split('\t')
for li2_1 in li2:
li2_1_1 = li2_1.split(',')[0]
#組織名 means company name in Japanese
if li2_1_1 == '組織名':
print(li1.split()[0])
else:
continue
df[column] = new_list
df["column2"] = new_list2
return df["columns2"]
columns = ['column']
for column in columns:
kaiseki(column)