I have a dataset in the shape of a txt file that looks like this:
beer_name: Legbiter
beer_id: 19827
brewery_name: Strangford Lough Brewing Company Ltd
brewery_id: 10093
style: English Pale Ale
abv: 4.8
date: 1357729200
user_name: AgentMunky
user_id: agentmunky.409755
appearance: 4.0
aroma: 3.75
palate: 3.5
taste: 3.5
overall: 3.75
rating: 3.64
text: Poured from a 12 ounce bottle into a pilsner glass.A: A finger of creamy head with clear-dark amber body.S: Rich brown sugar. Malty...T: Slight sugars, dry malt, vague hops. Big malty-brown with sugar.M: Dry and slightly astringent before a boring endtaste.O: Solid beer. Drinkable and interesting. Still vaguely bland.
review: True
I am using the following function to try and make it into a proper df (and a little more processing afterwards, but this is where is throws an error):
rb_file_data = pd.read_csv(os.path.join(MATCHED_BEER_DIR, 'ratings_with_text_rb.txt'), sep=":", header=None, names=["Key", "Value"])
The issue I have is that some reviews use ":" in the text part (I specifically chose to show you one containing some), which throws the following error:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 34, saw 7
I have enough data to get rid of the whole review if needed, but would be happy to keep it if possible.
Is there a way to use the separator only on the first time it appears in a line, or anything else?
You can try with below code