Get phrases with 3 words

399 Views Asked by At

I have tried to figure this one out for some time now.

I want to take a large text/string and split it into phrases of 3 words, and add them to an array.

I have tried using spilt() but it dosen't work as I hoped.

What I was thinking of doinig, to get it to work:

Start with the first 3 words in the string, when I got those, I put it in an array and move 1 word and take the next 3 words and so on and so on.

Is this a bad way of doing this?

Kind regards :)

2

There are 2 best solutions below

1
Scott Mermelstein On BEST ANSWER
my_really_long_string = "this is a really long string"
split_string = my_really_long_string.split()
phrase_array = [" ".join(split_string[i:i+3]) for i in range(len(split_string) - 2)]

The first line just represents your string.

After that, just split on the spaces, assuming that's all you care about for defining the end of words. (@andrew_reece's comments about edge cases is highly relevant.)

The next one iterates on the range of 0 to n-2 where n is the length of the string. It takes 3 consecutive words from the split_string array and joins them back with spaces.

This is almost certainly not the fastest way to do things, since it has a split and a join, but it is very straightforward.

>>> my_really_long_string = "this is a really long string"
>>> split_string = my_really_long_string.split()
>>> phrases = [" ".join(split_string[i:i+3]) for i in range(len(split_string) - 2)]
>>> 
>>> phrases
['this is a', 'is a really', 'a really long', 'really long string']
>>> 
2
eatmeimadanish On

This would work. You might want to strip the text of characters first, not sure what your data is.

x = 'alt bot cot dot eat fat got hot iot jot kot lot mot not'
x = [y for y in [x.strip().split(' ')[i:i+3] for i in range(0, len(x), 3)]]