My task is to create combinations, more like a Cartesian product for some attribute lines of a library file. I am currently facing the problem of grouping the same attributes(of course the adjacent parameters are different) as sublists of a list. Remember my input may contain a thousand lines of attributes , which I need to extract from a library file.
######################
Example input:
attr1 apple 1
attr1 banana 2
attr2 grapes 1
attr2 oranges 2
attr3 watermelon 0
######################
Example output:
[['attr1 apple 1','attr1 banana 2'], ['attr2 grapes 1','attr2 oranges 2'], ['attr3 watermelon 0']]
The result I am getting:
['attr1 apple 1','attr1 banana 2', 'attr2 grapes 1','attr2 oranges 2', 'attr3 watermelon 0']
Below is the code:
import re
# regex pattern definition
pattern = re.compile(r'attr\d+')
# Open the file for reading
with open(r"file path") as file:
# Initialize an empty list to store matching lines
matching_lines = []
# reading each line
for line in file:
# regex pattern match
if pattern.search(line):
# matching line append to the list
matching_lines.append(line.strip())
# Grouping the elements based on the regex pattern
#The required list
grouped_elements = []
#Temporary list for sublist grouping
current_group = []
for sentence in matching_lines:
if pattern.search(sentence):
current_group.append(sentence)
else:
if current_group:
grouped_elements.append(current_group)
current_group = [sentence]
if current_group:
grouped_elements.append(current_group)
# Print the grouped elements
for group in grouped_elements:
print(group)
When the file is already sorted, there is a simple solution: