How do I copy all the duplicate lines of a file to a new file in Python?

Question

How do I copy all the duplicate lines of a file to a new file in Python?

247 Views Asked by Mr. Xamer At 10 June 2018 at 05:57

I'm trying to write a code to copy all the duplicates of a file to a new file. The program I wrote checks the first 3 elements of each line and compares it to the next line.

f=open(r'C:\Users\xamer\Desktop\file.txt','r')
data=f.readlines()
f.close()
lines=data.copy()
dup=open(r'C:\Users\xamer\Desktop\duplicate.txt','a')
for x in data:
    for y in data:
        if (y[0]==x[0]) and (y[1]==x[1]) and (y[2]==x[2]):
            lines.append(y)
        else:
            lines.remove(y)
dup.write(lines)
dup.close()

I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\xamer\Desktop\file.py", line 80, in <module>
    lines.remove(y)
ValueError: list.remove(x): x not in list

Any suggestions?

Original Q&A

There are 1 best solutions below

**Antonino** · Accepted Answer · 2018-06-10T10:11:17.493000

These snippets should do the job you were asking for. At the beginning I thought to create a duplicated_lines list and then writing it all at the end. But then I realized that I could optimize the code performance avoiding an additional final loop by just writing the repeated items on the fly

As underlined by another user it is not really clear if you want to check only adjacent double entries or repeated items independently from the position

In the first case - where repetitions are immediately after - this is the code:

# opening the source file
with open('hello.txt','r') as f:
    # returns a list containing the original lines
    data=f.readlines()

# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:

    for i in range(0, len(data)-1):
        # stripping to avoid a bug if the last line is a repeated item
        if(data[i].strip('\n') == data[i+1].strip('\n')):
            print("Lines {}: {}".format(i, data[i]))
            print("Lines {}: {}".format(i+1, data[i+1]))
            #duplicated_lines.append(data[i])
            print("Line repeated: " + data[i])
            f.write("%s\n" % data[i])

If instead you wanna check repeated lines all along the file this is the code:

# opening the source file
with open('hello.txt','r') as f:
    # returns a list containing the original lines
    data=f.readlines()

# creating the file to host the repeated lines
with open('duplicated.txt','a') as f:    
    for i in range(0, len(data)-1):
        for j in range(i+1, len(data)):
            # stripping to avoid a bug if the last line is a repeated item
            if(data[i].strip('\n') == data[j].strip('\n')):
                print("Lines {}: {}".format(i, data[i]))
                print("Lines {}: {}".format(j, data[j]))
                #duplicated_lines.append(data[i])
                print("Line repeated: " + data[i])
                f.write("%s\n" % data[i])

How do I copy all the duplicate lines of a file to a new file in Python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DUPLICATES

Related Questions in LINES

Trending Questions

Popular # Hahtags

Popular Questions