Filter overlapping entries in bed file

2.1k Views Asked by At

I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1

There are 1 best solutions below

0
maciek On

OK, I worked this out:

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed