I have two files that contain genomic intervals, one is the master index and the other contains a subset of genomic intervals, some of which overlap with the master index. Oftentimes, more than one genomic interval will overlap with the intervals in the master index. I know how to do bedtools intersect and that sort of thing, but I don't want the row number in the master index to increase, rather I'd like to append both overlapping intervals to the same line.
So, for example, here is a snippet of the master index file:
chr2L 10239 10488
chr2L 10906 11238
chr2L 11389 11538
chr2L 11790 12138
chr2L 14489 14688
chr2L 18139 18438
chr2L 20939 21338
chr2L 25402 25801
chr2L 26052 26201
And here would be one of the second files:
chr2L 18002 18367 .034 18 0
chr2L 18401 18600 .02 20 2
chr2L 26000 26100 .01 10 0
And this would be the desired output:
chr2L 10239 10488
chr2L 10906 11238
chr2L 11389 11538
chr2L 11790 12138
chr2L 14489 14688
chr2L 18139 18438 chr2L 18002 18367 .034 18 0,chr2L 18401 18600 .02 20 2
chr2L 20939 21338
chr2L 25402 25801
chr2L 26052 26201 chr2L 26000 26100 .01 10 0
Changing the delimiters in the second file is fine if thats necessary, for example all the columns after the chromosome interval could be comma separated if thats necessary. I don't have an example code of what I have tried, because nothing is getting even close to working. My guess would be that awk can do this in some way but if anyone has any insight its most appreciated.