I have a table (simplified output from a program), that I need to filter:
id hit from to value
A hit1 56 102 0.00085
B hit2 89 275 0.00034
B hit3 240 349 0.00034
C hit4 332 480 3.40E-15
D hit5 291 512 3.80E-24
D hit6 287 313 0.00098
D hit7 381 426 0.00098
D hit8 287 316 0.0029
D hit9 373 422 0.0029
D hit10 514 600 0.0021
For each id, the df should be sorted by from and, if there are overlapping hits, keep the one with the lower value.
So far, this is my code, which does first the starting by from then by value:
import pandas
df = pandas.read_csv("table", sep='\s+', names=["id", "hit", "from", "to", "value"])
df.sort_values(['from', "value"]).groupby('id')
But how do I check for the overlap (from to to) & remove the one with the higher score?
This is my expected output:
id hit from to valu
A hit1 56 102 0.00085
C hit4 332 480 3.40E-15
D hit5 291 512 3.80E-24
D hit10 514 600 0.0021
Please note, that id B has two overlapping hits with equal value, therefore both entries are to be kicked out.
from Python 3: remove overlaps in table
No comments:
Post a Comment