Thursday, 20 December 2018

pandas DataFrame.query expression that returns all rows by default

I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method).

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def query_with_all(self, query_string):
        if query_string.lower() == 'all':
            return self
        else:
            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?



from pandas DataFrame.query expression that returns all rows by default

No comments:

Post a Comment