Saturday, 30 March 2019

Using python lime as a udf on spark

I'm looking to use lime's explainer within a udf on pyspark. I've previously trained the tabular explainer, and stored is as a dill model as suggested in link

loaded_explainer = dill.load(open('location_to_explainer','rb'))

def lime_explainer(*cols):
    selected_cols = np.array([value for value in cols])
    exp = loaded_explainer.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
    mapping = exp.as_map()[1]

    return str(mapping)

This however takes a lot of time, as it appears a lot of the computation happens on the driver. I've then been trying to use spark broadcast to broadcast the explainer to the executors.

broadcasted_explainer= sc.broadcast(loaded_explainer)

def lime_explainer(*col):
    selected_cols = np.array([value for value in cols])
    exp = broadcasted_explainer.value.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
    mapping = exp.as_map()[1]

    return str(mapping)        

However, I run into a pickling error, on broadcast.

PicklingError: Can't pickle at 0x7f69fd5680d0>: attribute lookup on lime.discretize failed

Can anybody help with this? Is there something like dill that we can use instead of the cloudpickler used in spark?



from Using python lime as a udf on spark

No comments:

Post a Comment