I'm looking to use lime's explainer
within a udf on pyspark. I've previously trained the tabular explainer, and stored is as a dill model as suggested in link
loaded_explainer = dill.load(open('location_to_explainer','rb'))
def lime_explainer(*cols):
selected_cols = np.array([value for value in cols])
exp = loaded_explainer.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
mapping = exp.as_map()[1]
return str(mapping)
This however takes a lot of time, as it appears a lot of the computation happens on the driver. I've then been trying to use spark broadcast to broadcast the explainer to the executors.
broadcasted_explainer= sc.broadcast(loaded_explainer)
def lime_explainer(*col):
selected_cols = np.array([value for value in cols])
exp = broadcasted_explainer.value.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
mapping = exp.as_map()[1]
return str(mapping)
However, I run into a pickling error, on broadcast.
PicklingError: Can't pickle at 0x7f69fd5680d0>: attribute lookup on lime.discretize failed
Can anybody help with this? Is there something like dill
that we can use instead of the cloudpickler used in spark?
from Using python lime as a udf on spark
No comments:
Post a Comment