Friday 23 July 2021

Different results on anomaly detection bettween pycaret and H2O

I'm working on detect anomalies from the following data:
enter image description here

It comes from a processed signal of and hydraulic system, from there I know that the dots in the red boxes are anomalies happen when the system fails.

I'm using the first 3k records to train a model, both in pycaret and H20. These 3k records covers 5 cycles of data, as shown in the image bellow:

To train the model in pycaret I'm using the following code:

enter image description here

from pycaret.anomaly import *
from pycaret.datasets import get_data
import pandas as pd
exp_ano101 = setup(df[["Pressure_median_mw_2500_ac"]][0:3000], normalize = True, 
                   session_id = 123)

iforest = create_model('iforest')
unseen_predictions = predict_model(iforest, data=df[["Pressure_median_mw_2500_ac"]])
unseen_predictions = unseen_predictions.reset_index()

The results I get from pycaret are pretty good:

enter image description here

And with a bit of post processing I can get the follwing, which is quite close to the ideal:

enter image description here

On the other hand, using H20, with the following code:

import pandas as pd
from h2o.estimators import H2OIsolationForestEstimator, H2OGenericEstimator
import tempfile
ifr = H2OIsolationForestEstimator()
ifr.train(x="Pressure_median_mw_2500_ac",training_frame=hf)
th = df["mean_length"][0:3000].quantile(0.05)
df["anomaly"] = df["mean_length"].apply(lambda x: "1" if x> th  else "0")

I get this:

enter image description here

Which is a huge difference, since it is not detecting as anomalies this block:

enter image description here

My doubt is, how can I get similar results that the ones I get from pycaret given that I'm using the same algorithm, which is Isolation Forest. And even using SVM in Pycaret I get closer results than using isolation forest in H2O

enter image description here



from Different results on anomaly detection bettween pycaret and H2O

No comments:

Post a Comment