Hemant Vishwakarma: Different results on anomaly detection bettween pycaret and H2O

Friday, 23 July 2021

Different results on anomaly detection bettween pycaret and H2O

I'm working on detect anomalies from the following data:

It comes from a processed signal of and hydraulic system, from there I know that the dots in the red boxes are anomalies happen when the system fails.

I'm using the first 3k records to train a model, both in pycaret and H20. These 3k records covers 5 cycles of data, as shown in the image bellow:

To train the model in pycaret I'm using the following code:

from pycaret.anomaly import *
from pycaret.datasets import get_data
import pandas as pd
exp_ano101 = setup(df[["Pressure_median_mw_2500_ac"]][0:3000], normalize = True, 
                   session_id = 123)

iforest = create_model('iforest')
unseen_predictions = predict_model(iforest, data=df[["Pressure_median_mw_2500_ac"]])
unseen_predictions = unseen_predictions.reset_index()

The results I get from pycaret are pretty good:

And with a bit of post processing I can get the follwing, which is quite close to the ideal:

On the other hand, using H20, with the following code:

import pandas as pd
from h2o.estimators import H2OIsolationForestEstimator, H2OGenericEstimator
import tempfile
ifr = H2OIsolationForestEstimator()
ifr.train(x="Pressure_median_mw_2500_ac",training_frame=hf)
th = df["mean_length"][0:3000].quantile(0.05)
df["anomaly"] = df["mean_length"].apply(lambda x: "1" if x> th  else "0")

I get this:

Which is a huge difference, since it is not detecting as anomalies this block:

My doubt is, how can I get similar results that the ones I get from pycaret given that I'm using the same algorithm, which is Isolation Forest. And even using SVM in Pycaret I get closer results than using isolation forest in H2O

from Different results on anomaly detection bettween pycaret and H2O

Hemant Vishwakarma

Friday, 23 July 2021

Different results on anomaly detection bettween pycaret and H2O

No comments:

Post a Comment