I'm trying to use the pandas_profiling package to automagically describe some data frames from inside Apaceh Zeppelin.
The code I'm running is:
%pyspark
import sys
print(sys.version_info)
import numpy as np
print("numpy: ", np.__version__)
import pandas as pd
print("pandas: ", pd.__version__)
import pandas_profiling as pp
print("pandas_profiling: ", pp.__version__)
from pandas_profiling import ProfileReport
df = spark.sql("SELECT * FROM database.table")
profile = ProfileReport(df, title="Report: table")
profile.to_widgets()
My result is:
sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
numpy: 1.19.5
pandas: 1.1.5
pandas_profiling: 3.1.0
Fail to execute line 19: profile.to_widgets()
Traceback (most recent call last):
File "/tmp/1662648724242-0/zeppelin_python.py", line 158, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 19, in <module>
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 414, in to_widgets
display(self.widgets)
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 197, in widgets
self._widgets = self._render_widgets()
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 315, in _render_widgets
report = self.report
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 179, in report
self._report = get_report_structure(self.config, self.description_set)
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/profile_report.py", line 166, in description_set
self._sample,
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/model/describe.py", line 56, in describe
check_dataframe(df)
File "/usr/local/lib/python3.6/site-packages/multimethod/__init__.py", line 209, in __call__
return self[tuple(map(self.get_type, args))](*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/pandas_profiling/model/dataframe.py", line 10, in check_dataframe
raise NotImplementedError()
NotImplementedError
Any way to work around this? Any hope of working around it from inside Zeppelin?
from NotImplementedError when calling pandas_profiling.ProfileReport.to_widgets() inside Apache Zeppelin
No comments:
Post a Comment