Consider
np.random.seed(0)
s1 = pd.Series([1, 2, 'a', 'b', [1, 2, 3]])
s2 = np.random.randn(len(s1))
s3 = np.random.choice(list('abcd'), len(s1))
df = pd.DataFrame({'A': s1, 'B': s2, 'C': s3})
df
A B C
0 1 1.764052 a
1 2 0.400157 d
2 a 0.978738 c
3 b 2.240893 a
4 [1, 2, 3] 1.867558 a
Column "A" has mixed data types. I would like to come up with a really quick way of determining this. It would not be as simple as checking whether type == object, because that would identify "C" as a false positive.
I can think of doing this with
df.applymap(type).nunique() > 1
A True
B False
C False
dtype: bool
But calling type atop applymap is pretty slow. Especially for larger frames.
%timeit df.applymap(type).nunique() > 1
3.95 ms ± 88 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Can we do better (perhaps with NumPy)? I can accept "No" if your argument is convincing enough. :-)
from Is there an efficient method of checking whether a column has mixed dtypes?
No comments:
Post a Comment