Understand the NaN and None difference in Pandas once for all


Pandas and Numpy are widely used formats for data mining and data sciences, but sometimes people get confused by None and NaN, which are very similar but slightly different data types. Here we figure it out once for all with some examples.

main difference

The distinction between None and NaN in Pandas can be summarized as:

  1. None represents a missing entry, but its type is not numeric. So any column (ad Pandas Series) that contains a None value is definately not a numeric type, such as int or float.
  2. NaN which stands for not-a-number, is on the other hand a numeric type. This means that NaN can be found in a numeric column of int or float type.

tests in action

in the following test, a None value is automatically transferred as a NaN value, because Pandas automatically converted None to NaN
given that the other value in the series is a numeric. The will make the series a numeric type and will be much easier for many
following operations.

import pandas as pd
pd.Series([1,None])
0    1.0
1    NaN
dtype: float64

in the following test, the other value in the series is a string, so the None value stay as None value. This make the whole
series an object type.

import pandas as pd
pd.Series(["1",None])
0       1
1    None
dtype: object

None type can lead to more arithmetic errors

Why did we claim with NaN type, it will be much easier for many other operations useful to data science?
It just gives less error for many arithmetic operations. For example, the following operation will give an error:

None + 1
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-8-3fd8740bf8ab> in <module>
----> 1 None + 1


TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

while the following operations with NaN type is fine, we just get another NaN type, but no error.

import numpy as np
np.nan + 1
nan

How to check the None and NaN type

There are several different ways to check if a data type is None or NaN values;
First using numpy, the function np.isnan() can check if a value is a NaN value, but it won’t work with None values.

np.isnan(np.nan)
True

in Pandas, there are functions that are isnull() and isna(), which are literally does the same things. isnull() is just an alias of the isna() method; Basically isnull() detects missing values, so both nan or None will be True.

pd.isnull(np.nan)
True
pd.isnull(None)
True

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC