A custom scikit-learn transformer example with parameter input and avoid possible None error


A customer scikit-learn transfomer example with parameter.
It takes a pandas DataFrame with a column called input as input, and returns a DataFrame with a column called output containing the transformed data:

define custom transformer

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class PowerTransformer(BaseEstimator, TransformerMixin):
def __init__(self, power=1):
self.power = power

def fit(self, X, y=None):
# Fit simply returns self, nothing else to do
return self

def transform(self, X):
# Check if input is a DataFrame
if isinstance(X, pd.DataFrame):
# If so, return a DataFrame with the transformed data
return pd.DataFrame({'output': X['input'] ** self.power})
else:
# If not, return a plain NumPy array with the transformed data
return X['input'] ** self.power

To use this transformer, we would first instantiate it with the desired power parameter, and then call its fit_transform method on a DataFrame with an input column. For example:

generate some sample data and dataframe

import numpy as np
import pandas as pd

# Generate some sample data
data = np.random.randn(5)

# Create a DataFrame with an 'input' column
X = pd.DataFrame({'input': data})

# Print the DataFrame
print(X)

      input
0 -1.460135
1  0.188532
2 -0.272600
3  0.306880
4 -0.221020

use the transfomer, with parameter intilization

# Create the transformer
transformer = PowerTransformer(power=2)

# Fit and transform the data
X_transformed = transformer.fit_transform(X)

print(X_transformed)
     output
0  2.131994
1  0.035544
2  0.074311
3  0.094175
4  0.048850

possible error: Parameter sees none value

the above example shows some practices: (1) the transformer have an default value in the init function (2) when calling the transfomer make sure to have the ···parameter=value format

For example, We can skip the parameter name like this, it will work for one time use:
transformer = PowerTransformer(2)

But it will raise None error in using with other methos such as sklearn.model_selection.cross_val_score

github link


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC