Create Next App

DATA_CLEANER

Scaling

This Scaling module scales numerical features in your dataset. It's a powerful tool for handling common data issues like skewness and outliers, which can significantly improve the performance of many machine learning models.

BASH

from noventis.data_cleaner import NoventisScaler

Parameters

Parameter	Type	Default	Description
Method	{'auto', 'standard', 'minmax', 'robust', 'power'}	`"auto"`	The scaling algorithm to be used. auto (default): Automatically selects the best scaling strategy for each column based on its statistical properties (e.g., skewness, outliers). standard Uses StandardScaler. Best when data is already close to normal. Scales to mean 0 and std 1. minmax Uses MinMaxScaler. Scales to a fixed range (typically [0, 1]). Good for models that expect bounded features (e.g., many neural nets). robust Uses RobustScaler. Resistant to outliers by using median and IQR. Suitable when outliers are present. power Uses PowerTransformer. Transforms skewed data to be closer to Gaussian, helping models that assume normality.
optimize	bool	True	If True, the scaler's internal parameters will be fine-tuned.
custom_params	Optional[dict]	None	Allows you to override the default or optimized parameters for specific scaling methods.
skew_threshold	float	2.0	Threshold of absolute skewness to consider a column as "highly skewed".
outlier_threshold	float	0.01	Proportion of data points that must be outliers for a column to be categorized as "having outliers."
normality_alpha	float	0.05	The significance level (alpha) used in the statistical test for normality.
verbose	bool	True	If True, a summary of the scaling process will be printed after fitting.

Method

Type

{'auto', 'standard', 'minmax', 'robust', 'power'}

Default

"auto"

The scaling algorithm to be used.

auto (default): Automatically selects the best scaling strategy for each column based on its statistical properties (e.g., skewness, outliers).
standard Uses StandardScaler. Best when data is already close to normal. Scales to mean 0 and std 1.
minmax Uses MinMaxScaler. Scales to a fixed range (typically [0, 1]). Good for models that expect bounded features (e.g., many neural nets).
robust Uses RobustScaler. Resistant to outliers by using median and IQR. Suitable when outliers are present.
power Uses PowerTransformer. Transforms skewed data to be closer to Gaussian, helping models that assume normality.

optimize

Type

bool

Default

True

If True, the scaler's internal parameters will be fine-tuned.

custom_params

Type

Optional[dict]

Default

None

Allows you to override the default or optimized parameters for specific scaling methods.

skew_threshold

Type

float

Default

2.0

Threshold of absolute skewness to consider a column as "highly skewed".

outlier_threshold

Type

float

Default

0.01

Proportion of data points that must be outliers for a column to be categorized as "having outliers."

normality_alpha

Type

float

Default

0.05

The significance level (alpha) used in the statistical test for normality.

verbose

Type

bool

Default

True

If True, a summary of the scaling process will be printed after fitting.

Model Usage Examples

Automatic Scaling

PYTHON

import pandas as pd
from noventis_scaler import NoventisScaler

# Sample data with different distributions
df = pd.DataFrame({
  'normal_data':  np.random.normal(100, 15, 1000),
  'skewed_data':  np.random.exponential(2, 1000),
  'with_outliers': np.concatenate([np.random.normal(50, 10, 950),
                                   np.random.normal(200, 10, 50)])
})

# initialize & fit/transform
scaler = NoventisScaler(method='auto')
df_scaled = scaler.fit_transform(df)

# see chosen methods per column
print(scaler.fitted_methods_)

Force Specific Method

PYTHON

import pandas as pd
from noventis_scaler import NoventisScaler

df = pd.DataFrame({
  'normal_data':  np.random.normal(100, 15, 1000),
  'skewed_data':  np.random.exponential(2, 1000),
  'with_outliers': np.concatenate([np.random.normal(50, 10, 950),
                                   np.random.normal(200, 10, 50)])
})

scaler = NoventisScaler(method='robust')
df_scaled = scaler.fit_transform(df)