DATA_CLEANER

Scaling

This Scaling module scales numerical features in your dataset. It's a powerful tool for handling common data issues like skewness and outliers, which can significantly improve the performance of many machine learning models.

BASH
from noventis.data_cleaner import NoventisScaler
Parameters
Method
Type
{'auto', 'standard', 'minmax', 'robust', 'power'}
Default
"auto"

The scaling algorithm to be used.

  • auto (default): Automatically selects the best scaling strategy for each column based on its statistical properties (e.g., skewness, outliers).
  • standard Uses StandardScaler. Best when data is already close to normal. Scales to mean 0 and std 1.
  • minmax Uses MinMaxScaler. Scales to a fixed range (typically [0, 1]). Good for models that expect bounded features (e.g., many neural nets).
  • robust Uses RobustScaler. Resistant to outliers by using median and IQR. Suitable when outliers are present.
  • power Uses PowerTransformer. Transforms skewed data to be closer to Gaussian, helping models that assume normality.
optimize
Type
bool
Default
True
If True, the scaler's internal parameters will be fine-tuned.
custom_params
Type
Optional[dict]
Default
None
Allows you to override the default or optimized parameters for specific scaling methods.
skew_threshold
Type
float
Default
2.0
Threshold of absolute skewness to consider a column as "highly skewed".
outlier_threshold
Type
float
Default
0.01
Proportion of data points that must be outliers for a column to be categorized as "having outliers."
normality_alpha
Type
float
Default
0.05
The significance level (alpha) used in the statistical test for normality.
verbose
Type
bool
Default
True
If True, a summary of the scaling process will be printed after fitting.
Model Usage Examples
01

Automatic Scaling

PYTHON
import pandas as pd from noventis_scaler import NoventisScaler # Sample data with different distributions df = pd.DataFrame({ 'normal_data': np.random.normal(100, 15, 1000), 'skewed_data': np.random.exponential(2, 1000), 'with_outliers': np.concatenate([np.random.normal(50, 10, 950), np.random.normal(200, 10, 50)]) }) # initialize & fit/transform scaler = NoventisScaler(method='auto') df_scaled = scaler.fit_transform(df) # see chosen methods per column print(scaler.fitted_methods_)
02

Force Specific Method

PYTHON
import pandas as pd from noventis_scaler import NoventisScaler df = pd.DataFrame({ 'normal_data': np.random.normal(100, 15, 1000), 'skewed_data': np.random.exponential(2, 1000), 'with_outliers': np.concatenate([np.random.normal(50, 10, 950), np.random.normal(200, 10, 50)]) }) scaler = NoventisScaler(method='robust') df_scaled = scaler.fit_transform(df)