Common techniques for feature selection and transformation

Feature selection and transformation are crucial steps in feature engineering to enhance machine learning model performance.
1️⃣ Feature Selection Techniques
Feature selection helps in choosing the most relevant features while eliminating redundant or irrelevant ones.
🔹 1. Filter Methods
These techniques evaluate features independently of the model using statistical tests.
✅ Methods:
- Correlation Analysis → Select features with a high correlation with the target.
- Chi-Square Test → Measures dependency between categorical features and the target variable.
- Mutual Information (MI) → Evaluates how much information a feature provides about the target.
📌 Example (Correlation in Python)
pythonimport pandas as pddf = pd.DataFrame({'Feature1': [1, 2, 3, 4, 5], 'Feature2': [10, 20, 30, 40, 50], 'Target': [0, 1, 0, 1, 0]})
correlation_matrix = df.corr()
print(correlation_matrix)🔹 2. Wrapper Methods
These methods use a machine learning model to evaluate feature subsets.
✅ Methods:
- Recursive Feature Elimination (RFE) → Iteratively removes the least important features.
- Forward/Backward Selection → Adds/removes features step by step based on model performance.
📌 Example (Using RFE in Python)
pythonfrom sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifiermodel = RandomForestClassifier()
selector = RFE(model, n_features_to_select=2) # Select top 2 features
selector.fit(df[['Feature1', 'Feature2']], df['Target'])
print(selector.support_) # True for selected features
🔹 3. Embedded Methods
These methods incorporate feature selection within model training.
✅ Examples:
- Lasso Regression (L1 Regularization) → Shrinks coefficients of less important features to zero.
- Decision Trees & Random Forest Feature Importance → Selects features based on their contribution to model performance.
📌 Example (Feature Importance in Random Forest)
pythonmodel.fit(df[['Feature1', 'Feature2']], df['Target'])
print(model.feature_importances_) # Higher values indicate more important features2️⃣ Feature Transformation Techniques
Feature transformation modifies data to improve model accuracy and efficiency.
🔹 1. Normalization & Standardization
Ensures numerical features are on the same scale.
✅ Methods:
- Min-Max Scaling → Scales values between 0 and 1.
- Z-score Standardization → Centers data around mean (0) and standard deviation (1).
📌 Example (Scaling in Python)
pythonfrom sklearn.preprocessing import MinMaxScaler, StandardScalerscaler = MinMaxScaler()
df[['Feature1', 'Feature2']] = scaler.fit_transform(df[['Feature1', 'Feature2']])
🔹 2. Encoding Categorical Variables
Converts categorical data into numerical format for ML models.
✅ Methods:
- One-Hot Encoding → Creates binary columns for each category.
- Label Encoding → Assigns numerical values to categories.
📌 Example (One-Hot Encoding in Python)
pythodf = pd.get_dummies(df, columns=['Category'])🔹 3. Feature Extraction (Dimensionality Reduction)
Reduces the number of features while retaining important information.
✅ Methods:
- Principal Component Analysis (PCA) → Converts features into uncorrelated components.
- Autoencoders (Deep Learning) → Uses neural networks to learn compressed representations.
https://www.ficusoft.in/deep-learning-training-in-chennai/
from sklearn.decomposition import PCApca = PCA(n_components=2)
reduced_features = pca.fit_transform(df[['Feature1', 'Feature2']]
🔹 4. Log & Power Transformations
Used to make skewed data more normally distributed.
✅ Methods:
- Log Transformation → Helps normalize right-skewed data.
- Box-Cox Transformation → Used for normalizing data in regression models.
📌 Example (Log Transformation in Python)
pythonimport numpy as npdf['Feature1'] = np.log(df['Feature1'] + 1) # Avoid log(0) by adding 1
Conclusion
✅ Feature Selection helps remove irrelevant or redundant features.
✅ Feature Transformation ensures better model performance by modifying features.
WEBSITE: https://www.ficusoft.in/deep-learning-training-in-chennai/
Comments
Post a Comment