Skip to main content
← Back to Projects

Handling Class Imbalance in Random Forest

Python Scikit-learn SMOTE Classification

Context

Standard Random Forest classifiers degrade on imbalanced datasets common in fraud detection and credit risk.

Data & Modeling

Systematically compared SMOTE, ADASYN, Tomek links, cost-sensitive learning, and ensemble balancing across multiple imbalance ratios.

Results

Cost-sensitive RF with SMOTE achieved 15–20% F1 improvement over baseline on highly skewed datasets.

Takeaways

Extend to gradient-boosted ensembles and evaluate on real-world credit default data.

Evaluation

  • 5-fold stratified CV across imbalance ratios (1:10 to 1:100)
  • Metrics: F1, AUC-ROC, precision-recall AUC
  • Tested on synthetic + real-world credit datasets