MF3Classifier / README.md
alokpandey's picture
Update README.md
b553c60 verified
metadata
license: mit
language:
  - en
pipeline_tag: text-classification
tags:
  - finance
metrics:
  - accuracy
library_name: keras

MF3Classifier

Model Overview

This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.

Model Architecture

The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:

Preprocessing Pipeline

  1. Numerical Features Branch

    • Features: ['AUM']
    • Transformation: StandardScaler
  2. Categorical Features Branch

    • Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
    • Transformations:
      • OneHotEncoder (non-sparse output, handles unknown categories)
      • Feature Selection (SelectKBest with mutual_info_classif, k=30)

Classifier

  • Model: RandomForestClassifier
  • Key Parameters:
    • n_estimators: 30
    • max_depth: 20
    • min_samples_split: 10
    • min_samples_leaf: 5
    • n_jobs: -1 (parallel processing)
    • random_state: 42

Use Cases

  • Mutual fund performance prediction
  • Investment strategy optimization
  • Portfolio management
  • Risk assessment

Model Parameters

Preprocessing Configuration

  • Numerical Features:

    • StandardScaler with default parameters
    • Handles mean centering and scaling
  • Categorical Features:

    • OneHotEncoder:
      • handle_unknown: 'ignore'
      • sparse_output: False
      • dtype: numpy.float64
    • Feature Selection:
      • Method: SelectKBest with mutual_info_classif
      • Number of features: 30

Random Forest Configuration

  • Tree Structure:

    • Maximum depth: 20
    • Minimum samples for split: 10
    • Minimum samples per leaf: 5
  • Ensemble Settings:

    • Number of trees: 30
    • Feature selection: sqrt (auto)
    • Bootstrap: True
    • Criterion: gini

Technical Details

File Information

  • Model Type: Scikit-learn Pipeline
  • Last Updated: November 3, 2024

Input Features

  1. Numerical Features:

    • AUM (Assets Under Management)
  2. Categorical Features:

    • AMC (Asset Management Company)
    • Fund Category
    • Sub-Scheme
    • Investment Type
    • Growth Option

Limitations and Considerations

  • The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
  • Feature selection is limited to top 30 features
  • Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder

Usage Notes

  • The model supports parallel processing (n_jobs=-1)
  • Handles unknown categories in categorical features gracefully
  • Uses standard scaling for numerical features
  • Designed for production use with joblib serialization

Download Modal

To download the pre-trained MF3Classifier model, use the link below:

Download MF3Classifier Model