metadata
license: mit
language:
- en
pipeline_tag: text-classification
tags:
- finance
metrics:
- accuracy
library_name: keras
MF3Classifier
Model Overview
This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.
Model Architecture
The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:
Preprocessing Pipeline
Numerical Features Branch
- Features: ['AUM']
- Transformation: StandardScaler
Categorical Features Branch
- Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
- Transformations:
- OneHotEncoder (non-sparse output, handles unknown categories)
- Feature Selection (SelectKBest with mutual_info_classif, k=30)
Classifier
- Model: RandomForestClassifier
- Key Parameters:
- n_estimators: 30
- max_depth: 20
- min_samples_split: 10
- min_samples_leaf: 5
- n_jobs: -1 (parallel processing)
- random_state: 42
Use Cases
- Mutual fund performance prediction
- Investment strategy optimization
- Portfolio management
- Risk assessment
Model Parameters
Preprocessing Configuration
Numerical Features:
- StandardScaler with default parameters
- Handles mean centering and scaling
Categorical Features:
- OneHotEncoder:
- handle_unknown: 'ignore'
- sparse_output: False
- dtype: numpy.float64
- Feature Selection:
- Method: SelectKBest with mutual_info_classif
- Number of features: 30
- OneHotEncoder:
Random Forest Configuration
Tree Structure:
- Maximum depth: 20
- Minimum samples for split: 10
- Minimum samples per leaf: 5
Ensemble Settings:
- Number of trees: 30
- Feature selection: sqrt (auto)
- Bootstrap: True
- Criterion: gini
Technical Details
File Information
- Model Type: Scikit-learn Pipeline
- Last Updated: November 3, 2024
Input Features
Numerical Features:
- AUM (Assets Under Management)
Categorical Features:
- AMC (Asset Management Company)
- Fund Category
- Sub-Scheme
- Investment Type
- Growth Option
Limitations and Considerations
- The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
- Feature selection is limited to top 30 features
- Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder
Usage Notes
- The model supports parallel processing (n_jobs=-1)
- Handles unknown categories in categorical features gracefully
- Uses standard scaling for numerical features
- Designed for production use with joblib serialization
Download Modal
To download the pre-trained MF3Classifier model, use the link below: