MF3Classifier / README.md
alokpandey's picture
Update README.md
b553c60 verified
|
raw
history blame
3.05 kB
---
license: mit
language:
- en
pipeline_tag: text-classification
tags:
- finance
metrics:
- accuracy
library_name: keras
---
# MF3Classifier
## Model Overview
This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis.
## Model Architecture
The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier:
### Preprocessing Pipeline
1. **Numerical Features Branch**
- Features: ['AUM']
- Transformation: StandardScaler
2. **Categorical Features Branch**
- Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option']
- Transformations:
- OneHotEncoder (non-sparse output, handles unknown categories)
- Feature Selection (SelectKBest with mutual_info_classif, k=30)
### Classifier
- **Model**: RandomForestClassifier
- **Key Parameters**:
- n_estimators: 30
- max_depth: 20
- min_samples_split: 10
- min_samples_leaf: 5
- n_jobs: -1 (parallel processing)
- random_state: 42
## Use Cases
- Mutual fund performance prediction
- Investment strategy optimization
- Portfolio management
- Risk assessment
## Model Parameters
### Preprocessing Configuration
- **Numerical Features**:
- StandardScaler with default parameters
- Handles mean centering and scaling
- **Categorical Features**:
- OneHotEncoder:
- handle_unknown: 'ignore'
- sparse_output: False
- dtype: numpy.float64
- Feature Selection:
- Method: SelectKBest with mutual_info_classif
- Number of features: 30
### Random Forest Configuration
- **Tree Structure**:
- Maximum depth: 20
- Minimum samples for split: 10
- Minimum samples per leaf: 5
- **Ensemble Settings**:
- Number of trees: 30
- Feature selection: sqrt (auto)
- Bootstrap: True
- Criterion: gini
## Technical Details
### File Information
- **Model Type**: Scikit-learn Pipeline
- **Last Updated**: November 3, 2024
### Input Features
1. **Numerical Features**:
- AUM (Assets Under Management)
2. **Categorical Features**:
- AMC (Asset Management Company)
- Fund Category
- Sub-Scheme
- Investment Type
- Growth Option
## Limitations and Considerations
- The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships
- Feature selection is limited to top 30 features
- Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder
## Usage Notes
- The model supports parallel processing (n_jobs=-1)
- Handles unknown categories in categorical features gracefully
- Uses standard scaling for numerical features
- Designed for production use with joblib serialization
## Download Modal
To download the pre-trained **MF3Classifier** model, use the link below:
[**Download MF3Classifier Model**](https://huggingface.co/alokpandey/MF3Classifier/resolve/main/fund_predictor_model_20241103_230654.joblib)