|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- finance |
|
metrics: |
|
- accuracy |
|
library_name: keras |
|
--- |
|
# MF3Classifier |
|
|
|
## Model Overview |
|
This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis. |
|
|
|
## Model Architecture |
|
The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier: |
|
|
|
### Preprocessing Pipeline |
|
1. **Numerical Features Branch** |
|
- Features: ['AUM'] |
|
- Transformation: StandardScaler |
|
|
|
2. **Categorical Features Branch** |
|
- Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option'] |
|
- Transformations: |
|
- OneHotEncoder (non-sparse output, handles unknown categories) |
|
- Feature Selection (SelectKBest with mutual_info_classif, k=30) |
|
|
|
### Classifier |
|
- **Model**: RandomForestClassifier |
|
- **Key Parameters**: |
|
- n_estimators: 30 |
|
- max_depth: 20 |
|
- min_samples_split: 10 |
|
- min_samples_leaf: 5 |
|
- n_jobs: -1 (parallel processing) |
|
- random_state: 42 |
|
|
|
## Use Cases |
|
- Mutual fund performance prediction |
|
- Investment strategy optimization |
|
- Portfolio management |
|
- Risk assessment |
|
|
|
## Model Parameters |
|
|
|
### Preprocessing Configuration |
|
- **Numerical Features**: |
|
- StandardScaler with default parameters |
|
- Handles mean centering and scaling |
|
|
|
- **Categorical Features**: |
|
- OneHotEncoder: |
|
- handle_unknown: 'ignore' |
|
- sparse_output: False |
|
- dtype: numpy.float64 |
|
- Feature Selection: |
|
- Method: SelectKBest with mutual_info_classif |
|
- Number of features: 30 |
|
|
|
### Random Forest Configuration |
|
- **Tree Structure**: |
|
- Maximum depth: 20 |
|
- Minimum samples for split: 10 |
|
- Minimum samples per leaf: 5 |
|
|
|
- **Ensemble Settings**: |
|
- Number of trees: 30 |
|
- Feature selection: sqrt (auto) |
|
- Bootstrap: True |
|
- Criterion: gini |
|
|
|
## Technical Details |
|
|
|
### File Information |
|
- **Model Type**: Scikit-learn Pipeline |
|
- **Last Updated**: November 3, 2024 |
|
|
|
### Input Features |
|
1. **Numerical Features**: |
|
- AUM (Assets Under Management) |
|
|
|
2. **Categorical Features**: |
|
- AMC (Asset Management Company) |
|
- Fund Category |
|
- Sub-Scheme |
|
- Investment Type |
|
- Growth Option |
|
|
|
## Limitations and Considerations |
|
- The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships |
|
- Feature selection is limited to top 30 features |
|
- Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder |
|
|
|
## Usage Notes |
|
- The model supports parallel processing (n_jobs=-1) |
|
- Handles unknown categories in categorical features gracefully |
|
- Uses standard scaling for numerical features |
|
- Designed for production use with joblib serialization |
|
|
|
## Download Modal |
|
|
|
To download the pre-trained **MF3Classifier** model, use the link below: |
|
|
|
[**Download MF3Classifier Model**](https://huggingface.co/alokpandey/MF3Classifier/resolve/main/fund_predictor_model_20241103_230654.joblib) |