--- license: mit language: - en pipeline_tag: text-classification tags: - finance --- # Model Card: Fund Predictor Pipeline Model ## Model Overview This is a machine learning pipeline designed to predict mutual fund performance using both numerical and categorical features. The model combines preprocessing steps with a Random Forest classifier, making it suitable for financial data analysis. ## Model Architecture The model uses a two-branch preprocessing pipeline followed by a Random Forest classifier: ### Preprocessing Pipeline 1. **Numerical Features Branch** - Features: ['AUM'] - Transformation: StandardScaler 2. **Categorical Features Branch** - Features: ['AMC', 'Fund Category', 'Sub-Sheme', 'Investment Type', 'Growth Option'] - Transformations: - OneHotEncoder (non-sparse output, handles unknown categories) - Feature Selection (SelectKBest with mutual_info_classif, k=30) ### Classifier - **Model**: RandomForestClassifier - **Key Parameters**: - n_estimators: 30 - max_depth: 20 - min_samples_split: 10 - min_samples_leaf: 5 - n_jobs: -1 (parallel processing) - random_state: 42 ## Use Cases - Mutual fund performance prediction - Investment strategy optimization - Portfolio management - Risk assessment ## Model Parameters ### Preprocessing Configuration - **Numerical Features**: - StandardScaler with default parameters - Handles mean centering and scaling - **Categorical Features**: - OneHotEncoder: - handle_unknown: 'ignore' - sparse_output: False - dtype: numpy.float64 - Feature Selection: - Method: SelectKBest with mutual_info_classif - Number of features: 30 ### Random Forest Configuration - **Tree Structure**: - Maximum depth: 20 - Minimum samples for split: 10 - Minimum samples per leaf: 5 - **Ensemble Settings**: - Number of trees: 30 - Feature selection: sqrt (auto) - Bootstrap: True - Criterion: gini ## Technical Details ### File Information - **Model Path**: C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib - **Model Type**: Scikit-learn Pipeline - **Last Updated**: November 3, 2024 ### Input Features 1. **Numerical Features**: - AUM (Assets Under Management) 2. **Categorical Features**: - AMC - Fund Category - Sub-Scheme - Investment Type - Growth Option ## Limitations and Considerations - The model uses mutual_info_classif for feature selection, which may not capture all relevant relationships - Feature selection is limited to top 30 features - Performance may vary with unknown categories due to the 'ignore' setting in OneHotEncoder ## Usage Notes - The model supports parallel processing (n_jobs=-1) - Handles unknown categories in categorical features gracefully - Uses standard scaling for numerical features - Designed for production use with joblib serialization ## Model Location ``` C:\Users\alokp\models\fund_predictor_model_20241103_230654.joblib ```