--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards {} --- # Model Card for Model ID NOTE: This is NOT our final model. This is one of the secondary models that we explored in developing our final model. The final model is in the GBTrees Repository on HuggingFace. ## Model Details This model classifies news headlines as either NBC or Fox News. ### Model Description - **Developed by:** Jack Bader, Kaiyuan Wang, Pairan Xu - **Taks:** Binary classification (NBC News vs. Fox News) - **Preprocessing:** TF-IDF vectorization applied to the text data - stop_words = "english" - max_features = 1000 - **Model type:** Random Forest - **Freamwork:** Scikit-learn - #### Metrics - Accuracy Score ### Model Evaluation ```python import pandas as pd import joblib from huggingface_hub import hf_hub_download from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics import classification_report # Mount to drive from google.colab import drive drive.mount('/content/drive') # Load test set test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv", encoding="Windows-1252") # Log in w/ huggingface token # Token can be found in repo as Token.docx !huggingface-cli login # Download the model model = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "best_rf_model.pkl") # Download the vectorizer tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "tfidf_vectorizer.pkl") # Load the model pipeline = joblib.load(model) # Load the vectorizer tfidf_vectorizer = joblib.load(tfidf_vectorizer) # Extract the headlines from the test set X_test = test_df['title'] # Apply transformation to the headlines into numerical features X_test_transformed = tfidf_vectorizer.transform(X_test) # Make predictions using the pipeline y_pred = pipeline.predict(X_test_transformed) # Extract 'labels' as target y_test = test_df['label'] # Print classification report print(classification_report(y_test, y_pred))