{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Kunwar-Saaim Coding Challenge for Fatima Fellowship", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "accelerator": "TPU" }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "eBpjBBZc6IvA" }, "source": [ "# Fatima Fellowship Quick Coding Challenge (Pick 1)\n", "\n", "Thank you for applying to the Fatima Fellowship. To help us select the Fellows and assess your ability to do machine learning research, we are asking that you complete a short coding challenge. Please pick **1 of these 5** coding challenges, whichever is most aligned with your interests. \n", "\n", "**Due date: 1 week**\n", "\n", "**How to submit**: Please make a copy of this colab notebook, add your code and results, and submit your colab notebook to the submission link below. If you have never used a colab notebook, [check out this video](https://www.youtube.com/watch?v=i-HnvsehuSw).\n", "\n", "**Submission link**: https://airtable.com/shrXy3QKSsO2yALd3" ] }, { "cell_type": "markdown", "source": [ "" ], "metadata": { "id": "vFNnwRYul8xh" } }, { "cell_type": "markdown", "metadata": { "id": "braBzmRpMe7_" }, "source": [ "# 1. Deep Learning for Vision" ] }, { "cell_type": "markdown", "metadata": { "id": "1IWw-NZf5WfF" }, "source": [ "**Upside down detector**: Train a model to detect if images are upside down\n", "\n", "* Pick a dataset of natural images (we suggest looking at datasets on the [Hugging Face Hub](https://huggingface.co/datasets?task_categories=task_categories:image-classification&sort=downloads))\n", "* Synthetically turn some of images upside down. Create a training and test set.\n", "* Build a neural network (using Tensorflow, PyTorch, or any framework you like)\n", "* Train it to classify image orientation until a reasonable accuracy is reached\n", "* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n", "* Look at some of the images that were classified incorrectly. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)\n", "\n", "**Submission instructions**: Please write your code below and include some examples of images that were classified" ] }, { "cell_type": "code", "source": [ "### WRITE YOUR CODE TO TRAIN THE MODEL HERE" ], "metadata": { "id": "K2GJaYBpw91T" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Write up**: \n", "* Link to the model on Hugging Face Hub: \n", "* Include some examples of misclassified images. Please explain what you might do to improve your model's performance on these images in the future (you do not need to impelement these suggestions)" ], "metadata": { "id": "qSeLed2JxvGI" } }, { "cell_type": "markdown", "metadata": { "id": "sFU9LTOyMiMj" }, "source": [ "# 2. Deep Learning for NLP\n", "\n", "**Fake news classifier**: Train a text classification model to detect fake news articles!\n", "\n", "* Download the dataset here: https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset\n", "* Develop an NLP model for classification that uses a pretrained language model\n", "* Finetune your model on the dataset, and generate an AUC curve of your model on the test set of your choice. \n", "* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n", "* *Answer the following question*: Look at some of the news articles that were classified incorrectly. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)" ] }, { "cell_type": "markdown", "source": [ "## https://huggingface.co/kunwwarsaaim/distill-bert-fake-news-detection" ], "metadata": { "id": "uVAL-L6mmaEd" } }, { "cell_type": "code", "source": [ "!pip install transformers" ], "metadata": { "id": "CRKM9SyZsWkl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "import os\n", "import tensorflow as tf\n", "import pandas as pd\n", "from sklearn.utils import shuffle\n", "import transformers\n", "from transformers import AutoTokenizer, TFAutoModelForSequenceClassification" ], "metadata": { "id": "11B2b71RqwOG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "try:\n", " tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection\n", " print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])\n", "except ValueError:\n", " raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')\n", "\n", "tf.config.experimental_connect_to_cluster(tpu)\n", "tf.tpu.experimental.initialize_tpu_system(tpu)\n", "tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)\n", "\n", "\n", "AUTOTUNE = tf.data.experimental.AUTOTUNE" ], "metadata": { "id": "OlwYIn9_pgnJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "batch_size=32 * tpu_strategy.num_replicas_in_sync\n", "print('Batch size:', batch_size)" ], "metadata": { "id": "by_5W3761YsW", "outputId": "ce809a60-fc25-4c31-adb2-ba9fa98d77a6", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Batch size: 256\n" ] } ] }, { "cell_type": "code", "source": [ "from google.colab import drive\n", "drive.mount('/content/drive')\n", "os.chdir('/content/drive/MyDrive/news_dataset')" ], "metadata": { "id": "QkGuEO-4jnav", "outputId": "2af82882-512a-4b82-e50c-27b6fc3f27fd", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Mounted at /content/drive\n" ] } ] }, { "cell_type": "code", "source": [ "### WRITE YOUR CODE TO TRAIN THE MODEL HERE\n", "fake_news = pd.read_csv('Fake.csv')\n", "true_news = pd.read_csv('True.csv') " ], "metadata": { "id": "E90i018KyJH3" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "true_news['class_'] = [0]*len(true_news)\n", "fake_news['class_'] = [1]*len(fake_news)\n", "dataset = pd.concat([true_news,fake_news]).reset_index(drop=True)\n", "dataset = shuffle(dataset)\n", "\n", "dataset['data'] = dataset['title']+ ' ' + dataset['text']\n", "dataset.drop(['title','text','subject','date'],axis=1,inplace=True)\n", "dataset.drop_duplicates(inplace=True)" ], "metadata": { "id": "xkmNq2E1kUoq" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "dataset.keys()" ], "metadata": { "id": "40h-FjdKssww", "outputId": "396b3f77-dc2a-4b4c-8239-4e93bd5acfd6", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['class_', 'data'], dtype='object')" ] }, "metadata": {}, "execution_count": 122 } ] }, { "cell_type": "code", "source": [ "data = list(dataset.data)\n", "label = dataset.class_" ], "metadata": { "id": "DZzugpjusirx" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "tokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-uncased\") #Tokenizer\n", "inputs = tokenizer(data, padding=True, truncation=True, return_tensors='tf') #Tokenized text" ], "metadata": { "id": "qpGP3JkNs7Dn" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "len(inputs['input_ids']) == len(label)" ], "metadata": { "id": "8eeYQv02w_l9", "outputId": "464101d2-979d-4bea-d711-9561d3a19e32", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "True" ] }, "metadata": {}, "execution_count": 125 } ] }, { "cell_type": "code", "source": [ "dataset=tf.data.Dataset.from_tensor_slices((dict(inputs), label)) #Create a tensorflow dataset\n", "#train test split, we use 10% of the data for validation\n", "val_data_size=int(0.2*len(label))\n", "val_ds=dataset.take(val_data_size).batch(batch_size, drop_remainder=True) \n", "train_ds=dataset.skip(val_data_size).batch(batch_size, drop_remainder=True)\n", "train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)" ], "metadata": { "id": "U9ScngQYvrt3" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "with tpu_strategy.scope():\n", " model = TFAutoModelForSequenceClassification.from_pretrained(\"distilbert-base-uncased\", num_labels=2)\n", " model.compile(\n", " optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5, clipnorm=1.),\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=[tf.metrics.SparseCategoricalAccuracy()],\n", " )\n", " \n", "history=model.fit(train_ds, validation_data=val_ds, epochs=5, verbose=1)\n" ], "metadata": { "id": "dhTNraaNyhCS", "outputId": "c1d121a4-def4-4e54-8d66-2c20569ec523", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Some layers from the model checkpoint at distilbert-base-uncased were not used when initializing TFDistilBertForSequenceClassification: ['activation_13', 'vocab_layer_norm', 'vocab_projector', 'vocab_transform']\n", "- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", "- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", "Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier', 'classifier', 'dropout_39']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "Epoch 1/5\n", "122/122 [==============================] - 106s 457ms/step - loss: 0.1616 - sparse_categorical_accuracy: 0.9687 - val_loss: 0.0124 - val_sparse_categorical_accuracy: 0.9993\n", "Epoch 2/5\n", "122/122 [==============================] - 50s 414ms/step - loss: 0.0096 - sparse_categorical_accuracy: 0.9991 - val_loss: 0.0042 - val_sparse_categorical_accuracy: 0.9997\n", "Epoch 3/5\n", "122/122 [==============================] - 50s 414ms/step - loss: 0.0039 - sparse_categorical_accuracy: 0.9996 - val_loss: 0.0030 - val_sparse_categorical_accuracy: 0.9996\n", "Epoch 4/5\n", "122/122 [==============================] - 51s 414ms/step - loss: 0.0019 - sparse_categorical_accuracy: 0.9998 - val_loss: 0.0014 - val_sparse_categorical_accuracy: 0.9999\n", "Epoch 5/5\n", "122/122 [==============================] - 51s 415ms/step - loss: 0.0016 - sparse_categorical_accuracy: 0.9997 - val_loss: 0.0036 - val_sparse_categorical_accuracy: 0.9993\n" ] } ] }, { "cell_type": "code", "source": [ "model.save_weights('./distill_bert_fake_news_saved_weights_epoch_5_0-8.h5')" ], "metadata": { "id": "LuTypVNSz8oA" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "model.evaluate(val_ds)" ], "metadata": { "id": "niOM7TYi__N6", "outputId": "6fea2853-4c43-44fb-8b6b-52a4eca4ef8c", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "30/30 [==============================] - 5s 121ms/step - loss: 0.0036 - sparse_categorical_accuracy: 0.9993\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "[0.0036058793775737286, 0.9993489980697632]" ] }, "metadata": {}, "execution_count": 129 } ] }, { "cell_type": "code", "source": [ "output = model.predict(val_ds)" ], "metadata": { "id": "p8IY5uTcATkX" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "val_labels = list(val_ds.as_numpy_iterator())" ], "metadata": { "id": "YD1sez_xIdh9" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "val_true = []\n", "for i in range(len(val_labels)):\n", " val_true.append(val_labels[i][1])" ], "metadata": { "id": "kbtefx9QIvOz" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "val_text = []\n", "for i in range(len(val_labels)):\n", " val_text.append(val_labels[i][0]['input_ids'])" ], "metadata": { "id": "-5DN3sUla3zK" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "import numpy as np\n", "val_true_ = np.concatenate(val_true)" ], "metadata": { "id": "jNTnsNwnJc0Q" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "val_text_ = np.concatenate(val_text)" ], "metadata": { "id": "6bZo_FDkbbaY" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from scipy.special import softmax" ], "metadata": { "id": "7LuMPSGWKzdF" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "output_softmax = softmax(output.logits,axis=1)" ], "metadata": { "id": "f-YBX5JWKmsU" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "output_values = np.argmax(output_softmax,axis=1)" ], "metadata": { "id": "QQCEkX9eK5Kl" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "from sklearn.metrics import roc_curve\n", "fpr, tpr, thresholds = roc_curve(val_true_, output_softmax[:,1])\n", "from sklearn.metrics import roc_auc_score\n", "auc = roc_auc_score(val_true_, output_softmax[:,1])\n", "print('AUC: %.3f' % auc)" ], "metadata": { "id": "yIOR8wfAKgMs", "outputId": "05f98fac-c218-4180-f8a8-54540120ea9f", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "AUC: 1.000\n" ] } ] }, { "cell_type": "code", "source": [ "import matplotlib.pyplot as plt\n", "\n", "#create ROC curve\n", "plt.plot(fpr,tpr)\n", "plt.ylabel('True Positive Rate')\n", "plt.xlabel('False Positive Rate')\n", "plt.show()" ], "metadata": { "id": "dd711OzxLpoi", "outputId": "5ac29de9-6ebf-4705-c00a-779ebe8ad504", "colab": { "base_uri": "https://localhost:8080/", "height": 279 } }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "code", "source": [ "bool_ = output_values == val_true_" ], "metadata": { "id": "PdJ6rBPRN14W" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "bool_ = list(bool_)" ], "metadata": { "id": "Lv68D_g5N9G7" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "indices = [i for i, x in enumerate(bool_) if x == False]" ], "metadata": { "id": "-0N0Pg2nOam5" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "indices = [1339, 2615, 6600, 7026] #wrong prediction" ], "metadata": { "id": "QxxBij34OJC2" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "for i in indices: # 0 -> True News 1-> False News\n", " print('Sample ',i)\n", " print('Predicted Value: ',np.argmax(softmax(model.predict(val_text_[i,:].reshape(1,512)).logits,axis=1),axis=1),' True Value: ',val_true_[i])\n" ], "metadata": { "id": "_b9RiItGcKI1", "outputId": "0badb610-f732-445a-b53b-8bec9d297463", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Sample 1339\n", "Predicted Value: [0] True Value: 1\n", "Sample 2615\n", "Predicted Value: [0] True Value: 1\n", "Sample 6600\n", "Predicted Value: [0] True Value: 1\n", "Sample 7026\n", "Predicted Value: [0] True Value: 1\n" ] } ] }, { "cell_type": "code", "source": [ "print('Wrong Prediction')\n", "for i in indices:\n", " print(tokenizer.decode(val_text_[i]))" ], "metadata": { "id": "ruoYIFXkipMX", "outputId": "fe11069a-c0d3-42e9-bbcb-9d21bc538399", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Wrong Prediction\n", "[CLS] trump tells state department to make cut more than 50 % of funding to u. n. president trump s administration has told the state department to cut more than 50 percent of u. s. funding to united nations programs, foreign policy reported. the push for the drastic reductions comes as the white house is scheduled to release its 2018 topline budget proposal thursday, which is expected to include a 37 percent cut to the state department and u. s. agency for international development budgets. it s not clear if trump s budget plan, from the office of management and budget, would reflect the full extent of trump s proposed cuts to the u. n. richard gowan, a u. n. expert at the european council on foreign relations, said the alterations would spark chaos if true. [ it would ] leave a gaping hole that other big donors would struggle to fill, he told fp, pointing to how the u. s. provided $ 1. 5 billion of the u. n. refugee agency s $ 4 billion budget last year. via : the hill [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]\n", "[CLS] u. s. news and world report publishes list of top 10 most popular nations where refugees want to live more than 21, 000 people from all regions of the world participated in the best countries survey, in which they assessed how closely they associated 80 countries with specific characteristics. four of these economically stable, good job market, income equality and is a place i would live were included in the best countries to be an immigrant ranking. countries also were scored in relation to others on the share of migrants in their population ; the amount of remittances the migrants they host sent home ; and graded on a united nations assessment of integration measures provided for immigrants, such as language training and transfers of job certifications, and the rationale behind current integration policies. [ lol! not assimilation! ed ] note how important remittances are! those are dollars sent out of the host economy and i will bet a buck that any economic study that seeks to justify migration benefits to a country, never factors in how much of the migrants earnings ( or their welfare payments! ) are sent out of the country ( and thus lost to the host country s economy! ). ann corcoran refugee resettlement watch1. sweden ( this, coincidentally, is the country i have for a long time ranked as # 1 to fall to the islamists! ) 2. canada ( just be sure our northern border is fortified! ) 3. switzerland4. australia5. germany6. norway7. us8. netherlands9. finland10. denmarknotice that arab ( mostly muslim ) countries are not a desired destination. gee, why is thatn", "[CLS] energy department to close office of international climate and technology in response to the u. s. withdrawing from the paris climate agreement earlier this month, the energy department is shutting down the office of international climate and technology, a department that works with other countries to develop clean energy technology. an agency spokesman tried to justify the energy department shutting down the office by stating that the doe is looking for ways to consolidate the many duplicative programs that currently exist within doe, thus the office of international climate and technology is getting the chop. the 11 - person office has been in operation since 2010, operating as a means for the u. s. to work with international partners on energy sector technology in an effort to reduce greenhouse gases. the employees of the office of international climate and technology also play a large part in the clean energy ministerial, a conference for high - polluting nations to focus on making the energy sector greener. doe spokeswoman shaylyn hynes said there are numerous international offices within the energy department that could take on the work of the office of international climate and technology, however, she failed to acknowledge whether one actually would. the office of energy efficiency and renewable energy ( eere ) has an international affairs team, while the international affairs office has a renewables team, hynes said. the department is looking for ways to eliminate this kind of unnecessary duplication just like any responsible american business would. the closing of this particular office is most likely a direct result of trump s 2018 budget proposal, which slashes funding for both the doe and the environmental protection agency, particularly cuts for climate change initiatives and research efforts. naturally, environmentalists are horrified by the news of the international office s closure. willfully ignoring the climate crisis is recklessly and unnecessarily dangerous for families and communities across the country, and it s clear that trump will stop at nothing to completely isolate the united states and irreparably damage our reputation with the rest of the world, said john coequyt, the global climate policy director at the sierra club. ignorance is not diplomacy, and if trump were acting like a leader, he would know that. hynes responded by saying that the trump administration is not bringing an end to its clean energy efforts, drawing particular attention to energy secretary rick perry s support for carbon capture storage and nuclear energy efforts at a recent clean energy ministerial. featured image via kevin frayer / getty images [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]\n", "[CLS] democrats go on the attack against voter id laws virginia s voter id law is being challenged in court and with the loss of a conservative majority on the supreme court, this might just be the first domino to fall, signaling the end of wide - spread voter suppression by republicans : ( reuters ) a virginia law requiring voters to show photo identification goes on trial in federal court on monday, with democratic officials claiming it is discriminatory and aimed at keeping party voters from casting ballots. defenders of the 2013 virginia law say that it is aimed at preventing voter fraud. the trial, in u. s. district court in richmond, virginia, is one of several voting rights legal battles in process as democrats and republicans square off ahead of november s presidential election. the democratic party of virginia and two party activists are suing the virginia state board of elections and want judge henry hudson to strike down the law. the problem for the right is that the premise behind these laws has always been a lie. republicans only became concerned with voter fraud when it became clear that their white hegemony was in immediate danger. adding to this problem was the over - the - top way republicans attacked people s ability to vote. if voter id had been a stand alone law, then republicans might have been able to maintain the fiction that it was just about voter fraud. however, they also cut voting hours, the number of polling places ( but only in democratic - leaning areas ), made the ids difficult to get and, of course, kept bragging about how it would hurt the democrats. and just to be absolutely clear that voter id laws are bullshit, here s my favorite quote from the new york times ( 4 / 12 / 2007 ) : five years after the bush administration began a crackdown on voter fraud, the justice department has turned up virtually no evidence of any organized effort to skew federal elections, according to court records and interviews. although republican activists have repeatedly said fraud is so widespread that it has corrupted the political process and, possibly, cost the party election victories, about 120 people have been charged and 86 convicted as of last year. widespread voter fraud does not exist. the only way for a court to uphold voter id is to willfully turn a blind eye to the glaring pattern of voter suppression republicans have openly laid out. if you re wondering why they were so brazen, it s because they were quite confidant that the supreme court would back their partisan attack on democracy. after all, they gutted the voting rights act by claiming, in all seriousness, [SEP]\n" ] } ] }, { "cell_type": "markdown", "source": [ "#The model classifies only 4 data samples predicted as true news though they are fake news. It is performing well on the test set though it is not very large (20% of the data). Model hyperparameter tuning could be done to get more accurate results on larger dataset. Particularly automated hyperparameter tuning using libraries like optuna." ], "metadata": { "id": "QzeGqPGWj2sz" } }, { "cell_type": "markdown", "source": [ "**Write up**: \n", "* Link to the model on Hugging Face Hub: \n", "* Include some examples of misclassified news articles. Please explain what you might do to improve your model's performance on these news articles in the future (you do not need to impelement these suggestions)" ], "metadata": { "id": "kpInVUMLyJ24" } }, { "cell_type": "markdown", "metadata": { "id": "jTfHpo6BOmE8" }, "source": [ "# 3. Deep RL / Robotics" ] }, { "cell_type": "markdown", "metadata": { "id": "saB64bbTXWgZ" }, "source": [ "**RL for Classical Control:** Using any of the [classical control](https://github.com/openai/gym/blob/master/docs/environments.md#classic-control) environments from OpenAI's `gym`, implement a deep NN that learns an optimal policy which maximizes the reward of the environment.\n", "\n", "* Describe the NN you implemented and the behavior you observe from the agent as the model converges (or diverges).\n", "* Plot the reward as a function of steps (or Epochs).\n", "Compare your results to a random agent.\n", "* Discuss whether you think your model has learned the optimal policy and potential methods for improving it and/or where it might fail.\n", "* (Optional) [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.\n", "\n", "\n", "You may use any frameworks you like, but you must implement your NN on your own (no pre-defined/trained models like [`stable_baselines`](https://stable-baselines.readthedocs.io/en/master/)).\n", "\n", "You may use any simulator other than `gym` _however_:\n", "* The environment has to be similar to the classical control environments (or more complex like [`robosuite`](https://github.com/ARISE-Initiative/robosuite)).\n", "* You cannot choose a game/Atari/text based environment. The purpose of this challenge is to demonstrate an understanding of basic kinematic/dynamic systems." ] }, { "cell_type": "code", "source": [ "### WRITE YOUR CODE TO TRAIN THE MODEL HERE" ], "metadata": { "id": "CUhkTcoeynVv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Write up**: \n", "* (Optional) link to the model on Hugging Face Hub: \n", "* Discuss whether you think your model has learned the optimal policy and potential methods for improving it and/or where it might fail." ], "metadata": { "id": "bWllPZhJyotg" } }, { "cell_type": "markdown", "metadata": { "id": "rbrRbrISa5J_" }, "source": [ "# 4. Theory / Linear Algebra " ] }, { "cell_type": "markdown", "metadata": { "id": "KFkLRCzTXTzL" }, "source": [ "**Implement Contrastive PCA** Read [this paper](https://www.nature.com/articles/s41467-018-04608-8) and implement contrastive PCA in Python.\n", "\n", "* First, please discuss what kind of dataset this would make sense to use this method on\n", "* Implement the method in Python (do not use previous implementations of the method if they already exist)\n", "* Then create a synthetic dataset and apply the method to the synthetic data. Compare with standard PCA.\n" ] }, { "cell_type": "markdown", "source": [ "**Write up**: Discuss what kind of dataset it would make sense to use Contrastive PCA" ], "metadata": { "id": "TpyqWl-ly0wy" } }, { "cell_type": "code", "source": [ "### WRITE YOUR CODE HERE" ], "metadata": { "id": "1CQzUSfQywRk" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# 5. Systems" ], "metadata": { "id": "dlqmZS5Hy6q-" } }, { "cell_type": "markdown", "source": [ "**Inference on the edge**: Measure the inference times in various computationally-constrained settings\n", "\n", "* Pick a few different speech detection models (we suggest looking at models on the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads))\n", "* Simulate different memory constraints and CPU allocations that are realistic for edge devices that might run such models, such as smart speakers or microcontrollers, and measure what is the average inference time of the models under these conditions \n", "* How does the inference time vary with (1) choice of model (2) available system memory (3) available CPU (4) size of input?\n", "\n", "Are there any surprising discoveries? (Note that this coding challenge is fairly open-ended, so we will be considering the amount of effort invested in discovering something interesting here)." ], "metadata": { "id": "QW_eiDFw1QKm" } }, { "cell_type": "code", "source": [ "### WRITE YOUR CODE HERE" ], "metadata": { "id": "OYp94wLP1kWJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "**Write up**: What surprising discoveries do you see?" ], "metadata": { "id": "yoHmutWx2jer" } } ] }