{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "lawNHLqffR_m"
},
"source": [
"# SCC0633/SCC5908 - Processamento de Linguagem Natural\n",
"> **Docente:** Thiago Alexandre Salgueiro Pardo \\\n",
"> **Estagiário PAE:** Germano Antonio Zani Jorge\n",
"\n",
"\n",
"# Integrantes do Grupo: GPTrouxas\n",
"> André Guarnier De Mitri - 11395579 \\\n",
"> Daniel Carvalho - 10685702 \\\n",
"> Fernando - 11795342 \\\n",
"> Lucas Henrique Sant'Anna - 10748521 \\\n",
"> Magaly L Fujimoto - 4890582"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pV6WGoBln8id"
},
"source": [
"# New Section"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Abordagem Estatístico\n",
"A arquitetura da solução estatística/neural envolve duas abordagens que\n",
"serão descritas neste documento. A primeira abordagem envolve utilizar\n",
"TF-IDF e Naive Bayes. E a segunda abordagem irá utilizar Word2Vec e um\n",
"modelo transformers pré-treinado da família BERT, realizando finetuning do\n",
"modelo.\n",
"\n",
"Na primeira abordagem, utilizaremos o TF-IDF, que leva em consideração a\n",
"frequência de ocorrência dos termos em um corpus e gera uma sequência de\n",
"vetores que serão fornecidos ao Naive Bayes para classificação da review como\n",
"positiva ou negativa.\n",
"\n",
"\n",
"Na segunda abordagem, utilizaremos o Word2Vec para vetorizar as reviews.\n",
"Após dividir em treino e teste, faremos o fine tuning de um modelo do tipo BERT\n",
"para o nosso problema e dataset específico. Com o BERT adaptado, faremos a\n",
"classificação de nossos textos, medindo o seu desempenho com F1 score e\n",
"acurácia.\n",
"\n",
"![alt text](../imagens/BERT_TDIDF.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vfP54aryxZBg"
},
"source": [
"\n",
"## # Etapas da Abordagem Estatística\n",
"\n",
"1. **Bibliotecas**: Importamos as bibliotecas necessárias, considerando pandas para manipulação de dados, train_test_split para dividir o conjunto de dados em conjuntos de treinamento e teste, TfidfVectorizer para vetorização de texto usando TF-IDF, MultinomialNB para implementar o classificador Naive Bayes Multinomial e algumas métricas de avaliação.\n",
"\n",
"2. **Conjunto de dados**: Carregar o conjunto de dados e armazená-lo em um dataframe usando pandas.\n",
"\n",
"3. **Dividir o conjunto de dados**: Usamos `train_test_split` para dividir o DataFrame em conjuntos de treinamento e teste.\n",
"\n",
"4. **TF-IDF**: Usamos `TfidfVectorizer` para converter as revisões de texto em vetores numéricos usando a técnica TF-IDF. Em seguida, ajustamos e transformamos tanto o conjunto de treinamento quanto o conjunto de teste.\n",
"\n",
"5. **Naive Bayes**: Treinamos um classificador Naive Bayes Multinomial e usamos o modelo treinado para prever os sentimentos no conjunto de teste usando `predict`.\n",
"\n",
"6. **Avaliação e Resultados**: Salvamos os resultados em um novo dataframe `results_df` contendo as revisões do conjunto de teste, os sentimentos originais e os sentimentos previstos pelo modelo. Além disso, avaliamos o modelo verificando algumas métricas e a matriz de confusão.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TbLraa4UhWDJ"
},
"source": [
"\n",
"## # Baixando, Carregando os dados e Pré Processamento\n",
"\n",
"1. Transformar todos os textos em lowercase \\\\\n",
"2. Remoção de caracteres especiais \\\\\n",
"3. Remoção de stop words \\\\\n",
"4. Lematização (Lemmatization) \\\\\n",
"5. Tokenização \\\\"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"id": "bIWmIe0qfTbE"
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "Wf0n2yPdAn4C",
"outputId": "37eb3c4d-40c1-41a0-9b1a-d93ed6e272f3"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"db\",\n \"rows\": 50000,\n \"fields\": [\n {\n \"column\": \"review\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 49582,\n \"samples\": [\n \"\\\"Soul Plane\\\" is a horrible attempt at comedy that only should appeal people with thick skulls, bloodshot eyes and furry pawns.
The plot is not only incoherent but also non-existent, acting is mostly sub sub-par with a gang of highly moronic and dreadful characters thrown in for bad measure, jokes are often spotted miles ahead and almost never even a bit amusing. This movie lacks any structure and is full of racial stereotypes that must have seemed old even in the fifties, the only thing it really has going for it is some pretty ladies, but really, if you want that you can rent something from the \\\"Adult\\\" section. OK?
I can hardly see anything here to recommend since you'll probably have a lot a better and productive time chasing rats with a sledgehammer or inventing waterproof teabags or whatever.
2/10\",\n \"Guest from the Future tells a fascinating story of time travel, friendship, battle of good and evil -- all with a small budget, child actors, and few special effects. Something for Spielberg and Lucas to learn from. ;) A sixth-grader Kolya \\\"Nick\\\" Gerasimov finds a time machine in the basement of a decrepit building and travels 100 years into the future. He discovers a near-perfect, utopian society where robots play guitars and write poetry, everyone is kind to each other and people enjoy everything technology has to offer. Alice is the daughter of a prominent scientist who invented a device called Mielophone that allows to read minds of humans and animals. The device can be put to both good and bad use, depending on whose hands it falls into. When two evil space pirates from Saturn who want to rule the universe attempt to steal Mielophone, it falls into the hands of 20th century school boy Nick. With the pirates hot on his tracks, he travels back to his time, followed by the pirates, and Alice. Chaos, confusion and funny situations follow as the luckless pirates try to blend in with the earthlings. Alice enrolls in the same school Nick goes to and demonstrates superhuman abilities in PE class. The catch is, Alice doesn't know what Nick looks like, while the pirates do. Also, the pirates are able to change their appearance and turn literally into anyone. (Hmm, I wonder if this is where James Cameron got the idea for Terminator...) Who gets to Nick -- and Mielophone -- first? Excellent plot, non-stop adventures, and great soundtrack. I wish Hollywood made kid movies like this one...\",\n \"\\\"National Treasure\\\" (2004) is a thoroughly misguided hodge-podge of plot entanglements that borrow from nearly every cloak and dagger government conspiracy clich\\u00e9 that has ever been written. The film stars Nicholas Cage as Benjamin Franklin Gates (how precious is that, I ask you?); a seemingly normal fellow who, for no other reason than being of a lineage of like-minded misguided fortune hunters, decides to steal a 'national treasure' that has been hidden by the United States founding fathers. After a bit of subtext and background that plays laughably (unintentionally) like Indiana Jones meets The Patriot, the film degenerates into one misguided whimsy after another \\u0096 attempting to create a 'Stanley Goodspeed' regurgitation of Nicholas Cage and launch the whole convoluted mess forward with a series of high octane, but disconnected misadventures.
The relevancy and logic to having George Washington and his motley crew of patriots burying a king's ransom someplace on native soil, and then, going through the meticulous plan of leaving clues scattered throughout U.S. currency art work, is something that director Jon Turteltaub never quite gets around to explaining. Couldn't Washington found better usage for such wealth during the start up of the country? Hence, we are left with a mystery built on top of an enigma that is already on shaky ground by the time Ben appoints himself the new custodian of this untold wealth. Ben's intentions are noble \\u0096 if confusing. He's set on protecting the treasure. For who and when?\\u0085your guess is as good as mine.
But there are a few problems with Ben's crusade. First up, his friend, Ian Holmes (Sean Bean) decides that he can't wait for Ben to make up his mind about stealing the Declaration of Independence from the National Archives (oh, yeah \\u0096 brilliant idea!). Presumably, the back of that famous document holds the secret answer to the ultimate fortune. So Ian tries to kill Ben. The assassination attempt is, of course, unsuccessful, if overly melodramatic. It also affords Ben the opportunity to pick up, and pick on, the very sultry curator of the archives, Abigail Chase (Diane Kruger). She thinks Ben is clearly a nut \\u0096 at least at the beginning. But true to action/romance form, Abby's resolve melts quicker than you can say, \\\"is that the Hope Diamond?\\\" The film moves into full X-File-ish mode, as the FBI, mistakenly believing that Ben is behind the theft, retaliate in various benign ways that lead to a multi-layering of action sequences reminiscent of Mission Impossible meets The Fugitive. Honestly, don't those guys ever get 'intelligence' information that is correct? In the final analysis, \\\"National Treasure\\\" isn't great film making, so much as it's a patchwork rehash of tired old bits from other movies, woven together from scraps, the likes of which would make IL' Betsy Ross blush.
The Buena Vista DVD delivers a far more generous treatment than this film is deserving of. The anamorphic widescreen picture exhibits a very smooth and finely detailed image with very rich colors, natural flesh tones, solid blacks and clean whites. The stylized image is also free of blemishes and digital enhancements. The audio is 5.1 and delivers a nice sonic boom to your side and rear speakers with intensity and realism. Extras include a host of promotional junket material that is rather deep and over the top in its explanation of how and why this film was made. If only, as an audience, we had had more clarification as to why Ben and co. were chasing after an illusive treasure, this might have been one good flick. Extras conclude with the theatrical trailer, audio commentary and deleted scenes. Not for the faint-hearted \\u0096 just the thick-headed.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sentiment\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"negative\",\n \"positive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe",
"variable_name": "db"
},
"text/html": [
"\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
review
\n",
"
sentiment
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
One of the other reviewers has mentioned that ...
\n",
"
positive
\n",
"
\n",
"
\n",
"
1
\n",
"
A wonderful little production. <br /><br />The...
\n",
"
positive
\n",
"
\n",
"
\n",
"
2
\n",
"
I thought this was a wonderful way to spend ti...
\n",
"
positive
\n",
"
\n",
"
\n",
"
3
\n",
"
Basically there's a family where a little boy ...
\n",
"
negative
\n",
"
\n",
"
\n",
"
4
\n",
"
Petter Mattei's \"Love in the Time of Money\" is...
\n",
"
positive
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"\n",
"\n",
"\n",
" \n",
"
\n",
"\n",
"
\n",
"
\n"
],
"text/plain": [
" review sentiment\n",
"0 One of the other reviewers has mentioned that ... positive\n",
"1 A wonderful little production.
The... positive\n",
"2 I thought this was a wonderful way to spend ti... positive\n",
"3 Basically there's a family where a little boy ... negative\n",
"4 Petter Mattei's \"Love in the Time of Money\" is... positive"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db = pd.read_csv('imdb_reviews.csv')\n",
"db.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6PlfPScGMF1_",
"outputId": "2a0bd4a1-e22a-429d-82a4-5984eeab7b9d"
},
"outputs": [
{
"data": {
"text/plain": [
"sentiment\n",
"positive 25000\n",
"negative 25000\n",
"Name: count, dtype: int64"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db['sentiment'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Kev0EaSmMa4N",
"outputId": "eab73a61-ba36-4d72-e4f2-82236f9f2880"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Quantidade de valores faltantes para cada variável do dataset:\n",
"review 0\n",
"sentiment 0\n",
"dtype: int64\n"
]
}
],
"source": [
"valores_ausentes = db.isnull().sum(axis=0)\n",
"print('Quantidade de valores faltantes para cada variável do dataset:')\n",
"print(valores_ausentes)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 276
},
"id": "1AI3rN0KMuUq",
"outputId": "7ea5c91b-362e-49eb-82a7-6e8535f0e591"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
"[nltk_data] Package stopwords is already up-to-date!\n",
"[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
"[nltk_data] Package wordnet is already up-to-date!\n"
]
},
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"db\",\n \"rows\": 50000,\n \"fields\": [\n {\n \"column\": \"review\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 49574,\n \"samples\": [\n \"moving intriguing absorbing however story little choppy hard follow time although two principal actor great job seeing senn penn acting every fiber stealing every frame made memorable movie later movie revealed one role actor also showed comedic flair sweet lowdown surprisingly talented light weight used think \",\n \"gem go direct video fabulous art direction mood never miss beat truman show meet metropolis excellent cast never seen laura dern better bill macy always fabulous said david paymer meat loaf incredible film \",\n \"watched movie dismayed say least movie failed communicate audience language would put shame street loafer plot father forcing none son marry seems far fetched idea grandmother asking grand kid mess enemy would draw feeble minded attention waiting whole movie laugh laugh stupidity waste 3 hour convince movie even worth first look hope save time \"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sentiment\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"negative\",\n \"positive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe",
"variable_name": "db"
},
"text/html": [
"\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
review
\n",
"
sentiment
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
one reviewer mentioned watching 1 oz episode h...
\n",
"
positive
\n",
"
\n",
"
\n",
"
1
\n",
"
wonderful little production filming technique ...
\n",
"
positive
\n",
"
\n",
"
\n",
"
2
\n",
"
thought wonderful way spend time hot summer we...
\n",
"
positive
\n",
"
\n",
"
\n",
"
3
\n",
"
basically family little boy jake think zombie ...
\n",
"
negative
\n",
"
\n",
"
\n",
"
4
\n",
"
petter mattei love time money visually stunnin...
\n",
"
positive
\n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"\n",
" \n",
"\n",
" \n",
"
\n",
"\n",
"\n",
"
\n",
" \n",
"\n",
"\n",
"\n",
" \n",
"
\n",
"\n",
"
\n",
"
\n"
],
"text/plain": [
" review sentiment\n",
"0 one reviewer mentioned watching 1 oz episode h... positive\n",
"1 wonderful little production filming technique ... positive\n",
"2 thought wonderful way spend time hot summer we... positive\n",
"3 basically family little boy jake think zombie ... negative\n",
"4 petter mattei love time money visually stunnin... positive"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import re\n",
"import nltk\n",
"from nltk.corpus import stopwords\n",
"from nltk.stem import PorterStemmer\n",
"from nltk.stem import WordNetLemmatizer\n",
"\n",
"def lowercase_text(text):\n",
" return text.lower()\n",
"\n",
"def remove_html(text):\n",
" return re.sub(r'<[^<]+?>', '', text)\n",
"\n",
"def remove_url(text):\n",
" return re.sub(r'http[s]?://\\S+|www\\.\\S+', '', text)\n",
"\n",
"def remove_punctuations(text):\n",
" tokens_list = '!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'\n",
" for char in text:\n",
" if char in tokens_list:\n",
" text = text.replace(char, ' ')\n",
"\n",
" return text\n",
"\n",
"def remove_emojis(text):\n",
" emojis = re.compile(\"[\"\n",
" u\"\\U0001F600-\\U0001F64F\"\n",
" u\"\\U0001F300-\\U0001F5FF\"\n",
" u\"\\U0001F680-\\U0001F6FF\"\n",
" u\"\\U0001F1E0-\\U0001F1FF\"\n",
" u\"\\U00002500-\\U00002BEF\"\n",
" u\"\\U00002702-\\U000027B0\"\n",
" u\"\\U00002702-\\U000027B0\"\n",
" u\"\\U000024C2-\\U0001F251\"\n",
" u\"\\U0001f926-\\U0001f937\"\n",
" u\"\\U00010000-\\U0010ffff\"\n",
" u\"\\u2640-\\u2642\"\n",
" u\"\\u2600-\\u2B55\"\n",
" u\"\\u200d\"\n",
" u\"\\u23cf\"\n",
" u\"\\u23e9\"\n",
" u\"\\u231a\"\n",
" u\"\\ufe0f\"\n",
" u\"\\u3030\"\n",
" \"]+\", re.UNICODE)\n",
"\n",
" text = re.sub(emojis, '', text)\n",
" return text\n",
"\n",
"def remove_stop_words(text):\n",
" stop_words = stopwords.words('english')\n",
" new_text = ''\n",
" for word in text.split():\n",
" if word not in stop_words:\n",
" new_text += ''.join(f'{word} ')\n",
"\n",
" return new_text.strip()\n",
"\n",
"def lem_words(text):\n",
" lemma = WordNetLemmatizer()\n",
" new_text = ''\n",
" for word in text.split():\n",
" new_text += ''.join(f'{lemma.lemmatize(word)} ')\n",
"\n",
" return new_text\n",
"\n",
"def preprocess_text(text):\n",
" text = lowercase_text(text)\n",
" text = remove_html(text)\n",
" text = remove_url(text)\n",
" text = remove_punctuations(text)\n",
" text = remove_emojis(text)\n",
" text = remove_stop_words(text)\n",
" text = lem_words(text)\n",
"\n",
" return text\n",
"\n",
"nltk.download('stopwords')\n",
"nltk.download('wordnet')\n",
"db['review'] = db['review'].apply(preprocess_text)\n",
"db.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QgufZpgHnPa4"
},
"source": [
"# **Conjunto de Treino e teste**"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"id": "s0lJ6Q0tnPka"
},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"X= db['review']\n",
"y= db['sentiment']\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state= 12)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nz4erCEJuD4-",
"outputId": "88d57536-66e7-4d9b-e016-bf40183d4c45"
},
"outputs": [
{
"data": {
"text/plain": [
"35235 disagree people saying lousy horror film good ...\n",
"36936 husband wife doctor team carole nile nelson mo...\n",
"46486 like cast pretty much however story sort unfol...\n",
"27160 movie awful bad bear expend anything word avoi...\n",
"19490 purchased blood castle dvd ebay buck knowing s...\n",
" ... \n",
"36482 strange thing see film scene work rather weakl...\n",
"40177 saw cheap dvd release title entity force since...\n",
"19709 one peculiar oft used romance movie plot one s...\n",
"38555 nothing positive say meandering nonsense huffi...\n",
"14155 low moment life bewildered depressed sitting r...\n",
"Name: review, Length: 40000, dtype: object"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6LX-6e-QlioJ"
},
"source": [
"# **TD-IDF e Naive Bayes**"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"id": "gscB9-obNusA"
},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix,classification_report\n",
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.preprocessing import StandardScaler as encoder\n",
"from sklearn.metrics import (\n",
" accuracy_score,\n",
" confusion_matrix,\n",
" ConfusionMatrixDisplay,\n",
" f1_score,\n",
")\n",
"\n",
"\n",
"tfidf = TfidfVectorizer()\n",
"tfidf_train = tfidf.fit_transform(X_train)\n",
"tfidf_test = tfidf.transform(X_test)\n",
"\n",
"from sklearn.naive_bayes import MultinomialNB\n",
"\n",
"naive_bayes = MultinomialNB()\n",
"\n",
"naive_bayes.fit(tfidf_train, y_train)\n",
"y_pred = naive_bayes.predict(tfidf_test)\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "RfJ7AHMZvAb8",
"outputId": "685701e1-b1e8-47fb-9dc5-1bc04dd3894b"
},
"outputs": [
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"results_df\",\n \"rows\": 10000,\n \"fields\": [\n {\n \"column\": \"review\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 9990,\n \"samples\": [\n \"saw lot film charles dickens christmas carol one best atmosphere exactly book actor george c scott others great unfortunately often watch film germany switzerland \",\n \"loved first season quality went little bit second season however great middle pegasus third season fairly novel original ok fourth season started going downhill fast never even began giving u explanation really starting need hell cylon plan two cylon faction point angel kara leading fleet devastated earth 1 kind past last five cylons survive reincarnation question everywhere answer nowhere come end earth 2 earth past well okay destroying fleet giving technology giving kind urban life spreading thousand people paper thinly across planet anti science anti reason anti life philosophy show seems humanity forever trapped cycle going nature romanticism decadent capitalist society inventing destructive ruin everything without vision without hope grander future humanity antithetical proper science fiction even get started angel religious claptrap worst kind ultimate disappointment whole happened happen thing related previous incarnation series earth know making new show somehow consistent old would definitive stroke genius frakkin shame 1 10 \",\n \"guest future tell fascinating story time travel friendship battle good evil small budget child actor special effect something spielberg lucas learn sixth grader kolya nick gerasimov find time machine basement decrepit building travel 100 year future discovers near perfect utopian society robot play guitar write poetry everyone kind people enjoy everything technology offer alice daughter prominent scientist invented device called mielophone allows read mind human animal device put good bad use depending whose hand fall two evil space pirate saturn want rule universe attempt steal mielophone fall hand 20th century school boy nick pirate hot track travel back time followed pirate alice chaos confusion funny situation follow luckless pirate try blend earthling alice enrolls school nick go demonstrates superhuman ability pe class catch alice know nick look like pirate also pirate able change appearance turn literally anyone hmm wonder james cameron got idea terminator get nick mielophone first excellent plot non stop adventure great soundtrack wish hollywood made kid movie like one \"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"original sentiment\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"positive\",\n \"negative\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"predicted sentiment\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"positive\",\n \"negative\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe",
"variable_name": "results_df"
},
"text/html": [
"\n",
"