{ "cells": [ { "cell_type": "markdown", "id": "a420b5d2", "metadata": {}, "source": [ "# Öğrenme Ajansı Laboratuvarı Otomatik Deneme Puanlaması 2.0 - Learning Agency Lab Automated Essay Scoring 2.0" ] }, { "cell_type": "markdown", "id": "e91c5ef7", "metadata": {}, "source": [ "TR = Her yorum satırı kendisini üstündeki koda aittir. İlk olarak Türkçe, son olarak İngilizce yazıldı.\n", "\n", "EN = Each comment line belongs to the code above it. It was first written in Turkish and lastly in English.\n", "\n", "TR = Bu proje, derin öğrenme tekniklerini kullanarak gelişmiş bir otomatik makale puanlama (AES) sistemi geliştirmeyi amaçlamaktadır. Mevcut çerçeveye dayanarak, \"Learning Agency Lab AES 2.0\" öğrenci makalelerini daha fazla hassasiyet ve adaletle değerlendirme ve notlandırma yeteneğini geliştirir. Sistem makaleleri içerik, dil bilgisi, tutarlılık ve yazım tarzı açısından analiz ederek bütünsel bir değerlendirme sağlar. Sinir ağları, doğal dil işleme (NLP) ve regresyon modellerini kullanarak, bu AES sistemi eğitimciler ve öğrenciler için daha doğru, tutarlı ve etkili geri bildirim sunmak, kişiselleştirilmiş öğrenmeyi ve nesnel değerlendirmeyi desteklemek üzere tasarlanmıştır.\n", "\n", "EN = This project aims to develop an advanced automated essay scoring (AES) system using deep learning techniques. Building upon the existing framework, \"Learning Agency Lab AES 2.0\" enhances the ability to assess and grade student essays with greater precision and fairness. The system analyzes essays for content, grammar, coherence, and writing style, providing a holistic evaluation. By utilizing neural networks, natural language processing (NLP), and regression models, this AES system is designed to offer more accurate, consistent, and efficient feedback for educators and learners alike, supporting personalized learning and objective evaluation.\n", "\n", "Kaynak/Source = https://www.kaggle.com/competitions/learning-agency-lab-automated-essay-scoring-2" ] }, { "cell_type": "code", "execution_count": 1, "id": "17231e0f", "metadata": {}, "outputs": [], "source": [ "#pip install autocorrect" ] }, { "cell_type": "code", "execution_count": 2, "id": "34dfbaa8", "metadata": {}, "outputs": [], "source": [ "#pip install langdetect" ] }, { "cell_type": "code", "execution_count": 3, "id": "0f3cde94", "metadata": {}, "outputs": [], "source": [ "#pip install googletrans==4.0.0-rc1" ] }, { "cell_type": "code", "execution_count": 4, "id": "4afc940c", "metadata": {}, "outputs": [], "source": [ "#pip install wordcloud matplotlib" ] }, { "cell_type": "code", "execution_count": 5, "id": "a6dfba98", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import nltk\n", "import time\n", "import math\n", "import warnings\n", "warnings.filterwarnings('ignore') \n", "import re\n", "import pickle\n", "\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from autocorrect import spell\n", "from textblob import TextBlob\n", "from langdetect import detect\n", "from googletrans import Translator\n", "from PIL import Image\n", "from wordcloud import WordCloud, STOPWORDS\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler ,MinMaxScaler\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense, Flatten, Dropout, BatchNormalization, LeakyReLU\n", "from tensorflow.keras.callbacks import EarlyStopping\n", "from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error\n", "from wordcloud import WordCloud" ] }, { "cell_type": "code", "execution_count": 6, "id": "0d9bde94", "metadata": {}, "outputs": [], "source": [ "pd.set_option(\"display.max_columns\",None) \n", "# TR = En fazla kaç sütun olduğunu gösteriyor. \n", "# EN = It shows the maximum number of columns." ] }, { "cell_type": "code", "execution_count": 7, "id": "830e9a44", "metadata": {}, "outputs": [], "source": [ "df=pd.read_csv('train.csv')\n", "df_test=pd.read_csv('test.csv')" ] }, { "cell_type": "markdown", "id": "2638b690", "metadata": {}, "source": [ "## EDA Keşif Amaçlı Veri Analizi - EDA - Exploratory Data Analysis" ] }, { "cell_type": "code", "execution_count": 8, "id": "e27f4dd1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " essay_id full_text score\n", "0 000d118 Many people have car where they live. The thin... 3\n", "1 000fe60 I am a scientist at NASA that is discussing th... 3\n", "2 001ab80 People always wish they had the same technolog... 4\n", "3 001bdc0 We all heard about Venus, the planet without a... 4\n", "4 002ba53 Dear, State Senator\\n\\nThis is a letter to arg... 3\n", "-----------------------------------\n", " essay_id full_text\n", "0 000d118 Many people have car where they live. The thin...\n", "1 000fe60 I am a scientist at NASA that is discussing th...\n", "2 001ab80 People always wish they had the same technolog...\n" ] } ], "source": [ "print(df.head())\n", "print('-----------------------------------')\n", "print(df_test.head())" ] }, { "cell_type": "code", "execution_count": 9, "id": "6a418b8f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
essay_idfull_textscore
58795859154Dear Senator,\\n\\nWhat do you think about desca...2
11105a3ad983The \"Evening Star\" or otherwise known as Venus...2
12432b6f9938The coming of the future is upon us, and as th...4
350333e61e9Let me tell you about my life i am a young kid...2
14302d2ab150If you were asked to join a promgram called Se...3
\n", "
" ], "text/plain": [ " essay_id full_text score\n", "5879 5859154 Dear Senator,\\n\\nWhat do you think about desca... 2\n", "11105 a3ad983 The \"Evening Star\" or otherwise known as Venus... 2\n", "12432 b6f9938 The coming of the future is upon us, and as th... 4\n", "3503 33e61e9 Let me tell you about my life i am a young kid... 2\n", "14302 d2ab150 If you were asked to join a promgram called Se... 3" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sample(5)" ] }, { "cell_type": "code", "execution_count": 10, "id": "0e1c7b08", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
essay_idfull_textscore
17302ffd378dthe story \" The Challenge of Exploing Venus \" ...2
17303ffddf1fTechnology has changed a lot of ways that we l...4
17304fff016dIf you don't like sitting around all day than ...2
17305fffb49bIn \"The Challenge of Exporing Venus,\" the auth...1
17306fffed3eVenus is worthy place to study but dangerous. ...2
\n", "
" ], "text/plain": [ " essay_id full_text score\n", "17302 ffd378d the story \" The Challenge of Exploing Venus \" ... 2\n", "17303 ffddf1f Technology has changed a lot of ways that we l... 4\n", "17304 fff016d If you don't like sitting around all day than ... 2\n", "17305 fffb49b In \"The Challenge of Exporing Venus,\" the auth... 1\n", "17306 fffed3e Venus is worthy place to study but dangerous. ... 2" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "code", "execution_count": 11, "id": "435d4831", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(17307, 3)\n", "-----------------------------------\n", "(3, 2)\n" ] } ], "source": [ "print(df.shape)\n", "print('-----------------------------------')\n", "print(df_test.shape)" ] }, { "cell_type": "code", "execution_count": 12, "id": "d364189c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 17307 entries, 0 to 17306\n", "Data columns (total 3 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 essay_id 17307 non-null object\n", " 1 full_text 17307 non-null object\n", " 2 score 17307 non-null int64 \n", "dtypes: int64(1), object(2)\n", "memory usage: 405.8+ KB\n", "None\n", "-----------------------------------\n", "\n", "RangeIndex: 3 entries, 0 to 2\n", "Data columns (total 2 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 essay_id 3 non-null object\n", " 1 full_text 3 non-null object\n", "dtypes: object(2)\n", "memory usage: 180.0+ bytes\n", "None\n" ] } ], "source": [ "print(df.info())\n", "print('-----------------------------------')\n", "print(df_test.info())" ] }, { "cell_type": "code", "execution_count": 13, "id": "04e3a3a3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "essay_id 0\n", "full_text 0\n", "score 0\n", "dtype: int64\n", "-----------------------------------\n", "essay_id 0\n", "full_text 0\n", "dtype: int64\n" ] } ], "source": [ "print(df.isnull().sum().sort_values(ascending=False))\n", "print('-----------------------------------')\n", "print(df_test.isnull().sum().sort_values(ascending=False))" ] }, { "cell_type": "raw", "id": "e48bb25b", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "31396181", "metadata": {}, "source": [ "## Gereksiz Verileri Silme İşlemi Yapıyoruz - We Delete Unnecessary Data" ] }, { "cell_type": "code", "execution_count": 14, "id": "d2c6159e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " full_text score\n", "0 Many people have car where they live. The thin... 3\n", "-----------------------------------\n", " full_text\n", "0 Many people have car where they live. The thin...\n" ] } ], "source": [ "df=df.drop('essay_id',axis=1)\n", "df_test=df_test.drop('essay_id',axis=1)\n", "print(df.head(1))\n", "print('-----------------------------------')\n", "print(df_test.head(1))" ] }, { "cell_type": "raw", "id": "43638708", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "d95258c3", "metadata": {}, "source": [ "## Boşluk Varsa Doldurmaya, Düzeltilecek Kısım Varsa Düzeltmeye Başladık - If there is a gap, we started to fill it and if there is a part to be corrected, we started to correct it." ] }, { "cell_type": "code", "execution_count": 15, "id": "eaf1b8a6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Many people have car where they live. The thing they don\\'t know is that when you use a car alot of thing can happen\\xa0like you can get in accidet or\\xa0the smoke that the car has is bad to breath\\xa0on if someone is walk but in VAUBAN,Germany they dont have that proble because 70 percent of vauban\\'s families do not own cars,and 57 percent sold a car to move there. Street parkig ,driveways and home garages are forbidden\\xa0on the outskirts of freiburd that near the French and Swiss borders. You probaly won\\'t see a car in Vauban\\'s streets because they are completely \"car free\" but\\xa0If some that lives in VAUBAN that owns a car ownership is allowed,but there are only two places that you can park a large garages at the edge of the development,where a car owner buys a space but it not cheap to buy one they sell the space for you car for $40,000 along with a home. The vauban people completed this in 2006 ,they said that this an example of a growing trend in Europe,The untile states and some where else are suburban life from auto use this is called \"smart planning\". The current efforts to drastically reduce greenhouse gas emissions from tailes the passengee cars are responsible for 12 percent of greenhouse gas emissions in Europe and up to 50 percent in some car intensive in the United States. I honeslty think that good idea that they did that is Vaudan because that makes cities denser and better for walking and in VAUBAN there are 5,500 residents within a rectangular square mile. In the artical David Gold berg said that \"All of our development since World war 2 has been centered on the cars,and that will have to change\" and i think that was very true what David Gold said because alot thing we need cars to do we can go anyway were with out cars beacuse some people are a very lazy to walk to place thats why they alot of people use car and i think that it was a good idea that that they did that in VAUBAN so people can see how we really don\\'t need car to go to place from place because we can walk from were we need to go or we can ride bycles with out the use of a car. It good that they are doing that if you thik about your help the earth in way and thats a very good thing to. In the United states ,the Environmental protection Agency is promoting what is called \"car reduced\"communtunties,and the legislators are starting to act,if cautiously. Maany experts expect pubic transport serving suburbs to play a much larger role in a new six years federal transportation bill to approved this year. In previous bill,80 percent of appropriations have by law gone to highways and only 20 percent to other transports. There many good reason why they should do this. '" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['full_text'][0]" ] }, { "cell_type": "code", "execution_count": 16, "id": "98f192d6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Many people have car where they live. The thing they don\\'t know is that when you use a car alot of thing can happen\\xa0like you can get in accidet or\\xa0the smoke that the car has is bad to breath\\xa0on if someone is walk but in VAUBAN,Germany they dont have that proble because 70 percent of vauban\\'s families do not own cars,and 57 percent sold a car to move there. Street parkig ,driveways and home garages are forbidden\\xa0on the outskirts of freiburd that near the French and Swiss borders. You probaly won\\'t see a car in Vauban\\'s streets because they are completely \"car free\" but\\xa0If some that lives in VAUBAN that owns a car ownership is allowed,but there are only two places that you can park a large garages at the edge of the development,where a car owner buys a space but it not cheap to buy one they sell the space for you car for $40,000 along with a home. The vauban people completed this in 2006 ,they said that this an example of a growing trend in Europe,The untile states and some where else are suburban life from auto use this is called \"smart planning\". The current efforts to drastically reduce greenhouse gas emissions from tailes the passengee cars are responsible for 12 percent of greenhouse gas emissions in Europe and up to 50 percent in some car intensive in the United States. I honeslty think that good idea that they did that is Vaudan because that makes cities denser and better for walking and in VAUBAN there are 5,500 residents within a rectangular square mile. In the artical David Gold berg said that \"All of our development since World war 2 has been centered on the cars,and that will have to change\" and i think that was very true what David Gold said because alot thing we need cars to do we can go anyway were with out cars beacuse some people are a very lazy to walk to place thats why they alot of people use car and i think that it was a good idea that that they did that in VAUBAN so people can see how we really don\\'t need car to go to place from place because we can walk from were we need to go or we can ride bycles with out the use of a car. It good that they are doing that if you thik about your help the earth in way and thats a very good thing to. In the United states ,the Environmental protection Agency is promoting what is called \"car reduced\"communtunties,and the legislators are starting to act,if cautiously. Maany experts expect pubic transport serving suburbs to play a much larger role in a new six years federal transportation bill to approved this year. In previous bill,80 percent of appropriations have by law gone to highways and only 20 percent to other transports. There many good reason why they should do this. '" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_test['full_text'][0]" ] }, { "cell_type": "code", "execution_count": 17, "id": "0866d8b9", "metadata": {}, "outputs": [], "source": [ "def algo_text(df):\n", "\n", " for col in df.columns:\n", " if df[col].dtype=='object':\n", " df[col] = df[col].str.lower()\n", " df[col] = df[col].str.replace(r'[^\\w\\s]', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\n', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\d+', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\r', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\\\', '')\n", " df[col] = df[col].str.replace(r'.', '')\n", " df[col] = df[col].str.replace(r',', '')\n", " return df\n", " # TR = Bu kod data type object olan verilerin buluyor ve onlarda istenmeyen işartetleri kaldırıyor.\n", " # EN = This code finds the data with data type object and removes the unwanted marks from them.\n", "\n", "df=algo_text(df)" ] }, { "cell_type": "code", "execution_count": 18, "id": "67cc79ae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'many people have car where they live the thing they dont know is that when you use a car alot of thing can happen\\xa0like you can get in accidet or\\xa0the smoke that the car has is bad to breath\\xa0on if someone is walk but in vaubangermany they dont have that proble because percent of vaubans families do not own carsand percent sold a car to move there street parkig driveways and home garages are forbidden\\xa0on the outskirts of freiburd that near the french and swiss borders you probaly wont see a car in vaubans streets because they are completely car free but\\xa0if some that lives in vauban that owns a car ownership is allowedbut there are only two places that you can park a large garages at the edge of the developmentwhere a car owner buys a space but it not cheap to buy one they sell the space for you car for along with a home the vauban people completed this in they said that this an example of a growing trend in europethe untile states and some where else are suburban life from auto use this is called smart planning the current efforts to drastically reduce greenhouse gas emissions from tailes the passengee cars are responsible for percent of greenhouse gas emissions in europe and up to percent in some car intensive in the united states i honeslty think that good idea that they did that is vaudan because that makes cities denser and better for walking and in vauban there are residents within a rectangular square mile in the artical david gold berg said that all of our development since world war has been centered on the carsand that will have to change and i think that was very true what david gold said because alot thing we need cars to do we can go anyway were with out cars beacuse some people are a very lazy to walk to place thats why they alot of people use car and i think that it was a good idea that that they did that in vauban so people can see how we really dont need car to go to place from place because we can walk from were we need to go or we can ride bycles with out the use of a car it good that they are doing that if you thik about your help the earth in way and thats a very good thing to in the united states the environmental protection agency is promoting what is called car reducedcommuntuntiesand the legislators are starting to actif cautiously maany experts expect pubic transport serving suburbs to play a much larger role in a new six years federal transportation bill to approved this year in previous bill percent of appropriations have by law gone to highways and only percent to other transports there many good reason why they should do this '" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['full_text'][0]" ] }, { "cell_type": "code", "execution_count": 19, "id": "1dcb914a", "metadata": {}, "outputs": [], "source": [ "def algo_text(df_test):\n", "\n", " for col in df.columns:\n", " if df[col].dtype=='object':\n", " df[col] = df[col].str.lower()\n", " df[col] = df[col].str.replace(r'[^\\w\\s]', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\n', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\d+', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\r', '', regex=True)\n", " df[col] = df[col].str.replace(r'\\\\', '')\n", " df[col] = df[col].str.replace(r'.', '')\n", " df[col] = df[col].str.replace(r',', '')\n", " return df\n", " # TR = Bu kod data type object olan verilerin buluyor ve onlarda istenmeyen işartetleri kaldırıyor.\n", " # EN = This code finds the data with data type object and removes the unwanted marks from them.\n", "\n", "df_test=algo_text(df_test)" ] }, { "cell_type": "raw", "id": "9e230725", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "1dc8a50d", "metadata": {}, "source": [ "## Duygu Analizi - Sentiment Analysis" ] }, { "cell_type": "markdown", "id": "09780faf", "metadata": {}, "source": [ "### Yorumların Olumlumu ya da Olumsuzmu Olduğunu Tespit Etme - Determining Whether Comments Have Death or Immortality" ] }, { "cell_type": "code", "execution_count": 20, "id": "39a27dea", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 4, 2, 1, 5, 6], dtype=int64)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['score'].unique()" ] }, { "cell_type": "code", "execution_count": 21, "id": "1b46f633", "metadata": {}, "outputs": [], "source": [ "df['sentiment']=df['score']\n", "df['sentiment']=df['sentiment'].replace([5,6],'olumlu')\n", "df['sentiment']=df['sentiment'].replace([1,2],'olumsuz')\n", "df['sentiment']=df['sentiment'].replace([3,4],'notr')\n", "# TR = sentiment diye yeni bir sütun oluşturup () sütünundaki verilere eşitledik. [5,4],'olumlu', [1,2],'olumsuz', [3],'notr'\n", "# EN = We created a new column called sentiment and set it equal to the data in the () column. [5,4],'positive', [1,2],'negative', [3],'neutral'" ] }, { "cell_type": "code", "execution_count": 22, "id": "792b59e0", "metadata": {}, "outputs": [], "source": [ "df=df[['score','full_text','sentiment']]" ] }, { "cell_type": "code", "execution_count": 23, "id": "f728c724", "metadata": {}, "outputs": [], "source": [ "df=df[(df['sentiment']=='olumlu')|(df['sentiment']=='notr')|(df['sentiment']=='olumsuz')]\n", "# TR = sentimentimizi olumlu ya da olumsuz ya da notr olacak şekilde tanımladık.\n", "# EN = We defined our sentiment as positive, negative or neutral." ] }, { "cell_type": "code", "execution_count": 24, "id": "ded4992a", "metadata": {}, "outputs": [], "source": [ "df.reset_index(drop=True,inplace=True)\n", "# TR = Yukarıda yaptığımız işlem neticesinde olumlu,notr,olumsuz kelimeler kendi içlerinde üst üste oldular. Bu yüzden indexlerini sıfırlayıp tekrar verdik.\n", "# EN = As a result of the process we did above, positive, neutral and negative words were placed on top of each other. That's why we reset their indexes and gave them again." ] }, { "cell_type": "code", "execution_count": 25, "id": "7720767e", "metadata": {}, "outputs": [], "source": [ "x=df['full_text']\n", "y=df['sentiment']" ] }, { "cell_type": "code", "execution_count": 26, "id": "eb3fe280", "metadata": {}, "outputs": [], "source": [ "yelpbw = df[(df.score == 1) | (df.score == 2) | (df.score == 3) | (df.score == 4) | (df.score == 5) | (df.score == 6)]" ] }, { "cell_type": "code", "execution_count": 27, "id": "aed587a2", "metadata": {}, "outputs": [], "source": [ "yelpbw.reset_index(drop=True,inplace=True)" ] }, { "cell_type": "code", "execution_count": 28, "id": "3ed2fe88", "metadata": {}, "outputs": [], "source": [ "vect=CountVectorizer(stop_words='english',ngram_range=(1,2))" ] }, { "cell_type": "code", "execution_count": 29, "id": "c2318c0d", "metadata": {}, "outputs": [], "source": [ "x=yelpbw[\"full_text\"]\n", "y=yelpbw[\"score\"]" ] }, { "cell_type": "code", "execution_count": 30, "id": "02adcc31", "metadata": {}, "outputs": [], "source": [ "vect=CountVectorizer()\n", "x=vect.fit_transform(x)" ] }, { "cell_type": "raw", "id": "58e06619", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "c5110c7a", "metadata": {}, "source": [ "## Verileri Görşelleştirme - Visualizing Data me" ] }, { "cell_type": "code", "execution_count": 31, "id": "9ad9a32c", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.countplot(data=df, x='score',palette='bright');\n", "# TR = Kategorik verilerin her bir sınıfındaki gözlem sayısını görselleştirmek için kullanılır ve her kategorinin frekansını çubuklarla gösterir.\n", "# EN = It is used to visualize the number of observations in each class of categorical data and shows the frequency of each category with bars.\n", "\n", "# TR = (data=df) Veriyi df adlı DataFrame alacak.\n", "# TR = (x='score') column değişkenindeki sütunları alıp yatay eksenine eşitleyecek\n", "\n", "# EN = (data=df) Will take the data from the DataFrame named df.\n", "# EN = (x='score') Will take the columns in the column variable and assign them to the x-axi" ] }, { "cell_type": "code", "execution_count": null, "id": "bf27b1c6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "e81a35ab", "metadata": {}, "source": [ "## En çok Sayıdaki Kelimeleri Kap İçine Alma - Containing the Most Numbered Words " ] }, { "cell_type": "code", "execution_count": 32, "id": "8dc1ec87", "metadata": {}, "outputs": [], "source": [ "#wc=wordcloud\n", "def wc(data,bgcolor):\n", " plt.figure(figsize=(10,10))\n", " # TR = Kabımızın boyutunu belirttik.\n", " # EN = We specified the size of our container.\n", " \n", " mask=np.array(Image.open('cloud.png'))\n", " # TR = Image.open ile resmimizi açtık. np.array resmi diziye çevirdik ve mask değişkenine atadık.\n", " # EN = We opened our image with Image.open. We converted the np.array image to an array and assigned it to the mask variable.\n", " \n", " wc=WordCloud(background_color=bgcolor,stopwords=STOPWORDS,mask=mask)\n", " # TR = Bir WordCloud tanımladık. Arka plan rengini bgcolor eşitledik. stopwords=STOPWORDS ile gereksiz kelimeleri atıp anahtar kelimeleri sakladık.\n", " # EN = We defined a WordCloud. We set the background color equal to bgcolor. We removed unnecessary words and kept keywords with stopwords=STOPWORDS\n", "\n", " # TR = mask=mask yukarıda tanımladığımız mask değişkenini kullan.\n", " # EN = mask=mask use the mask variable we defined above.\n", " \n", " wc.generate(''.join(data))\n", " # TR = .join(data) ile bütün sütündaki text alıp birleştirecek. \n", " # EN = With .join(data) it will take the text in all columns and combine them.\n", "\n", " # TR = İçinde geçen tüm kelimeleri sayacak ve hafızada tutup generate ile tanımladığımız WordCloud oluşturduk ona eşitleyecek. \n", " # EN = It will count all the words in it, keep it in memory and synchronize it with the WordCloud we created with generate.\n", " \n", " plt.imshow(wc)\n", " plt.axis('off')\n", " # TR = Bunla kod ile x ve y gözükmüyor.\n", " # EN = With this code, x and y do not appear." ] }, { "cell_type": "code", "execution_count": 33, "id": "8e9edd42", "metadata": {}, "outputs": [], "source": [ "olumlu = df[df['score'].isin([5, 6])]['full_text']\n", "nötr = df[df['score'].isin([3, 4])]['full_text']\n", "olumsuz = df[df['score'].isin([1, 2])]['full_text']\n", "# TR = 5 ya da 6 score olanları olumlu, 3 ya da 4 score olan nötr ve 1 ya da 2 score olanları olumsuz değişkenine eşitledik.\n", "# EN = We equated those with 5 or 6 score to positive, those with 3 or 4 score to neutral, and those with 1 or 2 score to negative." ] }, { "cell_type": "code", "execution_count": 34, "id": "59deb041", "metadata": {}, "outputs": [ { "ename": "FileNotFoundError", "evalue": "[Errno 2] No such file or directory: 'C:\\\\Users\\\\ErenK\\\\OneDrive\\\\Belgeler\\\\Yapay Zeka\\\\Proje\\\\Natural Language Processing (NLP) 1\\\\laboratuvar\\\\cloud.png'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[34], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m wc(olumlu,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mwhite\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", "Cell \u001b[1;32mIn[32], line 7\u001b[0m, in \u001b[0;36mwc\u001b[1;34m(data, bgcolor)\u001b[0m\n\u001b[0;32m 3\u001b[0m plt\u001b[38;5;241m.\u001b[39mfigure(figsize\u001b[38;5;241m=\u001b[39m(\u001b[38;5;241m10\u001b[39m,\u001b[38;5;241m10\u001b[39m))\n\u001b[0;32m 4\u001b[0m \u001b[38;5;66;03m# TR = Kabımızın boyutunu belirttik.\u001b[39;00m\n\u001b[0;32m 5\u001b[0m \u001b[38;5;66;03m# EN = We specified the size of our container.\u001b[39;00m\n\u001b[1;32m----> 7\u001b[0m mask\u001b[38;5;241m=\u001b[39mnp\u001b[38;5;241m.\u001b[39marray(Image\u001b[38;5;241m.\u001b[39mopen(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mcloud.png\u001b[39m\u001b[38;5;124m'\u001b[39m))\n\u001b[0;32m 8\u001b[0m \u001b[38;5;66;03m# TR = Image.open ile resmimizi açtık. np.array resmi diziye çevirdik ve mask değişkenine atadık.\u001b[39;00m\n\u001b[0;32m 9\u001b[0m \u001b[38;5;66;03m# EN = We opened our image with Image.open. We converted the np.array image to an array and assigned it to the mask variable.\u001b[39;00m\n\u001b[0;32m 11\u001b[0m wc\u001b[38;5;241m=\u001b[39mWordCloud(background_color\u001b[38;5;241m=\u001b[39mbgcolor,stopwords\u001b[38;5;241m=\u001b[39mSTOPWORDS,mask\u001b[38;5;241m=\u001b[39mmask)\n", "File \u001b[1;32m~\\anaconda3\\Lib\\site-packages\\PIL\\Image.py:3277\u001b[0m, in \u001b[0;36mopen\u001b[1;34m(fp, mode, formats)\u001b[0m\n\u001b[0;32m 3274\u001b[0m filename \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mrealpath(os\u001b[38;5;241m.\u001b[39mfspath(fp))\n\u001b[0;32m 3276\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m filename:\n\u001b[1;32m-> 3277\u001b[0m fp \u001b[38;5;241m=\u001b[39m builtins\u001b[38;5;241m.\u001b[39mopen(filename, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrb\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 3278\u001b[0m exclusive_fp \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[0;32m 3280\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n", "\u001b[1;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'C:\\\\Users\\\\ErenK\\\\OneDrive\\\\Belgeler\\\\Yapay Zeka\\\\Proje\\\\Natural Language Processing (NLP) 1\\\\laboratuvar\\\\cloud.png'" ] }, { "data": { "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "wc(olumlu,'white')" ] }, { "cell_type": "code", "execution_count": null, "id": "0083f2f8", "metadata": {}, "outputs": [], "source": [ "wc(nötr,'white')" ] }, { "cell_type": "code", "execution_count": null, "id": "2782ea94", "metadata": {}, "outputs": [], "source": [ "wc(olumsuz,'white')" ] }, { "cell_type": "code", "execution_count": null, "id": "0219a1f1", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3510eadb", "metadata": {}, "source": [ "## Öznitelik Mühendisliği - Feature Engineering" ] }, { "cell_type": "markdown", "id": "f3bfe2a0", "metadata": {}, "source": [ "### Model - Modelling" ] }, { "cell_type": "code", "execution_count": 40, "id": "c1956cce", "metadata": {}, "outputs": [], "source": [ "x = df['full_text']\n", "y = df['score'].values" ] }, { "cell_type": "code", "execution_count": 41, "id": "28aa7c6e", "metadata": {}, "outputs": [], "source": [ "vectorizer = TfidfVectorizer(max_features=1000)\n", "vectorizer.fit(x)\n", "x = vectorizer.transform(x)\n", "x_test = vectorizer.transform(df_test['full_text'])" ] }, { "cell_type": "code", "execution_count": 42, "id": "ffed7293", "metadata": {}, "outputs": [], "source": [ "x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.20,random_state=42)\n", "# TR = modelimizi eğittik. \n", "# EN = We trained our model." ] }, { "cell_type": "code", "execution_count": 44, "id": "92336248", "metadata": {}, "outputs": [], "source": [ "x = x.toarray() # Convert sparse matrix to dense\n", "x_test = x_test.toarray()" ] }, { "cell_type": "code", "execution_count": 39, "id": "9ca2d123-2dbc-44ba-b234-1106b4ab65f9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0. , 0.0229158 , 0.03214887, ..., 0. , 0. ,\n", " 0. ],\n", " [0. , 0. , 0.02156446, ..., 0.02787769, 0. ,\n", " 0. ],\n", " [0. , 0. , 0. , ..., 0. , 0. ,\n", " 0. ],\n", " ...,\n", " [0. , 0. , 0.02142796, ..., 0.02770123, 0. ,\n", " 0. ],\n", " [0. , 0. , 0. , ..., 0. , 0. ,\n", " 0. ],\n", " [0. , 0.18880245, 0.04414556, ..., 0. , 0.10770003,\n", " 0. ]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x_test" ] }, { "cell_type": "code", "execution_count": 45, "id": "ab4dc089", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m7s\u001b[0m 6ms/step - loss: 7.2721 - mae: 2.2782 - val_loss: 0.9428 - val_mae: 0.7806\n", "Epoch 2/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 7ms/step - loss: 1.4098 - mae: 0.9450 - val_loss: 0.6211 - val_mae: 0.6367\n", "Epoch 3/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 1.0417 - mae: 0.8096 - val_loss: 0.5800 - val_mae: 0.6160\n", "Epoch 4/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.8533 - mae: 0.7347 - val_loss: 0.5536 - val_mae: 0.5986\n", "Epoch 5/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 4ms/step - loss: 0.7583 - mae: 0.6895 - val_loss: 0.5348 - val_mae: 0.5896\n", "Epoch 6/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 4ms/step - loss: 0.6790 - mae: 0.6529 - val_loss: 0.5181 - val_mae: 0.5802\n", "Epoch 7/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.6534 - mae: 0.6433 - val_loss: 0.5078 - val_mae: 0.5724\n", "Epoch 8/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 6ms/step - loss: 0.6189 - mae: 0.6245 - val_loss: 0.5110 - val_mae: 0.5717\n", "Epoch 9/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.5741 - mae: 0.6003 - val_loss: 0.4986 - val_mae: 0.5653\n", "Epoch 10/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.5656 - mae: 0.5960 - val_loss: 0.5026 - val_mae: 0.5663\n", "Epoch 11/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 6ms/step - loss: 0.5473 - mae: 0.5845 - val_loss: 0.4964 - val_mae: 0.5610\n", "Epoch 12/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 6ms/step - loss: 0.5346 - mae: 0.5784 - val_loss: 0.5081 - val_mae: 0.5695\n", "Epoch 13/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 3ms/step - loss: 0.5184 - mae: 0.5717 - val_loss: 0.5176 - val_mae: 0.5781\n", "Epoch 14/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 3ms/step - loss: 0.5036 - mae: 0.5631 - val_loss: 0.5118 - val_mae: 0.5734\n", "Epoch 15/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 4ms/step - loss: 0.4888 - mae: 0.5567 - val_loss: 0.4971 - val_mae: 0.5618\n", "Epoch 16/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.4895 - mae: 0.5557 - val_loss: 0.5141 - val_mae: 0.5711\n", "Epoch 17/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 4ms/step - loss: 0.4646 - mae: 0.5386 - val_loss: 0.5111 - val_mae: 0.5715\n", "Epoch 18/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 3ms/step - loss: 0.4443 - mae: 0.5291 - val_loss: 0.5233 - val_mae: 0.5754\n", "Epoch 19/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 6ms/step - loss: 0.4356 - mae: 0.5183 - val_loss: 0.5177 - val_mae: 0.5753\n", "Epoch 20/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m9s\u001b[0m 15ms/step - loss: 0.4206 - mae: 0.5113 - val_loss: 0.5128 - val_mae: 0.5702\n", "Epoch 21/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.3988 - mae: 0.5011 - val_loss: 0.5229 - val_mae: 0.5807\n", "Epoch 22/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 4ms/step - loss: 0.3920 - mae: 0.4946 - val_loss: 0.5093 - val_mae: 0.5683\n", "Epoch 23/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 5ms/step - loss: 0.3941 - mae: 0.4915 - val_loss: 0.5084 - val_mae: 0.5691\n", "Epoch 24/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m10s\u001b[0m 21ms/step - loss: 0.3832 - mae: 0.4887 - val_loss: 0.5160 - val_mae: 0.5727\n", "Epoch 25/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m6s\u001b[0m 13ms/step - loss: 0.3688 - mae: 0.4800 - val_loss: 0.5474 - val_mae: 0.5928\n", "Epoch 26/100\n", "\u001b[1m433/433\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m5s\u001b[0m 11ms/step - loss: 0.3582 - mae: 0.4729 - val_loss: 0.5224 - val_mae: 0.5793\n" ] } ], "source": [ "model=Sequential()\n", "model.add(Dense(128,activation='relu',input_dim=x_train.shape[1]))\n", "# TR = Bu katman, tüm giriş nöronlarına bağlantı kurar ve her nöronun ağırlıklarını öğrenir. 256 nöron var.\n", "# Aktivasyon fonksiyonunu ReLU (Rectified Linear Unit) olarak ayarlar. ReLU fonksiyonu, negatif değerleri sıfıra dönüştürür ve pozitif değerleri olduğu gibi bırakır.\n", "# EN = This layer connects all input neurons and learns the weights of each neuron. There are 256 neurons.\n", "# Sets the activation function to ReLU (Rectified Linear Unit). The ReLU function converts negative values ​​to zero and leaves positive values ​​as is.\n", "\n", "model.add(BatchNormalization())\n", "# TR = Bu katman, modelin eğitim sürecini daha stabil hale getirmek için kullanılır.\n", "# EN = This layer is used to make the training process of the model more stable.\n", "\n", "model.add(Dropout(0.5))\n", "# TR = Derin öğrenme modelinde aşırı uyumu (overfitting) azaltmak için kullanılır. Genelde 0.2 ile 0.5 arasında olur.\n", "# EN = It is used to reduce overfitting in the deep learning model. It is generally between 0.2 and 0.5.\n", "\n", "model.add(Dense(64,activation='relu'))\n", "model.add(BatchNormalization())\n", "model.add(Dropout(0.2))\n", "model.add(Dense(32,activation='relu'))\n", "model.add(BatchNormalization())\n", "model.add(Dropout(0.2))\n", "\n", "model.add(Dense(1, activation='linear'))\n", "# TR = Regresyon görevleri için lineer aktivasyon kullanıyoruz.\n", "# EN = We use linear activation for regression tasks. \n", "\n", "model.compile(loss='mse', optimizer='adam', metrics=['mae']) \n", "# TR = Modelin kayıp fonksiyonu olarak 'mse' (ortalama kare hatası), optimizer olarak 'adam' ve performans metriği olarak 'mae' (ortalama mutlak hata) kullanılarak derlenmesini sağlar \n", "# EN = Compiles the model using 'mse' (mean squared error) as the loss function, 'adam' as the optimizer, and 'mae' (mean absolute error) as the performance metric\n", "\n", "early_stopping = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)\n", "# TR = EarlyStopping ekleyin: Eğitim sırasında model performansı iyileşmediğinde erken durması için kullanıyoruz.\n", "# EN = Add EarlyStopping: We use it to stop early when model performance does not improve during training.\n", "\n", "# TR = val_loss 10 epoch boyunca iyileşmezse eğitimi durduruyor ve en iyi ağırlıkları geri yüklüyor.\n", "# EN = If val_loss does not improve for 10 epochs, it stops training and restores the best weights.\n", "\n", "history=model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=32, epochs=100, callbacks=[early_stopping])\n", "# TR = Modeli 100 epoch boyunca eğitiyoruz, fakat EarlyStopping ile durdurulabilir. Batch boyutu 32 olarak belirlenmiş.\n", "# EN = We train the model for 100 epochs, but it can be stopped with EarlyStopping. Batch size is set to 32." ] }, { "cell_type": "code", "execution_count": 46, "id": "2698a26b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Model: \"sequential\"\n",
       "
\n" ], "text/plain": [ "\u001b[1mModel: \"sequential\"\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n",
       "┃ Layer (type)                          Output Shape                         Param # ┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n",
       "│ dense (Dense)                        │ (None, 128)                 │         128,128 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ batch_normalization                  │ (None, 128)                 │             512 │\n",
       "│ (BatchNormalization)                 │                             │                 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dropout (Dropout)                    │ (None, 128)                 │               0 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dense_1 (Dense)                      │ (None, 64)                  │           8,256 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ batch_normalization_1                │ (None, 64)                  │             256 │\n",
       "│ (BatchNormalization)                 │                             │                 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dropout_1 (Dropout)                  │ (None, 64)                  │               0 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dense_2 (Dense)                      │ (None, 32)                  │           2,080 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ batch_normalization_2                │ (None, 32)                  │             128 │\n",
       "│ (BatchNormalization)                 │                             │                 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dropout_2 (Dropout)                  │ (None, 32)                  │               0 │\n",
       "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n",
       "│ dense_3 (Dense)                      │ (None, 1)                   │              33 │\n",
       "└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘\n",
       "
\n" ], "text/plain": [ "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓\n", "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩\n", "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m128,128\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ batch_normalization │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m512\u001b[0m │\n", "│ (\u001b[38;5;33mBatchNormalization\u001b[0m) │ │ │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m8,256\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ batch_normalization_1 │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m256\u001b[0m │\n", "│ (\u001b[38;5;33mBatchNormalization\u001b[0m) │ │ │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dropout_1 (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m32\u001b[0m) │ \u001b[38;5;34m2,080\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ batch_normalization_2 │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m32\u001b[0m) │ \u001b[38;5;34m128\u001b[0m │\n", "│ (\u001b[38;5;33mBatchNormalization\u001b[0m) │ │ │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dropout_2 (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m32\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", "├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤\n", "│ dense_3 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m1\u001b[0m) │ \u001b[38;5;34m33\u001b[0m │\n", "└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Total params: 417,285 (1.59 MB)\n",
       "
\n" ], "text/plain": [ "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m417,285\u001b[0m (1.59 MB)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Trainable params: 138,945 (542.75 KB)\n",
       "
\n" ], "text/plain": [ "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m138,945\u001b[0m (542.75 KB)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Non-trainable params: 448 (1.75 KB)\n",
       "
\n" ], "text/plain": [ "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m448\u001b[0m (1.75 KB)\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
 Optimizer params: 277,892 (1.06 MB)\n",
       "
\n" ], "text/plain": [ "\u001b[1m Optimizer params: \u001b[0m\u001b[38;5;34m277,892\u001b[0m (1.06 MB)\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": 43, "id": "89b83035", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m109/109\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 656us/step - loss: 0.5064 - mae: 0.5738\n", "Test doğruluğu: 0.5679\n" ] } ], "source": [ "test_loss, test_acc = model.evaluate(x_test, y_test)\n", "# TR = test_loss değişkeni, test verileri üzerinde hesaplanan kayıp değerini içerir. test_acc değişkeni, test verileri üzerinde hesaplanan doğruluk değerini içerir.\n", "# EN = The test_loss variable contains the loss value calculated on the test data. The test_acc variable contains the accuracy value calculated on the test data.\n", "\n", "print(f\"Test doğruluğu: {test_acc:.4f}\")" ] }, { "cell_type": "code", "execution_count": 44, "id": "ab86c012", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m109/109\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 2ms/step\n" ] } ], "source": [ "pred=model.predict(x_test)\n", "# TR = modeli x_test ile predict özelliği ile tahmin ettik. predict=tahmin demek. Dahmin edip pred eşitledik. \n", "# EN = We predicted the model with x_test and the predict feature. predict=means prediction. We guessed and equalized the pred." ] }, { "cell_type": "code", "execution_count": 45, "id": "a7d76d61", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5477825403213501" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r2_score(y_test,pred) \n", "# TR = Bunu gerçek(y_test) değer ile tahmin(pred) edilen değerleri karşılaştır ve arasındaki farkı bul. \n", "# EN = Compare this with the actual (y_test) value and the predicted (pred) values ​​and find the difference between them." ] }, { "cell_type": "code", "execution_count": 46, "id": "22688e54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.7063537801134147" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_squared_error(y_test,pred)**.5 \n", "# TR = Burada, Root Mean Square Error bulduk. Bunu gerçek(y_test) değer ile tahmin(pred) edilen değerleri karşılaştır arasındaki farkı bul ve **.5 ile karekökünü al.\n", "# EN = Here, we found Root Mean Square Error. Compare this with the actual (y_test) value and the predicted (pred) values, find the difference and take the square root of **.5." ] }, { "cell_type": "code", "execution_count": 47, "id": "fbc114cf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.7063537801134147" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_squared_error(y_test,pred)**.5 \n", "# TR = Burada, Root Mean Square Error bulduk. Bunu gerçek(y_test) değer ile tahmin(pred) edilen değerleri karşılaştır arasındaki farkı bul ve **.5 ile karekökünü al.\n", "# EN = Here, we found Root Mean Square Error. Compare this with the actual (y_test) value and the predicted (pred) values, find the difference and take the square root of **.5." ] }, { "cell_type": "code", "execution_count": 48, "id": "b460b159", "metadata": {}, "outputs": [], "source": [ "loss_f=pd.DataFrame(history.history)" ] }, { "cell_type": "code", "execution_count": 49, "id": "1b964eab", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "loss_f.plot();" ] }, { "cell_type": "code", "execution_count": 47, "id": "cfc42c0b-0495-40de-bb22-a859dbb6ded6", "metadata": {}, "outputs": [], "source": [ "with open('laboratuvar_model.pkl', 'wb') as f:\n", " pickle.dump(model, f)\n", "\n", "with open('laboratuvar_vectorizer.pkl', 'wb') as f:\n", " pickle.dump(vectorizer, f)" ] } ], "metadata": { "kaggle": { "accelerator": "none", "dataSources": [ { "databundleVersionId": 8059942, "sourceId": 71485, "sourceType": "competition" } ], "dockerImageVersionId": 30762, "isGpuEnabled": false, "isInternetEnabled": true, "language": "python", "sourceType": "notebook" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }