diff --git "a/Machine_Learning_Problem_Framing_HCK_015.ipynb" "b/Machine_Learning_Problem_Framing_HCK_015.ipynb"
new file mode 100644--- /dev/null
+++ "b/Machine_Learning_Problem_Framing_HCK_015.ipynb"
@@ -0,0 +1,5998 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#1. Perkenalan\n",
+ "\n",
+ ">Bab pengenalan harus diisi dengan identitas, gambaran besar dataset yang digunakan, dan objective yang ingin dicapai.\n",
+ "\n",
+ "Nama: Hana\n",
+ "\n",
+ "Batch: 015-HCK\n",
+ "\n",
+ "Objective: Menurut laporan FIFA 2022 (...), jumlah pemain sepakbola pada tahun 2021 kurang lebih sebanyak 100.000 pemain. Namun, dalam dataset ini yang digunakan hanya mencakup 20.000 pemain saja. Project ini bertujuan untuk memprediksi rating pemain FIFA 2023 sehingga semua pemain sepak bola profesional dapat diketahui ratingnya dan tidak menutup kemungkinan akan lahir wonderkid baru. Project ini akan dibuat menggunakan algoritma Linear Regression dan metrics evaluasi yang akan dipakai adalah MAE."
+ ],
+ "metadata": {
+ "id": "y5lsr3ymhTq6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#2. Import Libraries\n",
+ "\n",
+ "> Cell pertama pada notebook harus berisi dan hanya berisi semua library yang digunakan dalam project."
+ ],
+ "metadata": {
+ "id": "knpqjaesitZl"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "_ZVuHRuNYo8a"
+ },
+ "outputs": [],
+ "source": [
+ "#Import Libraries\n",
+ "\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 3. Data Loading\n",
+ "\n",
+ "> Bagian ini berisi proses penyiapan data sebelum dilakukan eksplorasi data lebih lanjut. Proses Data Loading dapat berupa memberi nama baru untuk setiap kolom, mengecek ukuran dataset, dll.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "5wGZa7GFjBOE"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Data loading\n",
+ "\n",
+ "data = pd.read_csv('https://raw.githubusercontent.com/FTDS-learning-materials/phase-1/master/w1/P1W1D1PM%20-%20Machine%20Learning%20Problem%20Framing.csv')\n",
+ "data"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 496
+ },
+ "id": "lmHY3ZcLjO_B",
+ "outputId": "8e34597a-d0d2-4bf2-cea2-f7baa1f49981"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight ValueEUR AttackingWorkRate \\\n",
+ "0 L. Messi 34 170 72 78000000 Medium \n",
+ "1 R. Lewandowski 32 185 81 119500000 High \n",
+ "2 Cristiano Ronaldo 36 187 83 45000000 High \n",
+ "3 K. Mbappé 22 182 73 194000000 High \n",
+ "4 J. Oblak 28 188 87 112000000 Medium \n",
+ "... ... ... ... ... ... ... \n",
+ "19255 S. Black 19 180 75 100000 Medium \n",
+ "19256 Ma Zhen 23 196 85 50000 Medium \n",
+ "19257 Yang Haoyu 20 183 77 90000 Medium \n",
+ "19258 He Siwei 20 174 69 100000 Medium \n",
+ "19259 Chen Guoliang 22 186 70 70000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal \\\n",
+ "0 Low 85 92 91 \n",
+ "1 Medium 78 92 79 \n",
+ "2 Low 87 94 80 \n",
+ "3 Low 97 88 80 \n",
+ "4 Medium 87 92 78 \n",
+ "... ... ... ... ... \n",
+ "19255 Medium 56 27 29 \n",
+ "19256 Medium 49 47 45 \n",
+ "19257 Medium 57 26 29 \n",
+ "19258 Medium 61 25 32 \n",
+ "19259 Medium 55 27 29 \n",
+ "\n",
+ " DribblingTotal DefendingTotal PhysicalityTotal Overall \n",
+ "0 95 34 65 93 \n",
+ "1 85 44 82 92 \n",
+ "2 87 34 75 91 \n",
+ "3 92 36 77 91 \n",
+ "4 90 52 90 91 \n",
+ "... ... ... ... ... \n",
+ "19255 33 48 53 48 \n",
+ "19256 46 54 44 48 \n",
+ "19257 28 51 56 48 \n",
+ "19258 32 49 51 48 \n",
+ "19259 30 50 54 48 \n",
+ "\n",
+ "[19260 rows x 14 columns]"
+ ],
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " ValueEUR \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " Overall \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " L. Messi \n",
+ " 34 \n",
+ " 170 \n",
+ " 72 \n",
+ " 78000000 \n",
+ " Medium \n",
+ " Low \n",
+ " 85 \n",
+ " 92 \n",
+ " 91 \n",
+ " 95 \n",
+ " 34 \n",
+ " 65 \n",
+ " 93 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " R. Lewandowski \n",
+ " 32 \n",
+ " 185 \n",
+ " 81 \n",
+ " 119500000 \n",
+ " High \n",
+ " Medium \n",
+ " 78 \n",
+ " 92 \n",
+ " 79 \n",
+ " 85 \n",
+ " 44 \n",
+ " 82 \n",
+ " 92 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Cristiano Ronaldo \n",
+ " 36 \n",
+ " 187 \n",
+ " 83 \n",
+ " 45000000 \n",
+ " High \n",
+ " Low \n",
+ " 87 \n",
+ " 94 \n",
+ " 80 \n",
+ " 87 \n",
+ " 34 \n",
+ " 75 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " K. Mbappé \n",
+ " 22 \n",
+ " 182 \n",
+ " 73 \n",
+ " 194000000 \n",
+ " High \n",
+ " Low \n",
+ " 97 \n",
+ " 88 \n",
+ " 80 \n",
+ " 92 \n",
+ " 36 \n",
+ " 77 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " J. Oblak \n",
+ " 28 \n",
+ " 188 \n",
+ " 87 \n",
+ " 112000000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 87 \n",
+ " 92 \n",
+ " 78 \n",
+ " 90 \n",
+ " 52 \n",
+ " 90 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 19255 \n",
+ " S. Black \n",
+ " 19 \n",
+ " 180 \n",
+ " 75 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 56 \n",
+ " 27 \n",
+ " 29 \n",
+ " 33 \n",
+ " 48 \n",
+ " 53 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19256 \n",
+ " Ma Zhen \n",
+ " 23 \n",
+ " 196 \n",
+ " 85 \n",
+ " 50000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 49 \n",
+ " 47 \n",
+ " 45 \n",
+ " 46 \n",
+ " 54 \n",
+ " 44 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19257 \n",
+ " Yang Haoyu \n",
+ " 20 \n",
+ " 183 \n",
+ " 77 \n",
+ " 90000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 57 \n",
+ " 26 \n",
+ " 29 \n",
+ " 28 \n",
+ " 51 \n",
+ " 56 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19258 \n",
+ " He Siwei \n",
+ " 20 \n",
+ " 174 \n",
+ " 69 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 61 \n",
+ " 25 \n",
+ " 32 \n",
+ " 32 \n",
+ " 49 \n",
+ " 51 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19259 \n",
+ " Chen Guoliang \n",
+ " 22 \n",
+ " 186 \n",
+ " 70 \n",
+ " 70000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 55 \n",
+ " 27 \n",
+ " 29 \n",
+ " 30 \n",
+ " 50 \n",
+ " 54 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
19260 rows × 14 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "data",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 19260,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 18058,\n \"samples\": [\n \"R. Bouallak\",\n \"M. Beier\",\n \"D. Peri\\u0107\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 16,\n \"max\": 54,\n \"num_unique_values\": 29,\n \"samples\": [\n 42,\n 23,\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 155,\n \"max\": 206,\n \"num_unique_values\": 50,\n \"samples\": [\n 191,\n 161,\n 186\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 49,\n \"max\": 110,\n \"num_unique_values\": 57,\n \"samples\": [\n 72,\n 70,\n 77\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ValueEUR\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7604532,\n \"min\": 0,\n \"max\": 194000000,\n \"num_unique_values\": 252,\n \"samples\": [\n 3700000,\n 129000000,\n 31500000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Low\",\n \"Medium\",\n \"High\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 10,\n \"min\": 28,\n \"max\": 97,\n \"num_unique_values\": 70,\n \"samples\": [\n 79,\n 85,\n 32\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13,\n \"min\": 18,\n \"max\": 94,\n \"num_unique_values\": 76,\n \"samples\": [\n 83,\n 58,\n 89\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 93,\n \"num_unique_values\": 67,\n \"samples\": [\n 61,\n 89,\n 93\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 26,\n \"max\": 95,\n \"num_unique_values\": 69,\n \"samples\": [\n 68,\n 95,\n 51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 14,\n \"max\": 91,\n \"num_unique_values\": 77,\n \"samples\": [\n 64,\n 78,\n 43\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 29,\n \"max\": 92,\n \"num_unique_values\": 62,\n \"samples\": [\n 43,\n 38,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Overall\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 48,\n \"max\": 93,\n \"num_unique_values\": 46,\n \"samples\": [\n 54,\n 68,\n 67\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Duplicate Dataset\n",
+ "data_duplicate = data.copy()"
+ ],
+ "metadata": {
+ "id": "sQroZhBXjmFw"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Rename columns\n",
+ "\n",
+ "data.rename(columns = {'ValueEUR' : 'Price', 'Overall' : 'Rating'}, inplace = True)\n",
+ "data"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 496
+ },
+ "id": "DP_MuhlqjWZ1",
+ "outputId": "a1582996-c9b0-488c-941d-1876485d1268"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight Price AttackingWorkRate \\\n",
+ "0 L. Messi 34 170 72 78000000 Medium \n",
+ "1 R. Lewandowski 32 185 81 119500000 High \n",
+ "2 Cristiano Ronaldo 36 187 83 45000000 High \n",
+ "3 K. Mbappé 22 182 73 194000000 High \n",
+ "4 J. Oblak 28 188 87 112000000 Medium \n",
+ "... ... ... ... ... ... ... \n",
+ "19255 S. Black 19 180 75 100000 Medium \n",
+ "19256 Ma Zhen 23 196 85 50000 Medium \n",
+ "19257 Yang Haoyu 20 183 77 90000 Medium \n",
+ "19258 He Siwei 20 174 69 100000 Medium \n",
+ "19259 Chen Guoliang 22 186 70 70000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal \\\n",
+ "0 Low 85 92 91 \n",
+ "1 Medium 78 92 79 \n",
+ "2 Low 87 94 80 \n",
+ "3 Low 97 88 80 \n",
+ "4 Medium 87 92 78 \n",
+ "... ... ... ... ... \n",
+ "19255 Medium 56 27 29 \n",
+ "19256 Medium 49 47 45 \n",
+ "19257 Medium 57 26 29 \n",
+ "19258 Medium 61 25 32 \n",
+ "19259 Medium 55 27 29 \n",
+ "\n",
+ " DribblingTotal DefendingTotal PhysicalityTotal Rating \n",
+ "0 95 34 65 93 \n",
+ "1 85 44 82 92 \n",
+ "2 87 34 75 91 \n",
+ "3 92 36 77 91 \n",
+ "4 90 52 90 91 \n",
+ "... ... ... ... ... \n",
+ "19255 33 48 53 48 \n",
+ "19256 46 54 44 48 \n",
+ "19257 28 51 56 48 \n",
+ "19258 32 49 51 48 \n",
+ "19259 30 50 54 48 \n",
+ "\n",
+ "[19260 rows x 14 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " Rating \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " L. Messi \n",
+ " 34 \n",
+ " 170 \n",
+ " 72 \n",
+ " 78000000 \n",
+ " Medium \n",
+ " Low \n",
+ " 85 \n",
+ " 92 \n",
+ " 91 \n",
+ " 95 \n",
+ " 34 \n",
+ " 65 \n",
+ " 93 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " R. Lewandowski \n",
+ " 32 \n",
+ " 185 \n",
+ " 81 \n",
+ " 119500000 \n",
+ " High \n",
+ " Medium \n",
+ " 78 \n",
+ " 92 \n",
+ " 79 \n",
+ " 85 \n",
+ " 44 \n",
+ " 82 \n",
+ " 92 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Cristiano Ronaldo \n",
+ " 36 \n",
+ " 187 \n",
+ " 83 \n",
+ " 45000000 \n",
+ " High \n",
+ " Low \n",
+ " 87 \n",
+ " 94 \n",
+ " 80 \n",
+ " 87 \n",
+ " 34 \n",
+ " 75 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " K. Mbappé \n",
+ " 22 \n",
+ " 182 \n",
+ " 73 \n",
+ " 194000000 \n",
+ " High \n",
+ " Low \n",
+ " 97 \n",
+ " 88 \n",
+ " 80 \n",
+ " 92 \n",
+ " 36 \n",
+ " 77 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " J. Oblak \n",
+ " 28 \n",
+ " 188 \n",
+ " 87 \n",
+ " 112000000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 87 \n",
+ " 92 \n",
+ " 78 \n",
+ " 90 \n",
+ " 52 \n",
+ " 90 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 19255 \n",
+ " S. Black \n",
+ " 19 \n",
+ " 180 \n",
+ " 75 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 56 \n",
+ " 27 \n",
+ " 29 \n",
+ " 33 \n",
+ " 48 \n",
+ " 53 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19256 \n",
+ " Ma Zhen \n",
+ " 23 \n",
+ " 196 \n",
+ " 85 \n",
+ " 50000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 49 \n",
+ " 47 \n",
+ " 45 \n",
+ " 46 \n",
+ " 54 \n",
+ " 44 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19257 \n",
+ " Yang Haoyu \n",
+ " 20 \n",
+ " 183 \n",
+ " 77 \n",
+ " 90000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 57 \n",
+ " 26 \n",
+ " 29 \n",
+ " 28 \n",
+ " 51 \n",
+ " 56 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19258 \n",
+ " He Siwei \n",
+ " 20 \n",
+ " 174 \n",
+ " 69 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 61 \n",
+ " 25 \n",
+ " 32 \n",
+ " 32 \n",
+ " 49 \n",
+ " 51 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19259 \n",
+ " Chen Guoliang \n",
+ " 22 \n",
+ " 186 \n",
+ " 70 \n",
+ " 70000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 55 \n",
+ " 27 \n",
+ " 29 \n",
+ " 30 \n",
+ " 50 \n",
+ " 54 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
19260 rows × 14 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "data",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 19260,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 18058,\n \"samples\": [\n \"R. Bouallak\",\n \"M. Beier\",\n \"D. Peri\\u0107\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 16,\n \"max\": 54,\n \"num_unique_values\": 29,\n \"samples\": [\n 42,\n 23,\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 155,\n \"max\": 206,\n \"num_unique_values\": 50,\n \"samples\": [\n 191,\n 161,\n 186\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 49,\n \"max\": 110,\n \"num_unique_values\": 57,\n \"samples\": [\n 72,\n 70,\n 77\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7604532,\n \"min\": 0,\n \"max\": 194000000,\n \"num_unique_values\": 252,\n \"samples\": [\n 3700000,\n 129000000,\n 31500000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Low\",\n \"Medium\",\n \"High\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 10,\n \"min\": 28,\n \"max\": 97,\n \"num_unique_values\": 70,\n \"samples\": [\n 79,\n 85,\n 32\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13,\n \"min\": 18,\n \"max\": 94,\n \"num_unique_values\": 76,\n \"samples\": [\n 83,\n 58,\n 89\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 93,\n \"num_unique_values\": 67,\n \"samples\": [\n 61,\n 89,\n 93\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 26,\n \"max\": 95,\n \"num_unique_values\": 69,\n \"samples\": [\n 68,\n 95,\n 51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 14,\n \"max\": 91,\n \"num_unique_values\": 77,\n \"samples\": [\n 64,\n 78,\n 43\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 29,\n \"max\": 92,\n \"num_unique_values\": 62,\n \"samples\": [\n 43,\n 38,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 48,\n \"max\": 93,\n \"num_unique_values\": 46,\n \"samples\": [\n 54,\n 68,\n 67\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#cek head&tail\n",
+ "\n",
+ "data.head()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 261
+ },
+ "id": "hDTse1ELkIxv",
+ "outputId": "33da85f6-fb7d-4dbf-cf6c-b6b739507595"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight Price AttackingWorkRate \\\n",
+ "0 L. Messi 34 170 72 78000000 Medium \n",
+ "1 R. Lewandowski 32 185 81 119500000 High \n",
+ "2 Cristiano Ronaldo 36 187 83 45000000 High \n",
+ "3 K. Mbappé 22 182 73 194000000 High \n",
+ "4 J. Oblak 28 188 87 112000000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal DribblingTotal \\\n",
+ "0 Low 85 92 91 95 \n",
+ "1 Medium 78 92 79 85 \n",
+ "2 Low 87 94 80 87 \n",
+ "3 Low 97 88 80 92 \n",
+ "4 Medium 87 92 78 90 \n",
+ "\n",
+ " DefendingTotal PhysicalityTotal Rating \n",
+ "0 34 65 93 \n",
+ "1 44 82 92 \n",
+ "2 34 75 91 \n",
+ "3 36 77 91 \n",
+ "4 52 90 91 "
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " Rating \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " L. Messi \n",
+ " 34 \n",
+ " 170 \n",
+ " 72 \n",
+ " 78000000 \n",
+ " Medium \n",
+ " Low \n",
+ " 85 \n",
+ " 92 \n",
+ " 91 \n",
+ " 95 \n",
+ " 34 \n",
+ " 65 \n",
+ " 93 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " R. Lewandowski \n",
+ " 32 \n",
+ " 185 \n",
+ " 81 \n",
+ " 119500000 \n",
+ " High \n",
+ " Medium \n",
+ " 78 \n",
+ " 92 \n",
+ " 79 \n",
+ " 85 \n",
+ " 44 \n",
+ " 82 \n",
+ " 92 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Cristiano Ronaldo \n",
+ " 36 \n",
+ " 187 \n",
+ " 83 \n",
+ " 45000000 \n",
+ " High \n",
+ " Low \n",
+ " 87 \n",
+ " 94 \n",
+ " 80 \n",
+ " 87 \n",
+ " 34 \n",
+ " 75 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " K. Mbappé \n",
+ " 22 \n",
+ " 182 \n",
+ " 73 \n",
+ " 194000000 \n",
+ " High \n",
+ " Low \n",
+ " 97 \n",
+ " 88 \n",
+ " 80 \n",
+ " 92 \n",
+ " 36 \n",
+ " 77 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " J. Oblak \n",
+ " 28 \n",
+ " 188 \n",
+ " 87 \n",
+ " 112000000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 87 \n",
+ " 92 \n",
+ " 78 \n",
+ " 90 \n",
+ " 52 \n",
+ " 90 \n",
+ " 91 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "data",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 19260,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 18058,\n \"samples\": [\n \"R. Bouallak\",\n \"M. Beier\",\n \"D. Peri\\u0107\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 16,\n \"max\": 54,\n \"num_unique_values\": 29,\n \"samples\": [\n 42,\n 23,\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 155,\n \"max\": 206,\n \"num_unique_values\": 50,\n \"samples\": [\n 191,\n 161,\n 186\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 49,\n \"max\": 110,\n \"num_unique_values\": 57,\n \"samples\": [\n 72,\n 70,\n 77\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7604532,\n \"min\": 0,\n \"max\": 194000000,\n \"num_unique_values\": 252,\n \"samples\": [\n 3700000,\n 129000000,\n 31500000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Low\",\n \"Medium\",\n \"High\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 10,\n \"min\": 28,\n \"max\": 97,\n \"num_unique_values\": 70,\n \"samples\": [\n 79,\n 85,\n 32\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13,\n \"min\": 18,\n \"max\": 94,\n \"num_unique_values\": 76,\n \"samples\": [\n 83,\n 58,\n 89\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 93,\n \"num_unique_values\": 67,\n \"samples\": [\n 61,\n 89,\n 93\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 26,\n \"max\": 95,\n \"num_unique_values\": 69,\n \"samples\": [\n 68,\n 95,\n 51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 14,\n \"max\": 91,\n \"num_unique_values\": 77,\n \"samples\": [\n 64,\n 78,\n 43\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 29,\n \"max\": 92,\n \"num_unique_values\": 62,\n \"samples\": [\n 43,\n 38,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 48,\n \"max\": 93,\n \"num_unique_values\": 46,\n \"samples\": [\n 54,\n 68,\n 67\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "data.tail()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 261
+ },
+ "id": "pw2njKqXkFGX",
+ "outputId": "264d455f-9640-422e-a5f6-521b20708d45"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight Price AttackingWorkRate \\\n",
+ "19255 S. Black 19 180 75 100000 Medium \n",
+ "19256 Ma Zhen 23 196 85 50000 Medium \n",
+ "19257 Yang Haoyu 20 183 77 90000 Medium \n",
+ "19258 He Siwei 20 174 69 100000 Medium \n",
+ "19259 Chen Guoliang 22 186 70 70000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal \\\n",
+ "19255 Medium 56 27 29 \n",
+ "19256 Medium 49 47 45 \n",
+ "19257 Medium 57 26 29 \n",
+ "19258 Medium 61 25 32 \n",
+ "19259 Medium 55 27 29 \n",
+ "\n",
+ " DribblingTotal DefendingTotal PhysicalityTotal Rating \n",
+ "19255 33 48 53 48 \n",
+ "19256 46 54 44 48 \n",
+ "19257 28 51 56 48 \n",
+ "19258 32 49 51 48 \n",
+ "19259 30 50 54 48 "
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " Rating \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 19255 \n",
+ " S. Black \n",
+ " 19 \n",
+ " 180 \n",
+ " 75 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 56 \n",
+ " 27 \n",
+ " 29 \n",
+ " 33 \n",
+ " 48 \n",
+ " 53 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19256 \n",
+ " Ma Zhen \n",
+ " 23 \n",
+ " 196 \n",
+ " 85 \n",
+ " 50000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 49 \n",
+ " 47 \n",
+ " 45 \n",
+ " 46 \n",
+ " 54 \n",
+ " 44 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19257 \n",
+ " Yang Haoyu \n",
+ " 20 \n",
+ " 183 \n",
+ " 77 \n",
+ " 90000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 57 \n",
+ " 26 \n",
+ " 29 \n",
+ " 28 \n",
+ " 51 \n",
+ " 56 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19258 \n",
+ " He Siwei \n",
+ " 20 \n",
+ " 174 \n",
+ " 69 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 61 \n",
+ " 25 \n",
+ " 32 \n",
+ " 32 \n",
+ " 49 \n",
+ " 51 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ " 19259 \n",
+ " Chen Guoliang \n",
+ " 22 \n",
+ " 186 \n",
+ " 70 \n",
+ " 70000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 55 \n",
+ " 27 \n",
+ " 29 \n",
+ " 30 \n",
+ " 50 \n",
+ " 54 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Ma Zhen\",\n \"Chen Guoliang\",\n \"Yang Haoyu\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 19,\n \"max\": 23,\n \"num_unique_values\": 4,\n \"samples\": [\n 23,\n 22,\n 19\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 8,\n \"min\": 174,\n \"max\": 196,\n \"num_unique_values\": 5,\n \"samples\": [\n 196,\n 186,\n 183\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 69,\n \"max\": 85,\n \"num_unique_values\": 5,\n \"samples\": [\n 85,\n 70,\n 77\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21679,\n \"min\": 50000,\n \"max\": 100000,\n \"num_unique_values\": 4,\n \"samples\": [\n 50000,\n 70000,\n 100000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Medium\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Medium\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 49,\n \"max\": 61,\n \"num_unique_values\": 5,\n \"samples\": [\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 47,\n \"num_unique_values\": 4,\n \"samples\": [\n 47\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 29,\n \"max\": 45,\n \"num_unique_values\": 3,\n \"samples\": [\n 29\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 28,\n \"max\": 46,\n \"num_unique_values\": 5,\n \"samples\": [\n 46\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2,\n \"min\": 48,\n \"max\": 54,\n \"num_unique_values\": 5,\n \"samples\": [\n 54\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 44,\n \"max\": 56,\n \"num_unique_values\": 5,\n \"samples\": [\n 44\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 48,\n \"max\": 48,\n \"num_unique_values\": 1,\n \"samples\": [\n 48\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "data.info()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ulBl25AikfZs",
+ "outputId": "93200c90-f1a4-4540-9d5f-2b318c01d456"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "RangeIndex: 19260 entries, 0 to 19259\n",
+ "Data columns (total 14 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 Name 19260 non-null object\n",
+ " 1 Age 19260 non-null int64 \n",
+ " 2 Height 19260 non-null int64 \n",
+ " 3 Weight 19260 non-null int64 \n",
+ " 4 Price 19260 non-null int64 \n",
+ " 5 AttackingWorkRate 19260 non-null object\n",
+ " 6 DefensiveWorkRate 19260 non-null object\n",
+ " 7 PaceTotal 19260 non-null int64 \n",
+ " 8 ShootingTotal 19260 non-null int64 \n",
+ " 9 PassingTotal 19260 non-null int64 \n",
+ " 10 DribblingTotal 19260 non-null int64 \n",
+ " 11 DefendingTotal 19260 non-null int64 \n",
+ " 12 PhysicalityTotal 19260 non-null int64 \n",
+ " 13 Rating 19260 non-null int64 \n",
+ "dtypes: int64(11), object(3)\n",
+ "memory usage: 2.1+ MB\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#check dataset\n",
+ "data.describe().T"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 394
+ },
+ "id": "hrA8NszNkh6L",
+ "outputId": "055236e5-54f1-4471-825d-7e8223eb0cca"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " count mean std min 25% \\\n",
+ "Age 19260.0 2.518468e+01 4.737340e+00 16.0 21.0 \n",
+ "Height 19260.0 1.813050e+02 6.866151e+00 155.0 176.0 \n",
+ "Weight 19260.0 7.495078e+01 7.066864e+00 49.0 70.0 \n",
+ "Price 19260.0 2.857652e+06 7.604532e+06 0.0 475000.0 \n",
+ "PaceTotal 19260.0 6.791023e+01 1.065645e+01 28.0 62.0 \n",
+ "ShootingTotal 19260.0 5.353551e+01 1.381348e+01 18.0 44.0 \n",
+ "PassingTotal 19260.0 5.785332e+01 9.835494e+00 25.0 52.0 \n",
+ "DribblingTotal 19260.0 6.302871e+01 9.704853e+00 26.0 58.0 \n",
+ "DefendingTotal 19260.0 5.005810e+01 1.638880e+01 14.0 35.0 \n",
+ "PhysicalityTotal 19260.0 6.467658e+01 9.626269e+00 29.0 58.0 \n",
+ "Rating 19260.0 6.581563e+01 6.817297e+00 48.0 62.0 \n",
+ "\n",
+ " 50% 75% max \n",
+ "Age 25.0 29.0 54.0 \n",
+ "Height 181.0 186.0 206.0 \n",
+ "Weight 75.0 80.0 110.0 \n",
+ "Price 975000.0 2000000.0 194000000.0 \n",
+ "PaceTotal 68.0 75.0 97.0 \n",
+ "ShootingTotal 56.0 64.0 94.0 \n",
+ "PassingTotal 58.0 65.0 93.0 \n",
+ "DribblingTotal 64.0 69.0 95.0 \n",
+ "DefendingTotal 54.0 63.0 91.0 \n",
+ "PhysicalityTotal 66.0 72.0 92.0 \n",
+ "Rating 66.0 70.0 93.0 "
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " mean \n",
+ " std \n",
+ " min \n",
+ " 25% \n",
+ " 50% \n",
+ " 75% \n",
+ " max \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Age \n",
+ " 19260.0 \n",
+ " 2.518468e+01 \n",
+ " 4.737340e+00 \n",
+ " 16.0 \n",
+ " 21.0 \n",
+ " 25.0 \n",
+ " 29.0 \n",
+ " 54.0 \n",
+ " \n",
+ " \n",
+ " Height \n",
+ " 19260.0 \n",
+ " 1.813050e+02 \n",
+ " 6.866151e+00 \n",
+ " 155.0 \n",
+ " 176.0 \n",
+ " 181.0 \n",
+ " 186.0 \n",
+ " 206.0 \n",
+ " \n",
+ " \n",
+ " Weight \n",
+ " 19260.0 \n",
+ " 7.495078e+01 \n",
+ " 7.066864e+00 \n",
+ " 49.0 \n",
+ " 70.0 \n",
+ " 75.0 \n",
+ " 80.0 \n",
+ " 110.0 \n",
+ " \n",
+ " \n",
+ " Price \n",
+ " 19260.0 \n",
+ " 2.857652e+06 \n",
+ " 7.604532e+06 \n",
+ " 0.0 \n",
+ " 475000.0 \n",
+ " 975000.0 \n",
+ " 2000000.0 \n",
+ " 194000000.0 \n",
+ " \n",
+ " \n",
+ " PaceTotal \n",
+ " 19260.0 \n",
+ " 6.791023e+01 \n",
+ " 1.065645e+01 \n",
+ " 28.0 \n",
+ " 62.0 \n",
+ " 68.0 \n",
+ " 75.0 \n",
+ " 97.0 \n",
+ " \n",
+ " \n",
+ " ShootingTotal \n",
+ " 19260.0 \n",
+ " 5.353551e+01 \n",
+ " 1.381348e+01 \n",
+ " 18.0 \n",
+ " 44.0 \n",
+ " 56.0 \n",
+ " 64.0 \n",
+ " 94.0 \n",
+ " \n",
+ " \n",
+ " PassingTotal \n",
+ " 19260.0 \n",
+ " 5.785332e+01 \n",
+ " 9.835494e+00 \n",
+ " 25.0 \n",
+ " 52.0 \n",
+ " 58.0 \n",
+ " 65.0 \n",
+ " 93.0 \n",
+ " \n",
+ " \n",
+ " DribblingTotal \n",
+ " 19260.0 \n",
+ " 6.302871e+01 \n",
+ " 9.704853e+00 \n",
+ " 26.0 \n",
+ " 58.0 \n",
+ " 64.0 \n",
+ " 69.0 \n",
+ " 95.0 \n",
+ " \n",
+ " \n",
+ " DefendingTotal \n",
+ " 19260.0 \n",
+ " 5.005810e+01 \n",
+ " 1.638880e+01 \n",
+ " 14.0 \n",
+ " 35.0 \n",
+ " 54.0 \n",
+ " 63.0 \n",
+ " 91.0 \n",
+ " \n",
+ " \n",
+ " PhysicalityTotal \n",
+ " 19260.0 \n",
+ " 6.467658e+01 \n",
+ " 9.626269e+00 \n",
+ " 29.0 \n",
+ " 58.0 \n",
+ " 66.0 \n",
+ " 72.0 \n",
+ " 92.0 \n",
+ " \n",
+ " \n",
+ " Rating \n",
+ " 19260.0 \n",
+ " 6.581563e+01 \n",
+ " 6.817297e+00 \n",
+ " 48.0 \n",
+ " 62.0 \n",
+ " 66.0 \n",
+ " 70.0 \n",
+ " 93.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 11,\n \"fields\": [\n {\n \"column\": \"count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 19260.0,\n \"max\": 19260.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 19260.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"mean\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 861593.1275479158,\n \"min\": 25.184683281412255,\n \"max\": 2857651.5549325026,\n \"num_unique_values\": 11,\n \"samples\": [\n 53.535514018691586\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"std\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2292849.817214946,\n \"min\": 4.737340478154564,\n \"max\": 7604532.0956287095,\n \"num_unique_values\": 11,\n \"samples\": [\n 13.813476196758511\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"min\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 41.59195726448696,\n \"min\": 0.0,\n \"max\": 155.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 18.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"25%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 143198.6578001471,\n \"min\": 21.0,\n \"max\": 475000.0,\n \"num_unique_values\": 9,\n \"samples\": [\n 58.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"50%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 293952.06575964665,\n \"min\": 25.0,\n \"max\": 975000.0,\n \"num_unique_values\": 10,\n \"samples\": [\n 54.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"75%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 602999.3835594467,\n \"min\": 29.0,\n \"max\": 2000000.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 64.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"max\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 58493169.94318504,\n \"min\": 54.0,\n \"max\": 194000000.0,\n \"num_unique_values\": 10,\n \"samples\": [\n 91.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "- tinggi rata-rata orang asia adalah 170-175cm, dari statistik sederhana yang dilakukan, dataset ini memiliki rata-rata tinggi badan 181cm, artinya pemain asia pada dataset ini sedikit"
+ ],
+ "metadata": {
+ "id": "HRbLL3sQk1v1"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#4. Exploratory Data Analysis (EDA)\n",
+ "> Bagian ini berisi eksplorasi data pada dataset diatas dengan menggunakan query, grouping, visualisasi sederhana, dan lain sebagainya."
+ ],
+ "metadata": {
+ "id": "rO_HE-53lgIX"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#create Histogram and Scatter plot\n",
+ "\n",
+ "plt.figure(figsize = (16,5))\n",
+ "plt.subplot(1,2,1)\n",
+ "sns.histplot(data['Rating'], kde = True, bins = 20)\n",
+ "plt.title('Rating Histrogram')\n",
+ "\n",
+ "plt.subplot(1,2,2)\n",
+ "sns.scatterplot(x = 'Weight', y = 'Height', data = data)\n",
+ "plt.title('Height and Weight Proportion')\n",
+ "plt.show()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 331
+ },
+ "id": "rO5N6098kqes",
+ "outputId": "641ffb0f-fbc2-4ba8-a592-228334fffcf3"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": "\n"
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "- Terlihat dari plot histogram, rating terdistribusi normal dengan rata-rata di sekitar 65\n",
+ "- Tinggi dan berat badan pemain cukup proporsional, terlihat dari berat dan tinggi pemain yang seimbang"
+ ],
+ "metadata": {
+ "id": "yQNJ-fXUne6B"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#5. Feature Engineering\n",
+ "\n",
+ "> Bagian ini berisi proses penyiapan data untuk proses pelatihan model, seperti pembagian data menjadi train-test, transformasi data (normalisasi, encoding, dll.), dan proses-proses lain yang dibutuhkan."
+ ],
+ "metadata": {
+ "id": "g3AFgZprn2_p"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Handling Cardinality\n",
+ "\n",
+ "akan dibahas di hari rabu"
+ ],
+ "metadata": {
+ "id": "XCqzlG_bn894"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Splitting Between Train-set and Test-set"
+ ],
+ "metadata": {
+ "id": "bYEP9M8FoaYV"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Split between X & y\n",
+ "\n",
+ "X = data.drop('Rating', axis = 1)\n",
+ "y = data['Rating']\n",
+ "X"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 496
+ },
+ "id": "U-la1TikojAQ",
+ "outputId": "b82ec5a5-9427-4285-d7bd-8971fbd4218f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight Price AttackingWorkRate \\\n",
+ "0 L. Messi 34 170 72 78000000 Medium \n",
+ "1 R. Lewandowski 32 185 81 119500000 High \n",
+ "2 Cristiano Ronaldo 36 187 83 45000000 High \n",
+ "3 K. Mbappé 22 182 73 194000000 High \n",
+ "4 J. Oblak 28 188 87 112000000 Medium \n",
+ "... ... ... ... ... ... ... \n",
+ "19255 S. Black 19 180 75 100000 Medium \n",
+ "19256 Ma Zhen 23 196 85 50000 Medium \n",
+ "19257 Yang Haoyu 20 183 77 90000 Medium \n",
+ "19258 He Siwei 20 174 69 100000 Medium \n",
+ "19259 Chen Guoliang 22 186 70 70000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal \\\n",
+ "0 Low 85 92 91 \n",
+ "1 Medium 78 92 79 \n",
+ "2 Low 87 94 80 \n",
+ "3 Low 97 88 80 \n",
+ "4 Medium 87 92 78 \n",
+ "... ... ... ... ... \n",
+ "19255 Medium 56 27 29 \n",
+ "19256 Medium 49 47 45 \n",
+ "19257 Medium 57 26 29 \n",
+ "19258 Medium 61 25 32 \n",
+ "19259 Medium 55 27 29 \n",
+ "\n",
+ " DribblingTotal DefendingTotal PhysicalityTotal \n",
+ "0 95 34 65 \n",
+ "1 85 44 82 \n",
+ "2 87 34 75 \n",
+ "3 92 36 77 \n",
+ "4 90 52 90 \n",
+ "... ... ... ... \n",
+ "19255 33 48 53 \n",
+ "19256 46 54 44 \n",
+ "19257 28 51 56 \n",
+ "19258 32 49 51 \n",
+ "19259 30 50 54 \n",
+ "\n",
+ "[19260 rows x 13 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " L. Messi \n",
+ " 34 \n",
+ " 170 \n",
+ " 72 \n",
+ " 78000000 \n",
+ " Medium \n",
+ " Low \n",
+ " 85 \n",
+ " 92 \n",
+ " 91 \n",
+ " 95 \n",
+ " 34 \n",
+ " 65 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " R. Lewandowski \n",
+ " 32 \n",
+ " 185 \n",
+ " 81 \n",
+ " 119500000 \n",
+ " High \n",
+ " Medium \n",
+ " 78 \n",
+ " 92 \n",
+ " 79 \n",
+ " 85 \n",
+ " 44 \n",
+ " 82 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Cristiano Ronaldo \n",
+ " 36 \n",
+ " 187 \n",
+ " 83 \n",
+ " 45000000 \n",
+ " High \n",
+ " Low \n",
+ " 87 \n",
+ " 94 \n",
+ " 80 \n",
+ " 87 \n",
+ " 34 \n",
+ " 75 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " K. Mbappé \n",
+ " 22 \n",
+ " 182 \n",
+ " 73 \n",
+ " 194000000 \n",
+ " High \n",
+ " Low \n",
+ " 97 \n",
+ " 88 \n",
+ " 80 \n",
+ " 92 \n",
+ " 36 \n",
+ " 77 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " J. Oblak \n",
+ " 28 \n",
+ " 188 \n",
+ " 87 \n",
+ " 112000000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 87 \n",
+ " 92 \n",
+ " 78 \n",
+ " 90 \n",
+ " 52 \n",
+ " 90 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 19255 \n",
+ " S. Black \n",
+ " 19 \n",
+ " 180 \n",
+ " 75 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 56 \n",
+ " 27 \n",
+ " 29 \n",
+ " 33 \n",
+ " 48 \n",
+ " 53 \n",
+ " \n",
+ " \n",
+ " 19256 \n",
+ " Ma Zhen \n",
+ " 23 \n",
+ " 196 \n",
+ " 85 \n",
+ " 50000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 49 \n",
+ " 47 \n",
+ " 45 \n",
+ " 46 \n",
+ " 54 \n",
+ " 44 \n",
+ " \n",
+ " \n",
+ " 19257 \n",
+ " Yang Haoyu \n",
+ " 20 \n",
+ " 183 \n",
+ " 77 \n",
+ " 90000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 57 \n",
+ " 26 \n",
+ " 29 \n",
+ " 28 \n",
+ " 51 \n",
+ " 56 \n",
+ " \n",
+ " \n",
+ " 19258 \n",
+ " He Siwei \n",
+ " 20 \n",
+ " 174 \n",
+ " 69 \n",
+ " 100000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 61 \n",
+ " 25 \n",
+ " 32 \n",
+ " 32 \n",
+ " 49 \n",
+ " 51 \n",
+ " \n",
+ " \n",
+ " 19259 \n",
+ " Chen Guoliang \n",
+ " 22 \n",
+ " 186 \n",
+ " 70 \n",
+ " 70000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 55 \n",
+ " 27 \n",
+ " 29 \n",
+ " 30 \n",
+ " 50 \n",
+ " 54 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
19260 rows × 13 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X",
+ "summary": "{\n \"name\": \"X\",\n \"rows\": 19260,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 18058,\n \"samples\": [\n \"R. Bouallak\",\n \"M. Beier\",\n \"D. Peri\\u0107\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 16,\n \"max\": 54,\n \"num_unique_values\": 29,\n \"samples\": [\n 42,\n 23,\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 155,\n \"max\": 206,\n \"num_unique_values\": 50,\n \"samples\": [\n 191,\n 161,\n 186\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 49,\n \"max\": 110,\n \"num_unique_values\": 57,\n \"samples\": [\n 72,\n 70,\n 77\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7604532,\n \"min\": 0,\n \"max\": 194000000,\n \"num_unique_values\": 252,\n \"samples\": [\n 3700000,\n 129000000,\n 31500000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Low\",\n \"Medium\",\n \"High\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 10,\n \"min\": 28,\n \"max\": 97,\n \"num_unique_values\": 70,\n \"samples\": [\n 79,\n 85,\n 32\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13,\n \"min\": 18,\n \"max\": 94,\n \"num_unique_values\": 76,\n \"samples\": [\n 83,\n 58,\n 89\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 93,\n \"num_unique_values\": 67,\n \"samples\": [\n 61,\n 89,\n 93\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 26,\n \"max\": 95,\n \"num_unique_values\": 69,\n \"samples\": [\n 68,\n 95,\n 51\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 14,\n \"max\": 91,\n \"num_unique_values\": 77,\n \"samples\": [\n 64,\n 78,\n 43\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 29,\n \"max\": 92,\n \"num_unique_values\": 62,\n \"samples\": [\n 43,\n 38,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Splitting between train and test\n",
+ "\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)\n",
+ "print('Train_size: ' , X_train.shape)\n",
+ "print('Test_size: ', X_test.shape)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "wREuIfJVpYJI",
+ "outputId": "ee066263-8295-405a-d7a9-c2dcb5eec897"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Train_size: (15408, 13)\n",
+ "Test_size: (3852, 13)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 513
+ },
+ "id": "qpJk6q_psGfQ",
+ "outputId": "e3e6c462-3ed7-4d69-eafe-1ca9e9c97a19"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name Age Height Weight Price AttackingWorkRate \\\n",
+ "3035 C. Robinson 26 178 75 4000000 Medium \n",
+ "3964 Samu 25 173 70 2500000 High \n",
+ "511 Ander Herrera 31 182 71 14500000 High \n",
+ "17897 L. Beckemeyer 21 182 75 250000 Medium \n",
+ "4230 I. Jakobs 21 184 75 4200000 High \n",
+ "... ... ... ... ... ... ... \n",
+ "11284 R. Meißner 21 181 78 1300000 High \n",
+ "11964 J. Leutwiler 32 196 80 300000 Medium \n",
+ "5390 Heitinho Zanon 25 187 79 1500000 Medium \n",
+ "860 E. Eze 23 178 67 16500000 Medium \n",
+ "15795 J. Rodríguez 19 179 62 500000 Medium \n",
+ "\n",
+ " DefensiveWorkRate PaceTotal ShootingTotal PassingTotal \\\n",
+ "3035 Medium 79 73 65 \n",
+ "3964 Medium 77 58 64 \n",
+ "511 High 65 72 77 \n",
+ "17897 Medium 59 57 51 \n",
+ "4230 High 86 58 58 \n",
+ "... ... ... ... ... \n",
+ "11284 Medium 68 65 43 \n",
+ "11964 Medium 63 61 62 \n",
+ "5390 Medium 61 34 47 \n",
+ "860 Medium 77 69 73 \n",
+ "15795 Medium 72 28 40 \n",
+ "\n",
+ " DribblingTotal DefendingTotal PhysicalityTotal \n",
+ "3035 75 32 61 \n",
+ "3964 77 44 56 \n",
+ "511 79 78 75 \n",
+ "17897 58 34 51 \n",
+ "4230 73 61 69 \n",
+ "... ... ... ... \n",
+ "11284 62 22 63 \n",
+ "11964 66 46 65 \n",
+ "5390 39 70 77 \n",
+ "860 81 47 68 \n",
+ "15795 40 59 60 \n",
+ "\n",
+ "[15408 rows x 13 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 3035 \n",
+ " C. Robinson \n",
+ " 26 \n",
+ " 178 \n",
+ " 75 \n",
+ " 4000000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 79 \n",
+ " 73 \n",
+ " 65 \n",
+ " 75 \n",
+ " 32 \n",
+ " 61 \n",
+ " \n",
+ " \n",
+ " 3964 \n",
+ " Samu \n",
+ " 25 \n",
+ " 173 \n",
+ " 70 \n",
+ " 2500000 \n",
+ " High \n",
+ " Medium \n",
+ " 77 \n",
+ " 58 \n",
+ " 64 \n",
+ " 77 \n",
+ " 44 \n",
+ " 56 \n",
+ " \n",
+ " \n",
+ " 511 \n",
+ " Ander Herrera \n",
+ " 31 \n",
+ " 182 \n",
+ " 71 \n",
+ " 14500000 \n",
+ " High \n",
+ " High \n",
+ " 65 \n",
+ " 72 \n",
+ " 77 \n",
+ " 79 \n",
+ " 78 \n",
+ " 75 \n",
+ " \n",
+ " \n",
+ " 17897 \n",
+ " L. Beckemeyer \n",
+ " 21 \n",
+ " 182 \n",
+ " 75 \n",
+ " 250000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 59 \n",
+ " 57 \n",
+ " 51 \n",
+ " 58 \n",
+ " 34 \n",
+ " 51 \n",
+ " \n",
+ " \n",
+ " 4230 \n",
+ " I. Jakobs \n",
+ " 21 \n",
+ " 184 \n",
+ " 75 \n",
+ " 4200000 \n",
+ " High \n",
+ " High \n",
+ " 86 \n",
+ " 58 \n",
+ " 58 \n",
+ " 73 \n",
+ " 61 \n",
+ " 69 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 11284 \n",
+ " R. Meißner \n",
+ " 21 \n",
+ " 181 \n",
+ " 78 \n",
+ " 1300000 \n",
+ " High \n",
+ " Medium \n",
+ " 68 \n",
+ " 65 \n",
+ " 43 \n",
+ " 62 \n",
+ " 22 \n",
+ " 63 \n",
+ " \n",
+ " \n",
+ " 11964 \n",
+ " J. Leutwiler \n",
+ " 32 \n",
+ " 196 \n",
+ " 80 \n",
+ " 300000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 63 \n",
+ " 61 \n",
+ " 62 \n",
+ " 66 \n",
+ " 46 \n",
+ " 65 \n",
+ " \n",
+ " \n",
+ " 5390 \n",
+ " Heitinho Zanon \n",
+ " 25 \n",
+ " 187 \n",
+ " 79 \n",
+ " 1500000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 61 \n",
+ " 34 \n",
+ " 47 \n",
+ " 39 \n",
+ " 70 \n",
+ " 77 \n",
+ " \n",
+ " \n",
+ " 860 \n",
+ " E. Eze \n",
+ " 23 \n",
+ " 178 \n",
+ " 67 \n",
+ " 16500000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 77 \n",
+ " 69 \n",
+ " 73 \n",
+ " 81 \n",
+ " 47 \n",
+ " 68 \n",
+ " \n",
+ " \n",
+ " 15795 \n",
+ " J. Rodríguez \n",
+ " 19 \n",
+ " 179 \n",
+ " 62 \n",
+ " 500000 \n",
+ " Medium \n",
+ " Medium \n",
+ " 72 \n",
+ " 28 \n",
+ " 40 \n",
+ " 40 \n",
+ " 59 \n",
+ " 60 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 13 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train",
+ "summary": "{\n \"name\": \"X_train\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 14572,\n \"samples\": [\n \"\\u00c1lex Blanco\",\n \"Kwoun Sun Tae\",\n \"Park Joo Ho\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4,\n \"min\": 16,\n \"max\": 54,\n \"num_unique_values\": 29,\n \"samples\": [\n 42,\n 30,\n 27\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Height\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6,\n \"min\": 155,\n \"max\": 206,\n \"num_unique_values\": 49,\n \"samples\": [\n 180,\n 156,\n 160\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7,\n \"min\": 49,\n \"max\": 110,\n \"num_unique_values\": 57,\n \"samples\": [\n 75,\n 65,\n 66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7413730,\n \"min\": 0,\n \"max\": 137500000,\n \"num_unique_values\": 241,\n \"samples\": [\n 240000,\n 325000,\n 63500000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PaceTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 10,\n \"min\": 28,\n \"max\": 96,\n \"num_unique_values\": 69,\n \"samples\": [\n 81,\n 79,\n 37\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ShootingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13,\n \"min\": 18,\n \"max\": 94,\n \"num_unique_values\": 76,\n \"samples\": [\n 42,\n 36,\n 64\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PassingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 25,\n \"max\": 93,\n \"num_unique_values\": 67,\n \"samples\": [\n 81,\n 61,\n 58\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DribblingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 26,\n \"max\": 91,\n \"num_unique_values\": 66,\n \"samples\": [\n 44,\n 34,\n 75\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefendingTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 14,\n \"max\": 91,\n \"num_unique_values\": 77,\n \"samples\": [\n 61,\n 26,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"PhysicalityTotal\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9,\n \"min\": 29,\n \"max\": 92,\n \"num_unique_values\": 62,\n \"samples\": [\n 36,\n 89,\n 61\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Handling Outliers\n",
+ "\n",
+ "akan dibahas di hari rabu"
+ ],
+ "metadata": {
+ "id": "o-acJJfPoC8L"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Handling Missing Values\n",
+ "\n",
+ "akan dibahas di hari rabu"
+ ],
+ "metadata": {
+ "id": "Go7do_7QoJkE"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train.isnull().sum()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "1aSmuk55nYMN",
+ "outputId": "45676420-afbb-4166-cf8d-e0f29171cc5d"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Name 0\n",
+ "Age 0\n",
+ "Height 0\n",
+ "Weight 0\n",
+ "Price 0\n",
+ "AttackingWorkRate 0\n",
+ "DefensiveWorkRate 0\n",
+ "PaceTotal 0\n",
+ "ShootingTotal 0\n",
+ "PassingTotal 0\n",
+ "DribblingTotal 0\n",
+ "DefendingTotal 0\n",
+ "PhysicalityTotal 0\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_test.isnull().sum()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Px9-yFJLmrW3",
+ "outputId": "57b21f86-07d2-44c1-8acf-17be599748b3"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Name 0\n",
+ "Age 0\n",
+ "Height 0\n",
+ "Weight 0\n",
+ "Price 0\n",
+ "AttackingWorkRate 0\n",
+ "DefensiveWorkRate 0\n",
+ "PaceTotal 0\n",
+ "ShootingTotal 0\n",
+ "PassingTotal 0\n",
+ "DribblingTotal 0\n",
+ "DefendingTotal 0\n",
+ "PhysicalityTotal 0\n",
+ "dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "y_train.isnull().sum()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mBUWxfdwsm_o",
+ "outputId": "5de58cd4-d744-4b99-d4bc-1fb203b5316f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "y_test.isnull().sum()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "PQcqfq7EsqP0",
+ "outputId": "23a1ca03-34a7-4796-f22f-074627227e61"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Data tidak memiliki missing value, sehingga proses feature engineering bisa dilanjutkan ke tahap selanjutnya"
+ ],
+ "metadata": {
+ "id": "qI7tKMr7st-H"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "##Split Between Numeric Columns and Categorical Columns"
+ ],
+ "metadata": {
+ "id": "2h9mNYf4s5Vt"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#get numeric and categorical column\n",
+ "\n",
+ "cat_columns = X_train.select_dtypes(include=['object']).columns.tolist()\n",
+ "num_columns = X_train.select_dtypes(include = np.number).columns.tolist()\n",
+ "\n",
+ "print('Numerical Columns: ', num_columns)\n",
+ "print('Categorical Columns: ', cat_columns)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "psBpffmassT2",
+ "outputId": "736d5465-8d76-446e-c01b-034aa4d505d3"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Numerical Columns: ['Age', 'Height', 'Weight', 'Price', 'PaceTotal', 'ShootingTotal', 'PassingTotal', 'DribblingTotal', 'DefendingTotal', 'PhysicalityTotal']\n",
+ "Categorical Columns: ['Name', 'AttackingWorkRate', 'DefensiveWorkRate']\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Split numerical columns and categorical columns\n",
+ "\n",
+ "X_train_num = X_train[num_columns]\n",
+ "X_train_cat = X_train[cat_columns]\n",
+ "\n",
+ "X_test_num = X_test[num_columns]\n",
+ "X_test_cat = X_test[cat_columns]\n",
+ "\n",
+ "X_train_cat"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 423
+ },
+ "id": "xRiBv0Matoxt",
+ "outputId": "38dba6f4-2020-4167-e068-c18e976519c9"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name AttackingWorkRate DefensiveWorkRate\n",
+ "3035 C. Robinson Medium Medium\n",
+ "3964 Samu High Medium\n",
+ "511 Ander Herrera High High\n",
+ "17897 L. Beckemeyer Medium Medium\n",
+ "4230 I. Jakobs High High\n",
+ "... ... ... ...\n",
+ "11284 R. Meißner High Medium\n",
+ "11964 J. Leutwiler Medium Medium\n",
+ "5390 Heitinho Zanon Medium Medium\n",
+ "860 E. Eze Medium Medium\n",
+ "15795 J. Rodríguez Medium Medium\n",
+ "\n",
+ "[15408 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 3035 \n",
+ " C. Robinson \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 3964 \n",
+ " Samu \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 511 \n",
+ " Ander Herrera \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " 17897 \n",
+ " L. Beckemeyer \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 4230 \n",
+ " I. Jakobs \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 11284 \n",
+ " R. Meißner \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 11964 \n",
+ " J. Leutwiler \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 5390 \n",
+ " Heitinho Zanon \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 860 \n",
+ " E. Eze \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 15795 \n",
+ " J. Rodríguez \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train_cat",
+ "summary": "{\n \"name\": \"X_train_cat\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 14572,\n \"samples\": [\n \"\\u00c1lex Blanco\",\n \"Kwoun Sun Tae\",\n \"Park Joo Ho\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Feature Selection\n",
+ "\n",
+ "akan dipelajari lebih dalam di hari rabu"
+ ],
+ "metadata": {
+ "id": "v-ICKADJuovy"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train_cat"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 423
+ },
+ "id": "8w-CBYfmuSa9",
+ "outputId": "f7ffec39-ec15-400d-9678-44e1d15ac128"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Name AttackingWorkRate DefensiveWorkRate\n",
+ "3035 C. Robinson Medium Medium\n",
+ "3964 Samu High Medium\n",
+ "511 Ander Herrera High High\n",
+ "17897 L. Beckemeyer Medium Medium\n",
+ "4230 I. Jakobs High High\n",
+ "... ... ... ...\n",
+ "11284 R. Meißner High Medium\n",
+ "11964 J. Leutwiler Medium Medium\n",
+ "5390 Heitinho Zanon Medium Medium\n",
+ "860 E. Eze Medium Medium\n",
+ "15795 J. Rodríguez Medium Medium\n",
+ "\n",
+ "[15408 rows x 3 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Name \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 3035 \n",
+ " C. Robinson \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 3964 \n",
+ " Samu \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 511 \n",
+ " Ander Herrera \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " 17897 \n",
+ " L. Beckemeyer \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 4230 \n",
+ " I. Jakobs \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 11284 \n",
+ " R. Meißner \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 11964 \n",
+ " J. Leutwiler \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 5390 \n",
+ " Heitinho Zanon \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 860 \n",
+ " E. Eze \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 15795 \n",
+ " J. Rodríguez \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 3 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train_cat",
+ "summary": "{\n \"name\": \"X_train_cat\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 14572,\n \"samples\": [\n \"\\u00c1lex Blanco\",\n \"Kwoun Sun Tae\",\n \"Park Joo Ho\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Berdasarkan personal judgement, tidak ada kaitannya nama dengan rating pemain sepak bola. Ini bisa dibuktikan dengan nama Fadhil Ronaldo tidak kaitannya dengan nama sehebat Christiano Ronaldo sehiingga rating nya pun akan berbeda."
+ ],
+ "metadata": {
+ "id": "fM-_zzwsvdhT"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Drop column 'Name'\n",
+ "\n",
+ "X_train_cat.drop('Name', axis = 1, inplace = True)\n",
+ "X_test_cat.drop('Name', axis = 1, inplace = True)\n",
+ "\n",
+ "X_train_cat\n"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 617
+ },
+ "id": "euFZ5eJ0vaQJ",
+ "outputId": "d80faab5-5cfc-44cd-e897-6224ae7c738f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ ":3: SettingWithCopyWarning: \n",
+ "A value is trying to be set on a copy of a slice from a DataFrame\n",
+ "\n",
+ "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+ " X_train_cat.drop('Name', axis = 1, inplace = True)\n",
+ ":4: SettingWithCopyWarning: \n",
+ "A value is trying to be set on a copy of a slice from a DataFrame\n",
+ "\n",
+ "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
+ " X_test_cat.drop('Name', axis = 1, inplace = True)\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " AttackingWorkRate DefensiveWorkRate\n",
+ "3035 Medium Medium\n",
+ "3964 High Medium\n",
+ "511 High High\n",
+ "17897 Medium Medium\n",
+ "4230 High High\n",
+ "... ... ...\n",
+ "11284 High Medium\n",
+ "11964 Medium Medium\n",
+ "5390 Medium Medium\n",
+ "860 Medium Medium\n",
+ "15795 Medium Medium\n",
+ "\n",
+ "[15408 rows x 2 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 3035 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 3964 \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 511 \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " 17897 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 4230 \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 11284 \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 11964 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 5390 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 860 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 15795 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 2 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train_cat",
+ "summary": "{\n \"name\": \"X_train_cat\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 20
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "num_columns = X_train_num.columns.tolist()\n",
+ "cat_columns = X_train_cat.columns.tolist()\n",
+ "\n",
+ "print('Num Columns : ', num_columns)\n",
+ "print('Cat Columns : ', cat_columns)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "6saaV_9Wv7v4",
+ "outputId": "c18cf602-27b2-4a49-cda2-9b3baf262b2b"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Num Columns : ['Age', 'Height', 'Weight', 'Price', 'PaceTotal', 'ShootingTotal', 'PassingTotal', 'DribblingTotal', 'DefendingTotal', 'PhysicalityTotal']\n",
+ "Cat Columns : ['AttackingWorkRate', 'DefensiveWorkRate']\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Feature Scaling"
+ ],
+ "metadata": {
+ "id": "HHE-J_SgwlxL"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Feature Scaling using MinMaxScaler\n",
+ "from sklearn.preprocessing import MinMaxScaler\n",
+ "\n",
+ "scaler = MinMaxScaler()\n",
+ "scaler.fit(X_train_num)\n",
+ "\n",
+ "X_train_num_scaled = scaler.transform(X_train_num)\n",
+ "X_test_num_scaled = scaler.transform(X_test_num)\n",
+ "\n",
+ "X_train_num_scaled"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "yzbCbZmnwn6N",
+ "outputId": "7842c55b-6bc9-430c-aac5-947c722758c8"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([[0.26315789, 0.45098039, 0.42622951, ..., 0.75384615, 0.23376623,\n",
+ " 0.50793651],\n",
+ " [0.23684211, 0.35294118, 0.3442623 , ..., 0.78461538, 0.38961039,\n",
+ " 0.42857143],\n",
+ " [0.39473684, 0.52941176, 0.36065574, ..., 0.81538462, 0.83116883,\n",
+ " 0.73015873],\n",
+ " ...,\n",
+ " [0.23684211, 0.62745098, 0.49180328, ..., 0.2 , 0.72727273,\n",
+ " 0.76190476],\n",
+ " [0.18421053, 0.45098039, 0.29508197, ..., 0.84615385, 0.42857143,\n",
+ " 0.61904762],\n",
+ " [0.07894737, 0.47058824, 0.21311475, ..., 0.21538462, 0.58441558,\n",
+ " 0.49206349]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Feature Encoding\n",
+ "\n",
+ "Jelaskan alasan pemilihan teknik encoding"
+ ],
+ "metadata": {
+ "id": "Ur7I-vKsxzGf"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train_cat"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 423
+ },
+ "id": "oJS_tXFxx4zR",
+ "outputId": "5276a6ce-249b-450d-82f0-72437221faa6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " AttackingWorkRate DefensiveWorkRate\n",
+ "3035 Medium Medium\n",
+ "3964 High Medium\n",
+ "511 High High\n",
+ "17897 Medium Medium\n",
+ "4230 High High\n",
+ "... ... ...\n",
+ "11284 High Medium\n",
+ "11964 Medium Medium\n",
+ "5390 Medium Medium\n",
+ "860 Medium Medium\n",
+ "15795 Medium Medium\n",
+ "\n",
+ "[15408 rows x 2 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 3035 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 3964 \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 511 \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " 17897 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 4230 \n",
+ " High \n",
+ " High \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 11284 \n",
+ " High \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 11964 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 5390 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 860 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ " 15795 \n",
+ " Medium \n",
+ " Medium \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 2 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train_cat",
+ "summary": "{\n \"name\": \"X_train_cat\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": \"AttackingWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"DefensiveWorkRate\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Medium\",\n \"High\",\n \"Low\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Feature Encoding using Ordinal Encoder\n",
+ "\n",
+ "from sklearn.preprocessing import OrdinalEncoder\n",
+ "\n",
+ "encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High'],\n",
+ " ['Low', 'Medium', 'High']])\n",
+ "encoder.fit(X_train_cat)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 92
+ },
+ "id": "qsIbjcLrwaRh",
+ "outputId": "5faf4606-c7cb-437c-e8a3-286d0942c1ee"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "OrdinalEncoder(categories=[['Low', 'Medium', 'High'],\n",
+ " ['Low', 'Medium', 'High']])"
+ ],
+ "text/html": [
+ "OrdinalEncoder(categories=[['Low', 'Medium', 'High'],\n",
+ " ['Low', 'Medium', 'High']]) In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train_cat_encoded = encoder.transform(X_train_cat)\n",
+ "X_test_cat_encoded = encoder.transform(X_test_cat)\n",
+ "\n",
+ "X_train_cat_encoded"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "kR1vfyTSy3ZO",
+ "outputId": "e90e1976-3383-4dfa-d737-6e1932fda6f4"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([[1., 1.],\n",
+ " [2., 1.],\n",
+ " [2., 2.],\n",
+ " ...,\n",
+ " [1., 1.],\n",
+ " [1., 1.],\n",
+ " [1., 1.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 25
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Concate between Numeric Columns and Categorical Columns"
+ ],
+ "metadata": {
+ "id": "xNoVXJYozb_h"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Concate Columns\n",
+ "\n",
+ "X_train_final = np.concatenate([X_train_num_scaled, X_train_cat_encoded], axis = 1)\n",
+ "X_test_final = np.concatenate([X_test_num_scaled, X_test_cat_encoded], axis = 1)\n",
+ "\n",
+ "X_train_final.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "G4Gc6qlSzg1q",
+ "outputId": "77a930d5-6460-48b4-bdf6-7069ca4e7c89"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(15408, 12)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 26
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "X_train_final_datframe = pd.DataFrame(X_train_final, columns = [num_columns + cat_columns])\n",
+ "X_train_final_datframe"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 443
+ },
+ "id": "yiXSdbrdzEOJ",
+ "outputId": "39b6775c-2f41-4b49-f94d-a49823b67734"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Age Height Weight Price PaceTotal ShootingTotal \\\n",
+ "0 0.263158 0.450980 0.426230 0.029091 0.750000 0.723684 \n",
+ "1 0.236842 0.352941 0.344262 0.018182 0.720588 0.526316 \n",
+ "2 0.394737 0.529412 0.360656 0.105455 0.544118 0.710526 \n",
+ "3 0.131579 0.529412 0.426230 0.001818 0.455882 0.513158 \n",
+ "4 0.131579 0.568627 0.426230 0.030545 0.852941 0.526316 \n",
+ "... ... ... ... ... ... ... \n",
+ "15403 0.131579 0.509804 0.475410 0.009455 0.588235 0.618421 \n",
+ "15404 0.421053 0.803922 0.508197 0.002182 0.514706 0.565789 \n",
+ "15405 0.236842 0.627451 0.491803 0.010909 0.485294 0.210526 \n",
+ "15406 0.184211 0.450980 0.295082 0.120000 0.720588 0.671053 \n",
+ "15407 0.078947 0.470588 0.213115 0.003636 0.647059 0.131579 \n",
+ "\n",
+ " PassingTotal DribblingTotal DefendingTotal PhysicalityTotal \\\n",
+ "0 0.588235 0.753846 0.233766 0.507937 \n",
+ "1 0.573529 0.784615 0.389610 0.428571 \n",
+ "2 0.764706 0.815385 0.831169 0.730159 \n",
+ "3 0.382353 0.492308 0.259740 0.349206 \n",
+ "4 0.485294 0.723077 0.610390 0.634921 \n",
+ "... ... ... ... ... \n",
+ "15403 0.264706 0.553846 0.103896 0.539683 \n",
+ "15404 0.544118 0.615385 0.415584 0.571429 \n",
+ "15405 0.323529 0.200000 0.727273 0.761905 \n",
+ "15406 0.705882 0.846154 0.428571 0.619048 \n",
+ "15407 0.220588 0.215385 0.584416 0.492063 \n",
+ "\n",
+ " AttackingWorkRate DefensiveWorkRate \n",
+ "0 1.0 1.0 \n",
+ "1 2.0 1.0 \n",
+ "2 2.0 2.0 \n",
+ "3 1.0 1.0 \n",
+ "4 2.0 2.0 \n",
+ "... ... ... \n",
+ "15403 2.0 1.0 \n",
+ "15404 1.0 1.0 \n",
+ "15405 1.0 1.0 \n",
+ "15406 1.0 1.0 \n",
+ "15407 1.0 1.0 \n",
+ "\n",
+ "[15408 rows x 12 columns]"
+ ],
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Age \n",
+ " Height \n",
+ " Weight \n",
+ " Price \n",
+ " PaceTotal \n",
+ " ShootingTotal \n",
+ " PassingTotal \n",
+ " DribblingTotal \n",
+ " DefendingTotal \n",
+ " PhysicalityTotal \n",
+ " AttackingWorkRate \n",
+ " DefensiveWorkRate \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 0.263158 \n",
+ " 0.450980 \n",
+ " 0.426230 \n",
+ " 0.029091 \n",
+ " 0.750000 \n",
+ " 0.723684 \n",
+ " 0.588235 \n",
+ " 0.753846 \n",
+ " 0.233766 \n",
+ " 0.507937 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 0.236842 \n",
+ " 0.352941 \n",
+ " 0.344262 \n",
+ " 0.018182 \n",
+ " 0.720588 \n",
+ " 0.526316 \n",
+ " 0.573529 \n",
+ " 0.784615 \n",
+ " 0.389610 \n",
+ " 0.428571 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 0.394737 \n",
+ " 0.529412 \n",
+ " 0.360656 \n",
+ " 0.105455 \n",
+ " 0.544118 \n",
+ " 0.710526 \n",
+ " 0.764706 \n",
+ " 0.815385 \n",
+ " 0.831169 \n",
+ " 0.730159 \n",
+ " 2.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 0.131579 \n",
+ " 0.529412 \n",
+ " 0.426230 \n",
+ " 0.001818 \n",
+ " 0.455882 \n",
+ " 0.513158 \n",
+ " 0.382353 \n",
+ " 0.492308 \n",
+ " 0.259740 \n",
+ " 0.349206 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 0.131579 \n",
+ " 0.568627 \n",
+ " 0.426230 \n",
+ " 0.030545 \n",
+ " 0.852941 \n",
+ " 0.526316 \n",
+ " 0.485294 \n",
+ " 0.723077 \n",
+ " 0.610390 \n",
+ " 0.634921 \n",
+ " 2.0 \n",
+ " 2.0 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 15403 \n",
+ " 0.131579 \n",
+ " 0.509804 \n",
+ " 0.475410 \n",
+ " 0.009455 \n",
+ " 0.588235 \n",
+ " 0.618421 \n",
+ " 0.264706 \n",
+ " 0.553846 \n",
+ " 0.103896 \n",
+ " 0.539683 \n",
+ " 2.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 15404 \n",
+ " 0.421053 \n",
+ " 0.803922 \n",
+ " 0.508197 \n",
+ " 0.002182 \n",
+ " 0.514706 \n",
+ " 0.565789 \n",
+ " 0.544118 \n",
+ " 0.615385 \n",
+ " 0.415584 \n",
+ " 0.571429 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 15405 \n",
+ " 0.236842 \n",
+ " 0.627451 \n",
+ " 0.491803 \n",
+ " 0.010909 \n",
+ " 0.485294 \n",
+ " 0.210526 \n",
+ " 0.323529 \n",
+ " 0.200000 \n",
+ " 0.727273 \n",
+ " 0.761905 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 15406 \n",
+ " 0.184211 \n",
+ " 0.450980 \n",
+ " 0.295082 \n",
+ " 0.120000 \n",
+ " 0.720588 \n",
+ " 0.671053 \n",
+ " 0.705882 \n",
+ " 0.846154 \n",
+ " 0.428571 \n",
+ " 0.619048 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 15407 \n",
+ " 0.078947 \n",
+ " 0.470588 \n",
+ " 0.213115 \n",
+ " 0.003636 \n",
+ " 0.647059 \n",
+ " 0.131579 \n",
+ " 0.220588 \n",
+ " 0.215385 \n",
+ " 0.584416 \n",
+ " 0.492063 \n",
+ " 1.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
15408 rows × 12 columns
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "X_train_final_datframe",
+ "summary": "{\n \"name\": \"X_train_final_datframe\",\n \"rows\": 15408,\n \"fields\": [\n {\n \"column\": [\n \"Age\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.1250756424264932,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 29,\n \"samples\": [\n 0.6842105263157894,\n 0.368421052631579,\n 0.2894736842105263\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"Height\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.13511683958682044,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 49,\n \"samples\": [\n 0.4901960784313726,\n 0.019607843137254832,\n 0.0980392156862746\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"Weight\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.11630874378986696,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 57,\n \"samples\": [\n 0.42622950819672134,\n 0.2622950819672132,\n 0.278688524590164\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"Price\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.053918041617480414,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 241,\n \"samples\": [\n 0.0017454545454545455,\n 0.0023636363636363638,\n 0.46181818181818185\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"PaceTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.15643674960778606,\n \"min\": 0.0,\n \"max\": 0.9999999999999999,\n \"num_unique_values\": 69,\n \"samples\": [\n 0.7794117647058824,\n 0.7500000000000001,\n 0.13235294117647056\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"ShootingTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.18170763986167868,\n \"min\": 0.0,\n \"max\": 0.9999999999999999,\n \"num_unique_values\": 76,\n \"samples\": [\n 0.3157894736842105,\n 0.23684210526315788,\n 0.6052631578947368\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"PassingTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.144647930074823,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 67,\n \"samples\": [\n 0.8235294117647058,\n 0.5294117647058825,\n 0.4852941176470588\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"DribblingTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.14960200478003224,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 66,\n \"samples\": [\n 0.27692307692307694,\n 0.12307692307692308,\n 0.7538461538461539\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"DefendingTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.2124390247731555,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 77,\n \"samples\": [\n 0.6103896103896105,\n 0.15584415584415584,\n 0.6623376623376624\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"PhysicalityTotal\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.1529038081244891,\n \"min\": 0.0,\n \"max\": 1.0,\n \"num_unique_values\": 62,\n \"samples\": [\n 0.1111111111111111,\n 0.9523809523809523,\n 0.5079365079365079\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"AttackingWorkRate\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.5309693277316405,\n \"min\": 0.0,\n \"max\": 2.0,\n \"num_unique_values\": 3,\n \"samples\": [\n 1.0,\n 2.0,\n 0.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": [\n \"DefensiveWorkRate\"\n ],\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.5053118673079292,\n \"min\": 0.0,\n \"max\": 2.0,\n \"num_unique_values\": 3,\n \"samples\": [\n 1.0,\n 2.0,\n 0.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 27
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 6. Model Definition\n",
+ "\n",
+ ">Bagian ini berisi cell untuk mendefinisikan model. Jelaskan alasan menggunakan suatu algoritma/model, hyperparameter yang dipakai, jenis penggunaan metrics yang dipakai, dan hal lain yang terkait dengan model."
+ ],
+ "metadata": {
+ "id": "BshWmfA30T5D"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Define algorithm\n",
+ "\n",
+ "from sklearn.linear_model import LinearRegression\n",
+ "\n",
+ "model_lin_reg = LinearRegression()"
+ ],
+ "metadata": {
+ "id": "Iu5abiwA0NO_"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 7. Model Training\n",
+ "\n",
+ "> Cell pada bagian ini hanya berisi code untuk melatih model dan output yang dihasilkan. Lakukan beberapa kali proses training dengan hyperparameter yang berbeda untuk melihat hasil yang didapatkan. Analisis dan narasikan hasil ini pada bagian Model Evaluation."
+ ],
+ "metadata": {
+ "id": "A-tb4KUY0n8M"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Train the model\n",
+ "\n",
+ "model_lin_reg.fit(X_train_final, y_train)\n"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "id": "K7sYKDBB0waG",
+ "outputId": "895627e1-2928-4af4-b9c6-acf9e323942f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "LinearRegression()"
+ ],
+ "text/html": [
+ "LinearRegression() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 29
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 8. Model Evaluation\n",
+ "\n",
+ ">Pada bagian ini, dilakukan evaluasi model yang harus menunjukkan bagaimana performa model berdasarkan metrics yang dipilih. Hal ini harus dibuktikan dengan visualisasi tren performa dan/atau tingkat kesalahan model. Lakukan analisis terkait dengan hasil pada model dan tuliskan hasil analisisnya."
+ ],
+ "metadata": {
+ "id": "5bM6ohpx126V"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "y_pred_train = model_lin_reg.predict(X_train_final)\n",
+ "y_pred_test = model_lin_reg.predict(X_test_final)\n",
+ "\n",
+ "y_pred_train"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "KWPueXjF1odE",
+ "outputId": "16f5e2b2-6b54-478e-86ff-d1264127551e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([68.73327461, 67.88839569, 79.77402457, ..., 61.58921355,\n",
+ " 75.46784852, 55.39850764])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 30
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "#Model Evaluation using MAE\n",
+ "\n",
+ "from sklearn.metrics import mean_absolute_error\n",
+ "\n",
+ "print('Error - train set: ', mean_absolute_error(y_train, y_pred_train))\n",
+ "print('Error - test set: ', mean_absolute_error(y_test, y_pred_test))"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "j5FBuVyB2Y23",
+ "outputId": "b9f2ae68-25ba-45dd-b6bf-2b8859ca8e0b"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Error - train set: 2.3455452086421467\n",
+ "Error - test set: 2.341414564002488\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 9. Model Saving\n",
+ "\n",
+ "> Pada bagian ini, dilakukan proses penyimpanan model dan file-file lain yang terkait dengan hasil proses pembuatan model."
+ ],
+ "metadata": {
+ "id": "xK6aSy_E4i9M"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import pickle\n",
+ "import json\n",
+ "\n",
+ "with open('list_num_cols.txt', 'w') as file_1:\n",
+ " json.dump(num_columns, file_1)\n",
+ "\n",
+ "with open('list_cat_cols.txt', 'w') as file_2:\n",
+ " json.dump(cat_columns, file_2)\n",
+ "\n",
+ "with open('scaler.pkl', 'wb') as file_3:\n",
+ " pickle.dump(scaler, file_3)\n",
+ "\n",
+ "with open('encoder.pkl', 'wb') as file_4:\n",
+ " pickle.dump(encoder, file_4)\n",
+ "\n",
+ "with open('model_lin_reg.pkl', 'wb') as file_5:\n",
+ " pickle.dump(model_lin_reg, file_5)"
+ ],
+ "metadata": {
+ "id": "QuQk0TnK3P0h"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Kesimpulan\n",
+ "\n",
+ ">Pada bagian terakhir ini, harus berisi kesimpulan yang mencerminkan hasil yang didapat dengan objective yang sudah ditulis di bagian pengenalan.\n",
+ "\n",
+ "1. Narasi based on EDA\n",
+ "2. Narasi based on Model Evaluation and Analysis\n",
+ "3. Further Improvement\n",
+ "4. DLL"
+ ],
+ "metadata": {
+ "id": "8G4lz5_-5_4R"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "DWHji0hn56KY"
+ },
+ "execution_count": null,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file