{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"jupyter.ipynb","provenance":[],"collapsed_sections":[]},"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"LrsOlHwGZSXt","colab_type":"text"},"source":["$$\\Huge{\\textit{Formation Multi-Perceptron}}$$"]},{"cell_type":"code","metadata":{"id":"lIJpM8NjdQT8","colab_type":"code","colab":{}},"source":["#@title Imports et fonctions d'affichages { run: \"auto\", display-mode: \"form\" }\n","import numpy as np\n","import random\n","import matplotlib.pyplot as plt\n","\n","from matplotlib import animation, rc\n","from IPython.display import HTML\n","\n","\n","def generate_classifier_data_set(poles, random_points=100, spread=0.25, seed=299792458):\n"," \"\"\"\n"," Generates a data set for a simple classifier\n"," :param poles: the position of the poles of interest\n"," :param random_points: the number of points per poles\n"," :param spread: the size of the spread around the poles\n"," :param seed: the seed used by the random number generator\n"," :return: the input and output matrices\n"," \"\"\"\n","\n"," np.random.seed(seed)\n","\n"," x = np.zeros((len(poles) * random_points, 2))\n"," y = np.zeros((len(poles) * random_points))\n","\n"," for k in range(len(poles)):\n"," pole = poles[k]\n","\n"," for i in range(random_points):\n"," radius = np.random.uniform(0, spread)\n"," angle = np.random.uniform(0, 2 * np.pi)\n","\n"," x[k * random_points + i, 0] = pole[0] + radius * np.cos(angle)\n"," x[k * random_points + i, 1] = pole[1] + radius * np.sin(angle)\n"," y[k * random_points + i] = k\n","\n"," return np.transpose(x), y\n","\n","\n","def generate_regression_data_set(f, low=0.0, high=1.0, random_points=100, spread=0.5, seed=299792458):\n"," \"\"\"\n"," Generates random points around the curve of the function f\n"," :param f: the approximated function\n"," :param low: the starting point of the interval\n"," :param high: the ending point of the interval\n"," :param random_points: the number of points\n"," :param spread: the size of the spread around the curve\n"," :param seed: the seed used by the random number generator\n"," :return: the input and output matrices\n"," \"\"\"\n","\n"," np.random.seed(seed)\n","\n"," x = np.zeros(random_points)\n"," y = np.zeros(random_points)\n","\n"," for k in range(random_points):\n"," x[k] = np.random.normal((low+high)/2, (high-low)/2)\n"," y[k] = np.random.normal(f(x[k]), spread)\n","\n"," return x, y\n","\n","\n","def plot_classifier(x, y):\n"," \"\"\"\n"," Plots the classifier data set as a scatter plot\n"," :param x: the input matrix\n"," :param y: the output matrix\n"," \"\"\"\n","\n"," for pole in range(y.shape[0]):\n"," point_x = []\n"," point_y = []\n","\n"," for sample in y.shape[1]:\n"," if y[sample] == pole:\n"," point_x.append(x[0, sample])\n"," point_y.append(x[1, sample])\n","\n"," plt.scatter(point_x, point_y, label=\"pole \" + str(pole))\n","\n"," plt.legend()\n"," plt.show()\n","\n"," \n","def plot_classifier_with_approx(X, Y, approx, start=(-1, -1), stop=(1, 1), n=100):\n"," \"\"\"\n"," Plots the classifier data set with the regions predicted by the approximation function\n"," :param x: the input matrix\n"," :param y: the output matrix\n"," :param approx: the approximation function used to classify each point of the plane\n"," :param start: the staring point\n"," :param stop: the ending point\n"," :param n: the number of points per axis\n"," \"\"\"\n","\n"," x = np.linspace(start[0], stop[0], n)\n"," y = np.linspace(start[1], stop[1], n)\n"," z = np.zeros((n, n))\n","\n"," for i in range(n):\n"," for j in range(n):\n"," z[j, i] = approx((x[i], y[j])) > 0.5\n","\n"," plt.pcolormesh(x, y, z)\n","\n"," point_x = []\n"," point_y = []\n","\n"," for sample in range(Y.shape[0]):\n"," if Y[sample] == 1:\n"," point_x.append(X[0, sample])\n"," point_y.append(X[1, sample])\n","\n"," plt.scatter(point_x, point_y, c=\"black\", edgecolors=\"white\")\n"," \n"," \n"," point_x = []\n"," point_y = []\n","\n"," for sample in range(Y.shape[0]):\n"," if Y[sample] == 0:\n"," point_x.append(X[0, sample])\n"," point_y.append(X[1, sample])\n","\n"," plt.scatter(point_x, point_y, c=\"white\", edgecolors=\"black\")\n"," \n"," \n"," \n"," plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"TTjAT09zZSXv","colab_type":"text"},"source":["#Etape 1 : Le neurone"]},{"cell_type":"markdown","metadata":{"id":"xo55TKBLZSXx","colab_type":"text"},"source":["Dans cette section nous allons tout simplement effectuer une régression affine, ce qui correspond au travail d'un neurone avec du biais.\n","\n","$$h_{b,w}(x^{(i)}) = b + wx^{(i)} $$\n","\n","N'ayant aucune idée de la valeur de $b$ et $w$, il vaut mieux les initialiser au hasard"]},{"cell_type":"markdown","metadata":{"id":"JWDthcrNZSX5","colab_type":"text"},"source":["## Fonction de coût\n","\n","Il faut corriger les poids en fonctions des exemples, pour cela, on quantifie l'erreur par une fonction de coût : l'erreur quadratique !\n","\n","Ce n'est pas du tout la seule façon de faire (on peut aussi penser à l'erreur absolue) et la plupart des problèmes commence par cette question : que veut-on optimiser ? (ici minimiser l'erreur quadratique)\n","\n","Voici l'erreur quadratique sur un exemple (le facteur 1/2 simplifie les calculs pour plus tard) :\n","\n","$$J(w, b) = \\frac{1}{2} (h_{w,b}(x^{(i)}) - y^{(i)})^2$$\n","\n","Ainsi, en moyennant sur les $m$ exemple d'entraînement on obtient au total:\n","\n","$$J(w, b) = \\frac{1}{2m} \\sum_{i=1}^m (h_{w,b}(x^{(i)}) - y^{(i)})^2$$\n","\n","Pourquoi moyenner plutôt que sommer ? Pour que les nombres n'explosent pas avec beaucoup d'exemples ! Et puis on retombe ainsi dans le cadre très étudié de l'erreur quadratique moyenne.\n","\n","##La descente de gradient\n","\n","Puis, on va utiliser cette fonction de coût pour corriger les poids grâce à une fameuse méthode d'optimisation, $\\textbf{la descente de gradient}$ :\n","\n","Pour chaque exemple, on va corriger les poids en descendant la pente ou le gradient :\n","\n","$$ w := w - \\frac{\\partial J}{\\partial w}(w, b)$$\n","$$ b := b - \\frac{\\partial J}{\\partial b}(w, b)$$\n","\n","Et ce en moyennant sur $k$ exemples parmis les $m$ exemples d'entraînement d'où, au total:\n","\n","$$ w := w - \\frac{1}{k}\\sum_{i=1}^k \\frac{\\partial J}{\\partial w}(w, b)$$\n","$$ b := b - \\frac{1}{k}\\sum_{i=1}^k \\frac{\\partial J}{\\partial b}(w, b)$$\n","\n","C'est ce que l'on appel faire des $\\textbf{batchs}$, ce qui permet d'éviter des pas inutiles.\n","Ainsi, la taille $k$ des batchs constitue notre 1er hyperparamètre !\n","\n","Puis, en remplaçant J par son expression, on obtient:\n","\n","$$ w := w - \\frac{1}{k} \\sum_{i=1}^k (h_{w, b}(x^{(i)}) - y^{(i)})\\frac{\\partial h_{w, b}}{\\partial w}(w, b)$$\n","$$ b := b - \\frac{1}{k} \\sum_{i=1}^k (h_{w, b}(x^{(i)}) - y^{(i)})\\frac{\\partial h_{w, b}}{\\partial b}(w, b)$$\n","\n","Et donc dans nôtre problème affine :\n","\n","$$ w := w - \\frac{1}{k} \\sum_{i=1}^k (h_{w, b}(x^{(i)}) - y^{(i)})x$$\n","$$ b := b - \\frac{1}{k} \\sum_{i=1}^k (h_{w, b}(x^{(i)}) - y^{(i)})$$\n","\n","Beaucoup de blabla mais au final, il ne reste qu'a implémenter la dernière formule !"]},{"cell_type":"code","metadata":{"id":"VIKVxD8cZSX6","colab_type":"code","cellView":"form","outputId":"5b4397ad-451c-4df0-b1db-9b2b7151d576","executionInfo":{"status":"ok","timestamp":1570200087506,"user_tz":-120,"elapsed":3384,"user":{"displayName":"Mathïs Fédérico","photoUrl":"","userId":"00650757110992636682"}},"colab":{"base_uri":"https://localhost:8080/","height":35}},"source":["#@title Implémentation descente one by one\n","\n","# Implémentation de la régression affine et affichage dynamique avec batchs\n","\n","\n","# Définition de la fonction qui servira a générer notre set de données\n","def f(x):\n"," \"\"\"\n"," Fonction affine que l'on va chercher a retrouver par descente de gradient\n"," :param x: le point d'evaluation de la fonction\n"," :return: la valeur de la fonction prise en x\n"," \"\"\"\n","\n"," return 2 * x - 1\n","\n","\n","# On génère notre data set qui sera composé d'une centaine de points eparpillés autour de la courbe de la fonction f\n","X, Y = generate_regression_data_set(f, 0, 1, 100, 0.3)\n","\n","# Valeur aléatoires pour le poids et le bias.\n","weight = np.random.uniform(-1, 1)\n","bias = np.random.uniform(-1, 1)\n","\n","parameters = [[weight, bias]]\n","costs = []\n","\n","# Implémentation de la régression affine sans batch\n","for k in range(X.shape[0]):\n"," x = X[k]\n"," y = Y[k]\n"," \n"," h = weight * x + bias\n","\n"," cost = 0\n"," \n"," for i in range(X.shape[0]):\n"," cost += (weight * X[i] + bias - Y[i]) ** 2\n"," \n"," costs.append(np.log(cost / (2 * X.shape[0])))\n"," \n"," cost_derivative = h - y\n","\n"," weight = weight - cost_derivative * x\n"," bias = bias - cost_derivative\n"," \n"," parameters.append([weight, bias])\n"," \n","print(\"Sans batch, fonction obtenue: f(x)=\", weight, \"* x +\", bias)"],"execution_count":2,"outputs":[{"output_type":"stream","text":["Sans batch, fonction obtenue: f(x)= 2.181758006827202 * x + -1.0021571493779027\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"e7zJPyiymwLp","colab_type":"code","cellView":"form","outputId":"0cd75095-cc28-43db-a89b-92b7d3b0831a","executionInfo":{"status":"ok","timestamp":1570200105161,"user_tz":-120,"elapsed":19031,"user":{"displayName":"Mathïs Fédérico","photoUrl":"","userId":"00650757110992636682"}},"colab":{"base_uri":"https://localhost:8080/","height":380,"output_embedded_package_id":"1UexdPjRASxTmrpqXdAuv37WMVDbFD4V1"}},"source":["#@title Animation descente de gradient one by one\n","\n","# First set up the figure, the axes, and the plot element\n","fig = plt.figure(1)\n","ax1 = fig.add_subplot(121)\n","ax2 = fig.add_subplot(122)\n","\n","plt.close()\n","\n","ax1.set_xlim((-0.5, 2))\n","ax1.set_ylim((-2, 2))\n","data, = ax1.plot([], [], linestyle='', marker='o', color='b')\n","used_data, = ax1.plot([], [], linestyle='', marker='o', color='r', markersize=15)\n","prediction, = ax1.plot([], [], color='g', marker='', lw=2, label='prediction')\n","ax1.set_title(\"Modèle\")\n","ax1.set_xlabel(\"X\")\n","ax1.set_ylabel(\"Y\")\n","\n","ax2.set_xlim((0, len(costs)))\n","ax2.set_ylim((-5, 3))\n","curve, = ax2.plot([], [], color='b')\n","cost_point, = ax2.plot([], [], linestyle='', marker='o', color='r', markersize=5)\n","ax2.set_title(\"Evolution du coût\")\n","ax2.set_xlabel(\"Epoch\")\n","ax2.set_ylabel(\"Fonction de coût (log)\")\n","\n","# initialization function: plot the background of each frame\n","def init():\n"," data.set_data(X[:], Y[:])\n"," curve.set_data(range(len(costs)), costs)\n"," return (data, curve,)\n"," \n","# animation function: this is called sequentially\n","def animate(i):\n"," x = np.linspace(-0.5,2,100)\n"," w = parameters[i][0]\n"," b = parameters[i][1]\n"," given = w*x + b\n"," \n"," prediction.set_data(x, given)\n"," used_data.set_data([X[i]], [Y[i]])\n"," cost_point.set_data([i], [costs[i]])\n"," return (prediction, used_data,)\n","\n","anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(parameters)-2, interval=500, blit=True)\n","HTML(anim.to_jshtml())\n"],"execution_count":3,"outputs":[{"output_type":"display_data","data":{"text/plain":"Output hidden; open in https://colab.research.google.com to view."},"metadata":{}}]},{"cell_type":"code","metadata":{"id":"KT4e5kFMtdBy","colab_type":"code","cellView":"form","outputId":"aa84dfc4-fb4d-4fb2-cb9d-5abbae175195","executionInfo":{"status":"ok","timestamp":1570200159242,"user_tz":-120,"elapsed":879,"user":{"displayName":"Mathïs Fédérico","photoUrl":"","userId":"00650757110992636682"}},"colab":{"base_uri":"https://localhost:8080/","height":35}},"source":["#@title Implementation descente de gradient stochastique\n","# On refait mais en prenant des batchs cette fois ci\n","weight = np.random.uniform(-1, 1)\n","bias = np.random.uniform(-1, 1)\n","\n","parameters = [[weight, bias]]\n","costs = []\n","\n","# On règle la taille des batchs\n","batch_size = 20\n","\n","# On règle le nombe de fois que l'on veut passer un batch\n","epochs = 50\n","\n","# Implémentation de la régression affine avec batchs\n","batches = []\n","for k in range(epochs):\n"," sum_weight = 0\n"," sum_bias = 0\n"," batch = []\n"," \n"," cost = 0\n"," \n"," for i in range(batch_size):\n"," k = np.random.randint(0, X.shape[0])\n"," \n"," x = X[k]\n"," y = Y[k]\n"," \n"," h = weight * x + bias\n"," \n"," cost_derivative = h - y\n","\n"," sum_weight += cost_derivative * x\n"," sum_bias += cost_derivative\n"," \n"," batch.append([x, y])\n","\n"," \n"," cost = 0\n"," \n"," for i in range(X.shape[0]):\n"," cost += (weight * X[i] + bias - Y[i]) ** 2\n"," costs.append(np.log(cost / (2 * X.shape[0])))\n"," \n"," #costs.append(np.log(cost / (2 * batch_size)))\n"," batches.append(batch)\n"," weight -= sum_weight / batch_size\n"," bias -= sum_bias / batch_size\n"," \n"," parameters.append([weight, bias])\n","\n","batches = np.array(batches)\n","print(\"Avec batch, fonction obtenue: f(x)=\", weight, \"* x +\", bias)\n"],"execution_count":4,"outputs":[{"output_type":"stream","text":["Avec batch, fonction obtenue: f(x)= 2.032851994076927 * x + -1.1306217178202012\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"H8MZtm_WukWT","colab_type":"code","cellView":"form","outputId":"e6ba798a-9f04-44c9-9bb5-e71db4d9983a","executionInfo":{"status":"ok","timestamp":1570200188313,"user_tz":-120,"elapsed":9459,"user":{"displayName":"Mathïs Fédérico","photoUrl":"","userId":"00650757110992636682"}},"colab":{"base_uri":"https://localhost:8080/","height":380}},"source":["#@title Animation descente de gradient stochastique\n","\n","# First set up the figure, the axes, and the plot element\n","fig = plt.figure(1)\n","ax1 = fig.add_subplot(121)\n","ax2 = fig.add_subplot(122)\n","\n","plt.close()\n","\n","ax1.set_xlim((-0.5, 2))\n","ax1.set_ylim((-2, 2))\n","data, = ax1.plot([], [], linestyle='', marker='o', color='b')\n","used_data, = ax1.plot([], [], linestyle='', marker='o', color='r', markersize=5)\n","prediction, = ax1.plot([], [], color='g', marker='', lw=2, label='prediction')\n","ax1.set_title(\"Modèle\")\n","ax1.set_xlabel(\"X\")\n","ax1.set_ylabel(\"Y\")\n","\n","ax2.set_xlim((0, len(costs)))\n","ax2.set_ylim((-5, 3))\n","curve, = ax2.plot([], [], color='b')\n","cost_point, = ax2.plot([], [], linestyle='', marker='o', color='r', markersize=5)\n","ax2.set_title(\"Evolution du coût\")\n","ax2.set_xlabel(\"Epoch\")\n","ax2.set_ylabel(\"Fonction de coût (log)\")\n","\n","# initialization function: plot the background of each frame\n","def init():\n"," data.set_data(X[:], Y[:])\n"," curve.set_data(range(len(costs)), costs)\n"," return (data, curve,)\n"," \n","# animation function: this is called sequentially\n","def animate(i):\n"," x = np.linspace(-0.5,2,100)\n"," w = parameters[i][0]\n"," b = parameters[i][1]\n"," given = w*x + b\n"," \n"," prediction.set_data(x, given)\n"," used_data.set_data(batches[i,:,0], batches[i,:,1])\n"," cost_point.set_data([i], [costs[i]])\n"," return (prediction, used_data,)\n","\n","anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(parameters)-2, interval=500, blit=True)\n","HTML(anim.to_jshtml())"],"execution_count":6,"outputs":[{"output_type":"execute_result","data":{"text/html":["\n","\n","\n","\n","