mini batch gradient descent with python |
Gradient descent is a popular optimization algorithm used in machine learning to minimize the cost function of a model. However, when dealing with large datasets, batch gradient descent may take a long time to converge, as it processes all the training examples in each iteration. Stochastic gradient descent, on the other hand, processes each training example separately, making it faster but more unstable. Mini-batch gradient descent is a compromise between these two algorithms, as it processes a small subset of the training examples in each iteration, providing a balance between speed and stability. In this blog, we will discuss mini-batch gradient descent in detail and provide an example of implementing it in Python.
What is Mini-Batch Gradient Descent?
Mini-batch gradient descent is a variant of gradient descent that uses a small subset of the training examples in each iteration to update the parameters of the model. This subset is called a mini-batch and typically ranges from 32 to 512 examples. The algorithm iterates over the entire dataset multiple times, with each iteration called an epoch. In each epoch, the algorithm randomly shuffles the training examples and divides them into mini-batches, then updates the parameters using the gradient of the cost function computed on each mini-batch.
Mini-batch gradient descent is a compromise between batch gradient descent and stochastic gradient descent. In batch gradient descent, the algorithm updates the parameters after processing all the training examples in each iteration, which makes it computationally expensive and may take a long time to converge. In stochastic gradient descent, the algorithm updates the parameters after processing each training example separately, which makes it faster but more unstable. Mini-batch gradient descent updates the parameters after processing a small subset of the training examples, providing a balance between speed and stability.
Advantages of Mini-Batch Gradient Descent
Mini-batch gradient descent has several advantages over other optimization algorithms:
Computational Efficiency
Mini-batch gradient descent is computationally efficient, as it processes only a small subset of the training examples in each iteration, which makes it faster than batch gradient descent.
Stability
Mini-batch gradient descent is more stable than stochastic gradient descent, as it processes a small subset of the training examples in each iteration, which reduces the variance of the gradient and helps the algorithm to converge more smoothly.
Flexibility
Mini-batch gradient descent is more flexible than batch gradient descent, as it allows us to adjust the size of the mini-batch and the learning rate to find the optimal balance between convergence speed and stability.
Implementing Mini-Batch Gradient Descent in Python
In this section, we will provide an example of implementing mini-batch gradient descent in Python using the scikit-learn library. We will use the Boston Housing dataset, which contains information about the median value of owner-occupied homes in various Boston neighborhoods. Our goal is to predict the median value of owner-occupied homes based on 13 features such as crime rate, average number of rooms per dwelling, and others.
Step 1: Loading the Dataset
We first load the Boston Housing dataset using the load_boston() function from the scikit-learn library. We then split the dataset into training and testing sets using the train_test_split() function from the same library.
Step 2: Scaling the Data
We then scale the training and testing data using the StandardScaler() function from the scikit-learn library. Scaling the data helps to normalize the features and improve the convergence of the algorithm.
Step 3: Defining the Mini-Batch Gradient Descent Algorithm
We define the mini-batch gradient descent algorithm as a Python function that takes the training data, the learning rate, the mini-batch size, and the number of epochs as input. The algorithm iterates over the entire training set multiple times and updates the parameters using the gradient of the cost function computed on each mini-batch.
Step 4: Training and Evaluating the Model
We use the mini_batch_gradient_descent() function to train the model on the training data and evaluate its performance on the testing data using the mean squared error (MSE) metric.
Step 5: Tuning the Hyperparameters
We can tune the hyperparameters of the mini-batch gradient descent algorithm to find the optimal balance between convergence speed and stability. The most important hyperparameters are the learning rate and the mini-batch size. A high learning rate may cause the algorithm to oscillate around the optimal solution, while a low learning rate may cause the algorithm to converge slowly. A small mini-batch size may reduce the variance of the gradient and help the algorithm to converge more smoothly, while a large mini-batch size may improve the computational efficiency of the algorithm.
We can use the hyperparameters that give the lowest mean squared error on the testing data to train the final model.
Conclusion
Mini-batch gradient descent is a popular optimization algorithm for training machine learning models on large datasets. It is faster than batch gradient descent and more stable than stochastic gradient descent. In this article, we discussed the principles of mini-batch gradient descent and provided a Python implementation of the algorithm. We also showed how to train and evaluate a linear regression model using mini-batch gradient descent with Python and scikit-learn. Finally, we discussed how to tune the hyperparameters of the algorithm to find the optimal balance between convergence speed and stability.