Gradient Descent (GD) is an optimization algorithm used for minimizing the cost function in Machine Learning and Deep Learning.
An easy way to understand: Gradient Descent is one of an algorithm to training your model in Machine Learning and Deep Learning.
2. Gradient Descent Algorithm
As I said, Gradient Descent is an algorithm used for minimizing the cost function, so if we have cost function like this :
and we can define Gradient Descent like :
where j = 0 and j = 1
3. How Gradient Descent work
First, we initialization θj and assign θj into an algorithm to compute new θj and repeat until finding out θ so that cost function J(θj) is small enough. Like this :
4. Type of Gradient Descent
Batch Gradient Descent
Stochastic Gradient Descent
Mini Batch Gradient Descent
5. Tips for Gradient Descent
Chosse Learning rate : Learning rate maybe good is 0.0001~0.0003 , 0.001~0.003, 0.01~0.03 or 0.1~0.3
Debugging Gradient Descent : building graph of cost function with number of interations and check it, if cost function value increases, then you probably need to decrease learning rate
To make Gradient Descent run faster : Feature Normalization. We can speed up Gradient Desent by having each of our input values in roughly the same range.Two techniques to help with this are feature scaling and mean normalization.
some time using Gradient Descent with Momentum is better, for example let look at graph: