Linear Regression, Gradient Descent, Batch Gradient Descent, and Stochastic Gradient Descent
Linear Regression
Concepts
- Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
- Assumes a linear relationship between variables.
- Commonly used for prediction, forecasting, and determining causal relationships.
Basic Algorithm
- Find the line of best fit (regression line) that minimizes the sum of squared errors (SSE) or mean squared error (MSE).
- Regression equation:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn + e
b0, b1, ..., bn
are coefficients, x1, x2, ..., xn
are independent variables, y
is the dependent variable, and e
is the error term.
Use Cases
- Predicting housing prices based on features like size, location, and age.
- Forecasting sales based on marketing budget and economic factors.
- Estimating the effect of a new policy on a target population.
Gradient Descent
Concepts
- Gradient descent is an optimization algorithm used to minimize a function iteratively.
- Applicable to any differentiable function.
- Finds the optimal solution by updating the variables in the direction of the negative gradient.
Basic Algorithm
- Initialize variables (weights) with random values.
- Calculate the gradient of the function with respect to each variable.
- Update the variables by subtracting a fraction (learning rate) of the gradient.
- Repeat steps 2 and 3 until convergence or a maximum number of iterations.
Use Cases
- Optimizing the cost function in linear regression and other machine learning models.
- Minimizing error in neural networks.
- Solving other optimization problems in various domains.
Batch Gradient Descent
Concepts
- Batch gradient descent is a variant of gradient descent that calculates the gradient using the entire dataset.
- Computationally expensive, but guarantees convergence to the global minimum for convex functions.
Basic Algorithm
- Initialize variables (weights) with random values.
- Calculate the gradient of the function with respect to each variable, using the entire dataset.
- Update the variables by subtracting a fraction (learning rate) of the gradient.
- Repeat steps 2 and 3 until convergence or a maximum number of iterations.
Use Cases
- When the dataset is small enough to fit in memory.
- When a precise solution is required and computational resources are not a concern.
Stochastic Gradient Descent
Concepts
- Stochastic gradient descent (SGD) is a variant of gradient descent that calculates the gradient using a single data point at a time.
- Faster and less memory-intensive than batch gradient descent.
- More noise in the gradient updates, which can help escape local minima but may hinder convergence to the global minimum.
Basic Algorithm
- Initialize variables (weights) with random values.
- Randomly select a data point from the dataset.
- Calculate the gradient of the function with respect to each variable, using the selected data point.
- Update the variables by subtracting a fraction (learning rate) of the gradient.
- Repeat steps 2-4 until convergence or a maximum number of iterations.
Use Cases
- When the dataset is too large to fit in memory.
- When a faster, approximate solution is acceptable.
- In online learning, where the model is updated as new data becomes available.