reading-notes

Linear Regression, Gradient Descent, Batch Gradient Descent, and Stochastic Gradient Descent

Linear Regression

Concepts

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
Assumes a linear relationship between variables.
Commonly used for prediction, forecasting, and determining causal relationships.

Basic Algorithm

Find the line of best fit (regression line) that minimizes the sum of squared errors (SSE) or mean squared error (MSE).
Regression equation: y = b0 + b1*x1 + b2*x2 + ... + bn*xn + e
b0, b1, ..., bn are coefficients, x1, x2, ..., xn are independent variables, y is the dependent variable, and e is the error term.

Use Cases

Predicting housing prices based on features like size, location, and age.
Forecasting sales based on marketing budget and economic factors.
Estimating the effect of a new policy on a target population.

Gradient Descent

Concepts

Gradient descent is an optimization algorithm used to minimize a function iteratively.
Applicable to any differentiable function.
Finds the optimal solution by updating the variables in the direction of the negative gradient.

Basic Algorithm

Initialize variables (weights) with random values.
Calculate the gradient of the function with respect to each variable.
Update the variables by subtracting a fraction (learning rate) of the gradient.
Repeat steps 2 and 3 until convergence or a maximum number of iterations.

Use Cases

Optimizing the cost function in linear regression and other machine learning models.
Minimizing error in neural networks.
Solving other optimization problems in various domains.

Batch Gradient Descent

Concepts

Batch gradient descent is a variant of gradient descent that calculates the gradient using the entire dataset.
Computationally expensive, but guarantees convergence to the global minimum for convex functions.

Basic Algorithm

Initialize variables (weights) with random values.
Calculate the gradient of the function with respect to each variable, using the entire dataset.
Update the variables by subtracting a fraction (learning rate) of the gradient.
Repeat steps 2 and 3 until convergence or a maximum number of iterations.

Use Cases

When the dataset is small enough to fit in memory.
When a precise solution is required and computational resources are not a concern.

Stochastic Gradient Descent

Concepts

Stochastic gradient descent (SGD) is a variant of gradient descent that calculates the gradient using a single data point at a time.
Faster and less memory-intensive than batch gradient descent.
More noise in the gradient updates, which can help escape local minima but may hinder convergence to the global minimum.

Basic Algorithm

Initialize variables (weights) with random values.
Randomly select a data point from the dataset.
Calculate the gradient of the function with respect to each variable, using the selected data point.
Update the variables by subtracting a fraction (learning rate) of the gradient.
Repeat steps 2-4 until convergence or a maximum number of iterations.

Use Cases

When the dataset is too large to fit in memory.
When a faster, approximate solution is acceptable.
In online learning, where the model is updated as new data becomes available.