- Published on
Understanding Logistic Regression
- Authors
- Name
- Tat Tran
- @tattran22
Hello, passionate learners! After our deep dive into linear regression, it's time to unravel another critical machine learning algorithm: Logistic Regression. While it has "regression" in its name, it's primarily used for classification tasks. Let's explore!
Introduction: What is Logistic Regression?
Logistic Regression is a supervised learning algorithm used for binary classification problems, i.e., when the output can be of two classes, like "Yes" or "No", "1" or "0".
How is it different from Linear Regression?
Linear regression is used for predicting continuous values, whereas logistic regression predicts the probability that a given instance belongs to a particular category.
The Sigmoid Function
Logistic Regression utilizes the Sigmoid function, which maps any input into a value between 0 and 1.
Where is the base of natural logarithms, and is the input to the function (a linear combination of weights and input features).
Mathematics of Logistic Regression
Given ,
The predicted probability is:
To classify this probability into one of the two classes, a threshold (commonly 0.5) is applied.
Estimating the Coefficients
The process of training in logistic regression involves adjusting the coefficients to maximize the likelihood of observing the given data. The method used is called Maximum Likelihood Estimation (MLE).
Cost Function & Gradient Descent
In logistic regression, the cost function used is the log loss:
To find the coefficients, we aim to minimize this cost function. Gradient Descent, as in linear regression, can be employed here.
Assumptions of Logistic Regression
- Binary Outcome: Response variable is binary.
- Linearity: Each predictor's log odds are a linear combination of its values.
- No Multicollinearity: Little to no multicollinearity among predictors.
- Independence: Observations are independent.
Applications of Logistic Regression
Logistic regression is widely used in fields like medicine (disease diagnosis), finance (credit approval), marketing (customer churn prediction), and many more.
Limitations
- Assumes linearity of independent variables and log odds.
- Requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square.
Wrapping Up
Logistic regression is a powerful tool for binary classification tasks. While it has its set of assumptions and limitations, when used wisely, it can be incredibly effective.
Happy learning and exploring!