0
The Median Isn’t the Message - Stephen Jay Gould
When we think of regression, the most common one, which we all know, is linear regression. It is a fairly popular and simple technique for
estimating the mean of some variable conditional on the values of independent variables.

Now imagine if you are a grocery delivery or ride-hailing service and want to show the customer the estimated delivery
or wait times. If the distance is smaller, there will be less variability in the waiting time, but if the distance is
longer, many things can go wrong, and due to that there can be a lot of variability in the estimate time. If we have to
create a model to predict that, we may not want to apply linear regression as that will only tell us the average time.
It’s important to note that one of the key assumptions for applying linear regression is a constant variance
(Homoskedasticity). However, many times this is often not the case. The variability is not constant (Heteroscedastic),
which violates the linear regression assumption (Linear Regression Notes).
Motivation
Let’s look at a running data for the distance vs. the time it takes to finish. We clearly know Continue reading