Suppose the data we have is and we fit it to the linear regression model: . Assume that the i-th observation lie exactly on the fitted line. Would change if we delete and perform linear regression on the rest observations?
Initially, I thought would not change due to the following reasoning.
Let denote the matrix with i-th row deleted and let denote the vector with i-th component deleted. Suppose that is the ordinary least square estimator of based on the original data set. Then we have:
where the last equality follows from the fact that lie on the fitted line.
Then I made a wrong assertion:
not only minimizes , but also minimizes . Indeed, the fact that minimizes both and does not imply minimizes .
In general, would change. Consider the data set given by . Then the regression line is simply . (There are multiple ways to check this. A direct one is the geometric approach.) Now, lies on this line. However, its removal will change the regression line dramatically.