**Regression:**In regression analysis, one usually finds the so-called "line of best fit" by minimizing the distance, as measured vertically, between a set of data points and an arbitrary line in the plane. The reason for minimizing the*vertical*distance rather than the*actual*distance is mostly historical; before the time of computers it was simply easier to calculate the first than the second. In addition, the second method (minimizing the actual distance) gives two possible answers, the worst of which must then be discarded. Since that might require actual thought, this method has generally been shunned.

The following animation compares the two methods by rotating a set of data points about its centroid. The number "f" refers to the number of frames in the animation, while n is the number of data points. The (x[i], y[i]) are the data points. Rather than center the animation at the centroid, the data points are instead shifted so that their centroid is at the origin. The red line is the "standard" regression line obtained by minimizing the vertical distance (squared). The blue and green lines are the lines obtained by minimizing the actual distance (squared); though we have graphed both, in actual practice the non-minimal one (usually the green one) should be discarded.

What should be interesting to notice is that, although most of the time the red line is reasonably close to the actual line of best fit (either blue or green), sometimes it is not very close at all. This usually happens when the line of best fit is quite steep, demonstrating visually that standard regression analysis is especially weak when the data involved changes rapidly.

restart: with(plots): a:=0: f:=48: n:=4: # frames, no. of data points x[1]:=8.: y[1]:=0.: # data point 1 x[2]:=6.: y[2]:=1.: # data point 2 x[3]:=4.: y[3]:=0.: # x[4]:=2.: y[4]:=1.: # data point n (=4) for i from 1 to n do # shifts data in a:=a+x[i]: # x-direction to end do: unassign('i'): a:=a/n: # center at origin for i from 1 to n do # x[i]:=x[i]-a: # end do: unassign('i'): a:=0: # for i from 1 to n do # shifts data in a:=a+y[i]: # y-direction to end do: unassign('i'): a:=a/n: # center at origin for i from 1 to n do # y[i]:=y[i]-a: # end do: unassign('i'): a:=0: # for i from 1 to n do a:=`if`(evalf(a)<evalf(sqrt(x[i]^2+y[i]^2)), sqrt(x[i]^2+y[i]^2),a): end do: unassign('i'): for i from 1 to n do # rescales data to x[i]:=x[i]/a: # fit in unit disc y[i]:=y[i]/a: # end do: unassign('i'): # m[0] := k -> -(sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)-2*sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)*cos(k)^2+sin(k)*sum(y[i],i = 1 .. n)^2*cos(k)-cos(k)*sum(x[i],i = 1 .. n)^2*sin(k)-sum(x[i]*y[i],i = 1 .. n)*n+2*sum(x[i]*y[i],i = 1 .. n)*n*cos(k)^2+sin(k)*cos(k)*sum(x[i]^2,i = 1 .. n)*n-sin(k)*cos(k)*sum(y[i]^2,i = 1 .. n)*n)/(sum(y[i],i = 1 .. n)^2-sum(y[i],i = 1 .. n)^2*cos(k)^2-2*sin(k)*sum(y[i],i = 1 .. n)*cos(k)*sum(x[i],i = 1 .. n)+sum(x[i],i = 1 .. n)^2*cos(k)^2+2*cos(k)*sin(k)*sum(x[i]*y[i],i = 1 .. n)*n-sum(y[i]^2,i = 1 .. n)*n+sum(y[i]^2,i = 1 .. n)*n*cos(k)^2-sum(x[i]^2,i = 1 .. n)*n*cos(k)^2): m[1] := k -> -1/2*(-sum(x[i]^2,i = 1 .. n)*n+sum(x[i],i = 1 .. n)^2-2*sum(x[i],i = 1 .. n)^2*cos(k)^2-4*cos(k)*sin(k)*sum(x[i]*y[i],i = 1 .. n)*n+4*sin(k)*sum(x[i],i = 1 .. n)*cos(k)*sum(y[i],i = 1 .. n)+sum(y[i]^2,i = 1 .. n)*n-2*sum(y[i]^2,i = 1 .. n)*n*cos(k)^2-sum(y[i],i = 1 .. n)^2+2*sum(y[i],i = 1 .. n)^2*cos(k)^2+2*sum(x[i]^2,i = 1 .. n)*n*cos(k)^2+sqrt(-8*sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)*sum(x[i]*y[i],i = 1 .. n)*n-2*sum(x[i]^2,i = 1 .. n)*n*sum(x[i],i = 1 .. n)^2-2*sum(x[i]^2,i = 1 .. n)*n^2*sum(y[i]^2,i = 1 .. n)+2*sum(x[i]^2,i = 1 .. n)*n*sum(y[i],i = 1 .. n)^2+2*sum(x[i],i = 1 .. n)^2*sum(y[i],i = 1 .. n)^2+sum(x[i]^2,i = 1 .. n)^2*n^2+sum(y[i]^2,i = 1 .. n)^2*n^2-2*sum(y[i]^2,i = 1 .. n)*n*sum(y[i],i = 1 .. n)^2+2*sum(x[i],i = 1 .. n)^2*sum(y[i]^2,i = 1 .. n)*n+4*sum(x[i]*y[i],i = 1 .. n)^2*n^2+sum(y[i],i = 1 .. n)^4+sum(x[i],i = 1 .. n)^4))/(-sum(x[i]*y[i],i = 1 .. n)*n+2*sum(x[i]*y[i],i = 1 .. n)*n*cos(k)^2+sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)-2*sum(y[i],i = 1 .. n)*sum(x[i],i = 1 .. n)*cos(k)^2-cos(k)*sin(k)*sum(y[i]^2,i = 1 .. n)*n-cos(k)*sum(x[i],i = 1 .. n)^2*sin(k)+sin(k)*sum(y[i],i = 1 .. n)^2*cos(k)+cos(k)*sin(k)*sum(x[i]^2,i = 1 .. n)*n): m[2] := k -> 1/2*(sum(x[i]^2,i = 1 .. n)*n-sum(x[i],i = 1 .. n)^2+2*sum(x[i],i = 1 .. n)^2*cos(k)^2+4*cos(k)*sin(k)*sum(x[i]*y[i],i = 1 .. n)*n-4*sin(k)*sum(x[i],i = 1 .. n)*cos(k)*sum(y[i],i = 1 .. n)-sum(y[i]^2,i = 1 .. n)*n+2*sum(y[i]^2,i = 1 .. n)*n*cos(k)^2+sum(y[i],i = 1 .. n)^2-2*sum(y[i],i = 1 .. n)^2*cos(k)^2-2*sum(x[i]^2,i = 1 .. n)*n*cos(k)^2+sqrt(-8*sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)*sum(x[i]*y[i],i = 1 .. n)*n-2*sum(x[i]^2,i = 1 .. n)*n*sum(x[i],i = 1 .. n)^2-2*sum(x[i]^2,i = 1 .. n)*n^2*sum(y[i]^2,i = 1 .. n)+2*sum(x[i]^2,i = 1 .. n)*n*sum(y[i],i = 1 .. n)^2+2*sum(x[i],i = 1 .. n)^2*sum(y[i],i = 1 .. n)^2+sum(x[i]^2,i = 1 .. n)^2*n^2+sum(y[i]^2,i = 1 .. n)^2*n^2-2*sum(y[i]^2,i = 1 .. n)*n*sum(y[i],i = 1 .. n)^2+2*sum(x[i],i = 1 .. n)^2*sum(y[i]^2,i = 1 .. n)*n+4*sum(x[i]*y[i],i = 1 .. n)^2*n^2+sum(y[i],i = 1 .. n)^4+sum(x[i],i = 1 .. n)^4))/(-sum(x[i]*y[i],i = 1 .. n)*n+2*sum(x[i]*y[i],i = 1 .. n)*n*cos(k)^2+sum(x[i],i = 1 .. n)*sum(y[i],i = 1 .. n)-2*sum(y[i],i = 1 .. n)*sum(x[i],i = 1 .. n)*cos(k)^2-cos(k)*sin(k)*sum(y[i]^2,i = 1 .. n)*n-cos(k)*sum(x[i],i = 1 .. n)^2*sin(k)+sin(k)*sum(y[i],i = 1 .. n)^2*cos(k)+cos(k)*sin(k)*sum(x[i]^2,i = 1 .. n)*n): an0:=animate(m[0](k)*x,x=-1.1..1.1,k=Pi/f..Pi*(2*f-1)/f, frames=f,color=red,thickness=2): an1:=animate(m[1](k)*x,x=-1.1..1.1,k=Pi/f..Pi*(2*f-1)/f, frames=f,color=green,thickness=2): an2:=animate(m[2](k)*x,x=-1.1..1.1,k=Pi/f..Pi*(2*f-1)/f, frames=f,color=blue,thickness=2): for i from 1 to n do an3:=animate([x[i]*cos(k)-y[i]*sin(k)+.01*cos(t), x[i]*sin(k)+y[i]*cos(k)+.01*sin(t),t=0..2*Pi], k=Pi/f..Pi*(2*f-1)/f,frames=f,color=black, thickness=2): an2:=display(an2,an3): end do: display(an2,an0,an1,view=[-1.1..1.1,-1.1..1.1],axes=none, scaling=constrained);