Unit 6: Matrix Algebra
Chapter 34: Matrix Factorizations
Section 34.6: minimum-length least-squares solutions, and the pseudoinverse
Copyright
Copyright * 2001 by Addison Wesley Longman, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.
>
Initializations
> restart;
>
with(linalg):
with(plots):
with(plottools):
read(`C:/Program Files/Maple 6.01/pvac.txt`):
Warning, the protected names norm and trace have been redefined and unprotected
Warning, the name changecoords has been redefined
> alias(c[1]=_t[1],c[2]=_t[2]);
> f := rand(-9..9):
>
Least Squares Problem: Formulation
In this section we again solve a least squares problem as an example of an over-determined system. However, we will look at the general case where the rank of the coefficient matrix is less than maximal. Since the system is over-determined, there are more rows than columns in the system matrix, that is,
. If the rank of the system matrix is
, the columns of the matrix are not independent. (In Section 33.6 we solved the least-squares problem for a system whose matrix was of maximal rank, that is, whose columns were linearly independent.)
The system to be solved is A x = y , where A , the rank-2 matrix used in Example 34.7 in Section 34.5, is listed in Table 34.4, along with the vector y , and rref([ A , y ]), the reduced row-echelon form of [ A , y ].
,
y
=
, rref([
A
,
y
]) =
__________________________________________________________________________________________
Table 34.4
To re-construct A , execute the following Maple commands
>
_seed := 125:
a1 := randmatrix(5,2,entries=f):
a2 := randmatrix(2,4,entries=f):
A := evalm(a1 &* a2);
>
To generate the vector y , execute the following Maple commands.
>
_seed := 718377433241:
y := randvector(5,entries=f);
>
The matrix A indeed has rank 2, as we saw in Section 34.5.
> rank(A);
>
Least Squares Problem: Solution
The third row of rref([
A
,
y
]) given in Table 34.4 indicates an inconsistent equation
, so we know there is no solution to the over-determined system of linear
A
x
=
y
. Hence, we seek a least squares "best fit" solution.
The normal equations are given by
x
=
y
, where
,
b
=
y
, and the solution
x
=
X
, are
,
b
=
,
X
=
The least-squares solution
X
contains two free parameters because
, like
A
, has rank 2. There are only two independent equations in the set of four normal equations. Note, however, that the four normal equations are now consistent, but under-determined. There is a solution, but not a unique solution.
We now write the solution as
X
=
v
+
v
+
v
where
v
, are given by
v
=
,
v
=
,
v
=
The vectors
v
and
v
are in the null space of
A
because
A
v
=
0
,
.
The vector
v
is
not
orthogonal to
v
and
v
because
v
.
v
=
and
v
.
v
=
Thus,
v
is not orthogonal to the null space, so
v
is not in the row space of
A
. The vector
v
has a component in the row space of
A
, but it also has a component in the null space of
A
.
Before we move on to the next conceptual segment of our discussion, we pause to implement these calculations in Maple.
There are several ways to approach solving the linear system A x = y . For example, Maple's linsolve command can be used. Here, the outcome is
> linsolve(A,y);
>
The "null" response suggests there is no solution to the set of equations. We can verify that the equations are inconsistent by row-reducing the augmented matrix [ A , y ] to its reduced row-echelon form.
> rref(augment(A,y));
>
The third row indicates an inconsistent equation 0 = 1, so we know there is no solution to this over-determined system of linear equations. That is why we will seek a least squares "best fit" solution.
First, we try creating and solving the normal equations
A
A
x
=
A
y
.
Since
A
is 5
x
4,
A
A
is the 4
x
4 matrix
> ATA := evalm(transpose(A) &*A);
>
and the vector
b
=
A
y
, of dimension (4
x
5)(5
x
1) = 4
x
1, is
> b := evalm(transpose(A) &* y);
>
Hence, the normal equations form a "square" system for which linsolve yields the solution
> X := linsolve(ATA,b);
>
The solution of the normal equations contains two free parameters. The system is 4
x
4, and like
A
, the rank of
A
A
is also
of rank two, as seen by
> rank(ATA);
>
That is why there are two free parameters. There are only two independent equations in the set of four normal equations. Note, however, that the four normal equations are now consistent, but under-determined. There is a solution, but not a unique solution.
Next, we split the solution
X
into a vector
v
, and vectors
v
and
v
in the null space of
A
. The splitting is accomplished by first setting
= 0, thereby suppressing the part of the solution lying in the null space of
A
. Consequently, what's then left must be
v
.
> v[1] := subs(c[1]=0,c[2]=0,op(X));
>
Then, after subtracting out
v
from
X
, obtaining
> vv := evalm(X-v[1]);
>
we alternatively set either
or
to zero in the difference to extract
v
and
v
, the individual vectors spanning the null space of
A
.
>
v[2] := subs(c[1]=1,c[2]=0,op(vv));
v[3] := subs(c[1]=0,c[2]=1,op(vv));
>
To demonstrate that
v
and
v
are in the null space of
A
, show
A
v
=
0
and
A
v
=
0
. Hence,
> print(evalm(A &* v[2]),evalm(A &* v[3]));
>
Next, show that the same least-squares solution is generated by applying Maple's leastsqrs command to the original system A x = y .
>
Z := leastsqrs(A,y):
print(Z,X);
>
Observe, if you will, that the vector
v
is
not
orthogonal to
v
and
v
. Thus,
v
is not orthogonal to the null space, so
v
is
not
in the row space of
A
. The vector
v
has a component in the row space of
A
, but it also has a component in the null space of
A
.
>
dotprod(v[1],v[2]);
dotprod(v[1],v[3]);
>
Minimum Length Solution
When A has less than maximal rank, the solution to the least-squares problem is not unique. One way to select a unique solution from the resulting family of least-squares solutions is to pick the least-squares solution of minimum length. Therefore, minimize
|| X ||
=
that is,
> q := simplify(norm(X,2),symbolic)^2;
>
with respect to the two parameters
and
. The usual techniques of multivariable calculus, namely, equating partial derivatives to zero and solving simultaneously, yield
,
, and
X
=
that is, using the calculus, we obtain
> qq := solve({diff(q,c[1]),diff(q,c[2])},{c[1],c[2]});
>
and substituting the optimizing values of
and
X
to determine the least-squares solution of minimal length. Texts typically call this vector
X
which, in Maple, we will call
> Xplus := subs(qq,op(X));
>
Adjusting the component of
X
in the null space of
A
to minimize the length of
X
is equivalent to projecting
X
onto the row space of
A
. But the only part of
X
having a component in the row space of
A
is
v
. Hence, the minimization process we just went through is nothing more than finding the projection of
v
onto the row space of
A
. We can demonstrate this claim by forming
P
', a matrix which projects onto the row space of
A
. (Distinguish between
P
' and
P
, the matrix which projects vectors in
R
=
R
onto the column space of
A
.) If the columns of the matrix
C
are a basis for the row space of
A
, then
,
P
' =
C
(
C
C
)
C
=
and
P
'
X
=
P
'
v
=
X
. Figure 34.3 shows the relationships between the vectors and the four fundamental subspaces.
>
g1 := plot([[-1,4],[1,4],[1,2],[-1,2],[-1,4]],color=black):
g2 := plot([[1,2],[2.5,2],[2.5,0],[1,0],[1,2]],color=black):
g3 := plot([[3,4],[4.5,4],[4.5,2],[3,2],[3,4]],color=black):
g4 := plot([[4.5,2],[6,2],[6,0],[4.5,0],[4.5,2]],color=black):
g5 := arrow([1,3],[3,3],.05,.2,.1,color=black):
g6 := arrow([2.5,.5],[4.5,2],.05,.2,.1,color=black):
g7 := textplot({[0,4.2,`R`],[3.75,4.2,`R`]},font=[TIMES,ROMAN,12]):
g8 := textplot({[.15,4.4,`4`],[3.9,4.4,`5`]},font=[TIMES,ROMAN,8]):
g9 := textplot({[.7,3.8,`Row space of A`],[.65,3.5,`dimension = 2`],[4,3.8,`Column`],[4.2,3.6,`space of A`],[4.2,3.3,`dimension`],[4,3.05,`= 2 `],[2.25,.5,`Null space`],[1.9,.2,`of A`],[2.5,-.2,`dimension = 2`]},font=[TIMES,ROMAN,8], align=LEFT):
g10 := textplot({[4.8,.6,`Null space`],[5,.2,`of A`],[5.4,.3,`T`],[4.58,-.2,`dimension = 3`]},font=[TIMES,ROMAN,8],align=RIGHT):
g11 := arrow([1,2],[-.6,2.6],.05,.2,.1,color=yellow):
g12 := arrow([1,2],[1.3,1],.05,.2,.1,color=red):
g13 := arrow([1,2],[1.7,1.4],.05,.2,.1,color=red):
g14 := arrow([1,2],[2,2.5],.05,.2,.1,color=green):
g15 := arrow([1,2],[1.6,3.7],.05,.2,.1,color=black):
g16 := arrow([4.5,2],[3.3,2.6],.05,.2,.1,color=magenta):
g17 := arrow([4.5,2],[4.9,3.7],.05,.2,.1,color=blue):
g18 := arrow([4.5,2],[4.8,1],.05,.2,.1,color=orange):
g19 := textplot({[-.4,2.9,`X`],[1.85,3.6,`X`], [2.1,2.5,`v`], [1.8,1.6,`v`],[1.5,1,`v`],[5.2,3.6,`y`],[5,1.25,`y`],[3.9,2.7,`y`], [.5,2.9,`v`]}, font=[TIMES,BOLD,10]):
g20 := textplot({[2.2,2.4,`1`],[.6,2.8,`1`],[1.93,1.5,`2`],[1.63,.9,`3`], [-.18,3,`+`]}, font=[TIMES,ROMAN,8]):
g21 := textplot({[.15,2.9,`= P'`],[3.7,2.7,`P`]},font=[TIMES,ROMAN,10]):
g22 := textplot([5.1,1.1,`_|_`],font=[TIMES,ROMAN,5]):
g23 := textplot([5.45,1.1,`(P )`],font=[TIMES,ROMAN,6]):
g24 := textplot([5.5,1.1,`y`],font=[TIMES,BOLD,6]):
display([g||(1..24)],scaling=constrained, axes=none);
>
The basis for the row space of
can be obtained Maple as
> rsA := rowspace(A);
>
As in Section 34.5, the order of the vectors is not fixed. To be sure our calculations here agree with what is written in the text, we note that the vector with the larger 1-norm is on the left in the matrix
in the text. Hence, we for
> C := augment(op(sort([op(rsA)], (a,b)->is(norm(a,1)>norm(b,1)))));
>
From
C
, form
P
' =
C
(
C
C
)
C
, the matrix which projects vectors onto the subspace spanned by the columns of
C
, that is, onto the row space of
A
.
> `P'` := evalm(C &* inverse(transpose(C) &* C) &* transpose(C));
>
The matrix
P
' should project
v
and
X
(the general least-squares solution) onto the row space of
A
. This projection should be
X
, the least-squares solution of shortest length. Thus, the vectors
P
'
v
,
P
'
X
, and
X
should all be the same, as the following calculuations confirm.
> print(evalm(`P'` &* v[1]), evalm(`P'` &* X), Xplus);
>
Pseudoinverse
The over-determined system we are solving is
A
x
=
y
. There is no solution - the equations are inconsistent - because
y
is not in the column space of
A
. We settled for a least-squares solution
X
which was not unique. To extract a unique solution from the two-parameter family of solutions contained in
X
, we found
X
, the least-squares solution of minimum length, and confirmed that this unique solution resides in the row space of
A
, and is the projection of
X
onto that row space.
If the system
A
x
=
y
were a square system of maximal rank, there would be an inverse matrix
A
, multiplication by which would yield the solution
x
=
A
y
via
A
A
x
=
A
y
=
x
For the over-determined case, we seek a matrix
A
with which
y
could be multiplied so as to yield the minimum-length least-squares solution
X
=
A
y
. Such a matrix will be called a
pseudoinverse
, and we now give two prescriptions for constructing such a matrix.
>
Pseudoinverse by Curious Formula
We seek a matrix, the application of which to
y
, yields the minimum-norm least-squares solution
X
. The matrix
A
can be computed by the formula
discussed briefly in [1]. When applied to the matrix A in Table 34.4, it yields
which, because Maple interprets
evalm
(
A
A
+
) as
A
A
+
I
, we can implement with
> Aplus := map(limit,evalm(inverse(transpose(A) &* A + s^2) &* transpose(A)), s=0);
>
A calculation then shows
X
= A
y
, that is,
> print(Xplus, evalm(Aplus &* y));
>
Moreover, we can also show the rank of
is 2, via
> rank(Aplus);
>
Subsequently, we will show how
is related to
, the singular value decomposition of
A
.
The matrix
A
behaves as an inverse on the column space of
A
, acting as follows. Vectors in
R
=
R
are projected onto the column space of
A
, and those projections are "pulled back" to pre-images in
R
=
R
. These pre-images are then projected onto the row space of
A
. A vector
y
in
R
=
R
is projected onto the two-dimensional column space of
A
, and that projection is inverted to a pre-image in
R
=
R
which is not unique. By projecting this pre-image onto the row space of
A
, the unique, shortest least-squares solution,
X
= A
y
is obtained.
>
Pseudoinverse from SVD
If the singular value decomposition of
A
is given by
, the pseudoinverse is given by
where
is the transpose of that version of
in which eqch nonzero singular value has been replaced by its reciprocal. Since
, like
A
, has dimensions
x
= 5
x
4, then
has dimensions
x
= 4
x
5. For our example, we have
and a calculation shows that
.
To implement such a calculation in Maple, we must have access to the matrices
, and
. To this end, we have copied and pasted the expressions for these matrices. Thus, the matrix
is given by
> V := matrix([[739/2*(3277+23*204983^(1/2))/(52054667238495+80926940385*204983^(1/2))^(1/2),-739/2*(-3277+23*204983^(1/2))/(52054667238495-80926940385*204983^(1/2))^(1/2),0,1/910*910^(1/2)*237^(1/2)], [-1478*(2833+5*204983^(1/2))/ (52054667238495+80926940385*204983^(1/2))^(1/2), 1478*(-2833+5*204983^(1/2))/(52054667238495-80926940385*204983^(1/2))^(1/2), 1/1185*1185^(1/2), 2/1659*910^(1/2)*237^(1/2)], [1638363/(52054667238495+80926940385*204983^(1/2))^(1/2), 1638363/(52054667238495-80926940385*204983^(1/2))^(1/2), 28/1185*1185^(1/2), 17/21567*910^(1/2)*237^(1/2)], [739/2*(-5641+204983^(1/2))/(52054667238495+80926940385*204983^(1/2))^(1/2),-739/2*(5641+204983^(1/2))/(52054667238495-80926940385*204983^(1/2))^(1/2), 4/237*1185^(1/2), -251/215670*910^(1/2)*237^(1/2)]]);
>
the matris
is given by
> U := matrix([[232785*(2833+5*204983^(1/2))/(20950+30*204983^(1/2))^(1/2)/(52054667238495+80926940385*204983^(1/2))^(1/2), -232785*(-2833+5*204983^(1/2))/(20950-30*204983^(1/2))^(1/2)/(52054667238495-80926940385*204983^(1/2))^(1/2), 8/7273*7273^(1/2), -693/1576163*1517^(1/2)*1039^(1/2), 468/3029449*1997^(1/2)*1517^(1/2)], [3695*(114083+181*204983^(1/2))/(20950+30*204983^(1/2))^(1/2)/(52054667238495+80926940385*204983^(1/2))^(1/2), -3695*(-114083+181*204983^(1/2))/(20950-30*204983^(1/2))^(1/2)/(52054667238495-80926940385*204983^(1/2))^(1/2), 0, 1/1517*1517^(1/2)*1039^(1/2), 244/3029449*1997^(1/2)*1517^(1/2)], [29560*(4903+29*204983^(1/2))/(20950+30*204983^(1/2))^(1/2)/(52054667238495+80926940385*204983^(1/2))^(1/2), -29560*(-4903+29*204983^(1/2))/(20950-30*204983^(1/2))^(1/2)/(52054667238495-80926940385*204983^(1/2))^(1/2), 45/7273*7273^(1/2), 128/1576163*1517^(1/2)*1039^(1/2), 544/3029449*1997^(1/2)*1517^(1/2)], [-29560*(12367+32*204983^(1/2))/(20950+30*204983^(1/2))^(1/2)/(52054667238495+80926940385*204983^(1/2))^(1/2), 29560*(-12367+32*204983^(1/2))/(20950-30*204983^(1/2))^(1/2)/(52054667238495-80926940385*204983^(1/2))^(1/2), 0, 0, 1/1997*1997^(1/2)*1517^(1/2)], [22170*(7391+30*204983^(1/2))/(20950+30*204983^(1/2))^(1/2)/(52054667238495+80926940385*204983^(1/2))^(1/2), -22170*(-7391+30*204983^(1/2))/(20950-30*204983^(1/2))^(1/2)/(52054667238495-80926940385*204983^(1/2))^(1/2), -72/7273*7273^(1/2), 3/1576163*1517^(1/2)*1039^(1/2), 392/3029449*1997^(1/2)*1517^(1/2)]]);
>
and the matrix
is given by
> Sigma := matrix([[(20950+30*204983^(1/2))^(1/2), 0, 0, 0], [0, (20950-30*204983^(1/2))^(1/2), 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]);
>
Just to be sure the factors are correctly reproduced, test that
A
=
U
V
.
> print(A, map(radnormal,evalm(U &* Sigma &* transpose(V))));
>
The factors U and V are the important ones; we also need the singular values themselves, so we again obtain them exactly with the singularvals command.
> sv := singularvals(A);
>
For our example, the simplest way to create
is directly.
> SigmaPlus := extend(diag(1/sv[4],1/sv[5]),2,3,0);
>
Next, compute the product
V
U
and verify that it reproduces
A
.
>
map(rationalize,evalm(V &* SigmaPlus &* transpose(U)));
lprint(`and from before, we had`);
print(Aplus);
`and from before, we had`
>
As predicted, the pseudoinverse
can be obtained from the factors in the singular value decomposition of
A
.
>
Numeric SVD and the Pseudoinverse
In practice, the singular value decomposition is computed numerically, not exactly. We therefore consider how
A
might be assembled from the factors
and
which were obtained in floating-point form in Section 34.5. If
, a numerically computed version of
is obtained from the singular values
> SV := evalf(Svd(A,Uf,Vf));
>
the reciprocals which appear in
would include a term with magnitude
. Clearly, the numerically determined singular values must be filtered through a threshold barrier to eliminate values which are probably zero. If this is done with a threshold of
as
is constructed from
, then
and if
, then
which agrees to 8 significant figures with
expressed in floating-point form. Moreover,
X
, the floating-point form of the exact solution, and
y
, the minimum-norm least-squares solution computed entirely in floating-point arithmetic, appear below, and can be seen to agree to 8 significant figures.
X
=
,
y
=
To execute these calculations in Maple, we first note that
is given by
> Sigmaf := matrix(5,4,(i,j) -> if i=j then SV[i] else 0 fi);
>
Setting the threshold as
, we build
directly as
> SigmaPlusf := transpose(matrix(5,4, (i,j) -> if i=j and Sigmaf[i,i]>10^(-10) then 1/Sigmaf[i,i] else 0 fi));
>
We next build
as
> Aplusf := evalm(Vf &* SigmaPlusf &* transpose(Uf));
>
Floating the exact version of
A
we find
> evalf(op(Aplus));
>
Finally, we compute
X
by converting
X
to floating-point form, and we compare it to
y
, as follows.
> print(evalf(op(Xplus)),evalm(Aplusf &* evalf(op(y))));
>
References
[1] Richard Pavelle, editor, Applications of Computer Algebra, Kluwer Academic Publishers, 1985.
>