Section 39-06.mws

Unit 8: Numerical Methods

Chapter 39: Systems of Equations

Section 39.6: relaxation and SOR

Copyright

Copyright * 2001 by Addison Wesley Longman, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.

Initializations

> restart;

> with(linalg):
with(plots):
read(`C:/Program Files/Maple 6.01/pvac.txt`):

Warning, the protected names norm and trace have been redefined and unprotected

Warning, the name changecoords has been redefined

>

Relaxation - the Concept

The idea of relaxation has been attributed to the British engineer, Richard Southwell, [1], although some claim that Gauss was familiar with the concept. Conceived in the "by-hand" era before computers, the idea underlies the more modern method of successive overrelaxation (SOR). Our exposition will be in terms of the linear system A x = y for which the matrix A is the strictly diagonally dominant

> A := matrix(3,3,[-12,-1,1,1,13,7,3,4,8]);

A := MATRIX([[-12, -1, 1], [1, 13, 7], [3, 4, 8]])

>

and for which the vector y is

> y := vector([-11,48,35]);

y := MATRIX([[-11], [48], [35]])

>

Then, the solution is

> xe := linsolve(A,y);

xe := MATRIX([[1], [2], [3]])

>

the system variables are

> vars := [seq(x[k],k=1..3)];

vars := [x[1], x[2], x[3]]

>

and the equations of the system are

> for k from 1 to 3 do
q||k := evalm(A &* vars)[k] = y[k];
od;

q1 := -12*x[1]-x[2]+x[3] = -11

q2 := x[1]+13*x[2]+7*x[3] = 48

q3 := 3*x[1]+4*x[2]+8*x[3] = 35

>

The residual vector for this system would be R = A x ``-`` y , which we make into a function in Maple via

> R := map(unapply,evalm(A&*vars - y),op(vars));

R := MATRIX([[proc (x_1, x_2, x_3) options operator...

>

Clearly, R would be zero if x = x ``[e] . Indeed, we find

> R(op(convert(xe,list)));

MATRIX([[0], [0], [0]])

>

If we start an iteration process at x ``[0] = 0 , the residual vector would initially be R = ``-`` y , in which the largest absolute value is 48, occurring in the second equation. In fact, we see that

> R(0,0,0);

MATRIX([[11], [-48], [-35]])

>

If we could make the largest residual zero, that is, if we could relax the largest residual to zero , we should be led to a more accurate solution. Our strategy, then, is to keep x[1] = x[3] = 0, and adjust x[2] to make the second component of R become zero. We would be aided in this task if we solved each equation for the variable with the coefficient of greatest magnitude in that equation, obtaining

x[1] = (-x[2]+x[3]+11)/12

x[2] = (-x[1]-7*x[3]+48)/13

x[3] = (-3*x[1]-4*x[2]+35)/8

which we implement as functions in Maple via

> for k from 1 to 3 do
X[k] := unapply(solve(q||k,x[k]),op(remove(has,vars,k)));
od;

X[1] := proc (x_2, x_3) options operator, arrow; -1...

X[2] := proc (x_1, x_3) options operator, arrow; -1...

X[3] := proc (x_1, x_2) options operator, arrow; -3...

>

So far, the astute reader will recognize the fixed-point iteration formula x = g ( x ). However, the actual iteration we do will be different.

The value of x[2] which drives the second component of R to zero is

> X[2](0,0);

48/13

>

If x[2] is set equal to this value while keeping x[1] = x[3] = 0, then the next member of the sequence of iterates is

x ``[1] = (0, x[2](0,0) , 0) = ``(0,48/13,0)

which we represent in Maple as the list

> z := [0,X[2](0,0),0];

z := [0, 48/13, 0]

>

Since the vector x has changed, the residual vector R will change to

> `R` = evalf(R(op(z)));

R = MATRIX([[7.307692308], [0.], [-20.23076923]])

>

Now, the third component of the residual vector has the largest magnitude, so x[3] must be varied to bring this component of the residual to zero. The appropriate value of x[3] is computed by

> new := X[3](z[1],z[2]);

new := 263/104

>

so x ``[2] , the next iterate, is given by

> z := evalf([z[1],z[2],new]);

z := [0., 3.692307692, 2.528846154]

>

The residual generated by x ``[2] is then

> `R` = R(op(z));

R = MATRIX([[9.836538462], [17.70192308], [0.]])

>

The second component of R has the largest magnitude, so x[2] is adjusted to bring this component of the residual to zero. This is done by giving x[2] the value

> new := X[2](z[1],z[3]);

new := 2.330621301

>

so x ``[3] , the next iterate is given by

> z := [z[1],new,z[3]];

z := [0., 2.330621301, 2.528846154]

>

The residual generated by x ``[3] is then

> `R` = R(op(z));

R = MATRIX([[11.19822485], [-.1e-7], [-5.44674557]]...

>

The first component of the residual now has the largest magnitude, so x[1] must be adjusted to bring this component of R to zero. This is done by assigning x[1] the value

> new := X[1](z[2],z[3]);

new := .9331854045

>

so x ``[4] , the next iterate, is given by

> z := [new,z[2],z[3]];

z := [.9331854045, 2.330621301, 2.528846154]

>

The residual generated by x ``[4] is then

> `R` = R(op(z));

R = MATRIX([[0.], [.93318539], [-2.64718935]])

>

The component of R with the largest absolute value is the third, so x[3] must be adjusted to bring this component of the residual to zero. This is done by assigning x[3] the value

> new := X[3](z[1],z[2]);

new := 2.859744823

>

so x ``[5] , the next iterate, is given by

> z := [z[1],z[2],new];

z := [.9331854045, 2.330621301, 2.859744823]

>

The residual generated by x ``[5] is then

> `R` = R(op(z));

R = MATRIX([[.33089867], [3.24947607], [0.]])

>

The second component of R now has the largest magnitude, so x[2] must be adjusted to bring this component of the residual vector to zero. This is done by assigning x[2] the value

> new := X[2](z[1],z[3]);

new := 2.080661602

>

so that x ``[6] , the next iterate, is given by

> z := [z[1],new,z[3]];

z := [.9331854045, 2.080661602, 2.859744823]

>

The residual generated by x ``[6] is then

> `R` = R(op(z));

R = MATRIX([[.58085837], [-.1e-7], [-.99983880]])

>

in which the third component has the largest magnitude. We therefore drive this third component of R to zero by assigning to the variable x[3] the value

> new := X[3](z[1],z[2]);

new := 2.984724672

>

giving us as the next iterate the vector x ``[7] , represented in Maple as

> z := [z[1],z[2],new];

z := [.9331854045, 2.080661602, 2.984724672]

>

The residual vector generated by x ``[7] is then

> `R` = R(op(z));

R = MATRIX([[.70583822], [.87485893], [0.]])

>

The second component of R now has the largest magnitude, so it is driven to zero by assigning to x[2] the value

> new := X[2](z[1],z[3]);

new := 2.013364760

>

This gives x ``[8] , the next iterate, as

> z:=[z[1],new,z[3]];

z := [.9331854045, 2.013364760, 2.984724672]

>

and the iteration continues until it converges to the exact solution

> print(xe);

MATRIX([[1], [2], [3]])

>

Although it can become tedious, selecting the component with the largest magnitude in the residual vector is something which can be done "by-hand" for a system of modest size. Clearly, having to pick, repeatedly, the largest number from a list of 10,000 would be highly susceptible to error. Moreover, having a computer make the many comparisons required to select the largest magnitude is not efficient, and this version of the relaxation algorithm is not a viable method for solving large problems on a computer.

However, the underlying concept does motivate the method of successive overrelaxation, the SOR technique, which is capable of being implemented on a computer.

>

Successive Overrelaxation

Southwell discovered that often, when driving a component of the residual to zero, the other components would move in the same direction. For example, the residual generated by x ``[1] is

R = MATRIX([[7.3], [0], [-20]])

The third component is driven to zero by increasing it from -20 to 0. The residual generated by x ``[2] is

R = MATRIX([[9.8], [17.7], [0]])

Driving the third component of the residual upwards from -20 to 0 also drove the first component upwards from 7.3 to 9.8, and the second component upwards from 0 to 17.7.

This led Southwell to "overcompensate" or overrelax , pushing the residual, not to zero, but beyond zero. All that is needed is a more efficient way of implementing the calculations on a computer. Formulating the system A x = y as if for performing Gauss-Seidel iteration is the basis for this efficiency.

Assuming that the k th equation has been solved for x[k] , we write the linear system in the form

x[k](m+1) = x[k](m)+1/a[kk] ``(y[k]-Sum(a[ks]*x[s](m),s = 1 .. n))

as in the Jacobi method. Then, recognizing that the expression in parentheses is actually the k th component of the residual vector R , we write

x[k](m+1) = x[k](m)+1/a[kk] R ``[m]

By introducing omega , the relaxation factor, this equation is modified to

x[k](m+1) = x[k](m)+omega/a[kk] R ``[m]

Finally, write the general equation in the form

x[k](m+1) = (1-omega)*x[k](m)+omega/a[kk] ``(y[k]-Sum(a[ks]*x[s](m+1),s = 1 .. k-1)-Sum(a[ks]...

which is now consonant with the Gauss-Seidel method, except for the introduction of omega , the relaxation factor. Clearly, if omega = 1 , the SOR technique is precisely the Gauss-Seidel method. If A is a symmetric positive definite matrix and 0 < omega `` < 2 , then the SOR technique converges from any initial vector. (See [2] for a proof.)

>

Implementation

Implementing the SOR method for small systems is more conveniently done in matrix form. To accomplish this, write the linear system A x = y in the form

y ``-`` A x = 0

and multiply though by omega , giving

0 = omega ( y ``-`` A x)

Write A = L + D + U , where L is zero on and above the main diagonal, U is zero on and below the main diagonal, and D is zero off the main diagonal. In addition, add D x to each side of the equation, thereby obtaining

D x = D x -omega ( L + D + U ) x + omega y

Bring the term -omega L x to the left, giving

( D + omega L ) x = [ ``(1-omega) D -omega U ] x + omega y

Let the vector x on the left side be the updated one, namely, x ``(m+1) , while keeping the x on the right side as the "old" one, namely, x ``(m) . This gives

( D + omega L ) x ``(m+1) = [ ``(1-omega) D -omega U ] x ``(m) + omega y

Solve for x ``(m+1) to obtain

x ``(m+1) = ( D + omega L) ``^(-1) [ ``(1-omega) D -omega U ] x ``(m) + omega ( D + omega L ) ``^(-1) y

which has the fixed-point form

x ``(m+1) = B x ``(m) + C = g ( x ``(m) )

where

B = ( D + omega L) ``^(-1) [ ``(1-omega) D -omega U ]

and

C = omega ( D + omega L ) ``^(-1) y

>

Example 39.12

Consider the linear system A x = y for which A is the symmetric, positive definite matrix



> A := matrix(3,3,[8,-5,-7,-5,14,1,-7,1,8]);

A := MATRIX([[8, -5, -7], [-5, 14, 1], [-7, 1, 8]])...

>

and y is the vector

> y := vector([-23,26,19]);

y := MATRIX([[-23], [26], [19]])

>

We see that A is symmetric. That it is positive definite also, we see from

> definite(A,positive_def);

true

>

or from the actual eigenvalues

> eigenvals(evalf(op(A)));

.3556300699, 10.67012566, 18.97424427

>

which are clearly all positive.

the exact solution is

> xe := linsolve(A,y);

xe := MATRIX([[1], [2], [3]])

>

Decompose A as L + D + U , where L , D , and U are given by

> L := matrix(3,3, (i,j) -> if i>j then A[i,j] else 0 fi);
U := matrix(3,3, (i,j) -> if i<j then A[i,j] else 0 fi);
d := diag(seq(A[k,k],k=1..3));

L := MATRIX([[0, 0, 0], [-5, 0, 0], [-7, 1, 0]])

U := MATRIX([[0, -5, -7], [0, 0, 1], [0, 0, 0]])

d := MATRIX([[8, 0, 0], [0, 14, 0], [0, 0, 8]])

>

Notice that we have used d in place of D , which in Maple is reserved for the differentiation operator.

The matrix B = ( D + omega L ) ``^(-1) [ ``(1-omega) D -omega U ] is then

> B := map(simplify,evalm(inverse(d+omega*L)&*((1-omega)*d-omega*U)));

B := MATRIX([[1-omega, 5/8*omega, 7/8*omega], [-5/1...

>

and the vector C = omega ( D + omega L) ``^(-1) y is then

> C := evalm(omega*inverse(d+omega*L)&*y);

C := MATRIX([[-23/8*omega], [omega*(-115/112*omega+...

>

The eigenvalues of B determine the rate of convergence of the SOR iteration. In terms of omega , they are

> q := eigenvals(B);

q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...
q := 1/2688*(-758661120*omega^3+4360511232*omega^4-...

>

Since we wish to pick omega so that the largest magnitude of the eigenvalues is as small as possible, we begin with Figure 39.5, a graph of the three eigenvalues as a function of omega .

> p1 := plot(map(abs,[q]),omega=0..2,color=[black,red,green], thickness=[3,1,1], labels=[``,``]):
p2 := textplot([1.75,-.07,`w`], font=[SYMBOL,12]):
display([p1,p2],xtickmarks=5, ytickmarks=5);

[Maple Plot]

>

The optimal value of omega appears to be approximately 1.55, determined from the graph. For omega = 1.55 the matrix B becomes

> B1 := subs(omega=1.55,op(B));

B1 := MATRIX([[-.55, .9687500000, 1.356250000], [-....

>

and the vector C becomes

> C1 := subs(omega=1.55,op(C));

C1 := MATRIX([[-4.456250000], [.4117187500], [-2.44...

>

Starting with the initial vector [10, 10, 10]^T , we obtain the first ten iterates and the norms of the differences between the iterate and the exact solution.

> x := evalf(vector([10,10,10]));
for k from 1 to 10 do
x := evalm(B1&*x+C1);
print(k,x,norm(x-xe));
od:

x := MATRIX([[10.], [10.], [10.]])

1, MATRIX([[13.29375000], [3.630468746], [15.507495...

2, MATRIX([[12.78124436], [6.240244067], [11.277643...

3, MATRIX([[9.854605440], [3.653069007], [10.136072...

4, MATRIX([[7.409676390], [3.848960551], [7.4100474...

5, MATRIX([[5.246985350], [2.845826189], [6.1705689...

6, MATRIX([[3.783636339], [2.724709861], [4.8910813...

7, MATRIX([[2.735841730], [2.352952244], [4.2457561...

8, MATRIX([[2.076766276], [2.264020312], [3.7240444...

9, MATRIX([[1.645533524], [2.131975677], [3.4517101...

10, MATRIX([[1.385439825], [2.090771090], [3.256725...

>

To verify that the iteration converges more slowly for other values of omega , we try omega = 1.75 and obtain

> B2 := subs(omega=1.75,op(B));

B2 := MATRIX([[-.75, 1.093750000, 1.531250000], [-....

>

for the matrix B , and

> C2 := subs(omega=1.75,op(C));

C2 := MATRIX([[-5.031250000], [.1054687498], [-3.57...

>

for the vector C . Again iterating from the initial vector [10, 10, 10]^T , we find

> x := evalf(vector([10,10,10]));
for k from 1 to 10 do
x := evalm(B2&*x+C2);
print(k,x,norm(x-xe));
od:

x := MATRIX([[10.], [10.], [10.]])

1, MATRIX([[13.71875000], [3.074218750], [16.990600...

2, MATRIX([[14.05897141], [7.607367999], [11.276987...

3, MATRIX([[10.01296773], [2.392955360], [10.507407...

4, MATRIX([[6.165786130], [3.995473932], [4.8430448...

5, MATRIX([[2.130372397], [.9794966968], [3.5718342...

6, MATRIX([[-.88333643e-1], [2.013689674], [.901618...

7, MATRIX([[-1.381922849], [.7633286097], [1.196988...

8, MATRIX([[-1.327028739], [1.698487033], [.8549519...

9, MATRIX([[-.869113126], [1.326070032], [1.8941287...

10, MATRIX([[-.28641384e-1], [2.000780516], [2.2541...

>

With omega = 1.75 , the tenth iterate is farther away from the exact solution than with omega = 1.55 . We conclude that our choice of omega based on the graph of the magnitudes of the eigenvalues was appropriate.

>

Equivalence of Forms

When omega = 1 , the SOR method reduces to the Gauss-Seidel method, an equivalence we will now verify for the system in Example 39.12 whose Gauss-Seldel equations are

x[1]^(m+1) = ``(5/8)*x[2]^m+``(7/8)*x[3]^m-23/8

x[2]^(m+1) = ``(5/14)*x[1]^m-``(1/14)*x[3]^m+13/7

x[3]^(m+1) = ``(7/8)*x[1]^m-``(1/8)*x[2]^m+19/8

When omega = 1 , the matrix B becomes

> BB := subs(omega=1,op(B));

BB := MATRIX([[0, 5/8, 7/8], [0, 25/112, 27/112], [...

>

and the vector C becomes

> CC := subs(omega=1,op(C));

CC := MATRIX([[-23/8], [93/112], [-219/896]])

>

Consequently, the SOR equations in the fixed-point iteration x ``(m+1) = B x ``(m) + C are

x[1]^(m+1) = ``(5/8)*x[2]^m+``(7/8)*x[3]^m-23/8

x[2]^(m+1) = ``(25/112)*x[2]^m+``(27/112)*x[3]^m+93...

x[3]^(m+1) = ``(465/896)*x[2]^m+``(659/896)*x[3]^m-...

and we want to show these equations are indeed the Gauss-Seidel equations. Clearly, the first equation in each formulation is the same. In the second Gauss-Seidel equation, replace x[1]^(m+1) from the first, obtaining

x[2]^(m+1) = ``(25/112)*x[2]^m+``(27/112)*x[3]^m+93...

the second equation in the SOR formulation. In the third Gauss-Seidel equation, replace x[2]^(m+1) from the second equation. This introduces another x[1]^(m+1) . Now replace all x[1]^(m+1) from the first equation, and the result will be

x[3]^(m+1) = ``(465/896)*x[2]^m+``(659/896)*x[3]^m-...

the third equation of the SOR formulation. Thus, we have verified that the matrix form of the SOR iteration actually uses the updated values x ``[m+1] in the same way that Gauss-Seidel iteration does..

To execute these substitutions in Maple, write the Gauss-Seidel equations as

> unassign('x','X');
for k from 1 to 3 do
X[k] := solve(evalm(A &* vars-y)[k], x[k]);
od;

X[1] := 5/8*x[2]+7/8*x[3]-23/8

X[2] := 5/14*x[1]-1/14*x[3]+13/7

X[3] := 7/8*x[1]-1/8*x[2]+19/8

>

Substitute for x[1] in the second equation, using its value from the first equation, thereby obtaining

> X2 := subs(x[1]=X[1],X[2]);

X2 := 25/112*x[2]+27/112*x[3]+93/112

>

This is the second equation in the SOR formulation.

Finally, in the third equation of the Gauss-Seidel formulation the variables x[1] and x[2] both appear. The variable x[2] must be updated with the most recent version of this variable. The variable x[1] must likewise be replaced with its updated version, but that update contains the old value of x[2] . Thus, the replacement of x[1] must be done after the replacement of x[2] , lest the x[2] in x[1] be updated twice. With this due care, we get

> subs([x[1]=X[1], x[2]=X2], X[3]);

465/896*x[2]+659/896*x[3]-219/896

>

on the righthand side of the third equation, and we have verified that the matrix form of the SOR iteration actually uses the updated values x ``(m+1) in the same way that Gauss-Seidel iteration does.

>

References

[1] Richard V. Southwell, Relaxation Methods in Engineering Science, Oxford University Press, 1940.

[2] James M. Ortega, Numerical Analysis, A Second Course, Academic Press, 1992.

>