$$\newcommand{\dint}{\mathrm{d}} \newcommand{\vphi}{\boldsymbol{\phi}} \newcommand{\vpi}{\boldsymbol{\pi}} \newcommand{\vpsi}{\boldsymbol{\psi}} \newcommand{\vomg}{\boldsymbol{\omega}} \newcommand{\vsigma}{\boldsymbol{\sigma}} \newcommand{\vzeta}{\boldsymbol{\zeta}} \renewcommand{\vx}{\mathbf{x}} \renewcommand{\vy}{\mathbf{y}} \renewcommand{\vz}{\mathbf{z}} \renewcommand{\vh}{\mathbf{h}} \renewcommand{\b}{\mathbf} \renewcommand{\vec}{\mathrm{vec}} \newcommand{\vecemph}{\mathrm{vec}} \newcommand{\mvn}{\mathcal{MN}} \newcommand{\G}{\mathcal{G}} \newcommand{\M}{\mathcal{M}} \newcommand{\N}{\mathcal{N}} \newcommand{\S}{\mathcal{S}} \newcommand{\diag}[1]{\mathrm{diag}(#1)} \newcommand{\diagemph}[1]{\mathrm{diag}(#1)} \newcommand{\tr}[1]{\text{tr}(#1)} \renewcommand{\C}{\mathbb{C}} \renewcommand{\R}{\mathbb{R}} \renewcommand{\E}{\mathbb{E}} \newcommand{\D}{\mathcal{D}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\innerbig}[1]{\left \langle #1 \right \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\two}{\mathrm{II}} \newcommand{\GL}{\mathrm{GL}} \newcommand{\Id}{\mathrm{Id}} \newcommand{\grad}[1]{\mathrm{grad} \, #1} \newcommand{\gradat}[2]{\mathrm{grad} \, #1 \, \vert_{#2}} \newcommand{\Hess}[1]{\mathrm{Hess} \, #1} \newcommand{\T}{\text{T}} \newcommand{\dim}[1]{\mathrm{dim} \, #1} \newcommand{\partder}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\rank}[1]{\mathrm{rank} \, #1}$$

# Numerical Solution of Linear Algebraic Equations with Direct Methods

## Introduction

The most basic task in linear algebra, and perhaps in all of scientific computing, is to solve for the unknowns in a set of linear algebraic equations. Linear, algebraic equations occur in almost all branches of numerical analysis. But their most visible application in engineering is in the analysis of linear systems. Any system whose response is proportional to the input is deemed to be linear. Linear systems include structures, elastic solids, heat flow, seepage of fluids, electromagnetic fields and electric circuits. If the system is discrete, such as a truss or an electric circuit, then its analysis leads directly to linear algebraic equations.

In the case of a statically determinate truss, like the Eiffel tower which is a three-dimensional truss structure, the equations arise when the equilibrium conditions of the joints are written down. The unknowns $x_1, x_2,\ldots, x_n$ represent the forces in the members and the support reactions, and the constants $b_1, b_2,\ldots, b_n$ are the prescribed external loads. The behavior of continuous systems is described by differential equations, rather than algebraic equations. However, because numerical analysis can deal only with discrete variables, it is first necessary to approximate a differential equation with a system of algebraic equations. The well-known finite difference, finite element and boundary element methods of analysis work in this manner. They use different approximations to achieve the “discretization,” but in each case the final task is the same: solve a system (often a very large system) of linear, algebraic equations.

In summary, the modeling of linear systems invariably gives rise to equations of the form $\mathbf{A}.\mathbf{x} = \mathbf{b}$, where $\mathbf{b}$ is the input and $\mathbf{x}$ represents the response of the system. The coefficient matrix $\mathbf{A}$, which reflects the characteristics of the system, is independent of the input. In other words, if the input is changed, the equations have to be solved again with a different $\mathbf{b}$, but the same $\mathbf{A}$.

Therefore, it is desirable to have an equation solving algorithm that can handle any number of constant vectors with minimal computational effort. In general, a set of linear algebraic equations looks like this:

$\left\{ \begin{array} [c]{r}% a_{1,1}x_{1}+\ldots+a_{1,n}x_{n}=b_{1}\\ \vdots\\ a_{m,1}x_{1}+\ldots+a_{m,n}x_{n}=b_{m} \end{array} \right.$

The $n$ unknowns $x_j$ , $j=1,\ldots,n$ are related by $m$ equations. The coefficients $a_{i,j}$ with $i=1,\ldots,m$ and $j=1,\ldots,n$ are known numbers, as are the right-hand side quantities $b_i$ with $i=1,\ldots,m$. If $n=m$ then there are as many equations as unknowns, and there is a good chance of solving for a unique solution set of $x_j$’s. Otherwise, if $n\neq m$, things are even more interesting; we’ll have more to say about this below. If we write the coefficients $a_{i,j}$ as a matrix, and the right-hand sides $b_i$ as a column vector the previous system can be written in matrix form as $\mathbf{A}.\mathbf{x}=\mathbf{b}$ where $\mathbf{A}=\left[ \begin{array} [c]{cccc}% a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ \vdots & \vdots & & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \right].$ Throughout this page, we use a raised dot to denote matrix multiplication, or the multiplication of a matrix and a vector, or the dot product of two vectors.

With python, we will use the numpy library to deal with matrices. For instance, the following so called Wilson matrix $\mathbf{A}=\left[ \begin{array} [c]{cccc}% 10 & 7 & 8 & 7\\ 7 & 5 & 6 & 5\\ 8 & 6 & 10 & 9\\ 7 & 5 & 9 & 10 \end{array} \right]$ can be defined in python like this:


import numpy as np

""" Define the linear system elements: matrix  A and vector b """

A = np.array([[10, 7, 8, 7],[7, 5, 6, 5],[8, 6, 10, 9],[7, 5, 9, 10]])
b = np.array([1, 1, 1, 1])


We can also create a matrix by using some numpy functions


import numpy as np

A = np.zeros((3,3),dtype=np.float)
B = np.zeros((3,3),dtype=np.float64)

# Modification of the diagonal of matrix A
n = A.shape[0]
for i in range(n):
A[i,i] = 2.0
print(A)



A particularly useful representation of the equations for computational purposes is the augmented coefficient matrix obtained by adjoining the constant vector $\mathbf{b}$ to the coefficient matrix $\mathbf{A}$ in the following fashion: $$\left[\mathbf{A}|\mathbf{b}\right]= \left[ \begin{array}[c]{cccc} a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ \vdots & \vdots & & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array} \left| \begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \end{array}\right.\right]$$

For $m=n$, we can solve the unknows except when one or more of the $m$ equations is a linear combination of the others. We call this condition row degeneracy. When all equations contain certain variables only in exactly the same linear combination, we call this condition column degeneracy. It turns out that, for square matrices, row degeneracy implies column degeneracy, and vice versa. A set of equations that is degenerate is called singular.

## Backsubstitution method

If matrix $\mathbf{A}$ is an upper triangular matrix (or lower triangular matrix) i.e. $\left\{ \begin{array} [c]{r}% a_{11}x_{1}+\ldots a_{1,n-1}x_{n-1}+a_{1n}x_{n}=b_{1}\\ \vdots\\ a_{n-1,n-1}x_{n-1}+a_{n-1,n}x_{n}=b_{n-1}\\ a_{nn}x_{n}=b_{n}% \end{array} \right.$ and $\det\left( A\right) =a_{11}a_{22}\ldots a_{nn}\neq0$ then the solution can be obtained in the following manner: $\left\{ \begin{array} [c]{l}% x_{n}=a_{nn}^{-1}b_{n}\\ x_{n-1}=a_{n-1,n-1}^{-1}\left( b_{n-1}-a_{n-1,n}x_{n}\right) \\ \vdots\\ x_{1}=a_{11}^{-1}\left( b_{1}-a_{12}x_{2}-\cdots-a_{1,n-1}x_{n-1}-a_{1n}% x_{n}\right) \end{array} \right.$ Consequently, to solve this system we need to do $\left\{ \begin{array} [c]{c}% 1+2+\cdots+\left( n-1\right) =\dfrac{n\left( n-1\right) }{2}\text{ additions}\\ 1+2+\cdots+\left( n-1\right) =\dfrac{n\left( n-1\right) }{2}\text{ multiplications}\\ n\text{ divisions.}% \end{array} \right.$ Hence, we have the following algorithm:
$\begin{array}[c]{l} x_n =\dfrac{b_n}{a_{n,n}}\\ x_i =\dfrac{b_i-\displaystyle{\sum_{j=i+1}^{j=n}a_{i,j}\ x_j}}{a_{i,i}} \ \ \text{ for all } i = n − 1, . . . , 1 \end{array}$

### Exercice [PY-1]

In this exercice we want to define the Backsubstitution function. For this, we need to know the dimension of the matrix by using the shape function. We assume that the matrix is a square matrix. We will also test the accuracy of the solution and consequently the sensibility of the method according to the dimension of the matrix.

1) Define the python Backsubstitution function. This function will return the $\mathbf{x}$ solution to the linear system $\mathbf{A}.\mathbf{x}=\mathbf{b}$ where $\mathbf{A}$ is assumed to be a upper triangular matrix.

                
import numpy as np

""" Definition of the Backsubstitution function """

def Backsubstitution(A,b):
n = A.shape[0]
if b.shape[0] == n:
x = np.zeros(n,dtype=np.float)

for i in range(n-1,-1,-1):
#TO DO

else:
raise ValueError("shape index error")

return x

# Use this function
A = np.array([[10, 7, 8, 7],[0, 5, 6, 5],[0, 0, 10, 9],[0, 0, 0, 10]])
b = np.array([1, 1, 1, 1])
print(A)
print(b)
X = Backsubstitution(A,b)
print("Approximation solution of the system: ",X)



                            
def Backsubstitution(A,b):
n = A.shape[0]
if b.shape[0] == n:
x = np.zeros(n,dtype=np.float)

for i in range(n-1,-1,-1):
try:
s = 0.0
for j in range(n-1,i,-1):
s = s +  x[j]*A[i,j]
x[i] = (b[i]-s)/A[i,i]
except ZeroDivisionError:
print("Error: Division by Zero")
except IndexError:
print("Error: index ",i," out of bounds")

else:
raise ValueError("shape index error")

return x



2) In order to test the accuracy of the function, we define the coefficients of $\mathbf{A}$ by using a continuous uniform distribution on the interval $[-1,1]$. We also set the vector solution $x$ to $x=(1.0,..,1.0)^T$. By this way we are able to compare the exact solution $\mathbf{x}$ to the one obtained by using the Backsubstitution function according to different values of the dimension $n$ of the system.

• Create a list where for each value $n=1,..,50$, you evaluate all the maximum error for $100$ random linear systems. To do this, you can use the python code below.
•             
# define a random matrix A

for i in range(n):
for j in range(i,n):
# select one random value according to the continuous
# uniform distribution on -1.0 and 1.0
A[i,j] = np.random.uniform(-1.0,1.0,1)

# Create a vector with only "1"
x = np.ones(n,dtype=np.float)

# Calculate the associated vector b such that b=Ax
b = np.dot(A,x)

X = Backsubstitution(A,b)

# Compare the exact solution x with X
# which is obtained by the use of the previous function
err = np.max(np.abs(x-X))
print("Max error of the solution = ",err)




The np.dot function allows to calculate the dot product of a matrix with a matrix or a vector.

                            
# create an empty list
ERR = []
# define the maximum dimension of the linear systems
dim = 50
# define the number of random system for each value of n
nb = 100

for n in range(1,dim+1):
temp = []
for k in range(nb):
A = np.zeros((n,n),dtype=np.float)

for i in range(n):
for j in range(i,n):
# select one random value according to the uniform distribution
# on the interval -1.0 and 1.0
A[i,j] = np.random.uniform(-1.0,1.0,1)

# Create a vector with only "1"
x = np.ones(n,dtype=np.float)

b = np.dot(A,x)
X = Backsubstitution(A,b)
err = np.max(np.abs(X-x))
temp.append(err)
ERR.append(temp)


• By using the matplotlib library draw all the value errors of the previous list. For this you can use a semilogarithmic scale as in the figure below.
•                         
import matplotlib.pyplot as plt

# transpose the ERR list
ERR = np.transpose(ERR)

# Display grid
plt.grid(True, which="both")

# create a list of dim integer values
t = np.linspace(1,dim)

# Linear X axis, Logarithmic Y axis
for k in range(nb):
plt.semilogy(t,ERR[k],'ro')

#plt.ylim([0,1000])
plt.xlim([1,51])

# Provide the title for the semilog plot
plt.title('Axis in Semilog')

# Give x axis label for the semilog plot
plt.xlabel('dimension of the system')

# Give y axis label for the semilog plot
plt.ylabel('Backsubstitution Error')

# Display the semilog plot
plt.show()
#plt.savefig('BacksubstitutionErrors.png')


• Explain why for even small values of $n$ (around 40), the error is so big...(see below)
• ## Gauss-Jordan elimination method

The idea behind Gauss-Jordan elimination is to use elementary row operations on the system that replace a system with an equivalent but simpler sytem for which the answers to the questions about solutions are "obvious". The Gauss elimination method needs three steps:

1. algebraic operations by swapping rows for successively eliminating the unknows which is similar to define an invertible matrix $\mathbf{M}$ such that matrix $\mathbf{M}.\mathbf{A}$ becomes an upper triangular matrix
2. calculating vector $\mathbf{M}.\mathbf{b}$
3. solving the linear system $\mathbf{M}.\mathbf{A}.\mathbf{x}=\mathbf{M}.\mathbf{b}$ where $\mathbf{M}.\mathbf{A}$ is a upper triangular matrix
Let $\left\{ \begin{array} [c]{r} a_{11}x_{1}+a_{12}x_{2}+\ldots+a_{1n}x_{n}=b_{1}\;\;\;\left( L_{1}\right) \\ a_{21}x_{1}+a_{22}x_{2}+\ldots+a_{2n}x_{n}=b_{2}\;\;\;\left( L_{2}\right) \\ \vdots\\ a_{n1}x_{1}+a_{n2}x_{2}+\ldots+a_{nn}x_{n}=b_{n}\;\;\;\left( L_{n}\right) \end{array} \right.$ be a linear system of algebraic equations. If $a_{11}\neq0$ then $\left\{ \begin{array} [c]{c}% 1.x_{1}+a_{12}^{1}x_{2}+\ldots+a_{1n}^{1}x_{n}=b_{1}^{1}\;\;\;\left( L_{1}^{1}\leftarrow\dfrac{L_{1}}{a_{11}}\right) \\ \left. \begin{array} [c]{c}% 0.x_{1}+a_{22}^{1}x_{2}+\ldots+a_{2n}^{1}x_{n}=b_{2}^{1}\\ \vdots\\ 0.x_{1}+a_{n2}^{1}x_{2}+\ldots+a_{nn}^{1}x_{n}=b_{n}^{1}% \end{array} \right\} \begin{array} [c]{c}% \forall i\in\left[ 2,n\right] \\ \left( L_{i}^{1}\leftarrow L_{i}-a_{i1}L_{1}^{1}\right) \end{array} \end{array} \right.$ After applying $n$ time the previous method we finally obtain: $\left[ \begin{array} [c]{cccc}% 1 & a_{12}^{n} & \ldots & a_{1n}^{n}\\ 0 & 1 & & \\ \vdots & \ddots & \ddots & a_{n-1,n}^{n}\\ 0 & \cdots & 0 & 1 \end{array} \right] x=b^{\left[ n\right] }%$

By applying the backsubstitution algorithm we can then solve this linear system. In case one pivot is null, we search in which column we have a non-null pivot and we swap these columns. If it is impossible then we have $det(\mathbf{A})=0$.

The Gauss-Jordan elimination algorithm can be defined like this:
for k=1 to n-1 do
if |akk| < eps then
Error Message
else
for i=k+1 to n do
c= aik/akk
bi= bi - c bk
aik = 0
for j = k+1 to n do
aij = aij - c.akj
if |ann|< eps then
Error message



### Exercice [PY-2]

• Define the GaussJordan python function which enables to apply the Gauss-Jordan elimination to a linear system.
•         
import numpy as np
import time

def GaussJordan(A,b, eps = 0.000001):
n = A.shape[0]
test = true
# TO DO
return test

n = 50

# select random values according to the uniform distribution
# between -1.0 and 1.0 for each coefficient of matrix A
A = np.random.uniform(-1.0,1.0,size=(n,n))

# Create a random vector according the uniform distribution
x = np.random.uniform(-1.0,1.0,n)
b = np.dot(A,x)

# Solve the linear system and give the calculation time needed to solve the system

start_time = time.time()
if GaussJordan(A,b):
X = Backsubstitution(A,b)
t =(time.time() - start_time)
err = np.max(np.abs(X-x))
print("n= ",n," time= ",t," err= ",err)



                        

def GaussJordan(A,b, eps = 0.000001):
n = A.shape[0]
test = True

if b.shape[0] == n:

for k in range(n-1):
try:
if np.abs(A[k,k]) > eps:
for i in range(k+1,n):
c = A[i,k]/A[k,k]
b[i] = b[i] - c * b[k]
A[i,k] = 0.0
for j in range(k+1,n):
A[i,j] = A[i,j] - c * A[k,j]
else:
test = False
raise ValueError("Division by Zero")

except ZeroDivisionError:
test = False
print("Error: Division by Zero")
except IndexError:
test = False
print("Error: index ",i," out of bounds")
if np.abs(A[n-1,n-1]) < eps:
test = False
raise ValueError("Division by Zero")
else:
test = False
raise ValueError("shape index error")

return test


• Plot the calculation time curve for $n=1,...,100$.
• Plot all the maximum errors obtained for $100$ random matrix $A$ with different sizes $n=1,...,50$.
• Do the same for orthogonal matrices $\mathbf{Q}$. For this, define a matrix $\mathbf{A}$ where the $a_{ij}$ are issued from a uniform distribution on $[-1,1]$ for $j>i$ (with $a_{ji}=-a_{ij}$). To ensure that $\mathbf{Q}$ be orthogonal, we set $\mathbf{Q} = (\mathbf{I}-\mathbf{A})^{-1}(\mathbf{I}+\mathbf{A})$.

• We assume now that we have a computer with three significative digits. If we solve the following linear system $% \begin{array} [c]{r}% 10^{-4}x_{1}+x_{2}=1\\ x_{1}+x_{2}=2 \end{array}$ we obtain the 'exact' solutions: $x_{1}=1,00010...\;\;,\;x_{2}=0,99990...$

By applying the Gauss elimination method, we obtain $% \begin{array} [c]{r}% x_{1}+10^{4}x_{2}=10^{4}\\ -9990x_{2}=-9990 \end{array}$ because $-10^{4}+1=-9999$ and $-10^{4}+2=-9998$ are roundoff to $-9990$. It yields $x_{1}\simeq0\text{ et }x_{2}\simeq1$ If we swap the two rows: $% \begin{array} [c]{r}% x_{1}+x_{2}=2\\ 10^{-4}x_{1}+x_{2}=1 \end{array}$ then we have: $% \begin{array} [c]{r}% x_{1}+x_{2}=2\\ 0,999x_{2}=0,999 \end{array}$ Indeed, $-10^{-4}+1=0,9999\simeq0,999$ and $-2.10^{-4}+1=0,9998\simeq 0,999.$ It yields the more accurate values $x_{1}\simeq1\text{ et }x_{2}\simeq1$ Consequently, by swaping rows or columns such that the "Gauss pivot" be the larger as possible, we can obtain a more accurate result.

### Exercice [PY-3]

• Define a new GaussJordan python function where for each step, we search the maximum absolute value in the current pivot column.
• ## Cramer's rule

We are now going to derive explicit formulas for the solution of $\mathbf{A}x=\mathbf{b}$. The matrix $\mathbf{A}$ of this linear system is a square matrix of order $n$, and we shall assume that is not singular. We set $x_{i}=\frac{\det\left( \mathbf{B}_{i}\right) }{\det\left( \mathbf{A}\right) }%$ where $\mathbf{B}_{i}=\left[ \begin{array} [c]{ccccccc}% a_{11} & \cdots & a_{1,i-1} & b_{1} & a_{1,i+1} & \cdots & a_{1n}\\ a_{21} & \cdots & a_{2,i-1} & b_{2} & a_{2,i+1} & \cdots & a_{2n}\\ \vdots & & \vdots & \vdots & \vdots & & \vdots\\ a_{n1} & \cdots & a_{n,i-1} & b_{n} & a_{n,i+1} & \cdots & a_{nn}% \end{array} \right]$ We then calculate $n+1$ determinants and $n$ divisions. For calculating one determinant, we must do $n!-1$ additions, $n!(n-1)$ multiplications. This yields to: \begin{align*} & \left( n+1\right) !\text{ additions,}\\ & \left( n+2\right) !\text{ multiplications,}\\ & \text{and } n\text{ divisions.} \end{align*} For $n=10$ we need to do \begin{align*} & 900\;\text{operations for the Gauss elimination}\\ & 400\;000\;000\;\text{operations for the Cramer method}\\ & \text{No comment ....}% \end{align*}

## Inverse of a matrix

What is a singular or noninvertible matrix?

There might be cases where the matrix is not invertible and then the system cannot be solved. Consider the following linear system of equations: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 1 & -2 & 3\\ 2 & -1 & 2 \end{array}\right],\qquad\mathbf{b}=\left[\begin{array}{c} 1\\ -2\\ 3 \end{array}\right]$$ where only the last element of $\mathbf{A}$ changed. Let us try to apply the Gauss-Jordan elimination here: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 1 & -2 & 3\\ 2 & -1 & 2 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right.\right]$$ Operations 2 and 3: Multiply row 1 by -1 and sum to row 2: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & -3 & 4\\ 2 & -1 & 2 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ -1 & 1 & 0\\ 0 & 0 & 1 \end{array}\right.\right]$$ Operations 2 and 3: Multiply row 1 by -2 and sum to row 3: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & -3 & 4\\ 0 & -3 & 4 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ -1 & 1 & 0\\ -2 & 0 & 1 \end{array}\right.\right]$$ Consequently two rows are equal on the left side... Multiply row 2 by -1 and sum to row 3: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & -3 & 4\\ 0 & 0 & 0 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ -1 & 1 & 0\\ -1 & -1 & 1 \end{array}\right.\right]$$ We then lost one row. No matter the path we take, we will always have a zero-ed row in this case and cannot make the left side equal to the identity matrix. The reason for this is that any row in this matrix can be made by a linear combination of the other two (e.g. sum rows 1 and 2 of $\mathbf{A}$ and you get row 3). In other words, the system is really made from two equations and the other is generated by the original two rows. Then we have more variables than equations and the system cannot be solved. The fact that the matrix $\mathbf{A}$ cannot be inverted is a sign that the system is not solvable. In those situations, it is said the matrix is noninvertible or singular. We can also state that the rows of the matrix are linearly dependent, because we can make one by a linear combination of the others.

Now a piece of important trivia regarding matrix inversion:

• If a matrix is non-invertible, its transpose is non-invertible too
• From the previous, the columns of a non-invertible matrix are linearly dependent
• If the determinant of a matrix is zero, then the matrix is not invertible
• The rank of an invertible matrix of size $n\times n$ is $n$ (full rank)
• The eigenvalues of the an invertible matrix are all different from zero

Numerically, at least two additional things prevent us from getting a good solution:

• While not exact linear combinations of each other, some of the equations may be so close to linearly dependent that roundoff errors in the machine render them linearly dependent at some stage in the solution process. In this case your numerical procedure will fail, and it can tell you that it has failed.

• Accumulated roundoff errors in the solution process can swamp the true solution. This problem particularly emerges if $n$ is too large. The numerical procedure does not fail algorithmically. However, it returns a set of $x$’s that are wrong, as can be discovered by direct substitution back into the original equations. The closer a set of equations is to being singular, the more likely this is to happen, since increasingly close cancellations will occur during the solution. In fact, the preceding item can be viewed as the special case in which the loss of significance is unfortunately total.

Much of the sophistication of well-written linear equation-solving packages is devoted to the detection and/or correction of these two pathologies. It is difficult to give any firm guidelines for when such sophistication is needed, since there is no such thing as a “typical” linear problem. But here is a rough idea: Linear sets with $n$ no larger than $20$ or $50$ are routine if they are not close to singular; they rarely require more than the most straightforward methods, even in only single precision or float. With double precision, this number can readily be extended to $n$ as large as perhaps $1000$, after which point the limiting factor anyway soon becomes machine time, not accuracy. Even larger linear sets, $n$ in the thousands or millions, can be solved when the coefficients are sparse (that is, mostly zero), by methods that take advantage of the sparseness. Unfortunately, one seems just as often to encounter linear problems that, by their underlying nature, are close to singular. In this case, you might need to resort to sophisticated methods even for the case of $n=10$. Singular value decomposition is a technique that can sometimes turn singular problems into nonsingular ones, in which case additional sophistication becomes unnecessary.

There are many algorithms dedicated to the solution of large sets of equations, each one being tailored to a particular formof the coefficient matrix (symmetric, banded, sparse...). A well-known collection of these routines is LAPACK—Linear Algebra PACKage, originally written in Fortran77. The NumPy linear algebra functions rely on BLAS and LAPACK to provide efficient low level implementations of standard linear algebra algorithms. Those libraries may be provided by NumPy itself using C versions of a subset of their reference implementations but, when possible, highly optimized libraries that take advantage of specialized processor functionality are preferred (see numpy.linalg).

A system of $n$ linear equations in $n$ unknowns has a unique solution, provided that the determinant of the coefficient matrix is nonsingular; that is, $det(A)=0$. The rows and columns of a nonsingular matrix are linearly independent in the sense that no row (or column) is a linear combination of other rows (or columns). If the coefficient matrix is singular, the equations may have an infinite number of solutions, or no solutions at all, depending on the constant vector.

Let's say we have the following system of linear equations: $$\begin{array}{ccccc} x_{1} & +x_{2} & -x_{3} & = & 1\\ x_{1} & -2x_{2} & +3x_{3} & = & -2\\ -x_{1} & 2x_{2} & -x_{2} & = & 3 \end{array}$$ This can be represented in matrix form as: $$\mathbf{Ax=b},$$ where $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 1 & -2 & 3\\ -1 & 2 & -1 \end{array}\right],\qquad\mathbf{x}=\left[\begin{array}{c} x_{1}\\ x_{2}\\ x_{3} \end{array}\right],\qquad\mathbf{b}=\left[\begin{array}{c} 1\\ -2\\ 3 \end{array}\right].$$ To find $\mathbf{x}$, one needs to do the equivalent to divide $\mathbf{b}$ by $\mathbf{A}$. Since $\mathbf{A}$ is a matrix, we cannot simply divide by it. Instead, we make use of the notion of inverse of a matrix. The inverse of a matrix $\mathbf{A}$ is a matrix such that, when one is multiplied by the other, the result is the identity matrix $\mathbf{I}$ (a special matrix with 1's in the diagonal and 0's everywhere else): $$\mathbf{A^{-1}A=AA^{-1}=I}$$ In our original problem, we can then premultiply each side of the equation by the inverse of $\mathbf{A}$ to get: $$\mathbf{A^{-1}Ax=x=A^{-1}b}$$ To find this inverse, we need to find each element in the matrix $\mathbf{A^{-1}}$ that, when multiplied by the matrix $\mathbf{A}$, will produce the identity matrix. To do that, we can use the widely known Gauss-Jordan elimination method. We will use a joint matrix $\left[\mathbf{A|I}\right]$ by concatenating the columns of $\mathbf{A}$ and $\mathbf{I}$. Then, we perform a set of operations that converts $\mathbf{A}$ into $\mathbf{I}$. In the process, $\mathbf{I}$ is converted into $\mathbf{A^{-1}}$, concluding the joint matrix $\left[\mathbf{I|A^{-1}}\right]$. For our example: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 1 & -2 & 3\\ -1 & 2 & -1 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right.\right]$$ We can do the following operations to the joint matrix:

1. Swap two rows
2. Multiply a row by a non-zero scalar
3. Sum two rows and replace one of them with the result
Now let us apply these operations to our joint matrix.

Operation 3: Sum rows 2 and 3 and store the result in row 3: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 1 & -2 & 3\\ 0 & 0 & 2 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 1 & 1 \end{array}\right.\right]$$ Operations 2 and 3: Multiply row 1 by -1, sum with row 2 and store result in 2: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & -3 & 4\\ 0 & 0 & 2 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ -1 & 1 & 0\\ 0 & 1 & 1 \end{array}\right.\right]$$ Operations 2 and 3: Multiply row 3 by -2, sum with row 2 and store result in 2: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & -3 & 0\\ 0 & 0 & 2 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ -1 & -1 & -2\\ 0 & 1 & 1 \end{array}\right.\right]$$ Operation 2: Multiply row 2 by -1/3 and row 3 by 1/2: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 1 & -1\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\left|\begin{array}{ccc} 1 & 0 & 0\\ 1/3 & 1/3 & 2/3\\ 0 & 1/2 & 1/2 \end{array}\right.\right]$$ Operations 2 and 3: Mutiply row 2 by -1 and sum to row 1, then sum row 3 to row 1: $$\mathbf{A}=\left[\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\left|\begin{array}{ccc} 2/3 & 1/6 & -1/6\\ 1/3 & 1/3 & 2/3\\ 0 & 1/2 & 1/2 \end{array}\right.\right]$$ We end with the inverse: $$\mathbf{A^{-1}}=\left[\begin{array}{ccc} 2/3 & 1/6 & -1/6\\ 1/3 & 1/3 & 2/3\\ 0 & 1/2 & 1/2 \end{array}\right]$$ Now we can solve the system of equations at the beginning by $\mathbf{x=A^{-1}b}$. We could have solved the original problem by joining $\mathbf{A}$ and $\mathbf{b}$, and solving with the same method $\left[\mathbf{A|b}\right]$ (we would end up with $\left[\mathbf{I|x}\right]$). One of the benefit of calculating the inverse of $\mathbf{A}$ is that, in case we change $\mathbf{b}$, we only need to apply again $\mathbf{x=A^{-1}b}$ to solve the new system of equations.

To solve the inverse of a matrix $A$, we know that $A.A^{-1}=I$ where $I$ is the identity matrix. If we set $e_{i}$ the vector with null coordinates except the $i$-coordinate which is equal to $1$ then we can find the inverse matrix of $A$ by solving the $n$ following linear systems \begin{align*} A.X_{i} & =e_{i} \end{align*} where $X_{i}$ is the $i$-column of $A^{-1}$. To solve iteratively this inverse problem, we can use the $L.U$ decomposition method.

Let $L$ be a lower triangular matrix with $1$ on its diagonal and $U$ be a upper triangular matrix then with some assumptions matrix $A$ can be written as $A=L.U$. Hence, \begin{align*} Ax & =b\\ LUx & =b \end{align*} where, $A=\left[ \begin{array} [c]{cccc}% 1 & 0 & \ldots & 0\\ \times & 1 & & \vdots\\ \vdots & \ddots & \ddots & 0\\ \times & \cdots & \times & 1 \end{array} \right] \left[ \begin{array} [c]{cccc}% \times & \times & \ldots & \times\\ 0 & \times & & \vdots\\ \vdots & \ddots & \ddots & \times\\ 0 & \cdots & 0 & \times \end{array} \right]$ If we set $y=U.x$ then we firstly solve $L.y=b$ by using a forward substitution methodology. Hence, as we know the vector $y$, we can solve $U.x=y$ by using a back substitution method. Consequently, we finally obtain the unknow vector $x$. The main interest of this method is that if we have many linear systems with the same matrix $A$ we can use iteratively solve the systems with this $L.U$ decomposition. It can be obtained by using the Gauss elimination.

## What is an ill-conditioned matrix?

Let us consider that the vector $\mathbf{b}$ is data collected by some sensors. This data comes with some error $\Delta\mathbf{b}$ attached to it. Our solution to the problem will be: $$\mathbf{x=A^{-1}(b+\Delta b)=A^{-1}b+A^{-1}\Delta b}=\mathbf{x^{\star}+A^{-1}\Delta b}$$ where $\mathbf{x^{\star}}$ is the true solution. The error of our solution caused by the error in the data is $$\mathbf{\Delta x=x-x^{\star}=A^{-1}\Delta b}$$ The error in $\mathbf{\Delta b}$ may get amplified by $\mathbf{A^{-1}}$ and produce a large error in $\mathbf{x}$. In those situations (where large error is a subjective criterion), we say the problem is ill-posed or ill-conditioned. Otherwise, the matrix is well-posed or well-conditioned. To make it simple, you can imagine $\mathbf{A}$ as a scalar $A$. If $\mathbf{A}$ is very small, then its inverse is very large. Then, even a small error in the data gets amplified by the large inverse of $\mathbf{A}$, producing a large deviation in the solution. For a pratical example: $$\mathbf{A}=\frac{1}{2}\left[\begin{array}{cc} 1 & 1\\ 1+10^{-10} & 1-10^{-10} \end{array}\right],\qquad\mathbf{A^{-1}}=\left[\begin{array}{cc} 1-10^{10} & 10^{10}\\ 1+10^{10} & -10^{10} \end{array}\right]$$ The problem with this matrix is that it is very close to being singular, although it is not. This is a condition of the problem and nothing can be done to solve it.

In order to evalute if a matrix is well-conditioned or not we need to introduce the matrix norm framework.
Let us recall that function $N$ is a norm on a vector space $E$ if $\left\{ \begin{array} [c]{l}% N:E\rightarrow\mathbb{R}^{+}\\ \forall x\in E\;\;N\left( x\right) =0\Leftrightarrow x=0\\ \forall x\in E\;\forall\lambda\in\mathbb{K}\;\;N\left( \lambda.x\right) =\left| \lambda\right| .N\left( x\right) \\ \forall x,y\in E\;\;N\left( x+y\right) \leq N\left( x\right) +N\left( y\right) \end{array} \right.$

Definition. Let $N_{1}$ and $N_{2}$ be two norms on $E$. $N_{1}$ and $N_{2}$ are said equivalent iff \begin{align*} \exists\left( \alpha,\beta\right) & \in\left( \mathbb{R}^{+\ast}\right) ^{2}\; such\ that\\ \forall x & \in E\;\;\;\alpha N_{2}\left( x\right) \leq N_{1}\left( x\right) \leq\beta N_{2}\left( x\right) . \end{align*}

Theorem. if $E$ is a finite vector space then all norms on $E$ are equivalents.

On $\mathbb{R}^{n}$ or $\mathbb{C}^{n}$ we have for any $p\geq 1$: \begin{align*} \left\| x\right\| _{p} & =\left( \underset{i=1}{\overset{n}{\sum}}\left| x_{i}\right| ^{p}\right) ^{\frac{1}{p}} \end{align*} Moreover we have: \begin{align*} \left\| x\right\| _{1} & =\underset{i=1}{\overset{n}{\sum}}\left| x_{i}\right| \\ \left\| x\right\| _{2} & =\left( \underset{i=1}{\overset{n}{\sum}}\left| x_{i}\right| ^{2}\right) ^{\frac{1}{2}}\\ \left\| x\right\| _{\infty} & =\underset{i}{\max}\left( \left| x_{i}\right| \right) \end{align*} The square matrix ring of order $n$ is denoted by $\mathcal{M}_{n}\left( \mathbb{K}\right)$. It is also a vector space of dimension $n^{2}$.

Theorem. Let $\left( E,\left\| .\right\| _{E}\right)$, $\left( F,\left\|.\right\| _{F}\right)$ be vector spaces on the same field $\mathbb{K}=\mathbb{R}$ or $\mathbb{C}$. Let $A\in L\left( E,F\right)$ (linear continuous functions on $E\rightarrow F$).
The following properties are the all equivalents
a) $A$ is continuous at $x\in E$, for example at exemple en $x=\vec{0}$.
b) $A$ is continuous on $E$
c) $A$ is uniformly continue on $E$, $A$ is lipschitz on $E$
d) the image $A\left( x\right)$ of all bounded parts $X$ of $E$ is a bounded part of $F$
e) There exists a real value $m\geq0$ such that $\forall x\in E$ $\left\| A\left( x\right) \right\| _{F}\leq m.\left\| x\right\| _{E}$.

The fact that $\mathbf{A}$ is associated to a linear continuous function allows us to define a matrix norm induced by a vector space norm. $\left\| \mathbf{A}\right\|_p :=\underset{% \begin{array} [c]{c}% x\in\mathbb{C}^{n}\\ x\neq0 \end{array} }{\sup}\frac{\left\| \mathbf{A}x\right\|_p }{\left\| x\right\|_p }=\underset{% \begin{array} [c]{c}% x\in\mathbb{C}^{n}\\ \left\| x\right\|_p \leq1 \end{array} }{\sup}\left\| \mathbf{A}x\right\|_p =\underset{% \begin{array} [c]{c}% x\in\mathbb{C}^{n}\\ \left\| x\right\|_p =1 \end{array} }{\sup}\left\| Ax\right\|_p$

Theorem. Let $A=\left( a_{ij}\right)$ be a matrix of $\mathcal{M}_{n}\left( \mathbb{R}\right)$ then \begin{align*} \left\| A\right\| _{1} & =\sup\frac{\left\| Ax\right\| _{1}}{\left\| x\right\| _{1}}=\underset{j}{\max}\left( \underset{i}{\sum}\left| a_{ij}\right| \right) \\ \left\| A\right\| _{2} & =\sup\frac{\left\| Ax\right\| _{2}}{\left\| x\right\| _{2}}=\sqrt{\varrho\left( A^{\ast}A\right) }=\left\| A^{\ast }\right\| _{2}\\ \left\| A\right\| _{\infty} & =\sup\frac{\left\| Ax\right\| _{\infty}% }{\left\| x\right\| _{\infty}}=\underset{i}{\max}\left( \underset{j}{\sum }\left| a_{ij}\right| \right) \end{align*} Moreover, if $A^{T}A=AA^{T}$ (or $A^{\ast}A=AA^{\ast}$ in $\mathbb{C}$) then $\left\| A\right\| _{2}=\varrho\left( A\right) .$
Remark. $\varrho\left( A\right) =\max\left\{ \left| \lambda_{i}\left( A\right) \right| ;\;1\leq i\leq n\right\}$ is called the spectral radius of $\mathbf{A}$.
There exists matrix norms that are non induced by vector norm. For example
$\left\| A\right\|_F =\left( \underset{i=1}{\overset{n}{\sum}}\underset {i=1}{\overset{m}{\sum}}\left| a_{ij}\right| ^{2}\right) ^{\dfrac{1}{2}}.$ Here $\left\| I\right\|_F =\sqrt{n}\neq1$. The fact that $\left\| I\right\| =1$ is a property of matrix norms induced by vector norms.

## Condition number of a linear system

Let $\mathbf{A}.\mathbf{x}=\mathbf{b}$ be a linear system. We assume that vector $\mathbf{b}$ is perturbated by vector $\mathbf{\Delta b}$. It yields the new linear system to be solved: $\mathbf{A}\left( \mathbf{x} + \mathbf{\Delta x} \right) = \mathbf{b} + \mathbf{\Delta b}$ For any matrix norm induced by the vector norm $p$, we can calculate the relative error $\dfrac{\left\| \mathbf{\Delta x} \right\|_p}{\left\|\mathbf{x}\right\|_p }$ \begin{align*} \mathbf{A}.\mathbf{\Delta x} & = \mathbf{\Delta b}\\ \mathbf{\Delta x} & = \mathbf{A^{-1}}.\mathbf{\Delta b} \end{align*} Consequently $\left\| \mathbf{\Delta x} \right\|_p \leq\left\| \mathbf{A^{-1}}\right\|_p .\left\| \mathbf{\Delta} \mathbf{b}\right\|_p$ and \begin{align*} \mathbf{b} & =\mathbf{A}.\mathbf{x}\Rightarrow\left\| \mathbf{b}\right\|_p \leq\left\| \mathbf{A}\right\|_p .\left\| \mathbf{x}\right\| \\ \frac{1}{\left\| \mathbf{x}\right\|_p } & \leq\dfrac{\left\| \mathbf{A}\right\|_p }{\left\| \mathbf{b}\right\|_p } \end{align*} It yields: $\dfrac{\left\| \mathbf{\Delta x}\right\|_p }{\left\| \mathbf{x}\right\|_p }\leq\left\| \mathbf{A}\right\|_p .\left\| \mathbf{A^{-1}}\right\|_p \dfrac{\left\| \mathbf{\Delta b}\right\|_p }{\left\| \mathbf{b}\right\|_p }$ Definition. The positive real value $\left\| \mathbf{A}\right\|_p .\left\| \mathbf{A^{-1}}\right\|_p$ is called condition number of $A$ of order $p$. It is denoted $cond_p(\mathbf{A})$.

### Exercice [PY-4]

1) Define the Wilson matrix $\mathbf{A}$. For this, use the numpy and linalg python libraries.

        
import numpy as np
from numpy import linalg as LA

A=np.array([# To Do])
b=np.array([32,23,33,31])
c=np.array([32.1,22.9,33.1,30.9])
# Solve the linear system
x=LA.solve(A,b)
# solve the perturbated linear system
y=LA.solve(A,c)
print("Error : ",x-y)



2) Calculate the condition number of $\mathbf{A}$ for the $1$-norm and $2$-norm. For this, you need to create the following functions:


""" 1-Norm function """
def Norm1(A):
#TO DO
return

""" 2-Norm function """
def Norm2(A):
#TO DO
return value

For the $2$-norm you can use the eigenvals function of linalg.

### Exercice [PY-5]

Define the Hilbert(n) function (see wikipedia for the definition of the Hilbert matrix) which allows to create the Hilbert matrix of dimension $n$. In this exercice, we want to show that this matrix is very ill-conditionned even for $n=5,10$ and $15$. We then solve the following linear system:

        
import numpy as np
from numpy import linalg as LA

""" Hilbert matrix function """
def Hilbert(n):
#TO DO
return value

n=5
H=Hilbert(n)
x=np.ones(n)
b=np.dot(H,x)
X=LA.solve(H,b)

print("Error = ",x-X)



### Exercice [AL-1]

Let $\left\Vert \mathbf{.}\right\Vert_p: \mathbb{R}^n \longrightarrow \mathbb{R}$ (with $p≥1$) be a function defined by $$\left\Vert \mathbf{x}\right\Vert_p = \left( \sum_{i=1}^{i=n} \left| x_i \right|^p \right)^\frac{1}{p}$$
• Show that this function is a norm
• For any $p>1$ and any real value $q$ such that $\dfrac{1}{p}+\dfrac{1}{q}=1$ show that we have the following inequality: $$\text{for any }\mathbf{x},\mathbf{y}\in \mathbb{R}^n\ \ \ \sum_{i=1}^{i=n}\left| x_i.y_i \right| \le \left(\sum_{i=1}^{i=n}\left| x_i \right|^p \right)^\frac{1}{p}\left(\sum_{i=1}^{i=n}\left| y_i \right|^q\right)^\frac{1}{q}$$
• Case $p=1$ is obvious.
• For $p>1$, we denote by $q>1$ the real value such that $$\dfrac{1}{p}+\dfrac{1}{q}=1$$ As the exponential function is convex we then have:
for any $u,v \in \mathbb{R}$ and $\theta \in ]0,1[$ $e^{\theta u+(1-\theta)v} \le \theta e^{u} + (1-\theta)e^{v}$

By substituting $\theta$ by $\dfrac{1}{p}$, $u$ by $p.\ln(\alpha)$ and $v$ by $q.\ln(\beta)$ it yields:
for any $\alpha,\beta \in \mathbb{R}_{+}$ $\alpha\beta \le \dfrac{\alpha^{p}}{p} + \dfrac{\beta^{q}}{q}$
Consequently for any $\mathbf{x},\mathbf{y} \in \mathbb{R}^n$ $$\dfrac{\left| x_i.y_i \right|}{\left\Vert \mathbf{x}\right\Vert_p\left\Vert \mathbf{y}\right\Vert_q} \le \dfrac{1}{p} \dfrac{\left| x_i \right|^{p}}{\left\Vert \mathbf{x}\right\Vert_p^p} + \dfrac{1}{q} \dfrac{\left| y_i \right|^{q}}{\left\Vert \mathbf{y}\right\Vert_q^q}\text{ for all }i=1,...,n$$ By summation we obtain $$\sum_{i=1}^{n}{\left| x_i.y_i \right|} \leq \left\Vert \mathbf{x}\right\Vert_p \left\Vert \mathbf{y}\right\Vert_q$$ Moreover we have for $i=1,...,n$ $$(\left| x_i \right|+ \left|y_i \right|)^p = \left| x_i \right| (\left| x_i \right|+ \left|y_i \right|)^{p-1} + \left| y_i \right| (\left| x_i \right|+ \left|y_i \right|)^{p-1}$$ By using the previous inequality and by summation it yields $$\sum_{i=1}^{n}(\left| x_i \right|+ \left|y_i \right|)^p \leq (\left\Vert \mathbf{x}\right\Vert_p + \left\Vert \mathbf{y}\right\Vert_p) \left( \sum_{i=1}^{n}(\left| x_i \right|+ \left|y_i \right|)^{(p-1)q} \right)^\frac{1}{q}$$ As $(p-1)q=p$, we then obtain the triangular inequality $$\left\Vert \mathbf{x} + \mathbf{y} \right\Vert_p \leq \left\Vert \mathbf{x}\right\Vert_p + \left\Vert \mathbf{y}\right\Vert_p$$
• ### Exercice [AL-2]

Let $\left\Vert \mathbf{.}\right\Vert_p$ (with $p≥1$) be a matrix norm induced by the vector norm $p$.
Show that, for any $p \ge 1$,
• for any invertible $\mathbf{A}\in \mathcal{M}_n(\mathbb{R})$, $\left\Vert \mathbf{A}\right\Vert_p.\left\Vert \mathbf{A^{-1}}\right\Vert_p \ge 1$
• $\left\Vert \mathbf{I_d}\right\Vert_p = 1$
• for any $\mathbf{A},\mathbf{B}\in \mathcal{M}_n(\mathbb{R})$, $\left\Vert \mathbf{A.B}\right\Vert_p \le \left\Vert \mathbf{A}\right\Vert_p.\left\Vert \mathbf{B}\right\Vert_p$
• By using the matrix norm induced by a vector norm definition you can obtain these results.
• ### Exercice [AL-3]

Let $\mathbf{A} \in \mathcal{M}_n(\mathbb{R})$ and $\mathbf{b}$ be a non zero vector of $\mathbb{R}$. We denote by $\mathbf{x}$ the solution to the following linear system $\mathbf{A}.\mathbf{x}=\mathbf{b}$.

The main objective of this exercice is to know how the solution $\mathbf{x}$ will be modified according to perturbations $\mathbf{\Delta A}$ of matrix $\mathbf{A}$ and $\mathbf{\Delta b}$ of vector $\mathbf{b}$ $$(\mathbf{A} + \mathbf{\Delta A}).\mathbf{y}=\mathbf{b} + \mathbf{\Delta b}$$ where $\mathbf{\Delta A}\in \mathcal{M}_n(\mathbb{R})$ and $\mathbf{\Delta b}\in \mathbb{R}^n$. In this exercice we will consider a matrix norm induced by a vector norm of order $p$ (with $p \geq 1$). We also denote by $\mathbf{cond_p}(\mathbf{A})= \left\Vert \mathbf{A^{-1}}\right\Vert_p.\left\Vert \mathbf{A}\right\Vert_p$ the condition number of matrix $\mathbf{A}$.
Moreover, we assume that $\left\Vert \mathbf{A^{-1}}.\mathbf{\Delta A}\right\Vert_p \le 1$.

• Show that the linear system $$(\mathbf{A} + \mathbf{\Delta A}).\mathbf{y}=\mathbf{b} + \mathbf{\Delta b}$$ has a unique solution. We denote this solution by $\mathbf{y} = \mathbf{x} + \mathbf{\Delta x}$.
• Show that the relative perturbation of the initial solution is bounded by $\dfrac{\left\| \mathbf{\Delta x}\right\|_p }{\left\| \mathbf{x}\right\|_p }\leq \dfrac{\mathbf{cond_p}(\mathbf{A})}{1-\left\Vert \mathbf{A^{-1}}.\mathbf{\Delta A}\right\Vert_p} \left( \dfrac{\left\| \mathbf{\Delta b}\right\|_p}{\left\| \mathbf{b}\right\|_p } + \dfrac{\left\| \mathbf{\Delta A}\right\|_p}{\left\| \mathbf{A}\right\|_p } \right)$
• Is it possible in previous inequality to replace $\mathbf{cond_p}(\mathbf{A})$ by a smaller constant positive real value? In this case, we assume that this inequality is verified for all vectors $\mathbf{b}$, $\mathbf{\Delta b}$ and all matrices $\mathbf{\Delta A}$ such that $\left\Vert \mathbf{A^{-1}}.\mathbf{\Delta A}\right\Vert_p \le 1$.
• To DO.
• ## Linear Algebra framework

Let $E$ and $F$ be two vector space on the same field $\mathbb{K=R}$ or $\mathbb{K=C}$. A linear map $f$ from $E$ to $F$ satisfies the following properties: $\left\{ \begin{array} [c]{l}% \forall x\in E,\;\forall y\in E,\;\;f\left( x+y\right) =f\left( x\right) +f\left( y\right) \\ \forall x\in E,\;\forall\lambda\in\mathbb{K},\;\;f\left( \lambda x\right) =\lambda f\left( x\right) \end{array} \right.$

Let $\mathcal{L}\left( E,F\right)$ be the set of linear maps from $E$ to $F$. We can define the following operators in $\mathcal{L}\left( E,F\right)$:

$\left\{ \begin{array} [c]{l}% f+g\;\;\text{par}\;\forall x\in E\;\;\left( f+g\right) \left( x\right) =f\left( x\right) +f\left( y\right) \\ \lambda f\in\mathbb{K}\;\text{par }\forall x\in E,\;\forall\lambda \in\mathbb{K},\;\;\left( \lambda f\right) \left( x\right) =\lambda f\left( x\right) . \end{array} \right.$

Consequently $\mathcal{L}\left( E,F\right)$ is a vector space (also called linear space) in the field $\mathbb{K}$. If $E$ et $F$ are finite vector spaces then for any basis $\mathcal{B}% _{E}=\left( e_{j}\right) _{1\leq j\leq n}$ of $E$ and $\mathcal{B} _{F}=\left( f_{i}\right) _{1\leq i\leq m}$ of $F$, the linear map $g\in\mathcal{L}\left( E,F\right)$ is defined by the coefficients $\left( a_{ij}\right)$ such that: $g\left( e_{j}\right) =\sum_{i=1}^{m}a_{ij}f_{i}.$ We set these coefficients on the following table: $\begin{array}[c]{l|llll} & g\left( e_{1}\right) & g\left( e_{2}\right) & \cdots & g\left(e_{n}\right)\\ \hline f_{1} & a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ f_{2} & a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ \vdots & \vdots & \vdots & & \vdots\\ f_{m} & a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{array}$ where the values of the $j$-column of the table are the coordinates of the vector $g\left( e_{j}\right)$ in $\mathcal{B}_{F}$. Consequently, $A=\left( \begin{array} [c]{cccc}% a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}% \end{array} \right) ,$

is then called the $\bf{matrix}$ associated to the linear map $g\in \mathcal{L}\left( E,F\right)$ written in the basis $\left( e_{j}\right) _{1\leq j\leq n}$ and $\left( f_{i}\right) _{1\leq i\leq m}$. For all $x\in E$ such that $x=\sum\limits_{j=1}^{n}x_{j}e_{j}$, it yields

\begin{align*} g\left( x\right) & =\sum\limits_{j=1}^{n}x_{j}g\left( e_{j}\right) \\ & =\sum\limits_{j=1}^{n}x_{j}\sum_{i=1}^{m}a_{ij}f_{i}\\ & =\sum_{i=1}^{m}\left( \sum\limits_{j=1}^{n}a_{ij}x_{j}\right) f_{i}\\ & =\sum_{i=1}^{m}y_{i}f_{i}% \end{align*} where $y_{i}=\sum\limits_{j=1}^{n}a_{ij}x_{j}$ for all $1\leq i\leq m$. With $y_{i}$ being the coordinates of the vector $y$ of $\mathbb{K}^{m}$ we write $y=Ax$ or \begin{align*} \left( \begin{array} [c]{c}% y_{1}\\ y_{2}\\ \\ y_{m}% \end{array} \right) & =x_{1}\left( \begin{array} [c]{c}% a_{11}\\ a_{21}\\ \\ a_{m1}% \end{array} \right) +\cdots+x_{n}\left( \begin{array} [c]{c}% a_{1n}\\ a_{1n}\\ \\ a_{mn}% \end{array} \right) ,\\ y & =\sum\limits_{j=1}^{n}x_{j}A_{j}, \end{align*} where $A_{j}$ is the $j$-column vector of matrix $A$.

If $m$ is not equal to $n$ then matrix $A$ is rectangular. We denote by $\mathcal{M}_{m,n}\left( \mathbb{K}\right)$ the matrices set with $m$ rows and $n$ columns where all the coefficients belong to $\mathbb{K}$. $\mathcal{M}_{m,n}\left( \mathbb{K}\right)$, with the vector addition operator and the scalar multiplication on $\mathbb{K}$ is a vector space. We can show that the following matrices $E_{ij}=\left( e_{kl}\right) _{1\leq k\leq m,1\leq l\leq n}$ with $0$ coefficients except for $k=i$ and $l=j$ where $e_{kl}=1$ generate a basis for $\mathcal{M}_{m,n}\left( \mathbb{K}\right)$. The dimension of this space is then $mn.$

## References

1. P.G. Ciarlet (1982), Introduction à l’analyse numérique matricielle et à l’optimisation, Masson
2. J.J. Dongarra, C.B. Moler, J.R. Bunch & G.W. Stewart (1979), LINPACK Users’ Guide. SIAM.
3. D.K. Faddeev & V.N. Faddeeva (1963), Computational Methods of Linear Algebra. Freeman & Co.
4. G.H. Golub & C.F. Van Loan (1989), Matrix Computations. Second edition. John Hopkins Univ. Press.
5. N.J. Higham (1996), Accuracy and Stability of Numerical Algorithms. SIAM.
6. A.S. Householder (1964), The Theory of Matrices in Numerical Analysis. Blaisdell Publ. Comp.
7. G.W. Stewart (1973), Introduction to Matrix Computations. Academic Press.
8. L.N. Trefethen & D. Bau (1997), Numerical Linear Algebra. SIAM.
9. J.H. Wilkinson (1969), Rundungsfehler. Springer-Verlag.
10. J.H. Wilkinson & C. Reinsch (1971), Handbook for Automatic Computation, Volume II, Linear Algebra. Springer-Verlag.