By Richard Maxwell
I wrote this because all the OpenGL and DirectX tutorials on the net and stackoverflow are all confusing and needlessly complicated. Conflating row and column matrices with matrix storage in computer memory, pre and post multiplication, right hand and left hand coordinates, and unexplained mathematical shortcuts and reader assumptions. It means it's hard to get anything right first time. So I wrote this document while I figured it all out myself, so that I could have a reference for myself later.
That you know how to do 4th form (8th grade/year) matrix multiplication, and have access to wikipedia and the net.
M = Matrix, and from now on, I'm assuming thats a 4 row by 4 column matrix (4x4 Square matrix).
M = [ M11 M21 M31 M41 ] where Mx.y = Mcolumn.row
[ M12 M22 M32 M42 ]
[ M13 M23 M33 M43 ]
[ M14 M24 M34 M44 ]
I = [ 1 0 0 0 ] = Identity Matrix
[ 0 1 0 0 ]
[ 0 0 1 0 ]
[ 0 0 0 1 ]
Mtranspose = mirror along identiy axis
= [ M11 M12 M13 M14 ]
[ M21 M22 M23 M24 ]
[ M31 M32 M33 M34 ]
[ M41 M42 M43 M44 ]
Note: Mtranspose is also written as M^T.
Ma and Mb are 4x4 matricies then multiply them together:
MaMb = [ ( Ma11Mb11 + ) ( Ma12Mb11 + ) ( Ma13Mb11 + ) ( Ma14Mb11 + ) ] = Ma.Mb = Ma*Mb
[ ( Ma21Mb12 + ) ( Ma22Mb12 + ) ( Ma23Mb12 + ) ( Ma24Mb12 + ) ]
[ ( Ma31Mb13 + ) ( Ma32Mb13 + ) ( Ma33Mb13 + ) ( Ma34Mb13 + ) ]
[ ( Ma41Mb14 ) ( Ma42Mb14 ) ( Ma43Mb14 ) ( Ma44Mb14 ) ]
[ ]
[ ( Ma11Mb21 + ) ( Ma12Mb21 + ) ( Ma13Mb21 + ) ( Ma14Mb21 + ) ]
[ ( Ma21Mb22 + ) ( Ma22Mb22 + ) ( Ma23Mb22 + ) ( Ma24Mb22 + ) ]
[ ( Ma31Mb23 + ) ( Ma32Mb23 + ) ( Ma33Mb23 + ) ( Ma34Mb23 + ) ]
[ ( Ma41Mb24 ) ( Ma42Mb24 ) ( Ma43Mb24 ) ( Ma44Mb24 ) ]
[ ]
[ ( Ma11Mb31 + ) ( Ma12Mb31 + ) ( Ma13Mb31 + ) ( Ma14Mb31 + ) ]
[ ( Ma21Mb32 + ) ( Ma22Mb32 + ) ( Ma23Mb32 + ) ( Ma24Mb32 + ) ]
[ ( Ma31Mb33 + ) ( Ma32Mb33 + ) ( Ma33Mb33 + ) ( Ma34Mb33 + ) ]
[ ( Ma41Mb34 ) ( Ma42Mb34 ) ( Ma43Mb34 ) ( Ma44Mb34 ) ]
[ ]
[ ( Ma11Mb41 + ) ( Ma12Mb41 + ) ( Ma13Mb41 + ) ( Ma14Mb41 + ) ]
[ ( Ma21Mb42 + ) ( Ma22Mb42 + ) ( Ma23Mb42 + ) ( Ma24Mb42 + ) ]
[ ( Ma31Mb43 + ) ( Ma32Mb43 + ) ( Ma33Mb43 + ) ( Ma34Mb43 + ) ]
[ ( Ma41Mb44 ) ( Ma42Mb44 ) ( Ma43Mb44 ) ( Ma44Mb44 ) ]
Properties (for square matricies)
- MI = M
- MaMb != MbMa
- Ma(MbMc) = (MaMb)Mc
- (MaMb)^T = (Mb^T)(Ma^T) (note: order is swapped)
- M^(-1) == matrix inverse, where M.M^(-1) == I
- (MaMb)^(-1) = Mb^(-1)Ma^(-1) (note: order is swapped)
If Q is a matrix that is othronormal then: Q^(-1) = Q^T
All rotations are counterclockwise.
sin == sin(theta) where theta = angle of rotation.
Rx = [ 1 0 0 0 ] = Rpitch
[ 0 cos -sin 0 ]
[ 0 sin cos 0 ]
[ 0 0 0 1 ]
Ry = [ cos 0 sin 0 ] = Ryaw
[ 0 1 0 0 ]
[ -sin 0 cos 0 ]
[ 0 0 0 1 ]
Rz = [ cos -sin 0 0 ] = Rroll
[ sin cos 0 0 ]
[ 0 0 1 0 ]
[ 0 0 0 1 ]
Nifty Fact: You can think of rotations as defining the new x,y and z axies (which are all normal to eachother, therefore orthonormal) and then the matrix mearly moves from the old axies to the new axies.
R = [ NewX.x NewY.x NewZ.x 0 ]
[ NewX.y NewY.y NewZ.y 0 ]
[ NewX.z NewY.z NewZ.z 0 ]
[ 0 0 0 1 ]
Since it's orthonormal we also get R^(-1) = R^T
Rinverse = [ NewX.x NewX.y NewX.z 0 ]
[ NewY.x NewY.y NewY.z 0 ]
[ NewZ.x NewZ.y NewZ.z 0 ]
[ 0 0 0 1 ]
is you have a position thats at x,y,z in 3d, you can treat it as a matrix:
V = [x]
[y]
[z]
If a is a scalar (ie not a matrix) then:
Va = [x*a]
[y*a]
[z*a]
- The magnitude of V is: M(V) = sqrt(x^2 + y^2 + z^2)
- The normalised vector is: Vn = V(1/M(V))
Because you cannot do translations (moving the vector from a to b) via a matrix if they are only 3x3. To do this we use homogeneous coordinates. Bascially we add another dimension to our vector, set it's value to 1, and then we can do translations.
V = [ x ]
[ y ]
[ z ]
[ 1 ]
If we don't want a vector to be able to be translated, we set the last value to 0. So you'll see people treating angles as a normalised vector with the last value of 0, and positions with a value of 1.
Translation Matricies
T = [ 1 0 0 Tx ] = Translation Matrix
[ 0 1 0 Ty ]
[ 0 0 1 Tz ]
[ 0 0 0 1 ]
Where Tx is the translation in the x direction, Ty in y, and Tz in z when applied like thus:
TV = [ 1 0 0 Tx ] x [ x ] = Lots of maths, then simplification = [ x + Tx ]
[ 0 1 0 Ty ] [ y ] [ y + Ty ]
[ 0 0 1 Tz ] [ z ] [ z + Tz ]
[ 0 0 0 1 ] [ 1 ] [ 1 ]
But if we set the last value to 0 then:
TV = [ 1 0 0 Tx ] x [ x ] = Lots of maths, then simplification = [ x ]
[ 0 1 0 Ty ] [ y ] [ y ]
[ 0 0 1 Tz ] [ z ] [ z ]
[ 0 0 0 1 ] [ 0 ] [ 0 ]
Properties of Translations matrices:
-
TaTb = TbTa (pay attention!)
-
T^(-1) = T where the Tx,Ty and Tz values are -ve. ie:
[ 1 0 0 -Tx ] [ 0 1 0 -Ty ] [ 0 0 1 -Tz ] [ 0 0 0 1 ]
-
TaTb =
[ 1 0 0 Tax + Tbx ] [ 0 1 0 Tay + Tby ] [ 0 0 1 Taz + Tbz ] [ 0 0 0 1 ]
Right, now lets finally get to some graphics stuff...
A model is, amongst other things, a collection of points around its local origin. We can move (translate) and rotate those points using the matricies above.
- To move a vertex we just do: V' = TV
- To rotate a vertex we just do: V' = RV
- To rotate then move the matrix we do V' = TRV
- To move then rotate the matrix we do V' = RTV
So when reading the multiplications, read from right to left to get the order it happens.
Note that for matricies TR != RT. Imagine it yourself, if you have an object at the origin, rotate it, then move it. Then imagine moving it, then rotating if via the point at origin, it'll move in a circle with the radius of M(T).
-
Nifty Fact: Rotate then move (TR) can be combined into one matrix like thus:
[ R R R Tx ] [ R R R Ty ] [ R R R Tz ] [ 0 0 0 1 ]
When we think of the camera, we think we need to rotate it around it's origin, then translate it to where it's looking from. But what actually happens is that we move the entire world and then rotate it so that the camera is located at 0, 0, 0 pointing down the -ve z axis (the Right hand rule), into your monitor.
So, if your camera rotation then translation matrix is C, then your actual view matrix that you apply to all your verticies would be V, where V = C^(-1). Lets do the maths to figure out how to get V.
C = TR (where we rotate by R, then translate by T)
V = C^(-1)
V = (TR)^(-1)
V = R^(-1)T^(-1) (note that order is now swapped)
V = R^(T)T^(-1)
Right, now we are really close to writing down a view transformation matrix that matches what you can find on the internet. If we assume the following:
R = [ NewX.x NewY.x NewZ.x 0 ]
[ NewX.y NewY.y NewZ.y 0 ]
[ NewX.z NewY.z NewZ.z 0 ]
[ 0 0 0 1 ]
Therefore
Rinverse = [ NewX.x NewX.y NewX.z 0 ] = Rtranspose (since its orthonormal).
[ NewY.x NewY.y NewY.z 0 ]
[ NewZ.x NewZ.y NewZ.z 0 ]
[ 0 0 0 1 ]
As well as
T = [ 1 0 0 Tx ]
[ 0 1 0 Ty ]
[ 0 0 1 Tz ]
[ 0 0 0 1 ]
Therefore
Tinverse = [ 1 0 0 -Tx ]
[ 0 1 0 -Ty ]
[ 0 0 1 -Tz ]
[ 0 0 0 1 ]
So V = R^(T)T^(-1) would mean that V is:
Rinverse * Tinverse
Which is
[ NewX.x NewX.y NewX.z 0 ][ 1 0 0 -Tx ]
[ NewY.x NewY.y NewY.z 0 ][ 0 1 0 -Ty ]
[ NewZ.x NewZ.y NewZ.z 0 ][ 0 0 1 -Tz ]
[ 0 0 0 1 ][ 0 0 0 1 ]
resulting in
[ NewX.x NewX.y NewX.z (NewX.x * -Tx +) ]
[ (NewX.y * -Ty +) ]
[ (NewX.z * -Tz ) ]
[ ]
[ NewY.x NewY.y NewY.z (NewY.x * -Tx +) ]
[ (NewY.y * -Ty +) ]
[ (NewY.z * -Tz ) ]
[ ]
[ NewZ.x NewZ.y NewZ.z (NewZ.x * -Tx +) ]
[ (NewZ.y * -Ty +) ]
[ (NewZ.z * -Tz ) ]
[ ]
[ 0 0 0 1 ]
Now when you have two vectors (with compnents x,y,z), then the dot product of those two vectors are:
dot(Va, Vb) = ( Vax * Vbx ) +
( Vay * Vby ) +
( Vaz * Vbz )
So we can now rewrite our view matrix like:
[ NewX.x NewX.y NewX.z dot(NewX, -T) ]
[ NewY.x NewY.y NewY.z dot(NewY, -T) ]
[ NewZ.x NewZ.y NewZ.z dot(NewZ, -T) ]
[ 0 0 0 1 ]
But we still have a problem. What exatly are NewX, NewY and NewZ?
For most FPS games we want to look side to side (yaw) and then up and down (pitch). Initally I thought that was applying Ryaw (Ry) then Rpitch (Rx). However that doesn't work as Rx rotates around the original x axis, not the one translated by Ry. Therefore I need to do Rx first, then Ry. Remember, read right to left to see what happens first.
R = RyRx
R = [ cos(yaw) 0 sin(yaw) 0 ] [ 1 0 0 0 ]
[ 0 1 0 0 ] [ 0 cos(pitch) -sin(pitch) 0 ]
[ -sin(yaw) 0 cos(yaw) 0 ] [ 0 sin(pitch) cos(pitch) 0 ]
[ 0 0 0 1 ] [ 0 0 0 1 ]
R = [ cos(yaw) sin(yaw).sin(pitch) sin(yaw).cos(pitch) 0 ]
[ 0 cos(pitch) -sin(pitch) 0 ]
[ -sin(yaw) cos(yaw).sin(pitch) cos(yaw).cos(pitch) 0 ]
[ 0 0 0 1 ]
Remember that RxRy != RyRx. Rotation order matters. Secondly you can get things like gimbal lock and degenerate rotation matricies as well if you don't watch out. This is where you start reading about Quaternions. But for this document thats optional. All you need to know is that pitch should be in the range -89 to 89 degrees (inclusive), as having 90 degrees would make a degenerate view matrix (one that produces wacky results) due to gimbal lock.
So knowing the original R, we can get NewX, NewY, NewZ.
NewX = [ cos(yaw) ]
[ 0 ]
[ -sin(yaw) ]
NewY = [ sin(yaw).sin(pitch) ]
[ cos(pitch) ]
[ cos(yaw).sin(pitch) ]
NewZ = [ sin(yaw).cos(pitch) ]
[ -sin(pitch) ]
[ cos(yaw).cos(pitch) ]
But wait! I see a lot of code that uses an up vector and lots of cross products. That's because it works by using an origin and a point to look at to get the rotation matrix. But since it doesn't know the yaw or pitch angles, it can't use our cos and sin stuff. What it does instead is:
-
Get the direction of the new Z axis by Vtarget - Vcamera
-
Using the Yaxis (up) vector, find the new X axis by doing a cross product between the direction and up. (cross product between two vectors returns the vector that is normal to the plane represented by the two vectors)
-
Using the new Xaxis and direction, do a cross product to find the rotated Y axis.
NewZ = Normalise(Vtarget - VcameraPosition) NewX = Normalise(Cross(Vup, NewZ)) NewY = Normalise(Cross(NewZ, NewX))
Ok, I'm going to be honest here. I haven't worked though the maths to actually understand what's happening. I just looked at the pretty picture on the net and got the general gist of it.
So for us our projection matrix is as follows
Znear = distance to the front Z clip plane
Zfar = distance to the back Z clip plane
Zd = Zfar - Znear
Znid = 1 / -Zd (note the negative)
fovx = side to side field of view angle
aspect = screen width / screen height
fovy = up and down field of view angle = fovx / aspect
f = cot(fovy/2) = 1 / tan(fovy/2)
[ f / aspect 0 0 0 ]
[ 0 f 0 0 ]
[ 0 0 (Zfar + Znear).Znid 2.ZNear.Zfar.Znid ]
[ 0 0 -1 0 ]
notes: Znear and Zfar are distances, not positions, so they are always +ve.
Znear > 0
Z-fighting happens if Zd is too large, or Znear is too close to 0.
So, to multiply our vertex by the model transform, then the view, then the projection we need to do the following:
V' = Mp.Mv.Mm.V
M = Mp.Mv.Mm
V' = M.V
Yay, both DirectX and OpenGL expect the same memory layout for your 4x4 matrix.
M = [ M11 M21 M31 M41 ] where Mx.y = Mcolumn.row
[ M12 M22 M32 M42 ]
[ M13 M23 M33 M43 ]
[ M14 M24 M34 M44 ]
Therefore in memory:
[ M11 M12 M13 M14 M21 M22 M23 M24 M31 M32 M33 M34 M41 M42 M43 M44 ]
This is called Column-major memory storage.
Noticed how I wrote my x,y,z vectors in a column? That's because it's a column vector. You can also write them in a row, as a row vector.
Vcolumn = [x]
[y]
[z]
Vrow = [x y z]
Why does this matter? because you when you treat the vector as a matrix, the order of multiplication is important.
M.Vcolumn is ok
Vcolumn.M is illegal
M.Vrow is illegal
Vrow.M is ok
Hey, remember that (MaMb)^T = (Mb^T)(Ma^T) ? So that means that:
M.V = V^T M^T
that is:
[ 1 0 0 Tx ] [ x ] = [ x y z 1 ] [ 1 0 0 0 ]
[ 0 1 0 Ty ] [ y ] [ 0 1 0 0 ]
[ 0 0 1 Tz ] [ z ] [ 0 0 1 0 ]
[ 0 0 0 1 ] [ 1 ] [ Tx Ty Tz 1 ]
So all the following actions will produce the same result.
- A row vector is post-multiplied by M
- M is pre-multiplied by a row vector
- A column vector is pre-multiplied by M
- M is post-multiplied by a column vector
Bah, the whole lot is confusing.
Noticed how before I stored my memory as column major storage.
[ M11 M12 M13 M14 M21 M22 M23 M24 M31 M32 M33 M34 M41 M42 M43 M44 ]
Well, you can also store it as row major (some C++ libraries do this)
[ M11 M21 M31 M41 M12 M22 M32 M42 M13 M23 M33 M43 M41 M42 M43 M44 ]
Even better. If you confuse post and pre multiplication with row or column major storage, your code might just work anyway.
Paraphrasing from Wikipedia.
- RH. Using your right hand, your thumb points along the +ve x axis, the index finger along the +ve y axis and the middle finger pointing normal to your palm in the +ve z axis direction.
- LH. Using your left hand, your thumb points along the +ve x axis, the index finger along the +ve y axis and the middle finger pointing normal to your palm in the +ve z axis direction.
Why is this important? because for 3d rendering we need to know if the +ve z axis points into your monitor or out of it. If you place your thumb and index finger flat on the monitor, your middle finger will point in the direction of the +ve z axis. And whenever you have a choice, there will be different standards choosing different options.
- DirectX uses the LH rule
- OpenGL uses the RH rule
For this article, I used the right hand rule.
The reason this causes so much confusion for me is the different ways the view matrix is calculated with the Z axis negated or not.
NewZ = Normalise(Vtarget - VcameraPosition)
vs
NewZ = Normalise(VcameraPosition - Vtarget)
or
[ NewX.x NewX.y NewX.z dot(NewX, -T) ]
[ NewY.x NewY.y NewY.z dot(NewY, -T) ]
[ NewZ.x NewZ.y NewZ.z dot(NewZ, -T) ]
[ 0 0 0 1 ]
vs
[ NewX.x NewX.y -NewX.z dot(NewX, -T) ]
[ NewY.x NewY.y -NewY.z dot(NewY, -T) ]
[ NewZ.x NewZ.y -NewZ.z dot(NewZ, -T) ]
[ 0 0 0 1 ]
Also, apparently the fixed function OpenGL negated the Z axis in the view matrix, and the negated it again in the projection matrix cancelling the negation. So when people started using the maths for vertex shaders the only copied one of the matricies and therefore ended up with a negative z axis. Ugh.
What I do know is that the projection matrices for DirectX and OpenGL are different. And can be summed up as thus:
- DirectX: Uses LH coordinate system, and converts X and Y to be between -1 and 1, but Z to be between 0 and 1.
- OpenGL: Uses RH coordinate system, and converts X and Y and Z to be between -1 and 1.
** For this article I used the OpenGL projection matrix **