The standard can be downloaded from the ISO website at this direct link
DOCX documents are a zipped folder containing several interacting components in a word doc. The main ones are:
word/document.xml
: The main document contentword/styles.xml
: Name style information (e.g. "Header 1"), similar to CSSword/numbering.xml
: Sort of like CSS for numbering styles (e.g., "a)" vs "iii.")
Note on measures: The fundamental unit in DOCX is the TWIP, a "twentieth of a point", where a point ("pt") is 1/72 of an inch. Typically, properties referring to a physical length will accept a number indicating TWIPS or a string with a number followed by "mm|cm|in|pt|pc|pi" to indacte the units.
The main document content consists of a sequence of block-level items wrapped in
a body
element. There are other "stories" you can include beyond body
, such
as comments, headers, etc. The main types of block-level content are paragraphs
(p
) and tables (tbl
). Block-level elements have a sub-element specifying
their "properties" (pPr
for paragraphs and tblPr
for tables), which include
different options for styling and layout of the element. Each option corresponds
to a child element for properties. Tables and paragraphs have different
properties available as follows:
-
Style name
pStyle
: reference to an entry inword/styles.xml
. Sort of like a CSS class -
Numbering info
numPr
: reference to an entry inword/numbering.xml
. The reference is both an id (numId
) and a level number (ilvl
). Paragraphs with this property have a number/bullet placed before the beginning of text. -
Tab stops
tabs
: Contains a list of tab stops (tab
) to set on the given paragraph. Each tab stop specifies a distancepos
, a stop type (val
), and an optionalleader
, indicating the fill character. Valid values for these attributes are:val
: "clear", "start", "center", "end", "decimal", "bar", "num"leader
: "none", "dot", "hyphen", "underscore", "heavy", "middleDot"pos
: The distance from the left margin to the tab stop
-
Indentation
ind
: This is an object with the following attributes:start
: Indentation from the left margin. Negative values move text backwards.firstLine
: Additional indentation for the first line. Ifhanging
is also given, this is ignoredhanging
: Negative indentation for the first line. TrumpsfirstLine
if also givenend
: Additional margin to leave empty on the right. Negative values move the margin backwards.
All of these properties accept a TWIPS number or number + unit value. They also all have alternates suffixed with
Chars
(e.g.,startChars
) to specify the indentation in "character units" -
Spacing
spacing
: Controls spacing between lines and above/below the paragraph. The core attributes are as follows:before
: Similar to CSS margin-top, in TWIPS or measure + unit.beforeLines
: Similar to CSS margin-top, measured in hundredths of a lineafter
: Similar to CSS margin-bottom, in TWIPS or measure + unit.afterLines
: Similar to CSS margin-bottom, measured in hundredths of a lineline
: Similar to CSS line-height, in 240ths of a line. The meaning of this attribute can change iflineRule
is not blank orauto
(see the spec for details)- There are a couple more attributes, which see section 17.3.1.33
-
There are many more possible properties for a paragraph. See the spec for details on the following
- adjustRightInd
- autoSpaceDE
- autoSpaceDN
- bidi
- cnfStyle
- contextualSpacing
- divId
- framePr
- jc
- keepLines
- keepNext
- kinsoku
- mirrorIndents
- outlineLvl
- overflowPunct
- pBdr
- pageBreakBefore
- shd
- snapToGrid
- suppressAutoHyphens
- suppressLineNumbers
- suppressOverlap
- textAlignment
- textDirection
- textboxTightWrap
- topLinePunct
- widowControl
- wordWrap
-
Style
tblStyle
: A reference to a style inword/styles.xml
. Sort of like a CSS class -
Left indent
tblInd
: This element has two attributes used to specify the leading indentation for a table,type
andw
. Depending on the value oftype
,w
takes on a different meaning as follows:dxa
:w
is interpreted as a number of TWIPSpct
: Ifw
is a number, it is interpreted as 1/50ths of a 1% of the document width (excluding margins). If it ends in "%" then it species the percentage of document width directlynil
:w
is ignored and margin is 0auto
:w
is ignored and margin is deferred to parent styles
-
Borders
tblBorders
: Contains up to six elementstop
,start
,bottom
,end
,insideH
, andinsideV
(the first four correspond to the CSS top, left, bottom, and right). If there is a conflict between a cell border and the table border, cell borders typically win (but seetblPrEx
for the corner-case where they don't). Each element has the following attributes:color
: "RRGGBB" in hex (no leading "#") or "auto"sz
: Border size in eighths of a point. Minimum border size is .25pt and maximum border size is 12pt.val
: Type of border. E.g., "single", "dashed", "dotted", "double", etc. See the spec for the full list (17.18.2)- Many more. See the spec (17.3.4)
-
Width
tblW
: This is an indication of the preferred width, which is an input into the overall layout algorithm. This element has the same two attributes astblInd
above.
-
Other table properties include:
- bidiVisual
- jc
- shd
- tblCaption
- tblCellMar
- tblCellSpacing
- tblDescription
- tblLayout
- tblLook
- tblOverlap
- tblStyleColBandSize
- tblStyleRowBandSize
- tblW
- tblpPr