Skip to content

Instantly share code, notes, and snippets.

@michael
Last active August 29, 2015 14:21
Show Gist options
  • Save michael/4c81c396758467b914e2 to your computer and use it in GitHub Desktop.
Save michael/4c81c396758467b914e2 to your computer and use it in GitHub Desktop.
TAHI Article Format Specification

TAHI Article Format Specification

Top level article structure

HTML

<article>
  <head>
    <title>Article short title</title>
    <!-- Analog to HTML we use meta tags with unique name identifiers -->
    <meta name="doi" content="10.7554/eLife.00005"/>
    <!-- TODO: include contributors metadata (like authors etc.) -->
  </head>
  <body>
    <h1>Heading 1</h1>
    <p>Para 1</p>
    <h2>Heading 1.1</h2>
    ...
  </body>
</article>

XML

<article>
  <head>
    <title>Article short title</title>
    <!-- Analog to HTML we use meta tags with unique name identifiers -->
    <meta name="doi" content="10.7554/eLife.00005"/>
    <!-- TODO: include contributors metadata (like authors etc.) -->
  </head>
  <body>
    <h id="h1" level="1">Level 1 heading</h>
    <p></p>
    <h id="h2" level="2">Level 2 heading</h>
  </body>
</article>

Annotated Paragraphs

<p>
  The <em>PRC2 complex</em> has been the focus of a significant number of biochemical and molecular studies (for a recent review see
  <cite ref-type="bibr" rid="bib13">Margueron and Reinberg, 2011</cite>) ...
</p>

Citations

Tahi HTML

presentation-oriented and semantically annotated

<div data-ref-id="bib13" data-ref-type="journal-article">
  <div class="article-title">Characterization and development of courtship in zebrafish,<em>Danio rerio</em></div>
  <div class="contributors">
    <span data-type="given-names">KO</span> <span data-type="surname">Darrow</span>,
    <span data-type="given-names">WA</span> <span data-type="surname">Harris</span>
  </div>
  <div>
    <span data-type="source">Zebrafish</span>, <span data-type="volume">1</span>: <span data-type="fpage">40</span>-<span data-type="lpage">45</span>, <span data-type="year">2004</span>
  </div>
  <div>
    <span data-type="doi">http://dx.doi.org/10.1089/154585404774101662</span>
  </div>
</div>

Custom HTML (structured data, presentation-agnostic)

<div data-ref-id="bib13" data-ref-type="journal-article">
  <div class="article-title">Characterization and development of courtship in zebrafish,<em>Danio rerio</em></div>
  <div class="contributors">
    <span data-type="author">
      <span data-type="surname">Darrow</span>
      <span data-type="given-names">KO</span>
    </span>
    <span data-type="author">
      <span data-type="surname">Harris</span>
      <span data-type="given-names">WA</span>
    </span>
  </div>
  <span data-type="year">2004</span>
  <span data-type="source">Zebrafish</span>
  <span data-type="volume">1</span>
  <span data-type="fpage">40</span>
  <span data-type="lpage">45</span>
  <span data-type="doi">10.1089/154585404774101662</span>
</ref>

XML

structured data

<ref id="bib13" ref-type="journal-article">
  <contributors>
    <author>
      <surname>Darrow</surname>
      <given-names>KO</given-names>
    </author>
    <author>
      <surname>Harris</surname>
      <given-names>WA</given-names>
    </author>
  </contributors>
  <year>2004</year>
  <article-title>Characterization and development of courtship in zebrafish,<em>Danio rerio</em></article-title>
  <source>Zebrafish</source>
  <volume>1</volume>
  <fpage>40</fpage>
  <lpage>45</lpage>
  <doi>10.1089/154585404774101662</doi>
</ref>

Findings

Requirements

  • journal-wide configuration for citation styles
  • references need to be stored in a structured representation-agnostic format
  • ability to display references in different citation styles (applying CSL)

Realizations

  • format is either render-able or processable (both at the same is not possible without content redundancy)
  • always there is a tool involved to get from the data (figures and references in database) to the presentation (HTML)

Contra HTML

  • no way to get around pre and processing tools anyways
  • ownership of certain components can not be in the HTML (figures, references)
    • always there is a tool involved to get from the data to the presentation
  • HTML does not have any advantages for data processing
  • HTML can never be assured to be up to date because external data (figures) might have changed
  • HTML silently swallows errors when it doesnt conform to HTML5
  • It seems we are trying to solve with HTML what XML solves already
  • HTML is a dead end (no toolchains available for data processing except our editor)
  • using HTML as a data format is misleading (it suggests to be freeform when we actually want to have a very restricted schema)

Suggestions

  • choose a data representation format which suits the main part of the toolchain (source information) not output (presentation)
  • deliver a ePub file that has has source information (XML) + readable HTML representation (generated HTML) + assets (images)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment