D★Mark

A project by

D★Mark is a language for marking up prose. It facilitates writing semantically meaningful text, without limiting itself to the semantics provided by HTML or Markdown. If you’re a technical writer looking for a flexible markup language, D★Mark might be a good fit.

Here’s an example of D★Mark:

#para This a paragraph; an element in block form containing some text.

#note[only=web] This is a note that will %em{only} show up on web.

For development details on D★Mark, see its GitHub repository. Please open an issue for any problems that you find.

Cheat sheet

This cheat sheet covers the common uses of D★Mark. For more details on the syntax, see the Syntax section.

An element is marked up in block form with #, and in inline form with %:

#para It said %quote{Destroy all humans!}, I believe.

An element in block from can contain elements and/or text by indenting it with two spaces:

#section
  #header Example

  #listing
    content = File.read(ARGV[0])
    nodes = DMark::Parser.new(content).run

Elements, both in block and inline form, can have attributes inside square brackets:

#listing[lang=shell]
  $ ls -l

Use cases

D★Mark is particularly well-suited for some use cases that don’t work well in other markup languages, as they lack the flexibility to express certain ideas.

First term

On the Nanoc web site, the first occurrence of a term is marked up using the firstterm element. For example, the first time the term “identifier” is used, it is marked up as %firstterm{identifier}.

When translated into HTML, this element is converted into a span with the class firstterm: <span class="firstterm">identifier</span>. The CSS for the firstterm class ensures that it is printed in italics.

Additionally, a term that is marked up as firstterm will end up in the index at the back of the book that is generated from the Nanoc documentation.

Admonitions

Admonitions, such as notes, tips, warnings and hints, can be expressed as elements in D★Mark. For example, the Nanoc web site contains the following caution admonition:

#caution This will remove all files and directories that do not correspond to Nanoc items from the output directory.

The stylesheet renders this admonition with a red background, and a warning icon, to attract attention. The D★Mark documentation, which you are looking at now, contains note admonitions. For example:

This is an example note.

Cross-references

One way of marking up a hyperlink in D★Mark is to use a a element. For example, the following code snippet represents a hyperlink to the Nanoc web site:

#p I love the design of the %a[href=http://nanoc.ws/]{Nanoc web site}.
Because D★Mark itself does not prescribe any vocabulary, there is no single right way to mark up hyperlinks. For example, this document uses a link element with a target attribute for hyperlinks, rather than a more traditional a element.

The Nanoc documentation, however, does not use hyperlinks to link to other pages. While hyperlinks work well on the web, they are more cumbersome to use in print. Because a (distant) goal of the Nanoc documentation is to be readily convertible into a print book, it uses cross-references instead.

A reference is marked up using a ref element, and points to a chapter or section. For example, the following paragraph contains a reference to the Patterns chapter:

#p For more information on patterns, see %ref[chapter=/doc/patterns.*]{}.

When generating a web version of a document that contains a reference, the reference will be translated into a hyperlink. The name of the chapter is filled in automatically. The above example could be rendered as follows:

For more information on patterns, see the Patterns page.

In print, however, the reference is translated into the name of the chapter, along with the page number. Additionally, rather than referring to the Patterns page, it refers to the Patterns chapter, in order to prevent confusion between web pages and print pages. For example:

For more information on patterns, see the Patterns chapter on page 87.

In addition to chapter references, the Nanoc web site also supports references to sections and subsections.

Goals

Be extensible
Define only the syntax of the markup language, and don’t bother with semantics. Do not define a vocabulary.
Be simple
Be easy to write, easy to read, and easy to parse. Be unambiguous. Be easy to syntax highlight.
Be compact
Introduce as little extra syntax as possible.

Syntax

D★Mark knows two constructs: elements and text. An element has a name, attributes, and wraps elements and/or text in order to give them meaning. Text is just that—text.

An element in D★Mark can take two forms: block-level, and inline.

Block form

An element in block form consists of the # symbol, the name of the element, optionally attributes enclosed in rectangular brackets, a space character, and finally the content. For example:

#para This a paragraph; an element in block form containing some text.

#note[only=web] This is an example “note” element with an “only” attribute.
Inline form

Inside an element, text can be marked up using elements with the inline form. An element in inline form consists of the % symbol, the name of the element, optionally attributes enclosed in rectangular brackets, and finally the content within braces. For example:

#para I am a paragraph with an %em{amazing} inline element.

An element name starts with a letter (lowercase or uppercase), followed by zero or more letters, digits, dashes, or underscores. For instance, em, h2, section-header, SectionHeader and section_header are valid element names, while _section, 2 and hello/world are not.

At the top level, D★Mark documents consists only of elements in block form.

Elements in block form can be nested. To do so, indent the nested block two spaces deeper than the enclosing block. For example, the following defines a list element with three item elements inside it:

#list[unordered]
  #item glob patterns
  #item regular expression patterns
  #item legacy patterns

The block element form can also include text on indented lines following the element. In this case, the content is not wrapped inside a nested block-level element. This is particularly useful for source code listing. For example:

#listing[lang=ruby]
  identifier = Nanoc::Identifier.new('/about.md')

  identifier.without_ext
  # => "/about"

  identifier.ext
  # => "md"

An element in block form can always be expressed in inline form and vice versa, with the exception of a top-level element, which always needs to be in block form.

Attributes

Both block and inline elements can also have attributes. Attributes are enclosed in square brackets after the element name, as a comma-separated list of key-value pairs separated by an equal sign. The value part, along with the equal sign, can be omitted, in which case the value will be equal to the key name.

For example:

An attribute key starts with a letter (lowercase or uppercase), followed by zero or more letters, digits, dashes, or underscores. For instance, lang, only-for, Audience and data_type are valid attribute keys, while -except and hello/world are not.

Escaping

The following characters need to be escaped:

To escape a character, prefix it with %.

The following is an example of escaping inline content:

#p To escape a character, prefix it with %code{%%}.

The following is a listing element containing escaped D★Mark:

#listing
  %#para This is a paragraph element in block form.

Here’s an example of escaped characters in an attribute value:

#para[kind=joke%, ha ha] They say 20%% of all statistics are made up.

Comparison with other languages

D★Mark takes inspiration from a variety of other languages.

HTML

HTML is syntactically unambiguous, but comparatively more verbose than other languages. It also prescribes only a small set of elements, which makes it awkward to use for prose that requires more thorough markup. It is possible use span or div elements with custom classes, but this approach turns an already verbose language into something even more verbose.

<p>A glob pattern that matches every item is <span class="pattern attr-kind-glob">/**/*</span>.</p>
#para A glob pattern that matches every item is %pattern[glob]{/**/*}.
XML

Similar to HTML, with the major difference that XML does not prescribe a set of elements.

<para>A glob pattern that matches every item is <pattern kind="glob">/**/*</pattern>.</para>
#para A glob pattern that matches every item is %pattern[glob]{/**/*}.
Markdown

Markdown has a compact syntax, but is complex and ambiguous, as evidenced by the many different mutually incompatible implementations. It prescribes a small set of elements (smaller even than HTML). It supports embedding raw HTML, which in theory makes it possible to combine the best of both worlds, but in practice leads to markup that is harder to read than either Markdown or HTML separately, and occasionally trips up the parser and syntax highlighter.

A glob pattern that matches every item is <span class="pattern attr-kind-glob">/**/*</span>.
#para A glob pattern that matches every item is %pattern[glob]{/**/*}.
AsciiDoc

AsciiDoc, along with its AsciiDoctor variant, are syntactically unambiguous, but complex languages. They prescribe a comparatively large set of elements which translates well to DocBook and HTML. They do not support custom markup or embedding raw HTML, which makes them harder to use for prose that requires more complex markup.

There is no AsciiDoc example, as this example cannot be represented with AsciiDoc.
TeX, LaTeX

TeX is a turing-complete programming language, as opposed to a markup language, intended for typesetting. This makes it impractical for using it as the source for converting it to other formats. Its syntax is simple and compact, and served as an inspiration for D★Mark.

A glob pattern that matches every item is \pattern[glob]{/**/*}.
#para A glob pattern that matches every item is %pattern[glob]{/**/*}.
JSON, YAML

JSON and YAML are data interchange formats rather than markup languages, and thus are not well-suited for marking up prose.

[
  "A glob pattern that matches every item is ",
  ["pattern", {"kind": "glob"}, ["/**/*"]],
  "."
]
#para A glob pattern that matches every item is %pattern[glob]{/**/*}.

Samples

The samples/ directory contains some sample D★Mark files. They can be processed by invoking the appropriate script with the same filename. For example:

% bundle exec ruby samples/trivial.rb
<p>I’m a <em>trivial</em> example!</p>

Programmatic usage

Handling a D★Mark file consists of two stages: parsing and translating.

The parsing stage converts text into a list of nodes. Construct a parser with the tokens as input, and call #run to get the list of nodes.

content = File.read(ARGV[0])
nodes = DMark::Parser.new(content).run

Translating means converting the list of nodes into something else. For example, the translation step could translate each element into HTML or LaTeX.

D★Mark does not come with any translators. It does, however, provide a class named DMark::Translator, which is intended as the base class for translators.

For example, the following translator will convert the tree into XML:

class MyXMLLikeTranslator < DMark::Translator
  def handle_string(string, _context)
    [escape(string)]
  end

  def handle_element(element, context)
    [
      "<#{node.name}>",
      handle_children(node, context),
      "</#{node.name}>",
    ]
  end

  def escape(string)
    string.gsub('&', '&amp;').gsub('<', '&lt;')
  end
end

result = MyXMLLikeTranslator.translate(nodes)
puts result

To create a translator, create a subclass of DMark::Translator, and implement #handle_string and #handle_element, which should return an (optionally nested) array of strings, which will then be joined into a single string after processing.

#handle_string(string, context)

This function translates strings. The string argument is the string to convert. Typically, this returns an array with the escaped string, e.g. [escape(string)], where the #escape function performs escaping (such as replacing & and < with &amp; and &lt; in HTML and XML).

The context argument is a hash which is passed through from parent to element. It can be used to change translation logic depending on context. By default, it will be an empty hash.

#handle_element(element, context)

This function translates elements. The element argument is the element to convert.

The way an element is translated often depends on the element name, element.name (a string), and might depend on the element’s attributes, element.attributes (a hash).

When handling an element, make sure to handle all its child elements. The built-in #handle_children function can be used for this, and is typically called like handle_children(element, context). Handling child elements does not happen automatically, in order to provide the possibility of conditional output.

Like with #handle_string, the context argument is a hash which is passed through from parent to element.

Tips and tricks

The context argument of #handle_element is useful in cases where the resulting output depends on the nesting level. For example, this page uses nested section elements that start with a h (header) element, which is translated to any of the HTML header elements (such as h1) depending on the number of section ancestors:

def handle_element(element, context, context)
  case element.name
  when 'h'
    depth = context.fetch(:depth, 1)
    [
      "<h#{depth}>",
      handle_children(element, context),
      "</h#{depth}>",
    ]
  when 'section'
    depth = context.fetch(:depth, 1)
    [
      '<section>',
      handle_children(element, context.merge(depth: depth + 1)),
      '</section>',
    ]
  # … handle other elements here …

It can be useful to do some further processing on child nodes before returning them. To get a string containing translated child nodes’ content, call #translate, passing in the element’s children, along with the context. Here is an example of this function being used to syntax-highlight source code listings:

def handle_element(element, context)
  case element.name
  when 'listing'
    [
      '<pre><code>',
      syntax_highlight(element, context),
      '</code></pre>',
    ]
  # … handle other elements here …
end

def syntax_highlight(element, context)
  content = translate(element.children, context)
  language = element.attributes['lang']

  # … implementation here …
end

The context argument can be used to change translation logic for an element based on its parent. For example, strings might be escaped by default, except when they’re inside a listing element, where the strings will be captured and passed into a syntax-highlighting function that expects non-escaped content.

The syntax-highlighting example given above can be modified as follows, for situations where #syntax_highlight expects unescaped content:

def handle_string(string, context)
  if context[:raw]
    [string]
  else
    [html_escape(string)]
  end
end

def handle_element(element, context)
  case element.name
  when 'listing'
    [
      '<pre><code>',
      syntax_highlight(element, context.merge(raw: true)),
      '</code></pre>',
    ]
  # … handle other elements here …
end

Error handling

Parse errors, DMark::Parser::ParserError, implement #fancy_message, which is similar to #message but returns a multi-line string with additional diagnostic information to make it easier to identify and fix errors. For example, the following D★Mark snippet is invalid:

#p Stuff

#p More stuff }

… and raises an error, whose #fancy_message returns a string with this content:

parse error at line 3, col 15: unexpected } -- try escaping it as "%}"

#p More stuff }
              ↑