Composing Office documents without Office

Slashdot it! Delicious Share on Facebook Tweet! Digg!
Buchachon Petthanya, 123RF

Buchachon Petthanya, 123RF

Simple Complexity

Pandoc lets you convert Markdown documents to DOCX and work conveniently with the editor of your choice. It is even possible to keep track of changes made by various authors.

Many people appreciate programs like their graphical word processors (think LibreOffice Writer or Microsoft Word) because they simplify the process of creating appealing texts. However, even when used efficiently, these programs can make entering formatted text complex since they tie styles closely and invisibly with the text. As a result, you can often not see when one style ends and the next begins.

The free tool Pandoc offers a different approach by separating form and content clearly. The tools presented here also work with file types such as HTML, ODT, TEX, and PDF, as well as DOCX (the proprietary format used by Microsoft Word since 2009).

Excellent

Tools like Pandoc and AsciiDoc differ from classical text editors in that they reduce the use of element tags. Highlights and headings are typically labeled with certain strings (tags) in the text.

As the name "markup language" (ML) suggests, HTML belongs to this language family, as do LaTeX, RTF, and others. Simplified markup languages use fewer marks and symbols; thus they adapt more smoothly to the flow of the text.

A good place to start is with the markdown tool. It supports programs such as IPython Notebook/Jupyter, R/knitr, and others. Table 1 summarizes important commands; a comprehensive list is available online [1].

Table 1

Markdown

Markup Function
*<Word>* Cursive
<Word> Bold
# <Heading> First-level heading
## <Heading> Second-level heading
* Bullet for a bulleted list
1. Numbered bullets
`<Code>` Monospace(for code)
Horizontal line
> Indented block of text
[<Link-Text>](<URL>) Hyperlink
![<imagetext>](<file>) Include an image

Pandoc

Pandoc has undergone significant development over the years. Note that users with 32-bit systems are limited to version 1.2, which will not cope well with all of the examples presented in this article.

The repositories for most distributions offer the program in a stable 1.16.0.2 version for 64-bit systems. In addition, the web page [2] links to the newest packages for Ubuntu and Debian. Even if you don't need the functions right away, it is a good idea to immediately install the packages pandoc-citeproc and python-pandocfilters :

sudo apt update
sudo apt install pandoc python-pandocfilters pandoc-citeproc

Pandoc expects text files in UTF8 format, which is actually a matter of course for Linux. The following call is suitable for converting a Markdown text into the MS Word format DOCX.

$ pandoc test.md -o test.docx

Originally, Markdown was only supposed to simplify the entry of HTML code. Even today, the original intent is still recognizable in the language. As with HTML, the software interprets empty spaces and line breaks as simple word separators. However, there are some exceptions: If a line begins with four empty spaces, the software will interpret this as a block of code in monospace font without a line break. If the line ends with at least two empty spaces, then the software will add a break in the output.

You can introduce a heading with an empty space followed by double pound sign placed at the beginning of the following line. You should enclose bulleted lists, indented text, and text blocks with empty lines. At least one empty space should follow markings for list entries.

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content