How To Create Documents with LaTeX
It’s time to write documents and of course, me being me, I can’t just open up Word and start typing. I prefer to open up Notepad++ and start writing Markdown, then convert it to my final format whatever that might be. I’ve written up directions on how to generate DOCX documents from Markdown using Pandoc, but the standard conversion from Markdown to DOCX just isn’t working for me. I need a fancier title page and a lot more control over the formatting than I can get using the standard conversion, so I’m going to use an approach that I saw in a Stackoverflow answer: convert the Markdown to LaTeX, add in a LaTeX title page and formatting, then convert both of those into a DOCX via Pandoc. The LaTeX should give me the control over the final deliverable that I want, and hopefully the conversion from LaTeX to DOCX won’t be awful. If it is, well, we’ll just have to go straight to PDF.
The purpose of this post is to capture my notes and initial outputs for generating LaTeX documents before I get all specific and fancy.
Directory Structure
The first thing I’m going to do is to create a directory for this project with the following
The root directory contains subdirectories that will contain the final output. These directories are separated by output format and currently only the docx directory is prsent. The res subdirectory contains all of the resources needed to generate the final output. In this subdirectory, the tex subdirectory will contain any .tex files and the txt directory will contain Markdown files that will be converted into .tex files. In the future, if necessary, there would be an img folder or a gv folder. For now, I’m going to generate a straight LaTeX document in the tex folder and then work on turning it into a title page that can be integrated into a .tex file generated by Pandoc from the text.
Basic LaTeX Document
Here is a basic LaTeX document:
I’ve saved this as simpleReport.tex in the tex subdirectory. Then, I open a command prompt and navigate to the root directory and type this command to generate the DOCX:
pandoc -f latex -t docx -o docx\basicReport.docx res\tex\simpleReport.tex
And… SUCCESS! I have a very simple document. Let’s regroup for a second and discuss each of the commands.
documentclass
This command defines what type of document we’re creating. Apparently most people go with article but for engineering documents with lots of content I’m going to try report. The full list of available document classes can be found here.
begin{document}
This command starts the content of the document. What happens betwee documentclass and begin is called the preamble and it contains document-wide settings. Currently we don’t have any such settings, but we will soon!
title
Yeah… this is the title of the document. In the DOCX it gets put in big bold letters at the start of the document.
section
section denotes a… section (man this is redundant!). The title of the section goes in the curly braces and the text goes under.
end{document}
The document is now over, done, finished.
Combining Markdown and LaTeX
The above is a nice document and all, and it demonstrates a basic ‘Hello World’ functionality for the LaTeX->DOCX workflow, but I need some specific things for this to be useful to me. The first thing I need to do is to figure out how to combine LaTeX files and Markdown files into one DOCX file. To do this, I’m going to generate a basic Markdown document and place it in the txt subdirectory, turn it into a LaTeX file, then pass those both to Pandoc to generate a DOCX.
Here’s the markdown file:
The first step is to convert this to a .tex file with Pandoc:
pandoc -f markdown -t latex -o res\tex\mdSection2.tex res\txt\section2.md
And this is the LaTeX fragment that is produced:
Looks legit, so let’s use this command line to combine the two files:
pandoc -f latex -t docx -o docx\combinedFile.docx res\tex\basicReport.tex res\tex\mdSection2.tex
This produces a combined DOCX file. We’re doing great so far! (When are we going to utterly fail?)
It’s worth noting at this point that the basicReport.tex file contains all the markup necessary for a standalone document, but is still capable of being combined with another .tex fragment file into a useable DOCX file by Pandoc. I do not know whether Pandoc reads the document-wide settings contained in the basicReport.tex file and applies them to the whole document or discards them and uses its own. Let’s find out!
One of the most basic and obvious settings that can be set by the documentclass command is the ability to specify a document-wide font size. 12pt font is pretty standard, but if we want to make sure that the settings are getting to the final document we need to put something absurd in there so we can see the result.
You can modify the basicReport.tex file to change the font. It looks like this:
After I save this as customReport.tex in the proper subfolder, I run this command to combine them:
pandoc -f latex -t docx -o docx\customFile.docx res\tex\customReport.tex res\tex\mdSection2.tex
Aaaaand… (this is the point I start to fail, though not necessarily utterly) It doesn’t work: the font size is 12pt in the DOCX. The whole point of this exercise is to figure out how to customize the DOCX output via LaTeX, so this is a problem that needs to be fixed NOW!
Customizing LaTex Output
After a bit of research I’ve determined that what I want is to generate a custom LaTeX template that has all of the options that I want and then pass that to Pandoc to format my document properly.
I found a Stackoverflow answer that supposedly tells me how to do that. I’m going to document my attempt to follow its advice.
The first thing that I need to do is to generate the Pandoc LaTeX template so I can see what I’m dealing with. I’ve added a new template directory in the res directory to hold the template. You can generate the template like this:
pandoc -D latex > res\template\template.tex
I won’t paste the template here because it’s large, but looking at it tells me that there’s lots of variables I can mess with to customize the output without necessarily changing the template. Let’s start with one setting and then work our way to the others.
The first thing that I want to change is the document class. The default document class for Pandoc LaTeX output is apparently the article class where I want report. Let’s figure out how to change that.
I’ve found a question on a Stackoverflow site that answered itself about how you set the various options available in the Pandoc LaTeX template. Basically, you add YAML front matter to the markdown document to specify the settings. I’ve generated a new Markdown file called customLaTeX.md and given it the following content:
I know I only wanted to change the documentclass but the fontsize will be an obvious change in the document that I can see immediately in the DOCX output.
Below are the commands I use to generate the DOCX from the input markdown file.
The first command generates a standalone LaTeX file from the Markdown. This ensures that the front matter options are present in the LaTeX output:
pandoc -f markdown -t latex –standalone -o res\tex\mdSection2.tex res\txt\section2.md
This command line takes the standalone LaTeX and generates the DOCX.
pandoc -f latex -t docx -o docx\customReport.docx res\tex\mdSection2.tex
And the result of this is… failure (NOW I’m utterly failing)
Let’s look at some reasoning. I’ve found a Stackoverflow question asking why they can’t use a LaTeX template with DOCX output. Turns out, it’s not possible. DOCX support in Pandoc is basic and apparently the only the resulting DOCX is via the –reference-docx command line option.
That’s not too bad or difficult (and I’ve used the reference docx option before to okay effect) but I still need to incorporate a fancy title page with Markdown content. This should be as basic as centering and font size. Surely that works, yes?
Let’s try a simple LaTeX file:
And then we’ll convert this to DOCX via Pandoc:
pandoc -f latex -t docx -o docx\titlePage.docx res\tex\titlePage.tex
Aaaaaaand….
Sigh. No luck: no formatting in the DOCX. At this point the best I can hope for is PDF output that Word can read and process properly (i.e., keeping formatting).
So, let’s figure out how to get PDF output!
Markdown/Latex to PDF Conversion via Pandoc
As a Windows user, the first thing I need to do is install a LaTeX distribution. I’ve chosen MiKTeX for no reason other than I’ve used it before. There are instructions on how to download and install on Windows and it’s a pretty basic process so I won’t include those here. (If I run into any difficulties I’ll document them)
FIrst thing is to get ANY PDF output, so I’m just using one of the files I generated previously here as the guinea pig for conversion. Here’s the command line:
pandoc -t markdown -f latex -o pdf\test.pdf res\txt\customLatex.md
Aaaaand (man this is a recurring theme here…) it takes FOREVER. Seriously, like a minute so far. Nuts. Will it ever end? With any luck this is a first-time thing and not going to happen all the time.
So I Ctrl+C’d that and tried again - turns out there were popups for updating packages that I wasn’t seeing. The packages that were installed are:
- upquote
- microtype
- url
And it keeps asking to install that. Repeatedly. Something’s wrong.
The log file shows this:
2018-01-10 12:42:44,881-0700 FATAL pdflatex - Windows API error 87: The parameter is incorrect.
What the heck is this?
[This] Stackoverflow answer says that there’s multiple MiKTeX’s running. Sure enough, there are. Finishing off those processes fixes the issue and I get to install more packages.
And it keeps happening. Sometimes you just have to click Install again, sometimes you have to restart the entire task after it fails. But eventually, all the packages get installed and the PDF file gets created. Finally.
But does Word properly conver the PDF to something that I can edit?
No. The answer is no. And Pandoc can’t read PDFs and convert them to DOCX.
Sigh. What a waste of a morning. Can I live with PDF output? Or can I use ODT and then convert to DOCX much easier?
What’s worse than this: pandoc ALWAYS munges LaTeX when converting it to other formats. My font sizes and centering tags are GONE from the output no matter what it is. If you do a LaTeX to LaTeX conversion, Pandoc ‘helpfully’ completely obliterates your formatting.
So, I’m stuck with PDF output or nothing. Worse, if I use Pandoc to produce that output then I lose all of my precious formatting. Of course, I can just:
- Generate a LaTeX template which Pandoc can ingest to roughly generate the document I want
- Create a hard-coded LaTeX title page which will be hard-included by the template (so that Pandoc keeps its dirty mitts off of it)
- Convert the markdown to PDF through pandoc using LaTeX output format using the LaTeX template and pass a PDF file name so it invokes pdflatex itself to generate a PDF which should have all of the styles that I want with a custom title page.
The sad part is that this still leaves me with a PDF when I really want a DOCX.
Anyway, I figured out how to do the steps above. Here’s a rundown:
- Create res\tex\titlePage.tex and put something in it that you will be able to find. My content looked like this:
- Edit the template.tex file that you generated and stored in res\template\template.tex by finding the document begin command and add this in (the input directive is the added line):
- Run this command:
pandoc -o pdf\test.pdf –template=res\template\template.tex res\txt\section2.md
This produces a PDF including the title page formatting from titlePage.tex intact with the content from the Markdown file in the PDF. This is what I wanted (other than DOCX output… and a pony…).
LaTeX Template Variables and Modifications
Okay, so the whole goal of this is to have professional-looking documents with a minimum of effort, right? Seems to me I have to start customizing the hell out of these templates and this process. I’m going to work one setting at a time until I have something I wouldn’t mind submitting for credit in front of an auditor. I’ve done it before, I’ll do it again. Except this time I’ll document it. I’m going to start with the front matter YAML settings that I can change that work with the existing template.
Here’s a list of settings that I’ve changed:
Setting | Description | Desired Value | Other settings |
---|---|---|---|
fontfamily | Picks the | ||
headerincludes |
I’ve generated a ‘raw’ list of variables that can be configured in the YAML front matter by exporting the Pandoc LaTeX template and grep’ing out all of the variables. Here’s the list:
YAML LaTeX Variable | Type | Description | Desired Value |
---|---|---|---|
abstract | Text | Document abstract | |
author | Text | Author name | |
author-meta | Text | Hyperlink author meta-information | |
biblatex | Yes/No | Include biblatex and produce bibliography | |
biblio-files | List | List of files included for bibliography | |
biblio-style | Text | Content of \bibliographystyle command | |
biblio-title | Text | Bibliography title | |
body | Text | Document body | |
book-class | Yes/No | Document is a book-class | |
citecolor | Text | ||
classoption | List | List of options for the documentclass | |
date | Date | Content of the \date command | |
documentclass | |||
euro | |||
fontfamily | |||
fontsize | |||
geometry | |||
graphics | |||
header-includes | |||
highlighting-macros | |||
include-after | |||
include-before | |||
lang | |||
lhs | |||
linestretch | |||
linkcolor | |||
links-as-notes | |||
listings | |||
lof | |||
lot | |||
mainfont | |||
mainlang | |||
mathfont | |||
monofont | |||
natbib | |||
numbersections | |||
papersize | |||
sansfont | |||
sep | |||
strikeout | |||
subtitle | |||
tables | |||
title | |||
title-meta | |||
toc | |||
toc-depth | |||
urlcolor | |||
verbatim-in-note |
Generating Figure References
This code example creates a figure anchor and reference in the final PDF document:
Execute the Pandoc command above and the resulting PDF will have a figure reference after the ‘See’.
Generating Formatted Code Snippets
The list of languages that can be highlighted with Pandoc can be found by executing this command-line: