Authoring scientific documents with Markdown and Quarto
This workshop will show you how to easily create beautiful scientific documents (html, pdf, websites, books…)—complete with formatted text, dynamic code, and figures with Quarto, an open-source tool combining the powers of Jupyter or knitr with Pandoc to turn your text and code blocks into fully dynamic and formatted documents.
Markup and Markdown
Markup languages
Markup languages control the formatting of text documents. They are powerful but complex and the raw text (before it is rendered into its formatted version) is visually cluttered and hard to read.
Examples of markup languages include LaTeX and HTML.
- Tex (often with the macro package LaTeX) is used to create pdf.
Example LaTeX:
\documentclass{article}
\title{My title}
\author{My name}
\usepackage{datetime}
\newdate{date}{24}{11}{2022}
\date{\displaydate{date}}
\begin{document}
\maketitle
\section{First section}
Some text in the first section.\end{document}
- HTML (often with css or scss files to customize the format) is used to create webpages.
Example HTML:
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>My title</title>
<address class="author">My name</address>
<input type="date" value="2022-11-24" />
</head>
<h1>First section</h1>
<body>
Some text in the first section.</body>
</html>
Markdown
A number of minimalist markup languages intend to remove all the visual clutter and complexity to create raw texts that are readable prior to rendering. Markdown (note the pun with “markup”), created in 2004, is the most popular of them. Due to its simplicity, it has become quasi-ubiquitous. Many implementations exist which add a varying number of features (as you can imagine, a very simple markup language is also fairly limited).
Markdown files are simply text files and they use the .md
extension.
Basic Markdown syntax
In its basic form, Markdown is mostly used to create webpages. Conveniently, raw HTML can be included whenever the limited markdown syntax isn’t sufficient.
Here is an overview of the Markdown syntax supported by many applications.
Pandoc and its extended Markdown syntax
While the basic syntax is good enough for HTML outputs, it is very limited for other formats.
Pandoc is a free and open-source markup format converter. Pandoc supports an extended Markdown syntax with functionality for figures, tables, callout blocks, LaTeX mathematical equations, citations, and YAML metadata blocks. In short, everything needed for the creation of scientific documents.
Such documents remain as readable as basic Markdown documents (thus respecting the Markdown philosophy), but they can now be rendered in sophisticated pdf, books, entire websites, Word documents, etc.
And of course, as such documents remain text files, you can put them under version control with Git.
Previous example using Pandoc’s Markdown:
---
title: My title
author: My name
date: 2022-11-24
---
# First section
Some text in the first section.
Literate programming
Literate programming is a methodology that combines snippets of code and written text. While first introduced in 1984, this approach to the creation of documents has truly exploded in popularity in recent years thanks to the development of new tools such as R Markdown and, later, Jupyter notebooks.
Quarto
How it works
Quarto files are transformed into Pandoc’s extended Markdown by Jupyter (when used with Python or Julia) or by knitr (when used with R), then pandoc turns the Markdown document into the output of your choice.
Julia and Python make use of the Jupyter engine:
R uses the knitr engine:
Quarto files use the extension .qmd
.
When using R, you can use Quarto directly from RStudio: if you are used to R Markdown, Quarto is the new and better R Markdown.
When using Python or Julia, you can use Quarto directly from a Jupyter notebook (with .ipynb
extension).
Using Quarto directly from a Jupyter notebook:
In this workshop, we will see the most general workflow: simply using a text editor.
Installation
Download Quarto here.
Download the language(s) (R, Python, or Julia) you will want to use with Quarto as well as their corresponding engine (knitr for R; Jupyter for Python and Julia):
If you want to use Quarto with R, you will need:
- R (download here if you don’t have R already on your system),
- the
rmarkdown
package. For this, launch R and run:
install.packages("rmarkdown")
If you want to use it with Python, you will need:
- Python 3 (download here if don’t have it on your system),
- JupyterLab. For this, open a terminal and run:
python3 -m pip install jupyter # if you are on macOS or Linux
python -m pip install jupyter # if you are on Windows
Finally, if you want to use Quarto with Julia, you will need:
- Julia (download here if you don’t have Julia),
- the IJulia and Revise packages. For this, launch Julia and run:
] add IJulia Revise# <Backspace>
using IJulia
notebook() # to install a minimal Python+Jupyter distribution
Running notebook()
allows you to install Jupyter if you don’t already have it.
Document structure and syntax
Front matter
Written in YAML. Sets the options for the document. Let’s see a few examples.
HTML output:
---
title: "My title"
author: "My name"
format: html
---
HTML output with a few options:
---
title: "My title"
author: "My name"
format:
html:
toc: true
css: <my_file>.css
---
MS Word output with Python code blocks:
---
title: "My title"
author: "My name"
format: docx
jupyter: python3
---
revealjs output with some options and Julia code blocks:
---
title: "Some title"
subtitle: "Some subtitle"
institute: "Simon Fraser University"
date: "2022-11-24"
execute:
error: true
echo: true
format:
revealjs:
theme: [default, custom.scss]
highlight-style: monokai
code-line-numbers: false
embed-resources: true
jupyter: julia-1.8
---
See the Quarto documentation for an exhaustive list of options for all formats.
Written sections
Written sections are written in Pandoc’s extended Markdown.
Code blocks
If all you want is syntax highlighting of the code blocks, use this syntax:
{.language} <some code>
If you want syntax highlighting of the blocks and for the code to run, use instead:
```{language}
<some code>
```
In addition, options can be added to individual code blocks:
```{language}
#| <some option>: <some option value>
<some code>
```
Rendering
Using Quarto is very simple: there are only two commands you need to know.
In a terminal, simply run either of:
quarto render <file>.qmd # Render the document
quarto preview <file>.qmd # Display a live preview
Let’s create a webpage together
First, create a file called test.qmd
with the text editor of your choice.
Example:
nano test.qmd
Add a minimal front matter with the title of your document and the output format (html
here since we are creating a webpage):
---
title: Test webpage
format: html
---
Then open a new terminal, cd
to the location of the file, and run the command:
quarto preview test.qmd
This will open the rendered document in your browser.
We will play with this test.qmd
file and see how it is rendered by Quarto as we go.
Examples
Below are a few basic example files and their outputs.
Revealjs presentation
Rendered document (click on it to open it in a new tab):
Rendered document (click on it to open it in a new tab):
In order to export to pdf, you need a TeX distribution. You probably already have one installed on your machine, so you should first try to render or preview a document to pdf to see whether it works. If it doesn’t work, you can install the minimalist distribution TinyTex by running in your terminal:
quarto install tool tinytex
HTML with R code blocks
Rendered document (click on it to open it in a new tab):
Beamer with Python code blocks
Beamer is LaTeX presentation framework: a way to create beautiful pdf slides.
Rendered document (click on it to open it in a new tab):