Create PDF reports with Python and Jinja2

 

This post is a summary of the code I wrote in Python for my then-desperate wife to automatically generate hundreds of invoices.

Suppose you have an excel sheet with hundreds of rows and a couple of columns (a sample is shown below), and would like to generate pdf reports for individual rows according to column values. With MS office, the sheet can be imported into a word template and the column names of interest can be placed accordingly in the template. Then you can print pdf files for every row. However the task can easily become tedious when you have additional requirements. For instance, grouping customers by attributes like regions to different folders and naming the printed files with column values would need manual and repetitive efforts. To automate the process with Python, there are many options to choose from. The major tool we’ll make use of is Jinja2. In addition, we use pandas to handle tables and test a couple of html-to-pdf tools.

What is Jinja2?

This is excerpted from Jinja2’s documentation:

Jinja2 is a modern and designer-friendly templating language for Python, modelled after Django’s templates. It is fast, widely used and secure with the optional sandboxed template execution environment.

The name Jinja was chosen because it’s the name of a Japanese temple and temple and template share a similar pronunciation. It is not named after the city in Uganda.

In a nutshell, Jinja bridges our Python code and html files which will be shown to end users. By placing placeholders `` in a html template, in Python, Jinja can pass actual values to the placeholders when rendering html files. If this sounds too abstract, the concept will become clear when we see the code later.

A concrete but simple example

Let’s generate invoices according to the following sales table:

ID Invoice Name Address Item Cost
1 A0001 Alix Volkstr. 1 book 12Euro
2 A0002 Juli Volkstr. 2 laptop 500Euro
3 A0628 Ruo Volkstr. 3 laptop 1245Euro

Each customer needs an invoice pdf file which is named by the customer’s name. Our basic idea is to first generate html files and then convert them to pdf files.

To start with, I have already composed a template html file based on this repo. This is how the template looks like:

template

As you may see from the template, the column names in our table have corresponding placeholders.

Use Jinja to render html files

Now we can write Python code to pass the values in the table to html files. The following function can render one html file for a given row.

import jinja2

def render_html(row):
    """
    Render html page using jinja
    """
    template_loader = jinja2.FileSystemLoader(searchpath="./")
    template_env = jinja2.Environment(loader=template_loader)
    template_file = "layout.html"
    template = template_env.get_template(template_file)
    output_text = template.render(
        name=row.Name,
        address=row.Address,
        date=get_date(),
        invoice=row.Invoice,
        item=row.Item,
        amount=row.Cost
        )

    html_path = f'{row.Name}.html'
    html_file = open(html_path, 'w')
    html_file.write(output_text)
    html_file.close()

What this code does:

  1. tell Jinja where the template is;
  2. pass values to the placeholder in the template when rendering;
  3. write the rendered output to a html file.

Read the table using Pandas

The table is stored as sample.csv, we can use pandas to iterate through every row to have named tuples which can then be passed to render_html:

df = pd.read_csv('sample.csv')
for row in df.itertuples():
    render_html(row)

and three html files will be generated.

htmls.

To check if the values are passed correctly, let’s view the ruo.html:

ruo

The output actually looks all right, which is great.

Convert html to pdf

In python, there are also several options for converting html to pdf, pdfkit, weasyprint, xhtml2pdf, to name but a few.

Several factors like the template css style and the browser for viewing the html files can make the pdfs look quite differently from what you see from the browser. For instance, since the template html above has English, German and Chinese, we may need to specify encoding schemes for rendering all characters correctly. After playing different tools for a while, I have chosen pdfkit, which is a python wrapper for wkhtmltopdf. An exhaustive list of configurations can be found here.

Again I have written a function for converting:

def html2pdf(html_path, pdf_path):
    """
    Convert html to pdf using pdfkit which is a wrapper of wkhtmltopdf
    """
    options = {
        'page-size': 'Letter',
        'margin-top': '0.35in',
        'margin-right': '0.75in',
        'margin-bottom': '0.75in',
        'margin-left': '0.75in',
        'encoding': "UTF-8",
        'no-outline': None,
        'enable-local-file-access': None
    }
    with open(html_path) as f:
        pdfkit.from_file(f, pdf_path, options=options

Note that the specified options are from the wkhtmltopdf configuration list. For entries without values, simply specify them to be None.

Put them all together

To have clean working folder, directories res and tables are created for saving generated files and the original table respectively.

import os
import jinja2
import pdfkit
import pandas as pd
import numpy as np
from datetime import date

def render_html(row):
    """
    Render html page using jinja based on layout.html
    """
    template_loader = jinja2.FileSystemLoader(searchpath="./")
    template_env = jinja2.Environment(loader=template_loader)
    template_file = "layout.html"
    template = template_env.get_template(template_file)
    output_text = template.render(
        name=row.Name,
        address=row.Address,
        date=get_date(),
        invoice=row.Invoice,
        item=row.Item,
        amount=row.Cost
        )

    html_path = f'./res/{row.Name}.html'
    html_file = open(html_path, 'w')
    html_file.write(output_text)
    html_file.close()
    print(f"Now converting {row.Name} ... ")
    pdf_path = f'./res/{row.Name}.pdf'    
    html2pdf(html_path, pdf_path)   

def html2pdf(html_path, pdf_path):
    """
    Convert html to pdf using pdfkit which is a wrapper of wkhtmltopdf
    """
    options = {
        'page-size': 'Letter',
        'margin-top': '0.35in',
        'margin-right': '0.75in',
        'margin-bottom': '0.75in',
        'margin-left': '0.75in',
        'encoding': "UTF-8",
        'no-outline': None,
        'enable-local-file-access': None
    }
    with open(html_path) as f:
        pdfkit.from_file(f, pdf_path, options=options)

def get_date():
    "Get today's date in German format"
    today = date.today()
    return today.strftime("%d.%m.%Y")

if __name__ == "__main__":

    df = pd.read_csv('tables/sample.csv')
    for row in df.itertuples():
        render_html(row)

Running the script would generate both html files and pdf files for all rows. The code is also available on github.

Code dependencies

System wide: wkhtmltopdf, installers for different OS can be found here.

Python3:

  • pdfkit
  • jinja2
  • pandas