Documental Breakdown – Using Python to Generate Word Documents and PDFs

Python is a very popular programming language, known for its versatility, straightforward syntax and flexibility. One of its greatest advantages, is its ability to automate repetitive, tedious tasks. For example, if you are wanting to edit the same letter numerous times to include personalised fields, e.g., a person’s name or address, Python can be used to automate this process, significantly reducing the manual burden.

At Bays, we have already published two blogs outlining the process of creating Word Documents from scratch using Python, in addtion to filling in the gaps of an already exisiting Word Document:

This process is quite straightforward, with lots of open-source documentation available online to help. However, when following these tutorials, a couple of things came up that seemed to be slightly more challenging at first glance:

  • Adding a hyperlink to the Word Document
  • Converting the Word Document to a PDF using Python

In this blog, I will cover how to do both of these additional steps.

Before anything else, a Word Document template should be created.

Any information you want to add to this document should be entered in the format {{ x }} as seen in the image below.

Here I have added {{ rt }} and formatted it in a way I want the hyperlink to appear in the document.

For the purpose of this blog, the Python package Faker has been used to generate fake data to fill in the missing fields.

The following cells import all of the required libraries and create the data we want to use to fill in the gaps.

# install packages
! pip install docxtpl
# importing relevant libraries
import pandas as pd
import jinja2
from win32com import client
import os

from datetime import datetime
from docxtpl import DocxTemplate, RichText
from faker import Faker

fake = Faker()
# Creating a dict that contains all information to be added to the word doc
context = {'company_name': "Bays Consulting",
        'first_line_address': fake.address().split('\n')[0],
        'second_line_address': fake.address().split('\n')[1],
        'my_name': "Holly"}

The data created above is simply inserted into the template document in the following step (explained in more detail in the previous blog).

However, in this next step we are also adding a hyperlink to the word document by using a RichText.

jinja_env = jinja2.Environment(autoescape=True)
template = DocxTemplate("example_template.docx")

# the hyperlink we want inserted
url = ""

# adding hyperlink to text
rt = RichText()
rt.add(url, url_id=template.build_url_id(url))
context["rt"] = rt

# rendering document
template.render(context, jinja_env)
print(f'Finished rendering {context["name"]}')"{context['name']}.docx")
Finished rendering Mrs. Jacqueline Medina

Once the above code has been ran, a Word Document should be saved where all of the missing fields are now filled in..

An example of this can be seen below.

Saving to PDF

Once the Word Document has been saved, it can be converted to a PDF using the following function.

def convert_to_pdf(filepath: str):
    """Save a pdf of a docx file."""
        word = client.DispatchEx("Word.Application")
        target_path = filepath.replace(".docx", r".pdf")
        word_doc = word.Documents.Open(filepath)
        word_doc.SaveAs(target_path, FileFormat=17)
    except Exception as e:
        raise e


In conclusion, mastering the art of generating Word documents and PDFs through Python can significantly streamline your workflow, saving you valuable time and effort. By incorporating hyperlinks and seamlessly converting documents to PDFs, you enhance the functionality and accessibility of your documents. As you dive deeper into Python’s capabilities, you’ll discover endless possibilities for automation and efficiency. Stay tuned for more tutorials and tips to improve your Python skills and boost your productivity. Happy coding!

By Holly Jones

Leave a comment

Your email address will not be published. Required fields are marked *