Generating Word Documents Using a Template in Python

This blog is an extension of my previous one titled “How to create word documents (.docx) using Python“. In the previous blog, I explained about how we can generate MS Word reports (or documents in general) from scratch using Python. While building a document from scratch in Python gives you all the flexibility of the world, it can be a tedious process to actually “create” the document using codes. What if we make the template in MS Word, load it in Python, and then fill in only the bits that we are interested in using codes? That would be much simpler, wouldn’t it?

In this blog, I will guide you through how we can use a ready-made template and populate it with the values or texts that we need.

Static Approach:

The obvious first step is to create the template in MS Word. In the template, we need to use keywords in between curly braces {} to indicate the values (or text) that need to be imputed. I will refer to these as “tags” in this blog. The tags can be used anywhere in the document (for e.g., main body, tables, header/footer).

Note:

The keywords of the tags should NOT contain white spaces.
Formatting applied on the tags will be applied to their corresponding values (or text) as well.

Below document shows the template that will be used in the first section of the blog. Note that some of the tags appear more than once in the template. Their corresponding values will fill in all of the tags after running the script.

For structural consistency in the MS Word template, we can make use of tables to ensure that the text do not get distorted and remain consistent after running the script (and this is applicable even in general). For two columns, we start with a 2×1 table. The within each cell, the tables with the appropriate number of columns and rows are inserted. Once completed, by selecting the table, the borders can then be removed. By selecting the table, under the “Table Design” tab, go to “Borders > No Border” to remove the borders.

The top section of the template was created using this approach as shown below. Once completed, the borders were removed to make it just like in the template.

Now that the MS Word template is ready, we can start on the coding side. At this point, we will assume that we know the values and will just assign them to the tags. In other cases, these values can potentially be the output of some data analysis.

from datetime import datetime
from docxtpl import DocxTemplate
import jinja2

tpl = DocxTemplate('PATH/TO/THE/TEMPLATE/DOCUMENT')

YEAR = 2023  # this is just a dummy variable to showcase that variables can be used as well

context = {
    "today_date": datetime.now().strftime(format='%B %d, %Y'),
    "report_ref": "REP2023",
    "current_year": YEAR,
    "previous_year": YEAR-1,
    "current_year_total_sales": 150000,
    "previous_year_total_sales": 120000,
    "change_in_sales": "increase",
    "percentage_change": 25,
    "current_year_profit": 19500,
    "previous_year_profit": 10800,
    "footer_text": "Tags can be used in headers/footers as well.",
    }

jinja_env = jinja2.Environment(autoescape=True)
tpl.render(context, jinja_env)
tpl.save('PATH/TO/SAVE/THE/RESULTS/DOCUMENT')

Few things to note here:

The size of the script

In the previous blog, together with all the functions required to make it work, we were talking about more than a hundred lines of code! In this one, if we exclude the library imports, the dummy variable YEAR, and the context dictionary (which I will cover just after), we are using only four lines of code!

The context dictionary

There is a dictionary called context in the scipt. This dictionary will contain all the tags from the MS Word template as keys, and their corresponding values (or text) as the values of the dictionary. Note that the tags in context must be exactly as they are in the MS Word template. Otherwise it will not replace the tag in the template, and the result will still contain the tag in its raw format ({{ tag_name }}).

Data type in the context dictionary

The values of the dictionary, which are to replace the tags in the template, can be of any of the following type:

String
Integer/Float
Pre-defined variable (The variable YEAR was used in the script to demonstrate that)
Functions which return a string/integer/float

Below is the output of the script:

Things to note:

All the tags have been replaced by their assigned values in the context dictionary.
The formatting of tags in the template have been applied to the values as well.
All the other elements of the template did not get affected.

Dynamic Approach:

Now you might be wondering whether there is any way to make the tables dynamic, as everything was pretty static in the previous example. Well it turns out that there IS a way to do that, especially when we are not sure how many rows the tables are going to have (or simply if we do not want to end up with hundreds of tags if we need to create a tag for each cell of a big table).

First, I will go through the general framework of how it should be described in the template, and in the context dictionary. Then I will use pandas to create some dataframes which will then be translated to tables in a dynamic way.

Below is the template that I will be using for this section of the blog:

Things to note:

I have included some dummy standalone tags just for the sake of representation.

The tables in the template (as shown above) are described in a 3 stages process.a. The first row is a merged row.To merge the rows in a template, select the rows. Under the “Layout” tab, in the “Merge” section, click on “Merge Cells”.The tag in the merged cell should be {%tr for f in framework %}. This is used to indicate the beginning of the table.Note that I am using framework1 and framework2 in the template for the first and second tables respectively (since there are 2 tables).b. The second row is NOT merged and contains the column reference tags.The reference tags that are being used are month, sales, and perc_profit. The f is just whatever letter that was used in step (a).c. The last row is a merged row.The last row is a merged row as well. The tag in the merged cell should be {%tr endfor %}. This is used to indicate the end of the table. Note that these can be the same for both tables.

In the context dictionary, the framework should be described as a list of dictionaries.

The key will be the name of the name of the framework. In our case, using the first table as example, the key in the context dictionary will be framework1.

The values will be in the form of a list of dictionaries. The keys of each element should be exactly as the reference tags that were used in the table in the MS Word template. In our case, the keys will be month, sales, and perc_profit.

The first element of the list of dictionaries will be the title of the table in the document (IMPORTANT).

Below shows the general form of how the list of dictionaries of the first table might look like. Note that each element (dictionary) corresponds to a row in the table.

context = {

    'framework1': [
        {
            'month': 'Month',
            'sales': 'Sales (£)',
            'perc_profit': 'Percentage Profit (%)',
        },
        {
            'month': 'month_1',
            'sales': 'sales_1',
            'perc_profit': "perc_1",
        },
        {
            'month': 'month_2',
            'sales': 'sales_2',
            'perc_profit': "perc_2",
        },
        {
            'month': 'month_3',
            'sales': 'sales_3',
            'perc_profit': "perc_3",
        },
        {
            'month': 'month_4',
            'sales': 'sales_4',
            'perc_profit': "perc_4",
        },
    ]
}

With that being said, let us make 2 dataframes of different sizes to showcase how it works:

import pandas as pd
# creating the first dataframe
# note that the column names are the same as the reference tags
df1 = pd.DataFrame({'month': ['January', 'February', 'March', 'April'],
                   'sales': [12750, 18750, 11900, 14800],
                   'perc_profit': [10.4, 12.1, 9.8, 10.6]
                   })
df1
#output
month	sales	perc_profit
0	January	12750	10.4
1	February	18750	12.1
2	March	11900	9.8
3	April	14800	10.6

We need to make some changes to the df before we convert it to a list of dictionaries. We need to get the correct table headers as the very first row of the df. Below is one way of doing that:

# shifting the idx by 1
df1.index = df1.index + 1
# assigning the table headers to 'index 0' in the right order
df1.loc[0] = ['Month', 'Sales (£)', 'Percentage Profit (%)']
# sorting the df by idx to make the newly created row the first row of the df
df1 = df1.sort_index()
df1
#output
month	sales	perc_profit
0	Month	Sales (£)	Percentage Profit (%)
1	January	12750	10.4
2	February	18750	12.1
3	March	11900	9.8
4	April	14800	10.6

Once we get it in this form, we can convert the df to a list of dictionaries by using the below line of code:

nd_array_df1 = df1.to_dict(orient='records')
nd_array_df1
#output
[{'month': 'Month',
  'sales': 'Sales (£)',
  'perc_profit': 'Percentage Profit (%)'},
 {'month': 'January', 'sales': 12750, 'perc_profit': 10.4},
 {'month': 'February', 'sales': 18750, 'perc_profit': 12.1},
 {'month': 'March', 'sales': 11900, 'perc_profit': 9.8},
 {'month': 'April', 'sales': 14800, 'perc_profit': 10.6}]

Now let us do something similar for the second dataframe:

# creating the second dataframe
# note that the column names are the same as the reference tags
df2 = pd.DataFrame({'month': ['February', 'March', 'April', 'May', 'June', 'July'],
                   'sales': [123750, 187500, 131900, 214800, 223500, 176400],
                   'perc_profit': [12.5, 18.1, 13.8, 23.2, 26.4, 17.6]
                   })
# shifting the idx by 1
df2.index = df2.index + 1
# assigning the table headers to 'index 0' in the right order
df2.loc[0] = ['Month', 'Sales (£)', 'Percentage Profit (%)']
# sorting the df by idx to make the newly created row the first row of the df
df2 = df2.sort_index()
df2
#output
month	sales	perc_profit
0	Month	Sales (£)	Percentage Profit (%)
1	February	123750	12.5
2	March	187500	18.1
3	April	131900	13.8
4	May	214800	23.2
5	June	223500	26.4
6	July	176400	17.6

nd_array_df2 = df2.to_dict(orient='records')
nd_array_df2
#output
[{'month': 'Month',
  'sales': 'Sales (£)',
  'perc_profit': 'Percentage Profit (%)'},
 {'month': 'February', 'sales': 123750, 'perc_profit': 12.5},
 {'month': 'March', 'sales': 187500, 'perc_profit': 18.1},
 {'month': 'April', 'sales': 131900, 'perc_profit': 13.8},
 {'month': 'May', 'sales': 214800, 'perc_profit': 23.2},
 {'month': 'June', 'sales': 223500, 'perc_profit': 26.4},
 {'month': 'July', 'sales': 176400, 'perc_profit': 17.6}]

Now that we have both dataframes converted to a list of dictionaries, we can put them in our script to generate our document.

from docxtpl import DocxTemplate
import jinja2

tpl = DocxTemplate('PATH/TO/THE/TEMPLATE/DOCUMENT')

context = {
    "header_tag": "You can add tags in the header as well.",
    "table_type": "Dynamic",
    "current_year": 2023,
    "framework1": nd_array_df1,
    "n_rows_1": df1.shape[0]-1,  # note: minus 1 here because we added an extra row for the headers
    "n_columns_1": df1.shape[1],
    "framework2": nd_array_df2,
    "n_rows_2": df2.shape[0]-1,  # note: minus 1 here because we added an extra row for the headers
    "n_columns_2": df2.shape[1],
        }

jinja_env = jinja2.Environment(autoescape=True)
tpl.render(context, jinja_env)
tpl.save('PATH/TO/SAVE/THE/RESULTS/DOCUMENT')

YAY! Just like that, all the dataframes have been converted to tables in a dynamic way.

Conclusion

As we saw in this blog, it is much easier to work with a template rather than having to create the whole document from scratch in Python. Even better, when we have the ability to create the tables in a dynamic way. It saves us a lot of time, especially when we need to generate A LOT of the same type of document (e.g., customer receipts, reports, ).

Well, thank you for taking the time to read my blog! I hope you found this blog insightful and that it helped you in one way or the other.

Cheers All!