This blog is an extension of my previous one titled “How to create word documents (.docx) using Python“. In the previous blog, I explained about how we can generate MS Word reports (or documents in general) from scratch using Python. While building a document from scratch in Python gives you all the flexibility of the world, it can be a tedious process to actually “create” the document using codes. What if we make the template in MS Word, load it in Python, and then fill in only the bits that we are interested in using codes? That would be much simpler, wouldn’t it?
In this blog, I will guide you through how we can use a ready-made template and populate it with the values or texts that we need.
Static Approach:
The obvious first step is to create the template in MS Word. In the template, we need to use keywords in between curly braces {}
to indicate the values (or text) that need to be imputed. I will refer to these as “tags” in this blog. The tags can be used anywhere in the document (for e.g., main body, tables, header/footer).
Note:
- The keywords of the tags should NOT contain white spaces.
- Formatting applied on the tags will be applied to their corresponding values (or text) as well.
Below document shows the template that will be used in the first section of the blog. Note that some of the tags appear more than once in the template. Their corresponding values will fill in all of the tags after running the script.
For structural consistency in the MS Word template, we can make use of tables to ensure that the text do not get distorted and remain consistent after running the script (and this is applicable even in general). For two columns, we start with a 2×1 table. The within each cell, the tables with the appropriate number of columns and rows are inserted. Once completed, by selecting the table, the borders can then be removed. By selecting the table, under the “Table Design” tab, go to “Borders > No Border” to remove the borders.
The top section of the template was created using this approach as shown below. Once completed, the borders were removed to make it just like in the template.
Now that the MS Word template is ready, we can start on the coding side. At this point, we will assume that we know the values and will just assign them to the tags. In other cases, these values can potentially be the output of some data analysis.
from datetime import datetime
from docxtpl import DocxTemplate
import jinja2
tpl = DocxTemplate('PATH/TO/THE/TEMPLATE/DOCUMENT')
YEAR = 2023 # this is just a dummy variable to showcase that variables can be used as well
context = {
"today_date": datetime.now().strftime(format='%B %d, %Y'),
"report_ref": "REP2023",
"current_year": YEAR,
"previous_year": YEAR-1,
"current_year_total_sales": 150000,
"previous_year_total_sales": 120000,
"change_in_sales": "increase",
"percentage_change": 25,
"current_year_profit": 19500,
"previous_year_profit": 10800,
"footer_text": "Tags can be used in headers/footers as well.",
}
jinja_env = jinja2.Environment(autoescape=True)
tpl.render(context, jinja_env)
tpl.save('PATH/TO/SAVE/THE/RESULTS/DOCUMENT')
Few things to note here:
- The size of the script
In the previous blog, together with all the functions required to make it work, we were talking about more than a hundred lines of code! In this one, if we exclude the library imports, the dummy variable YEAR
, and the context dictionary (which I will cover just after), we are using only four lines of code!
- The
context
dictionary
There is a dictionary called context
in the scipt. This dictionary will contain all the tags from the MS Word template as keys, and their corresponding values (or text) as the values of the dictionary. Note that the tags in context
must be exactly as they are in the MS Word template. Otherwise it will not replace the tag in the template, and the result will still contain the tag in its raw format ({{ tag_name }}
).
- Data type in the
context
dictionary
The values of the dictionary, which are to replace the tags in the template, can be of any of the following type:
- String
- Integer/Float
- Pre-defined variable (The variable
YEAR
was used in the script to demonstrate that) - Functions which return a string/integer/float
Below is the output of the script:
Things to note:
- All the tags have been replaced by their assigned values in the
context
dictionary. - The formatting of tags in the template have been applied to the values as well.
- All the other elements of the template did not get affected.
Dynamic Approach:
Now you might be wondering whether there is any way to make the tables dynamic, as everything was pretty static in the previous example. Well it turns out that there IS a way to do that, especially when we are not sure how many rows the tables are going to have (or simply if we do not want to end up with hundreds of tags if we need to create a tag for each cell of a big table).
First, I will go through the general framework of how it should be described in the template, and in the context
dictionary. Then I will use pandas
to create some dataframes which will then be translated to tables in a dynamic way.
Below is the template that I will be using for this section of the blog:
Things to note:
- I have included some dummy standalone tags just for the sake of representation.
- The tables in the template (as shown above) are described in a 3 stages process.a. The first row is a merged row.To merge the rows in a template, select the rows. Under the “Layout” tab, in the “Merge” section, click on “Merge Cells”.The tag in the merged cell should be
{%tr for f in framework %}
. This is used to indicate the beginning of the table.Note that I am usingframework1
andframework2
in the template for the first and second tables respectively (since there are 2 tables).b. The second row is NOT merged and contains the column reference tags.The reference tags that are being used aremonth
,sales
, andperc_profit
. Thef
is just whatever letter that was used in step (a).c. The last row is a merged row.The last row is a merged row as well. The tag in the merged cell should be{%tr endfor %}
. This is used to indicate the end of the table. Note that these can be the same for both tables.
In the context
dictionary, the framework should be described as a list of dictionaries.
The key will be the name of the name of the framework. In our case, using the first table as example, the key in the context
dictionary will be framework1
.
The values will be in the form of a list of dictionaries. The keys of each element should be exactly as the reference tags that were used in the table in the MS Word template. In our case, the keys will be month
, sales
, and perc_profit
.
The first element of the list of dictionaries will be the title of the table in the document (IMPORTANT).
Below shows the general form of how the list of dictionaries of the first table might look like. Note that each element (dictionary) corresponds to a row in the table.
context = {
'framework1': [
{
'month': 'Month',
'sales': 'Sales (£)',
'perc_profit': 'Percentage Profit (%)',
},
{
'month': 'month_1',
'sales': 'sales_1',
'perc_profit': "perc_1",
},
{
'month': 'month_2',
'sales': 'sales_2',
'perc_profit': "perc_2",
},
{
'month': 'month_3',
'sales': 'sales_3',
'perc_profit': "perc_3",
},
{
'month': 'month_4',
'sales': 'sales_4',
'perc_profit': "perc_4",
},
]
}
With that being said, let us make 2 dataframes of different sizes to showcase how it works:
import pandas as pd
# creating the first dataframe
# note that the column names are the same as the reference tags
df1 = pd.DataFrame({'month': ['January', 'February', 'March', 'April'],
'sales': [12750, 18750, 11900, 14800],
'perc_profit': [10.4, 12.1, 9.8, 10.6]
})
df1
#output
month sales perc_profit
0 January 12750 10.4
1 February 18750 12.1
2 March 11900 9.8
3 April 14800 10.6
We need to make some changes to the df before we convert it to a list of dictionaries. We need to get the correct table headers as the very first row of the df. Below is one way of doing that:
# shifting the idx by 1
df1.index = df1.index + 1
# assigning the table headers to 'index 0' in the right order
df1.loc[0] = ['Month', 'Sales (£)', 'Percentage Profit (%)']
# sorting the df by idx to make the newly created row the first row of the df
df1 = df1.sort_index()
df1
#output
month sales perc_profit
0 Month Sales (£) Percentage Profit (%)
1 January 12750 10.4
2 February 18750 12.1
3 March 11900 9.8
4 April 14800 10.6
Once we get it in this form, we can convert the df to a list of dictionaries by using the below line of code:
nd_array_df1 = df1.to_dict(orient='records')
nd_array_df1
#output
[{'month': 'Month',
'sales': 'Sales (£)',
'perc_profit': 'Percentage Profit (%)'},
{'month': 'January', 'sales': 12750, 'perc_profit': 10.4},
{'month': 'February', 'sales': 18750, 'perc_profit': 12.1},
{'month': 'March', 'sales': 11900, 'perc_profit': 9.8},
{'month': 'April', 'sales': 14800, 'perc_profit': 10.6}]
Now let us do something similar for the second dataframe:
# creating the second dataframe
# note that the column names are the same as the reference tags
df2 = pd.DataFrame({'month': ['February', 'March', 'April', 'May', 'June', 'July'],
'sales': [123750, 187500, 131900, 214800, 223500, 176400],
'perc_profit': [12.5, 18.1, 13.8, 23.2, 26.4, 17.6]
})
# shifting the idx by 1
df2.index = df2.index + 1
# assigning the table headers to 'index 0' in the right order
df2.loc[0] = ['Month', 'Sales (£)', 'Percentage Profit (%)']
# sorting the df by idx to make the newly created row the first row of the df
df2 = df2.sort_index()
df2
#output
month sales perc_profit
0 Month Sales (£) Percentage Profit (%)
1 February 123750 12.5
2 March 187500 18.1
3 April 131900 13.8
4 May 214800 23.2
5 June 223500 26.4
6 July 176400 17.6
nd_array_df2 = df2.to_dict(orient='records')
nd_array_df2
#output
[{'month': 'Month',
'sales': 'Sales (£)',
'perc_profit': 'Percentage Profit (%)'},
{'month': 'February', 'sales': 123750, 'perc_profit': 12.5},
{'month': 'March', 'sales': 187500, 'perc_profit': 18.1},
{'month': 'April', 'sales': 131900, 'perc_profit': 13.8},
{'month': 'May', 'sales': 214800, 'perc_profit': 23.2},
{'month': 'June', 'sales': 223500, 'perc_profit': 26.4},
{'month': 'July', 'sales': 176400, 'perc_profit': 17.6}]
Now that we have both dataframes converted to a list of dictionaries, we can put them in our script to generate our document.
from docxtpl import DocxTemplate
import jinja2
tpl = DocxTemplate('PATH/TO/THE/TEMPLATE/DOCUMENT')
context = {
"header_tag": "You can add tags in the header as well.",
"table_type": "Dynamic",
"current_year": 2023,
"framework1": nd_array_df1,
"n_rows_1": df1.shape[0]-1, # note: minus 1 here because we added an extra row for the headers
"n_columns_1": df1.shape[1],
"framework2": nd_array_df2,
"n_rows_2": df2.shape[0]-1, # note: minus 1 here because we added an extra row for the headers
"n_columns_2": df2.shape[1],
}
jinja_env = jinja2.Environment(autoescape=True)
tpl.render(context, jinja_env)
tpl.save('PATH/TO/SAVE/THE/RESULTS/DOCUMENT')
YAY! Just like that, all the dataframes have been converted to tables in a dynamic way.
Conclusion
As we saw in this blog, it is much easier to work with a template rather than having to create the whole document from scratch in Python. Even better, when we have the ability to create the tables in a dynamic way. It saves us a lot of time, especially when we need to generate A LOT of the same type of document (e.g., customer receipts, reports, ).
Well, thank you for taking the time to read my blog! I hope you found this blog insightful and that it helped you in one way or the other.
Cheers All!