With coding and technology being so popular in 2022, people working in the industry are spoilt for choice with the countless options for programming languages. As if the choice of language wasn’t difficult enough, there are also so many IDE (Integrated Development Environment) and code editors to choose from.
Jupyter Notebook is a web-based IDE and has become increasingly popular among the data science community over the past couple of years. It is an interactive web-based notebook allowing you to split up your code into cells, executing one cell at a time. There are so many benefits of working with Jupyter Notebook, these being a few:
- Can run code line by line to get a better understanding of how the code works
- Great for exploratory analysis and visualisations
- Great way to present code alongside outputs
- They have markdown cells allowing you to document work and format nicely
However, it’s not always rainbows and butterflies. When working in Jupyter Notebooks it is easy to get carried away, writing your code cell by cell without thinking too much about the structure. You can easily jump between cells which often leads to variables being overwritten. This can result in errors, incorrect outputs and usually a big mess! If not used cautiously, Jupyter Notebook can encourage bad practice, cause problems with reproducibility and can be a nightmare for somebody wanting to review code.
This is where writing functions can be a game changer. A function is a block of code that is used to perform a specific task. Storing functions in a separate .py script and using the notebooks as complimentary tools to document analysis makes working in Jupyter Notebook much more efficient.
When working on a project, it is difficult not to dwell on the big picture and everything that has to be carried out. Writing functions forces you decompose your problems, performing only one action at a time. Additionally, when you only have to consider small pieces of code, the debugging and unit testing process becomes a little less painful.
All in all, functions prevent duplication of code, increase readability and reproducibility and if saved on a common environment, can be used across multiple different projects.