Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

High Value, Low Effort: Automating Code Documentation

The world of generative AI has recently exploded, from early GANs 9 years ago to image generators like Stable Diffusion ControlNet and DALL-E to large language models like ChatGPT, with each new model, the field is constantly improving and it will be fascinating to see how many industries standards will change in the next five years.

I have been quietly exploring and experimenting and have put together a script for a proof of concept that I believe to be a high value low effort use case of the ChatGPT API.

In today's fast-paced software development environment, maintaining documentation for a codebase can be a time-consuming and labour-intensive task, and developers often struggle to keep up with it while also working on critical tasks like developing new features or fixing bugs. This can lead to inconsistencies in documentation, making it challenging for new developers to understand and work with the codebase.

However, the ChatGPT API can help automate the process of code documentation and save businesses a significant amount of time and money. With the new gpt-3.5-turbo-0301 or codex models you can quickly and easily add in-line documentation and comments to your code files, making it easier for developers to understand and modify the code.

Code Documentation with ChatGPT API

The following script connects to a GitHub repository using an access token. Once connected, it retrieves all the contents of the repository, including the contents of subdirectories. It then iterates through each file and checks if it has a programming file extension. If it does, the file's contents are read and sent to the ChatGPT API with a message requesting that comments be added to the file.

The model then generates natural language responses to the request, describing the purpose of each function or section of code. These comments are then automatically added to the code file, enhancing its readability and maintainability.

Once the comments are added, a pull request is created for each file, allowing developers to review and confirm the added comments meet their standards and provide value.

Steps to Run the Code:

1.      Obtain API keys for OpenAI and GitHub.

2.      Add your OpenAI and GitHub API keys, as well as the repository name.

3.      Run the script in your favourite IDE. The script will automatically create a new branch with the added comments and create a pull request for each file.

4.      Review the pull requests and confirm that the added comments meet your standards and provide value.

It's important to note that this script is a proof of concept, a real-world implementation of this functionality would use a self-hosted model from Azure OpenAI. At the moment, the "gpt-3.5-turbo-0301” model is not available on Azure OpenAI but one could fine-tune the Codex models and use the edits endpoint for this use case however for ease of use and to prove the concept I used Chat completions.   

Additionally, the code itself can be refactored and optimized in multiple ways, such as grouping changes on a single pull request at the directory level or ignoring certain files. These options can help streamline the code review process and make it easier for developers to provide feedback on the added comments.

After running the script, you will see PRs created on your repository.

And with comments added in:

Benefits of Automating Code Documentation with ChatGPT API

Automating code documentation can provide several benefits, including:

  • Time-Savings: Automating the process of code documentation can save businesses a significant amount of time and money. Instead of manually documenting each code file, the ChatGPT API can generate natural language responses in a matter of seconds.

  • Improved Code Quality: By adding in-line documentation and comments to your code files, you can improve its readability and maintainability. This, in turn, can improve the overall quality of your codebase and make it easier for developers to work with.

  • Consistent Documentation: Automated code documentation ensures that all code files have consistent and accurate documentation. This is particularly important when onboarding new developers or when working on a large codebase.

  • Enhanced Developer Productivity: By automating code documentation, you can free up developers' time to work on more critical tasks, such as developing new features or fixing bugs.

Example of ChatGPT API Code Documentation at Scale

Let's take a hypothetical scenario where a company has 100 developers who have to spend at least one hour every week documenting code. Assuming a 40-hour workweek, this equates to 4,000 hours of development time spent on documentation each year.

If we estimate the average developer's hourly rate to be $50, that's $200,000 per year that the company is spending on documentation alone. However, if the company were to automate the documentation process using the ChatGPT API, it could save all of those 4,000 hours each year, freeing up developers to work on more critical tasks.

Additionally, the company can ensure that the documentation is consistent, accurate, and up-to-date. This can be challenging to achieve when developers are responsible for documenting their own code, as it often becomes a low-priority task that gets neglected or skipped.

ChatGPT API and other Azure OpenAI models can become valuable tools for businesses looking to streamline their code documentation processes and improve their development workflows. By automating code documentation, businesses can save time and money while ensuring that their codebase is well-documented and maintainable for the long term.  

Do you believe that automating code documentation will become the new standard? Will it lead to more efficient and effective development workflows?