Refactoring the repository of a former employee: any advice?

My objective is to rework the code in the ML repository that I acquired from a former employee in order to replicate their previous final findings and make it simpler and easier for our team and others to adapt to comparable projects.

I’m sort of inheriting a lot of solutions because the prior owner was astute and had created a solid model. But I’m also inheriting a lot of issues as well, like an unorganized repository with about 50 scripts, peculiar coding techniques, unresolved TODOs, lines that are commented out without a clear explanation, internal redundancies, a scant README, and no documentation outlining the repository’s usage for future users.
Fortunately, my new team has been very accommodating and the expectations are reasonable; they know the codebase is messy and have given me enough of leeway to work things out. However, this is the first time I’ve had to restructure a codebase this size, and with the lack of documentation, I’m feeling a little overwhelmed trying to get everything in my brain.

In what way would you advise handling such a situation?

1 Like

For myself, I would begin at the beginning. taking significant inspiration from his coding and starting the project from scratch. Although I am aware that piecemeal refactoring is a frequent technique, I doubt I would use it for an ML project.

2 Likes

Yeah this is probably what I’d do. Form a plan of how I would have done this project, use their code where possible, fill in the rest myself.

But before you make any code changes, write unit and integrations tests (assuming none) so you know you’re not breaking anything in the process.

1 Like

For myself, I would begin at the beginning. taking significant inspiration from his coding and starting the project from scratch. Although I am aware that piecemeal refactoring is a frequent technique, I doubt I would use it for an ML project.

1 Like

Do you have any tools you suggest for building a dependency tree between the scripts? I started building one manually in a sketchpad app like figma but I’m sure this can be automated so as to be faster and less prone to human error.

I’m going to finish one of these jobs soon. I was handed a project with several subfolders. There are roughly five to six Jupyter notebooks each folder. The team needs to use over thirty different notebooks to obtain the predicted outcome :person_facepalming:. You will have to wait for each notebook to finish, and the results take 1.5 days to obtain. Everything is hardcoded; identical code blocks are repeated all over the place. Not even a readme or comments.

Thus, I had to comprehend the project’s goal, plan my approach, use the already-available scripts as a guide, and proceed step-by-step. I visualized the model in my head; it helps me work better. sporadic work on this project due to other projects. I finished it in around four months. Instead of taking more than a day to complete, we can now run the project with just one line of code and it will finish in a few hours.

Be aware that the vendor, which is regarded as one of the TOP consulting firms, completed this project, and the firm was paid millions of dollars for it. The execution is quite poor, but the results are not horrible.