Python Machine Learning Data Science Project Structure

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate.

This is my current folder structure, but I'm mixing Jupyter Notebooks with actual Python code and it does not seems very clear.

├── cache
├── data
├── my_module
├── logs
├── notebooks
├── scripts
├── snippets
└── tools

I work in the scripts folder and currently adding all the functions in files under my_module, but that leads to errors loading data(relative/absolute paths) and other problems.

I could not find proper best practices or good examples on this topic besides this kaggle competition solution and some Notebooks that have all the functions condensed at the start of such Notebook.

Mar 26 in Machine Learning by Nandini
1 answer to this question.

In response to your question regarding reusing code by files into notebooks, it is found that appending to the system path is the most successful method. This may make some people shudder, but it appears to be the cleanest way of importing code into a notebook without a pip -e install and a lot of module boilerplate.
With the above, one tip is to use the %autoreload and % aimport magics. Here's an illustration:

# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

import os
import sys

# add the 'src' directory as one where we can import modules
source_dir = os.path.join(os.getcwd(), os.pardir, 'src')
# import my method from the source code
%aimport preprocess.build_features
answered Mar 30 by Dev
