Python Machine Learning Data Science Project Structure

0 votes

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate.

This is my current folder structure, but I'm mixing Jupyter Notebooks with actual Python code and it does not seems very clear.

├── cache
├── data
├── my_module
├── logs
├── notebooks
├── scripts
├── snippets
└── tools

I work in the scripts folder and currently adding all the functions in files under my_module, but that leads to errors loading data(relative/absolute paths) and other problems.

I could not find proper best practices or good examples on this topic besides this kaggle competition solution and some Notebooks that have all the functions condensed at the start of such Notebook.

Mar 26 in Machine Learning by Nandini
• 5,480 points

1 answer to this question.

0 votes

In response to your question regarding reusing code by files into notebooks, it is found that appending to the system path is the most successful method. This may make some people shudder, but it appears to be the cleanest way of importing code into a notebook without a pip -e install and a lot of module boilerplate.
With the above, one tip is to use the %autoreload and % aimport magics. Here's an illustration:

# Load the "autoreload" extension
%load_ext autoreload

# always reload modules marked with "%aimport"
%autoreload 1

import os
import sys

# add the 'src' directory as one where we can import modules
source_dir = os.path.join(os.getcwd(), os.pardir, 'src')
# import my method from the source code
%aimport preprocess.build_features
answered Mar 30 by Dev
• 6,000 points

Related Questions In Machine Learning

0 votes
1 answer

Time series analysis - Machine learning python

With machine learning and analysis, its always ...READ MORE

answered Aug 2, 2019 in Machine Learning by Vinod
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What is semi-supervised machine learning?

Hi@Ganesh, Semi-supervised machine learning is a combination of ...READ MORE

answered Jul 19, 2020 in Machine Learning by MD
• 95,340 points
+1 vote
1 answer

Machine Learning and Python Code

You can create an array called actualScore ...READ MORE

answered Dec 13, 2018 in Data Analytics by Shubham
• 13,490 points
0 votes
2 answers
0 votes
1 answer

Training and testing data in machine learning

Unsupervised learning is used with the K-means ...READ MORE

answered Feb 23 in Machine Learning by Dev
• 6,000 points
0 votes
1 answer

Training and testing data in machine learning

Unsupervised learning is used with the K-means ...READ MORE

answered Mar 2 in Machine Learning by Dev
• 6,000 points
Send OTP
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP