The Data Science Method (DSM) — A framework on how to take your data science projects to the next level.

Have you landed your first data science job or are you in the midst of a data science boot camp or perhaps a seasoned machine learning professional? Regardless, applying the DSM in your projects will elevate your work to make a greater impact and propel your success as a data scientist.

The biggest difference between people that are successful as data scientists and those that are not, is their ability to effectively frame data science projects and communicate project outcomes.

Of course, you must have all the prerequisite core knowledge of machine learning algorithms, programming abilities, and be eager to become a professional data scientist. The biggest difference between people that are successful in data science roles and those that are not is their ability to effectively frame data science projects and communicate project outcomes. Sometimes this is referred to as data story telling, but that only describes step number nine Review your Results — sharing your findings. Additionally, good stories provide plenty of context and you can think of much of the DSM as identifying the context of your data science story. Starting with the end in mind is one way to glean some guidance — you must know where you are headed in order to take the appropriate steps along the way. This can be difficult depending on the complexity of your data and the business needs requested for the project. Let’s consider the scientific method as a framework, as it provides clear steps to proceed along an experimental path.

Based on the scientific method, I have developed the Data Science Method (DSM), as a way to improve data science project outcomes and take your work to the next level. The DSM is detailed below in well defined steps.


The Data Science Method

  1. Problem Identification
  2. Data Collection, Organization, and Definitions
  3. Exploratory Data Analysis
  4. Pre-processing and Training Data Development
  5. Fit Models with Training Data Set
  6. Review Model Outcomes — Iterate over additional models as needed.
  7. Identify the Final Model
  8. Apply the Model to the Complete Data Set
  9. Review the Results — Share your findings
  10. Finalize Code and Documentation

Now we will address the first and most important step, problem identification, in more detail. The additional steps of the DSM will be described in detail in future articles. Respond to this article with your own Problem Identification answers as you go through the process to help hold you accountable for completing the step.

1. Problem Identification

Problem identification is the first step to a well positioned data science project.

Start by identifying the goal of the data science project. Ask the question: Is this an exploratory project or a predictive modeling project?

If the answer is exploratory, then less planning may be needed at the outset to ensure interesting and meaningful outcomes. It helps to identify what the expected use of the final product is. For an exploratory project try to hypothesize the kind of findings that would be of value before you get started. This is especially true for a clustering or unsupervised project. If your goal is to evaluate the variable correlations and multi-dimensional interactions of your data set, then the initial motivations of the data science project need to be more firmly defined.

Outlined here is a step-by-step approach to Problem Identification, the first step in the DSM.

1. Is the goal of this project exploratory or predictive?

2. Identify the use of the completed model or expected outcome of exploratory work; consider supervised or unsupervised methods.

3. Determine if the data you have will answer number two or if the data needed is available?

4. Define the data timeline or temporal scale of interest?

5. Describe and identify the modeling response variable?

6. Ask yourself if this is a classification or regression problem?

7. What deliverables will be provided at the completion of this modeling project?

Developing answers to the above questions not only helps you gain a focused trajectory of work, but it also provides the key details for model documentation. By answering these questions, you will be faced with connecting your data analysis work with business need, which may have motivated the work in the first place. If you put your data science work in clearly defined terms, you will have a framework for successful implementation within any industry.

To receive updates about DSM or Data Science Professional Development Sign up here.