Sunday, 1 January 2023

Separate script's functions into modules, callable by 2 separate mains

I have a single script that:

  1. imports 2 sets of data: df_height['user', 'height'], df_age['user', 'age']
  2. clean the data
  3. analyse the data: i) sum(height), ii) mean(age), iii) sum(height) * mean(age)
  4. display the data.

I want to:

  • Separate the functions out into modules
  • divide the different analysis into their own 'main'
  • For each analysis, divide into i) import and clean, ii) process iii) display

Here is the complete script (in the comments with #-> I indicate in what folder the function will be moved to):

import pandas as pd
import numpy as np

#1. functions for import data #-> These functions into src/import_data/import_data.py
def get_data_age():  
    df = pd.DataFrame({
        "user_id":     ['1', '2', '3', '4', '5'], 
        "age":         [10,  20,  30, "55", 50], 
    })
    return df

def get_data_height(): 
    df = pd.DataFrame({
        "user_id":     ['5', '7', '12', '5'], 
        "height":      [160, 170, 180, 'replace_this_with_190']
    })
    return df

 #2. functions for cleaning data #-> These functions into src/clean_data/clean_data.py
def clean_age (df): 
    df['age'] = pd.to_numeric(df['age'])
    return df 

def clean_height (df): 
    df['height'] = df['height'].replace("replace_this_with_190", 200)
    return df 

 #3. functions for processing data #-> These functions into src/alghorithms/calculations.py
def alghorithm_age (df):
    return df['age'].mean()

def alghorithm_height (df):
    return df['height'].sum()

 #4. functions in common (display data) #-> This functions into src/display_data/display_data.py
def common_function_display_data (data): 
    print (data)

 #5. function that combines data from alghorithm_height and alghorithm_age #-> This functions into src/alghorithms/calculations.py
def product_age_mean_and_height_sum(mean_age, sum_height): 
    return mean_age * sum_height


#main 1 (age)
df_age = get_data_age()    # -> this step into file main_age/00_import_and_clean_age.py
df_age_clean = clean_age(df_age)  # -> this step into file main_age/00_import_and_clean_age.py
age_mean = alghorithm_age(df_age_clean) # -> this step into main_age/file 01_process_age.py
common_function_display_data(age_mean)# -> this step into main_age/file 02_display_age.py

#main 2 (height)
df_height = get_data_height()# -> this step into file main_height/00_import_and_clean_height.py
df_height_clean = clean_height(df_height)# -> this step into file main_height/00_import_and_clean_height.py
height_sum = alghorithm_height(df_height_clean)# -> this step into main_height/file 01_process_height.py
common_function_display_data(height_sum)# -> this step into file main_height/02_display_height.py

#main 3 (combined)
age_mean_height_sum_product = product_age_mean_and_height_sum(age_mean, height_sum) # -> this step into file main_display_combined/display_combined.py
common_function_display_data(age_mean_height_sum_product)# -> this step into file main_height/02_display_height.py

Here is the final project structure I had in mind.

Project sturcture

Data flow

Problem However when i structure the project as above, I am unable to import modules into the main scripts. I believe this is because they are on parallel levels. getting the following error:

# EXAMPLE for file main_one_age/00_import_and_clean_age.py
---
from ..import_data.import_data import get_data_age
from ..clean_data.clean_data import clean_age

df_age = get_data_age()    # -> this step into file main_age/00_import_and_clean_age.py
df_age_clean = clean_age(df_age)  # -> this step into file main_age/00_import_and_clean_age.py

---
OUT:
    from ..import_data.import_data import get_data_age
ImportError: attempted relative import beyond top-level package
PS C:\Users\leodt\LH_REPOS\src\src>

QUESTIONS

Q: How can I separate the script into modules/main into a common structure? I am not looking for answers that find a workaround to my specific problem ImportError, but a solution that allows me to separate the above code into modules/mains with all the steps needed to make it work, a solution structured like a programmer would (I studied physics..).

Once the bounty opens, I will make one, as it is a very long question.



from Separate script's functions into modules, callable by 2 separate mains

No comments:

Post a Comment