freeports_analysis.formats.algorithms.commons
Common utilities and data structures for algorithm pipeline management.
This module provides shared functionality for handling format and pipeline identifiers, including validation schemas and index manipulation utilities.
Functions
Extract format name from ID column and add as separate column. |
|
|
Extract pipe name from ID column and add as separate column. |
Create complete format-pipe-name index from ID column. |
|
Set multi-index using Format name, Pipe name, and ID columns. |
- freeports_analysis.formats.algorithms.commons.add_format_name_index(df: DataFrame) DataFrame
Extract format name from ID column and add as separate column.
- Parameters:
df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column
- Returns:
DataFrame with added ‘Format name’ column
- Return type:
pd.DataFrame
Notes
The format name is extracted by removing any pipeline suffix from the ID. For example, ‘Amundi-IT23’ becomes ‘Amundi-IT23’ and ‘Amundi-IT23(pipeline1)’ also becomes ‘Amundi-IT23’.
- freeports_analysis.formats.algorithms.commons.add_pipe_name(df: DataFrame) DataFrame
Extract pipe name from ID column and add as separate column.
- Parameters:
df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column
- Returns:
DataFrame with added ‘Pipe name’ column
- Return type:
pd.DataFrame
Notes
The pipe name is extracted from the pipeline suffix in parentheses. For example, ‘Amundi-IT23(pipeline1)’ becomes ‘pipeline1’, while ‘Amundi-IT23’ without a pipeline suffix becomes NaN.
- freeports_analysis.formats.algorithms.commons.create_index_format_name_pipe(df: DataFrame) DataFrame
Create complete format-pipe-name index from ID column.
This is a convenience function that combines: 1. Extracting format name from ID 2. Extracting pipe name from ID 3. Setting the multi-index
- Parameters:
df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column
- Returns:
DataFrame with multi-index set to (Format name, Pipe name, ID)
- Return type:
pd.DataFrame
Notes
This function provides a complete pipeline for converting algorithm IDs into a structured multi-index format suitable for algorithm lookup and management. It handles both formats with and without pipeline names.
- freeports_analysis.formats.algorithms.commons.set_index_format_name_pipe(df: DataFrame) DataFrame
Set multi-index using Format name, Pipe name, and ID columns.
- Parameters:
df (pd.DataFrame) – DataFrame with ‘Format name’, ‘Pipe name’, and ‘ID’ columns
- Returns:
DataFrame with multi-index set to (Format name, Pipe name, ID)
- Return type:
pd.DataFrame
Notes
This creates a hierarchical index that allows efficient lookup of algorithms by format name, pipeline name, and algorithm ID.