freeports_analysis.formats.algorithms.commons

Common utilities and data structures for algorithm pipeline management.

This module provides shared functionality for handling format and pipeline identifiers, including validation schemas and index manipulation utilities.

Functions

add_format_name_index(df)

Extract format name from ID column and add as separate column.

add_pipe_name(df)

Extract pipe name from ID column and add as separate column.

create_index_format_name_pipe(df)

Create complete format-pipe-name index from ID column.

set_index_format_name_pipe(df)

Set multi-index using Format name, Pipe name, and ID columns.

freeports_analysis.formats.algorithms.commons.add_format_name_index(df: DataFrame) DataFrame

Extract format name from ID column and add as separate column.

Parameters:

df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column

Returns:

DataFrame with added ‘Format name’ column

Return type:

pd.DataFrame

Notes

The format name is extracted by removing any pipeline suffix from the ID. For example, ‘Amundi-IT23’ becomes ‘Amundi-IT23’ and ‘Amundi-IT23(pipeline1)’ also becomes ‘Amundi-IT23’.

freeports_analysis.formats.algorithms.commons.add_pipe_name(df: DataFrame) DataFrame

Extract pipe name from ID column and add as separate column.

Parameters:

df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column

Returns:

DataFrame with added ‘Pipe name’ column

Return type:

pd.DataFrame

Notes

The pipe name is extracted from the pipeline suffix in parentheses. For example, ‘Amundi-IT23(pipeline1)’ becomes ‘pipeline1’, while ‘Amundi-IT23’ without a pipeline suffix becomes NaN.

freeports_analysis.formats.algorithms.commons.create_index_format_name_pipe(df: DataFrame) DataFrame

Create complete format-pipe-name index from ID column.

This is a convenience function that combines: 1. Extracting format name from ID 2. Extracting pipe name from ID 3. Setting the multi-index

Parameters:

df (pd.DataFrame) – DataFrame containing algorithm IDs in ‘ID’ column

Returns:

DataFrame with multi-index set to (Format name, Pipe name, ID)

Return type:

pd.DataFrame

Notes

This function provides a complete pipeline for converting algorithm IDs into a structured multi-index format suitable for algorithm lookup and management. It handles both formats with and without pipeline names.

freeports_analysis.formats.algorithms.commons.set_index_format_name_pipe(df: DataFrame) DataFrame

Set multi-index using Format name, Pipe name, and ID columns.

Parameters:

df (pd.DataFrame) – DataFrame with ‘Format name’, ‘Pipe name’, and ‘ID’ columns

Returns:

DataFrame with multi-index set to (Format name, Pipe name, ID)

Return type:

pd.DataFrame

Notes

This creates a hierarchical index that allows efficient lookup of algorithms by format name, pipeline name, and algorithm ID.