freeports_analysis.formats.algorithms.unstructured

Unstructured algorithm pipeline management.

This module handles the loading and configuration of unstructured PDF processing algorithms for formats with complex or variable layouts that require custom parsing logic.

Functions

get_pipes(format_name)

Get processing pipelines for a specific unstructured format.

freeports_analysis.formats.algorithms.unstructured.get_pipes(format_name: str) Tuple[Dict[str, List[Callable]], Dict[str, List[Callable]], Dict[str, List[Callable]]]

Get processing pipelines for a specific unstructured format.

Parameters:

format_name (str) – Name of the format to get pipelines for

Returns:

Tuple containing three dictionaries for pdf_filter, text_extract, and deserialize segments. Each dictionary maps pipeline names to lists of processing functions.

Return type:

Tuple[Dict[str, List[Callable]], Dict[str, List[Callable]], Dict[str, List[Callable]]]

Notes

The function dynamically imports format-specific modules and extracts processing functions. Returns empty dictionaries if the format module is not found.

Modules

anima_en23

ANIMA_EN23 format submodule

anima_sgr_it24_a

Custom pipeline for ANIMA_SGR-IT23.A

anima_sgr_it24_b

Custom pdf filter for ANIMA_SGR-IT24.B

anima_sicav_en24

ANIMA_SICAV-EN24 format submodule

arca_it24

Custom pdf filter for ARCA-IT24 format

carne_en23

CANE-EN23 custom functions

fineco_en23_ir

Custom pdf filter for FINECO-EN23[IR] format

kairos_en23

KAIROS-EN23 format submodule

mediolanum_es24_a

MEDIOLLANUM_ES24_A format submodule

mediolanum_es24_b

MEDIOLANUM_ES24_B format submodule.

mediolanum_it24_b

MEDIOLANUM_IT24_B format submodule