freeports_analysis.formats.algorithms.unstructured
Unstructured algorithm pipeline management.
This module handles the loading and configuration of unstructured PDF processing algorithms for formats with complex or variable layouts that require custom parsing logic.
Functions
|
Get processing pipelines for a specific unstructured format. |
- freeports_analysis.formats.algorithms.unstructured.get_pipes(format_name: str) Tuple[Dict[str, List[Callable]], Dict[str, List[Callable]], Dict[str, List[Callable]]]
Get processing pipelines for a specific unstructured format.
- Parameters:
format_name (str) – Name of the format to get pipelines for
- Returns:
Tuple containing three dictionaries for pdf_filter, text_extract, and deserialize segments. Each dictionary maps pipeline names to lists of processing functions.
- Return type:
Tuple[Dict[str, List[Callable]], Dict[str, List[Callable]], Dict[str, List[Callable]]]
Notes
The function dynamically imports format-specific modules and extracts processing functions. Returns empty dictionaries if the format module is not found.
Modules
ANIMA_EN23 format submodule |
|
Custom pipeline for ANIMA_SGR-IT23.A |
|
Custom pdf filter for ANIMA_SGR-IT24.B |
|
ANIMA_SICAV-EN24 format submodule |
|
Custom pdf filter for ARCA-IT24 format |
|
CANE-EN23 custom functions |
|
Custom pdf filter for FINECO-EN23[IR] format |
|
KAIROS-EN23 format submodule |
|
MEDIOLLANUM_ES24_A format submodule |
|
MEDIOLANUM_ES24_B format submodule. |
|
MEDIOLANUM_IT24_B format submodule |