freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b
MEDIOLANUM_ES24_B format submodule.
This module provides processing functions for the MEDIOLANUM_ES24_B format, which handles Spanish financial documents with specific layout characteristics.
Functions
|
Deserialize text blocks into structured data for MEDIOLANUM_ES24_B format. |
|
Filter PDF content for MEDIOLANUM_ES24_B format. |
|
Extract text content from PDF blocks for MEDIOLANUM_ES24_B format. |
Classes
|
Types of PDF blocks for MEDIOLANUM_ES24_B format. |
|
Types of text blocks for MEDIOLANUM_ES24_B format. |
- class freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b.PdfBlockType(*values)
Types of PDF blocks for MEDIOLANUM_ES24_B format.
- class freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b.TextBlockType(*values)
Types of text blocks for MEDIOLANUM_ES24_B format.
- freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b.deserialize(txt_blk: TextBlock | None) Any | None
Deserialize text blocks into structured data for MEDIOLANUM_ES24_B format.
- Parameters:
txt_blk (Optional[TextBlock]) – Text block to deserialize, or None
- Returns:
Deserialized data object or None if input is None
- Return type:
Optional[Any]
Notes
Handles subfund context resolution and applies specific scaling to market values (multiplies by 1000).
- freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b.pdf_filter(xml_root: Element) List[PdfBlock]
Filter PDF content for MEDIOLANUM_ES24_B format.
This function processes XML content from PDF to extract relevant blocks, handling both subfund information and standard financial data.
- Parameters:
xml_root (etree.Element) – Root element of the PDF XML content
- Returns:
List of extracted PDF blocks with their types and metadata
- Return type:
List[PdfBlock]
Notes
The function first checks for specific Spanish regulatory markers (CNMV), then falls back to standard PDF filtering for financial data extraction.
- freeports_analysis.formats.algorithms.unstructured.mediolanum_es24_b.text_extract(pdf_blocks: List[PdfBlock], targets: List[str]) List[TextBlock]
Extract text content from PDF blocks for MEDIOLANUM_ES24_B format.
- Parameters:
pdf_blocks (List[PdfBlock]) – List of PDF blocks to extract text from
targets (List[str]) – List of target identifiers for text extraction
- Returns:
List of extracted text blocks with their types and metadata
- Return type:
List[TextBlock]
Notes
Handles both subfund information extraction and standard financial data extraction with specific column positions.