Golden Standard Methodology

Purpose The golden_standard methodology provides comprehensive validation through explicit, manual verification of all input data and output results.

Scope This methodology extends basic_check with rigorous manual verification of:

  • All pdf_blks and txt_blks input files

  • Complete data extraction verification

  • Explicit confirmation that no data is missing from results

Covered File Types - All files covered by basic_check methodology - Additional verification of input block files (pdf_blks, txt_blks)

Protocol Steps 1. Complete basic_check protocol (all steps) 2. Input Block Verification:

  • Manually review every pdf_blk and txt_blk file

  • Verify that all relevant data from input blocks is captured in results.pkl

  • Explicitly confirm no data is omitted or incorrectly parsed

  1. Log Analysis: - Verify that .log.csv contains only true anomalies or is empty - Each warning in the log must be reviewed and confirmed as legitimate

  2. Comprehensive Data Validation: - Cross-reference every data point in output files with source blocks - Verify data consistency across all output formats (CSV, YAML, pickle)

  3. Edge Case Verification: - Check handling of unusual data formats or structures - Verify error handling and logging for problematic inputs

Trust Level This methodology provides strong certification - it represents thorough manual verification that all data is correctly processed and no information is lost during extraction.

Applicable Context Use this methodology for: - Release candidate validation - Critical data processing verification - Situations requiring high confidence in data accuracy - Validation of core algorithm functionality

Relationship to basic_check The golden_standard methodology includes and extends all basic_check requirements, providing a superset of validation guarantees.