Skip to content

Lifecycle Hooks

Register callbacks at different stages of the extraction process.

Before Extract

Called before extraction begins.

def before_extract(obj_type: type[Any]) -> None:
    print(f"Starting extraction for {obj_type.__name__}")

extractor.register_before_extract_hook(before_extract)

After Extract

Called after field extraction, before derived field hooks execute.

def after_extract(obj_type: type[Any], metadata: dict[str, FieldMetadata]) -> None:
    for field_meta in metadata.values():
        if field_meta.field_type == str:
            field_meta.extra["string_field"] = True

extractor.register_after_extract_hook(after_extract)

After Derived

Called after all derived field hooks. Receives the complete metadata including derived fields.

def after_derived(obj_type: type[Any], metadata: dict[str, FieldMetadata]) -> None:
    for field_meta in metadata.values():
        field_meta.extra["processed"] = True

extractor.register_after_derived_hook(after_derived)

Field-Level Hooks

Called for each individual field (including nested fields).

def before_field(field_name: str, field_type: type[Any], parent_type: type[Any]) -> None:
    print(f"Extracting {field_name}")

def after_field(field_name: str, field_meta: FieldMetadata, parent_type: type[Any]) -> None:
    if field_meta.field_type == str:
        field_meta.extra["string_field"] = True

extractor.register_before_field_hook(before_field)
extractor.register_after_field_hook(after_field)

The after_field hook is ideal for populating custom metadata fields per field.

Notes

  • Lifecycle hooks execute only during actual extraction, not on cache hits.
  • Use refresh_cache=True to force re-execution of hooks.
  • When using custom metadata classes, hook callbacks receive and return the custom type.