Lifecycle Hooks
Register callbacks at different stages of the extraction process.
Before Extract
Called before extraction begins.
def before_extract(obj_type: type[Any]) -> None:
print(f"Starting extraction for {obj_type.__name__}")
extractor.register_before_extract_hook(before_extract)
After Extract
Called after field extraction, before derived field hooks execute.
def after_extract(obj_type: type[Any], metadata: dict[str, FieldMetadata]) -> None:
for field_meta in metadata.values():
if field_meta.field_type == str:
field_meta.extra["string_field"] = True
extractor.register_after_extract_hook(after_extract)
After Derived
Called after all derived field hooks. Receives the complete metadata including derived fields.
def after_derived(obj_type: type[Any], metadata: dict[str, FieldMetadata]) -> None:
for field_meta in metadata.values():
field_meta.extra["processed"] = True
extractor.register_after_derived_hook(after_derived)
Field-Level Hooks
Called for each individual field (including nested fields).
def before_field(field_name: str, field_type: type[Any], parent_type: type[Any]) -> None:
print(f"Extracting {field_name}")
def after_field(field_name: str, field_meta: FieldMetadata, parent_type: type[Any]) -> None:
if field_meta.field_type == str:
field_meta.extra["string_field"] = True
extractor.register_before_field_hook(before_field)
extractor.register_after_field_hook(after_field)
The after_field hook is ideal for populating custom metadata fields per field.
Notes
- Lifecycle hooks execute only during actual extraction, not on cache hits.
- Use
refresh_cache=Trueto force re-execution of hooks. - When using custom metadata classes, hook callbacks receive and return the custom type.