Processors
All processors live in lib/processors/. Each has:
- A
*Optiondataclass extendingAbstractProcessorOption(lib/processors/abstraction.py). - A class extending
AbstractImageProcessor.
Processor contract (AbstractImageProcessor)
| Member | Required | Purpose |
|---|---|---|
SUMMARY: str | yes (UI-exposed) | One-liner for the add-step menu. |
OPTIONS: dict[str, ParamSpec] | yes (UI-exposed) | Option specs; the registry’s discovery gate. |
REPLAY_TRAIT: ReplayTrait | for replayable steps | COORDINATE / PIXEL_VALUE / ROI — drives the replay engine (see pipeline.md → Replay pass). |
process(buffer) -> ImageBuffer | list[ImageBuffer] | None | yes | The transform. Mutate buffer and return it; return a list (or set buffer.children) to branch; return None to stop the branch. |
replay(buffer) | no | End-of-chain reconstruction. Default re-runs process(); geometric processors stamp replay_kind/replay_params so the engine fuses their warp instead. |
OPTION_CLASS | no | Explicit options dataclass; default is synthesised from OPTIONS. |
REGISTRY_NAME | no | Registry key; default is the class name. |
The chain calls run(buffer), which wraps process() and enforces the
output-format contract (ImageBuffer, list, or None). __init_subclass__
validates the contract at import (warns on a missing SUMMARY, or OPTIONS
keys with no matching option field).
Common option fields
AbstractProcessorOption contributes only plumbing fields every processor
inherits: debug (bool), debug_dir (str?), timeout_s (float). They are
hidden from the parameter descriptions.
DPIfixer (lib/processors/DPIfixer.py)
Clamp ImageBuffer.dpi into [min_dpi, max_dpi] via cv2.resize. INTER_CUBIC for upsampling, INTER_AREA for downsampling. Updates meta["roi"] (point coords scaled). No-op if change <1 dpi.
options: min_dpi: 100 max_dpi: 300Use it both as input clamp (early) and normalize (e.g. force exactly 300dpi by setting both bounds equal).
SkewFinder (lib/processors/SkewFinder.py)
Two-pass projection-profile deskew:
- Downscale to 400px tall, then binarize (Otsu) — estimation only; the output buffer is untouched by this downscale.
- Coarse search: ±
max_anglein 1° steps. Score =sum(diff(row_sums)^2)on a sheared copy (rows align when angle matches). - Fine search: ±1° around best angle in
accuracysteps. - If
apply_rotationand|angle| ≥ min_angle, rotate viacv2.warpAffine. Border value = white for color, configurable viak_cluster.
Stored as meta["skew"]. meta["roi"] polygon is transformed by the same matrix.
options: max_angle: 30.0 # Search range in degrees min_angle: 0.1 # Minimum angle to actually apply rotation accuracy: 0.1 # Fine-search step apply_rotation: true k_cluster: 0 # 0 = white background. >1 = k-means cluster count for bg color detectionestimate_skew(image) is a module-level helper used by ImageBuffer.deskew.
LayoutBackend (text detection)
lib/processors/layout_backends/ — pluggable abstraction picked via the backend: YAML option on PageDetector.
| Backend | Model | GPU |
|---|---|---|
apple_vision | macOS Vision framework | Neural Engine / GPU automatic |
east | frozen_east_text_detection.pb (~95 MB) in ./model/ or ./models/ | CUDA via cv2.dnn if OpenCV is CUDA-built |
dbnet | PP-OCR det ONNX (v3/v4/v5) in ./model/ or ./models/ | same as above |
heuristic | none | CPU only |
auto | macOS: apple_vision → east → dbnet → heuristic. Linux/Windows: east → dbnet → heuristic. | inherits |
Each backend reports uses_gpu. The chain stamps meta['gpu'] = True on every node produced by a GPU-backed processor; the web UI then renders a 🚀 next to the step in the per-scan timing bar.
PageDetector (lib/processors/PageDetector.py)
Apple Vision text detection → merge overlapping x-spans → optional reduce-to-N → emit child buffers.

- Optionally downscale to
processing_dpifor detection (defaults to no downscale). - The configured
LayoutBackend(auto→apple_visionon macOS).detect(img)returns bounding boxes. smart_merge(boxes)merges horizontally overlapping / close boxes into column groups, scored by themerge_*weights.- If
max_pages > 0and result count exceeds it, theover_capstrategy reduces the surplus:"merge"folds the best-scoring pair together (default), while"discard"drops the smallest page. Single-page modes (sheet,book_flat_x1) useover_cap: discardso marginal text bleeding in from a facing page is thrown away rather than merged into the kept page. Genuine over-splits scoring ≥merge_thresholdalways merge regardless of strategy. - For each page, crop the original image with
margin_mmmargin, build a childImageBuffer. Intersect the parent’smeta["roi"]polygon with the crop rect viacv2.intersectConvexConvexand propagate the result. - Returns
input_buf.children(list of buffers). The chain re-injects each child into the pipeline starting at the next step and prunes the parent file from output.
options: margin_mm: 2.0 # crop margin around each page bbox roi_margin_mm: 1.0 # tighter ROI margin propagated downstream (≤ margin_mm) max_pages: 2 # 0 = infinity over_cap: merge # over-cap reduction: merge | discard (drop smallest) processing_dpi: 150.0 # null = no downscale rescale_threshold: 0.01 merge_threshold: 0.60 # column-merge score cutoff merge_gap_weight: 0.4 # gap term weight in the merge score merge_width_weight: 0.6 # width-similarity term weight merge_gap_norm_cap: 0.15 # cap on the normalized inter-box gapRequires macOS + Apple Vision (the apple_vision backend). If the Vision backend is unavailable, auto falls back to EAST → DBnet → heuristic.
Binarizer (lib/processors/Binarizer.py)

Local adaptive thresholds (Wolf/Sauvola/…) compute a per-window cutoff, so a shadow or lighting gradient across the page doesn’t swallow text the way a single global threshold would.
Dispatcher with four modes:
NONE— pass through.GRAY— convert to grayscale (no thresholding).- DOXA — any algorithm name from
doxapy.Binarization.Algorithms(e.g.WOLF,SAUVOLA,NIBLACK,BERNSEN,BHT,BRADLEY_ROTH,OTSU). Family-specific params (window_mm_<family>,window_px_<family>,k_<family>) go throughdoxapy.Binarization.to_binary(binary, params).
options: method: "wolf++" window_mm_wolf: 3.2 # Wolf window in mm; each family has its own window_mm_<family>/k_<family> k_wolf: 0.5 # Threshold bias for the wolf family roi_shrink: 5 # Erode meta['roi'] mask N×, force out-of-ROI pixels white morpho_close: 2 # Morphological close (0–10) after thresholdROI masking (_apply_roi_mask): if the input buffer carries a meta["roi"] polygon (set by SkewFinder / PageDetector), pixels outside the polygon are forced to white after binarization. roi_shrink is the number of cv2.erode iterations applied to the mask first.
BW inputs are a no-op (pass-through).
TrapezoidalCorrection (lib/processors/TrapezoidalCorrection.py)
Keystone (pure perspective) rectification: text-line baselines → vanishing point (RANSAC + TLS) → column quadrilateral → Zhang-He metric aspect recovery → single cv2.warpPerspective.
All detection (binarize, connected components, morphology, span assembly, baselines, quad) runs at processing_dpi (default 150 — same convention as PageDewarper/PageDetector); the 4 quad corners scale back exactly and only the final warp touches the full-res buffer. Stamps replay_kind: "perspective" with the full 3×3 homography for the replay pass.

options: line_source: connectivity # connectivity | meta (PageDetector boxes) min_line_count: 4 processing_dpi: 150.0 # analysis resolution; final warp is full-res ransac_trials: 200 margin_mm: 2.0 zhang_he_min_skew: 0.05 # skip metric upgrade on near-axis-aligned quadsFalls back to passthrough (Status.REVIEW) when too few baselines or the quad fails convexity/area/aspect sanity checks.
PageDewarper (lib/processors/PageDewarper.py)
Dewarping removes the curvature + perspective that make a photographed book page look bowed: near the spine the page curves and tilts, so every text line bends into a banana shape a single deskew rotation cannot undo. PageDewarper fits a 3-D sheet model to the detected text baselines and re-projects it flat, recovering straight lines. The chain runs deskew first (cheap global tilt), then dewarp for the residual curvature.

| Deskew | Dewarp | |
|---|---|---|
| Model | single rotation angle | 3-D sheet (cubic / sine / B-spline) |
| Fixes | whole-page tilt | curvature and perspective |
| Cost | cheap | optimisation (MLX / JAX / Powell) |
| When | flat sheets, light skew | bound books, curled pages |
Twist is off by default. A free twist gain invents phantom curl on pages that are actually flat, so it is disabled unless a page genuinely needs it.
Sheet-model dewarping built on the page-dewarp library; optimizer backend is
MLX (Apple Metal) → padded JAX → SciPy Powell (backend: auto).
Four sheet models (sheet_model, see lib/processors/sheet_models.py):
cylindrical(default) — stock page-dewarp cubic z(x): every horizontal slice shares one height profile (2 shape DOF: α, β).sine_twist(aliasspline_twistalso accepted) — Fourier-sine height profile (spline_modes= K modes, zero at both page edges) modulated linearly in y by a twist gain γ:z(x,y) = (1 + γ·η)·Σ c_k·sin(kπx/W),η = y/H − 0.5. Captures non-cubic gutter walls and curl that varies top-to-bottom.bspline_twist— clamped cubic B-spline height profile (spline_modes= K free interior control points, endpoints pinned to 0) with the same twist gain γ. Local basis support: a steep gutter wall doesn’t ripple into the flat field the way high sine modes do.flat_spline—bspline_twistspecialised for post-trapezoidal pages, assuming the sheet is flat except for curl at the binding:- graded knots (
knot_gradingg ≥ 1, interior knots at1 − (1 − u)^gtoward the binding): coarse spline over the flat field, high resolution at the gutter wall; - outer-flatness penalty (
flat_outer_penaltyλ): addsλ·Σ w_i·c_i²withw_i = (1 − ξ_i)²(ξ = Greville abscissa, binding at basis t = 1) — far-from-binding control points are pulled to z = 0, the gutter wall stays free. 0 disables; - binding side per page (
binding_side: auto|left|right):autoreads thepage_sidemeta the PageDetector stamps on two-page spreads (left page → bound on its right edge, and vice versa; decided from page coordinates, not A/B order). Pages bound on the left evaluate the basis at t = 1 − x/W (flip) — same compiled objective, flip + weights are runtime inputs so alternating A/B pages never recompiles (grading itself is JIT-baked). Unresolvedautologs and degrades to plain bspline behaviour (no flip, no penalty) for that page.
- graded knots (
For the spline models the K+1 extra params ride at the pvec tail (keypoint indexing untouched). Supported by MLX / padded-JAX / Powell (vendored objective); raw unpadded JAX falls back to cylindrical.
Pipeline:
- Pad input by
dewarp_marginmm with white border. - Downscale to
processing_dpi(default 150) for span analysis; the remap reads full-res pixels. - Build a text mask (mm-sized MORPH_CLOSE, char-scale adaptive); assemble spans; fall back to line-mask morphology if <3 text spans.
- Sample span curves via robust span-level fits (
fit_span_baseline: IRLS Tukey cubic over each text line’s ink profile — descenders/dashes rejected by the loss, keypoints reach line ends).baseline_sourceselects what feeds the model:bottom(baselines),top(x-height toplines),average(midlines), orboth(default — baseline + topline as separate spans, doubling vertical constraints). Toplines are validated to sit 0.3–2.5 x-heights above the baseline. - Optimize the sheet + per-span/per-point coords. With
use_huber(default on) the reprojection loss is pseudo-Huber (huber_delta, normalized units) on MLX/padded-JAX backends — stray spans (footers, captions) can’t drag the sheet.cubic_costregularizes the shape params against phantom curl on flat input (α/β L2 for cylindrical; bending energy for the twist models: Σ(k²c_k)² for sine_twist, second differences of the control polygon for bspline_twist — γ unpenalised). The whole geometry path (solvePnP init, optimise, page dims, remap) runs under the configuredfocal_length. - Remap with an arc-length-uniform x grid (
sheet_models.arclength_x, mid-row profile): output width sized from the sheet’s arc length, so text near the steep gutter side keeps its true width instead of stretching by √(1+z′²). Replay mirrors this via thearc_lenflag inreplay_params. Replay params also carrysheet_model/spline_modes/model_dims/focal_length/support_x/support_y/support_decay/knot_grading/binding_flip(the flat outer penalty is fit-time only, nothing to replay). The supports clamp evaluation outside the fitted span range: decaying tangent extension in x (excursion bounded by |slope|·λ, λ =support_decay≈ 0.05·W), twist-factor freeze in y (margins otherwise pick up amplified phantom curl). - Sanity check: if remap goes out of bounds by more than
max_oobpx, abandon dewarp and return grayscale of padded input (Status.ERROR). Span-count guard (min_spans) passes through withStatus.WARNINGinstead of running an under-constrained fit.
options: backend: auto # auto | mlx | jax | powell sheet_model: cylindrical # cylindrical | sine_twist | bspline_twist | flat_spline spline_modes: 4 # shape DOF K: sine modes / B-spline ctrl points (params = K+1) twist: false # default off; fit γ only for true open-book fan pages binding_side: auto # flat_spline: auto (PageDetector page_side meta) | left | right knot_grading: 2.5 # flat_spline: knot density toward the binding (1 = uniform) flat_outer_penalty: 1.0 # flat_spline: outer-flatness weight λ (0 = off) baseline_source: both # bottom | top | average | both use_huber: true # robust pseudo-Huber reprojection loss huber_delta: 0.005 # pseudo-Huber scale (when use_huber) max_oob: 400.0 page_margin_mm: 5.0 dewarp_margin: 5.0 remap_decimate: 4 shear_cost: 40.0 cubic_cost: 0.0 # shape regularizer (α/β or spline bending); 0 = off focal_length: 1.3 # Overridden by camera calibration if loaded processing_dpi: 150.0 # Span analysis downscale min_spans: 4 min_span_width_ratio: 0.5 # drop partial-width spans (footers, page numbers) kernel_char_mult: 2.0 thickness_char_mult: 3.0 edge_max_length_char_mult: 3.0 line_join_mm: 4.0 # fallback kernel when char scale unknownWhen debug: true or --debug, writes intermediate visualizations to <workspace>/debug/:
<stem>_0_spans.jpg— colored span overlays.<stem>_1_initial.jpg— keypoint projection from initial params (side-by-side with input).<stem>_2_optimized.jpg— initial vs. optimized keypoint projection.
JAX cache lives at ./.jax_cache/ (auto-created). Persisting compilation across runs saves ~5s startup.
Apple Vision detection (lib/processors/layout_backends/apple_vision.py)
AppleVisionBackend is the apple_vision LayoutBackend used by
PageDetector for text-box detection:
detect(img_rgb)— returns the list of bounding boxes (no text).
It uses VNRecognizeTextRequest with RecognitionLevelAccurate and language
correction disabled, wrapped in objc.autorelease_pool to keep memory bounded.
Text recognition for OCR is a separate concern — the apple_vision /
apple_docs OCR engines live under lib/workers/ocr/ (see OCR).
Writing a new processor
- Add a file in
lib/processors/. - Define an option dataclass:
from dataclasses import dataclassfrom lib.processors.abstraction import AbstractProcessorOption, AbstractImageProcessorfrom lib.ImageBuffer import ImageBuffer
@dataclassclass MyOption(AbstractProcessorOption): threshold: int = 42- Define the processor:
from lib.processors.abstraction import ReplayTraitfrom lib.processors.option_specs import _i
class MyProcessor(AbstractImageProcessor): name: str = "MyProcessor" SUMMARY = "One-line description for the add-step menu." OPTIONS = {"threshold": _i(42, 0, 255, "Threshold value.")} OPTION_CLASS = MyOption REPLAY_TRAIT = ReplayTrait.PIXEL_VALUE # omit if the step isn't replayable def __init__(self, options: MyOption): super().__init__(options) self.threshold = options.threshold def process(self, buf: ImageBuffer) -> ImageBuffer: # mutate buf.buffer / buf.meta / set buf.children return buf-
Nothing to register.
lib/processors/registry.pyauto-discovers the class on first access — it scanslib/processors/*.pyforAbstractImageProcessorsubclasses that declareOPTIONS. The GUI add-step menu, the pipeline loader (Initializer), and the worker chain (IntegratedProcessingChain) all read through the registry; noOPTION_MAP/PROCESSOR_REGISTRYedits, noif name == "X"branches. (To extend without touching the repo, drop the file in<APP_DATA>/plugins/processors/and approve it in the trust prompt.) -
Reference it in a pipeline YAML by its registered name —
REGISTRY_NAMEif set, else the class name (case-sensitive).
Return semantics:
- Return the same buffer → chain continues with that buffer.
- Set
buffer.children = [child1, child2, ...](or return a list) → chain branches; each child re-enters at the next step. - Return
None→ branch stops, warning logged.
Drop-in user plugins (no repo edit)
Users add processors (and OCR engines) without modifying the repo by
dropping a *.py file into the per-user plugin dirs:
<APP_DATA>/plugins/processors/ AbstractImageProcessor subclass (SUMMARY + OPTIONS)<APP_DATA>/plugins/ocr/ OcrEngine subclass decorated @register(<APP_DATA> = lib/app_data.plugins_dir(); on macOS
~/Library/Application Support/Aglaia/plugins/….)
Trust gate. Code is not run blindly. At GUI startup
lib/gui/plugin_trust.py (wired in aglaia.py:_qt_app) shows a warning
for every file that is new or whose content changed since it was
accepted, offering Add / Delete / Skip. Accepted files are recorded
in the plugins table of aglaia-config.db with a sha256.
Invariant — import == code execution. Discovery
(lib.app_data.plugins.import_accepted(), called by the processor
registry and lib/workers/ocr/__init__.py) imports only accepted,
sha-matching files; an unacknowledged or modified file is never imported.
Plugin dirs are placed on sys.path so a plugin’s module name resolves
identically inside spawned pipeline workers (spawn re-imports by name).
Headless/CLI has no popup — it loads only already-accepted plugins and prints a “pending” warning for the rest. Acknowledge them once via the GUI.
Threat model: stop a user from blindly running a file he dropped (or that something dropped for him). It is not a defense against an attacker with write access to the data dir — hence no signing.