Export
Aglaïa exports a project to a searchable PDF or Markdown. Both read the chosen stage of each visible page from the project DB.
PDF (lib/workers/pdf_export.py)
A page is encoded by colour type:
- Bitonal (BW) →
build_bitonal_pdf(rows, out, engine=...):"jbig2"(default, smallest) — lossless JBIG2 via the in-treeaglaia_jbig2PyO3 wrapper (encode_page_lossless), embedded as a/JBIG2Decodeimage XObject."g4"— CCITT Group 4 fallback. Used when JBIG2 is unavailable; the encoder probe (from aglaia_jbig2 import encode_page_lossless) degrades gracefully to G4 if the native extension isn’t built.
- Colour / gray → JPEG (
/DCTDecode).
Compression mode (PDFprocessor.create_pdf_from_db): "jbig2" / "g4" /
"native" / "auto". auto uses the bitonal path (JBIG2 if installed,
else G4) when every page is BW, and otherwise falls back to native
(all pages as JPEG). jbig2 / g4 skip non-BW rows; native embeds
every row as a JPEG regardless of type.
An invisible OCR text layer (Helvetica, WinAnsi, render mode 3) is
overlaid per page when OCR results exist, so the PDF is selectable /
searchable while showing the scanned image. PDF object assembly + the
text layer go through pikepdf (qpdf); page rendering for previews uses
pypdfium2 (PDFium).
JBIG2 ships only when the build env was synced with
--extra jbig2(it compiles the Rust crate). See distribution.md; the encoder is credited in ../ABOUT.md.
Markdown
write_markdown turns OCR text into free-flowing Markdown (headings,
paragraphs, dehyphenation, footnotes, lists, cross-page merge). Full
heuristics in markdown_export.md.
What gets exported
Only visible pages of non-deleted scans: queries filter
scans.deleted_at IS NULL AND branches.trashed_at IS NULL. Per-page
visibility is the eye toggle (see gui.md); each page exports
its currently-chosen stage.