Document Pipeline

Connecting… 2,940 records · 81,724 pages
Scrapers
Czech National Archives Idle
Czech Security Archives Idle
Findbuch Austria Idle
Austrian State Archives Idle
Matricula Online Idle
Manual Import Idle
Ingested
Documents received and queued for processing
0 records · 0 pages
OCR
Reading each page image with AI to extract text and its position on the page
0 records · 0 pages
85,680 / 85,680 jobs 100%
PaddleOCR 0/2 busy
Qwen VL no workers
PDF Build
Building searchable PDFs by overlaying extracted text onto the original scanned images
0 records · 0 pages
5,812 / 5,812 jobs 100%
0/2 busy
Translation
Translating German and Czech text into English using neural machine translation
0 records · 0 pages
170,081 / 170,081 jobs 100%
0/4 busy 58 failed
Embedding
Generating semantic embeddings from translated text for intelligent search
1 records · 289 pages
9,010 / 9,010 jobs 100%
0/1 busy 1 failed
Complete
Fully processed and searchable
2,939 records · 81,435 pages