Glossary
IDP Glossary — Intelligent Document Processing Terms Explained
The world of document AI is full of specialised terminology. This glossary explains the most important concepts — from OCR and classification to visual grounding and straight-through processing.
Table of Contents
Bordereaux · CLOUD Act · Confidence Score · Dark Processing · Data Extraction · DATEV Interface · Document AI · Document Classification · EU AI Act · GDPR-Compliant AI · GoBD · HS Code · Human-in-the-Loop · ICR · IDP · Invoice Capture · Invoice Recognition · Mailroom Automation · Master Data Matching · OCR · Posting Suggestion · Recognition Rate · §203 StGB · Steuergeheimnis (§30 AO) · Straight-Through Processing · Visual Grounding
Bordereaux
A bordereaux is a tabular listing of insurance policies or claims exchanged between insurers and brokers. Bordereaux arrive in hundreds of different Excel formats and need to be normalised. feld.ai detects table structures automatically and converts heterogeneous bordereaux into a uniform format. → Excel Table Processing
CLOUD Act
The US CLOUD Act (Clarifying Lawful Overseas Use of Data Act) of 2018 compels US companies to hand over stored data at the request of US authorities — regardless of where the data is physically located. For European organisations, this means: data at US cloud providers (Azure, AWS, GCP) is subject to US jurisdiction even in EU data centres. → EU-Sovereign Document AI
Confidence Score
A confidence score indicates how certain an AI model is about a recognition or classification — typically as a percentage between 0 and 100. A high score (e.g. 98%) means high certainty; a low score triggers manual review. Thresholds are configurable and determine the level of automation. → Automated Document Verification
Dark Processing
Dark processing refers to fully automatic document processing without human intervention. A document is ingested, classified, extracted, validated and posted — entirely without manual review. The dark processing rate is a key metric for the degree of automation in document workflows.
Data Extraction
Data extraction is the automatic reading of structured information from documents. This includes recognising fields (e.g. name, date, amount), their position in the document, and converting them into machine-readable format. Modern systems work without templates and recognise fields even in unseen layouts. → Data Extraction from Files
DATEV Interface / DATEV Format
The DATEV interface connects document processing with DATEV software used by German tax firms. feld.ai provides a DATEV-compatible export of captured document data (e.g. posting batches); native DATEV integration is in preparation. This lets extracted documents and posting suggestions flow into the firm's workflow. → AI Mailroom for Tax Firms
Document AI
Document AI is the umbrella term for AI systems that can read, understand and process documents. This encompasses OCR, classification, data extraction, validation and semantic understanding. Unlike rule-based systems, AI-based solutions learn from examples and can process previously unseen documents.
Document Classification
Document classification is the automatic assignment of a document to a predefined category — e.g. invoice, contract, notice or reminder. Classification is typically the first step in a document processing pipeline and controls the downstream extraction and filing logic. → Document Classification
EU AI Act
The EU AI Act is the European legal framework for artificial intelligence. It classifies AI systems into risk categories and attaches graduated obligations for transparency, documentation and oversight. For tax firms and legal departments, it is relevant that any AI in use is traceable, documented and operated under their own control — an advantage of sovereign, EU-hosted solutions. → Digital Sovereignty
GDPR-Compliant AI
GDPR-compliant AI refers to AI systems that meet the requirements of the General Data Protection Regulation. This includes data minimisation, purpose limitation, transparency and the right to erasure. For document processing, it is critical that no US cloud provider is in the processing chain and that customer data is not used to train general-purpose models. → EU-Sovereign Document AI
GoBD
GoBD (Grundsätze zur ordnungsmäßigen Führung und Aufbewahrung von Büchern) are German regulations governing digital bookkeeping. They require traceability, immutability and proper retention of digitised documents. Document processing systems that serve the German market must comply with GoBD archiving requirements.
HS Code (Customs Tariff Number)
An HS code (Harmonized System code) is an internationally standardised number code for classifying goods in international trade. feld.ai extracts HS codes from customs documents, certificates of origin and commercial invoices automatically and matches them against reference databases. → Customs Document Automation
Human-in-the-Loop
Human-in-the-loop (HITL) is a concept where humans review, correct and approve AI results. In document processing, this means uncertain recognitions are presented for manual review. Corrections feed directly back into model training and improve future recognition. Your data trains only your own model.
ICR (Intelligent Character Recognition)
ICR extends traditional OCR with the ability to recognise handwritten characters. Where OCR handles printed text, ICR reads handwriting in forms, notes and annotations. Modern multimodal AI models combine OCR and ICR in a single system and achieve good results even with difficult handwriting.
IDP (Intelligent Document Processing)
IDP is the market term for AI-powered systems that read, understand and process documents. IDP combines OCR, classification, data extraction, validation and integration. Gartner, Everest Group and other analysts regularly evaluate IDP vendors in market reports.
Invoice Capture
Invoice capture is the process of systematically ingesting incoming business documents (invoices, receipts, delivery notes) and converting their contents into structured data. Modern invoice capture uses AI instead of manual entry. Extracted data — supplier, amount, date, line items — flows directly into accounting or ERP. → Invoice & Receipt Processing
Invoice Recognition
Invoice recognition is the automatic extraction of header and line-item data from invoices — supplier, invoice number, date, individual items, amounts, IBAN. feld.ai achieves header field recognition rates above 96% and line-item accuracy above 90%, even on unseen invoice layouts. → Invoice & Receipt Processing
Mailroom Automation
Mailroom automation refers to the automatic processing of incoming mail — from classification through data extraction to filing in the correct system. The goal is to reduce manual screening and sorting. feld.ai provides a complete solution for digital mailrooms in businesses and professional firms. → AI-Powered Mailroom
Master Data Matching
Master data matching is the process of automatically checking extracted document data against existing master data (suppliers, customers, accounts, cost centres). Matching identifies assignments, detects discrepancies and keeps master data current. At feld.ai, master data matching is an integral part of document processing. → Master Data Cleansing
OCR (Optical Character Recognition)
OCR converts images of text into machine-readable text. OCR is the foundation of all document processing — without text recognition, no extraction or classification can take place. Modern OCR systems use neural networks and achieve high accuracy even on poor-quality scans.
Posting Suggestion
A posting suggestion is the automatic assignment of a document to general ledger accounts, cost centres and tax codes for accounting. feld.ai generates posting suggestions based on extracted document data and your master data. The suggestion is validated via human-in-the-loop and handed to accounting. → Invoice & Receipt Processing
Recognition Rate
The recognition rate indicates what proportion of fields or documents were correctly recognised. feld.ai achieves header field recognition rates above 96% in production environments. The rate improves continuously through human-in-the-loop corrections that feed directly into model training.
§203 StGB (Breach of Private Secrets)
§203 StGB is a German criminal-law provision that penalises professional secret-holders — such as tax advisors, lawyers and doctors — for unlawfully disclosing entrusted secrets. This includes passing data to service providers. For AI use this means: client data must not flow uncontrolled into third-party clouds. EU hosting and processing under your own control are therefore essential. → Document AI for Tax & Legal
Steuergeheimnis (§30 AO — German Tax Secrecy)
Tax secrecy under §30 of the German Fiscal Code (Abgabenordnung) obliges officials and equivalent persons to keep confidential the circumstances of taxpayers that become known to them in proceedings. For tax firms it underscores the duty of care when handling client data. AI-powered document processing must therefore rely on EU hosting and strict access control so that no data flows to third parties. → AI Mailroom for Tax Firms
Straight-Through Processing (STP)
Straight-through processing refers to the fully automatic processing of a business transaction from start to finish — without manual intervention. In document processing: a document is ingested, extracted, validated, posted and filed in one continuous process. The STP rate is a key efficiency metric.
Visual Grounding
Visual grounding is the ability of an AI system to trace every extracted piece of information or answer back to the exact location in the original document — visually highlighted on the original page. At feld.ai, this means: every answer in Document Chat, every extracted field references the original location in the document. Trust through traceability. → Document Chat