AI FEATURES
Document intelligence
Extract structured cost data from tender documents, bills of quantities, specifications, and scanned drawings. A local OCR and document-parsing pipeline with no external API dependencies.
Supported formats
Extraction pipeline
Documents pass through four processing stages:
Format detection and routing
The pipeline identifies the file format and routes it to the appropriate parser. PDFs are checked for an embedded text layer; if absent, the OCR stage is invoked. DXF/DWG files go to the geometry parser. IFC files go to the property set extractor.
Text extraction or OCR
For text-layer PDFs, content is extracted with layout preservation — tables, column structures, and section headings are retained. For scanned documents, the OCR engine processes each page and reconstructs the reading order using spatial analysis.
Construction-domain parsing
The extracted text is processed by a construction-domain model that identifies: work item descriptions, quantities and units, reference codes (NBS, CAWS, CWICR, CSI MasterFormat, etc.), cost figures, and specification clauses. Tables are parsed column-by-column to preserve row associations between code, description, quantity, unit, and rate.
CWICR matching
Each extracted item is matched against the CWICR database using semantic similarity. Items with a strong match receive an assigned CWICR code. Items below the confidence threshold are returned as unmatched and flagged for manual assignment in the BOQ editor.
Output schema
{
"documentTitle": "Specification Section 03 — Concrete Works",
"processedPages": 24,
"extractedItems": [
{
"rawCode": "E10",
"rawDescription": "In situ concrete in reinforced suspended slabs, thickness 200mm",
"quantity": 840,
"unit": "m²",
"rawRate": null,
"cwicrCode": "02.01.020",
"cwicrTitle": "Reinforced concrete suspended slab, 200mm, C25/30",
"confidence": 0.91
}
],
"unmatched": [
{
"rawDescription": "Specialist post-tensioning system by nominated sub-contractor",
"note": "No CWICR match above threshold — assign manually"
}
],
"processingTimeMs": 4820
}