Construction Document Understanding - An AI Powered Solution
Document understanding is a powerful technology in document management designed to automate the manual work performed by document controllers. In construction, general contractors typically have teams of people who work as document controllers responsible for organizing and manually entering information after uploading documents to a shared central location. Procore's Artificial Intelligence team saw an opportunity to leverage document understanding technology to process construction documents and auto-suggest all possible standard construction text fields. Our AI solution helps document controllers reduce repetitive and labor-intensive tasks, significantly reducing the time they spend on data entry.
Document understanding is one of the most challenging topics in artificial intelligence. This technology automates document processing, which is mostly unstructured such as PDF files and document images. Its capability includes, but is not limited to, document classification, information extraction, layout detection, and optical character recognition. The current state-of-the-art technology extensively uses many kinds of natural language processing models, computer vision models, or a combination of both.
General contractors can have teams of document controllers that work full-time on uploading construction documents on their platforms and manually process them. That includes categorizing uploaded documents into categories such as engineering drawings, specifications and others. This also includes manual data entry, i.e. document name, description, and discipline, which is a time consuming and error-prone process.
Furthermore, construction documents are not always in simple line-by-line, left-to-right format. In fact, they could be vertical, horizontal, curved or scattered text. There could be a mixture of text and tables, graphs, plans, and diagrams, which make the problem even more complex. Figure 1 shows an example of typical engineering documents that Procore's customers upload.
We will discuss how we use document understanding that the Procore AI team develops to solve customer's pain points. We start with document classification, then entity recognition, and finally optical character recognition.
Figure 1 - An example of construction document showing complex contents i.e. complex layout, vertical text and scattered text presenting inside tables and plans
First, let's start with the classification problem. When our customers upload their construction documents, they often upload as a batch consisting of different types of documents. For example, one single upload could include drawings, specifications or submittals. Our customers want uploading documents to be automatically classified.
Figure 2 is the example of a type of document that our customers upload. The document on the top is a drawing, while the document on the bottom is a specification. While these documents look very different, they contain similar textual content. Both are about codes and standards, and contain similar groups of words i.e. "code," "electrical," "fire," "requirements," etc. As a result, we cannot use a text classification model to classify them. Instead, Procore’s AI team developed a high accuracy model that takes both text and graphical features into account. The graphical content includes the area, orientation, portion of textual context, portion of line, path and curves etc. This model learns that drawings usually present more graphical components, while specifications contain specification-related keywords.
Figure 2 - A sample portion of engineering drawing (top) and sample portion of engineering specifications (bottom)
Entity Recognition Problem
When document controllers upload documents, they have to enter information related to each document, such as title, description, and document number. We wanted to help our customers automate this process with the auto-suggestion feature. We approached this as an entity recognition problem. Traditionally, we convert a document into a sequence of tokens and determine whether each token is an entity we look for. However, in certain construction documents such as drawings, a sequence of tokens by itself is not sufficient. Figure 3 below is illustrative. In this example, assuming we want to extract the title of this drawing. As can be seen from the sequence of text alone, it is difficult to determine which words (if any) are part of the document title (e.g., "KEY PLAN" or "BUILDING SECTIONS" could be a valid title). To distinguish, we have to rely on graphical information, such as the layout of this document, font size, or the position of words in the document. Procore’s AI team tackled this by developing a highly-accurate AI model that understands not only words, but also their positions and document layout. We trained this model with our hand crafted construction dataset.
Figure 3 - A sample of engineering drawing that auto-suggest model struggles. Both "BUILDING SECTIONS" and "KEY PLAN" could be the drawing title.
Optical Character Recognition
As discussed above, both the classification problem and the entity-recognition problem initially require text extraction from documents. Documents typically uploaded into Procore are PDFs, which are either vector or raster, where different text extraction techniques are needed. Vector PDF files are usually converted from document editing files (e.g., Microsoft Word®). Since textual content is embedded in vector PDF files, we can use PDF Extractor to extract text from them. In contrast, raster PDF files are scanned images and require optical character recognition (OCR) to extract text.
Unfortunately, most off-the-shelf OCR engines that often perform well on typical documents, do not necessarily perform well on construction documents. Procore AI team developed an in-house OCR engine that outperforms off-the-shelf OCR engines on our construction documents. Our deep-learning based model learns on our massive construction dataset so that it can detect text with higher accuracy. Even in extreme cases such as figure 4, where text orientation is curved and surrounded by graphical components, our OCR engine is able to detect words correctly.
Figure 4 - A portion of engineering drawing text that most OCR engines do not work
Procore’s AI team helps our customers automate labor-intensive document management by providing a cutting-edge AI solution - document understanding. The solution includes document classification, entity recognition, and custom OCR. The key technical challenges include the differences between construction documents and typical documents resulting in an off-the-shelf approach not working well. We overcome these challenges by using integrated graphical and texture information to our deep learning models. Our AI solution is highly accurate and is scalable to support our customers across the world.