Funded by the CIH (Construction Innovation Hub), the i-ReC (Intelligent regulatory compliance) project was a collaborative effort between, BRE Product validation, Northumbria University, and Heriot-Watt University. The UK is heavily investing in MMC (Modern Methods of Construction) and offsite construction techniques. Investment has increased due to the escalating prominence of the benefits of this form of manufacturing in reducing carbon emissions and the influence of extreme weather conditions. A study, called Life Cycle Assessments of The Valentine, found that off-site construction can reduce a project’s carbon emissions by up to 45%. While other studies have shown that extreme weather can extend project time scales up to 21%, causing economic impact throughout the supply chain. Adopting offsite construction methods, can aid in ensuring that construction projects stay on target while reducing carbon emission (Ballesteros-Perez, Smith, Lloyd-Papworth, Cooke 2018). Effects of climate change have pushed the construction industry to move towards off-site manufacturing to reduce these impacts. A key barrier to off-site construction adoption is being able to access and monitor changes to BSI standards simply and efficiently.
The aim of the i-ReC project was to develop an automated process of gathering and checking standards that would be easily searchable using a semantic search engine. The present method of developing building standard databases relies upon the manual research and collection of standards by database operators. Developing a search engine and adding automation to this process would increase project efficiencies and reduce the risk of human error. The team’s approach was innovative to the industry, applying techniques of Natural Language Processing (NLP) alongside Machine Learning (ML) to reduce the manual effort of database maintenance.
The search engine was created by taking a database of standards, then constructing a Knowledge Graph that was based on an ontology relevant to the industry. Ontologies are a formal way to generalize large amounts of data into the groups of categories such as, individuals, classes, properties and axioms. Having a defined ontology makes the process of adding further documents easier, as the defined ontologies are a form of schema that indicate how The Knowledge Graph is to be mapped. Ontologies can also be used to automate the process of adding further documents to a constructed Knowledge Graph, as the content has been predefined through a formal structure.
The database was chosen to be structured as a Knowledge Graph, as it helps create a high level of relevancy from search results. A knowledge Graph is a way to integrate a large amount of data from an array of sources, that enhances the semantic search of search engines. The most well known knowledge graph is arguably Google’s ‘The Knowledge Graph’ – written about in 2012 on the company’s blog. This was Google’s method of moving from search results based on strings to search based on things. For example, a search for “The Shard” in Google returns results based on the London building.
With the use of a Knowledge Graph, Google understands that The Shard is a specific building. It understands that it is a building as the database is made up of a graph of nodes with relationships as connectors.
Our knowledge graph for example would be made up of source nodes which would be the standard documents, with the target nodes comprising of specific product terms ‘asphalt’ etc.
To search the knowledge graph, the use of BERT (Bidirectional Encoder Representations from Transformers) is applied. BERT is a method of Natural Language Processing (NLP) that researchers at Google presented in 2018. What makes BERT unique is that it uses the encoder part of the Transformer. Transformers are a type of neural network architecture that aims to resolve sequence-to-sequence tasks. Therefore, rather than reading tokens sequentially, one element at a time, BERT permits the tokens of a sentence to be read all at once. Its method of reading all at once, permits contextual relationships to be built from words within a sentence. Broken down, the process consists of, positional encoding, attention, and self-attention. (Devlin, Chang, Lee, Toutanova, 2019).
For the search engine, BERT was pre-trained on a text corpus of sentences that were found within relevant documents to the domain. The aim was to gather a high number of documents to pre-train the model, as the more content provided the higher the accuracy of results will be for the model. Once trained, the algorithm will then learn from text it receives, improving its performance over time. This enables the search to be able to decipher the input text more accurately to gather all the relevant standard documents within the knowledge graph.
Development of the standards search engine formed one piece of the puzzle, it needed a ‘front-end’ to work for users in the field, a way to extract information from design information, check standards information within the search engine and return results to the user that can be understood and used. Xbim worked with the University teams to integrate the search engine with our Flex platform. An automated workflow extracts the item terms within BIM (Building Information Models) IFC schema and communicates via the search engine API to search the database. The search returns results of standards referencing the individual items used within the model. As the search engine is built on the BERT model, the search considers each sentence within the collated documents to return the most relevant standards related to the input text that was extracted from the BIM model. The standards are then displayed in our communication channel, providing your wider team with the ability to comment. A further copy of the information is also sent to users registered email addresses.