SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts

Krupier, Ruben, Konstas, Ioannis, Gray, Alasdair, Sadeghineko, Farhad, Watson, Richard and Kumar, Bimal (2021) SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts. In: Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics (ACL), Punta Cana, Dominican Republic, pp. 129-143.

2021.nllp-1.14.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (1MB) | Preview
Official URL:


Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to re-search that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt1, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84) de-fined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3).

Item Type: Book Section
Subjects: G400 Computer Science
G500 Information Systems
Department: Faculties > Engineering and Environment > Architecture and Built Environment
Depositing User: Elena Carlaw
Date Deposited: 22 Feb 2023 11:26
Last Modified: 22 Feb 2023 11:30

Actions (login required)

View Item View Item


Downloads per month over past year

View more statistics