Krupier, Ruben, Konstas, Ioannis, Gray, Alasdair, Sadeghineko, Farhad, Watson, Richard and Kumar, Bimal (2021) SPAR.txt, a cheap Shallow Parsing approach for Regulatory texts. In: Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics (ACL), Punta Cana, Dominican Republic, pp. 129-143.
|
Text
2021.nllp-1.14.pdf - Published Version Available under License Creative Commons Attribution 4.0. Download (1MB) | Preview |
Abstract
Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to re-search that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPAR.txt1, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84) de-fined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3).
Item Type: | Book Section |
---|---|
Subjects: | G400 Computer Science G500 Information Systems |
Department: | Faculties > Engineering and Environment > Architecture and Built Environment |
Depositing User: | Elena Carlaw |
Date Deposited: | 22 Feb 2023 11:26 |
Last Modified: | 22 Feb 2023 11:30 |
URI: | https://nrl.northumbria.ac.uk/id/eprint/51471 |
Downloads
Downloads per month over past year