Abstract
In business cases, there is an increasing need for automated long form question answering (LFQA) systems from business documents, however data for training such systems is not easily achievable. Developing such data sets require a costly human annotation stage where <<question-answer-related document passage>> triplets should be created. In this paper, we present a method to rapidly develop an LFQA dataset from existing logs of help-desk data without need of manual human annotation stage. This method first creates a Siamese-Bert encoder to relate recorded answers with business documents’ passages. For this purpose, the Siamese-Bert encoder is trained over a synthetically created dataset imitating paraphrased document passages using a noise model. The encoder is then used to create the necessary triplets for LFQA from business documents. We train a Dense Passage Retrieval (DPR) system using a bi-encoder architecture for the retrieval stage and a cross-encoder for re-ranking the retrieved document passages. The results show that the proposed method is successful at rapidly developing LFQA systems for business use cases, yielding a 85% recall of the correct answer at the top 1 of the returned results.
Original language | English |
---|---|
Title of host publication | 14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of IC3K 2022 - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management |
Editors | Frans Coenen, Ana Fred, Joaquim Filipe |
Publisher | Science and Technology Publications, Lda |
Pages | 75-82 |
Number of pages | 8 |
ISBN (Electronic) | 9789897586149 |
Publication status | Published - 2022 |
Event | 14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022 - Valletta, Malta Duration: 24 Oct 2022 → 26 Oct 2022 |
Publication series
Name | International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings |
---|---|
Volume | 1 |
ISSN (Electronic) | 2184-3228 |
Conference
Conference | 14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022 |
---|---|
Country/Territory | Malta |
City | Valletta |
Period | 24/10/22 → 26/10/22 |
Bibliographical note
Publisher Copyright:Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
Keywords
- Dense Passage Retrieval
- Long Form Question Answering
- Siamese-BERT