Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT

Tolga Çekiç, Yusufcan Manav, Batu Helvacıoğlu, Enes Burak Dündar, Onur Deniz, Gülşen Eryiğit

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In business cases, there is an increasing need for automated long form question answering (LFQA) systems from business documents, however data for training such systems is not easily achievable. Developing such data sets require a costly human annotation stage where <<question-answer-related document passage>> triplets should be created. In this paper, we present a method to rapidly develop an LFQA dataset from existing logs of help-desk data without need of manual human annotation stage. This method first creates a Siamese-Bert encoder to relate recorded answers with business documents’ passages. For this purpose, the Siamese-Bert encoder is trained over a synthetically created dataset imitating paraphrased document passages using a noise model. The encoder is then used to create the necessary triplets for LFQA from business documents. We train a Dense Passage Retrieval (DPR) system using a bi-encoder architecture for the retrieval stage and a cross-encoder for re-ranking the retrieved document passages. The results show that the proposed method is successful at rapidly developing LFQA systems for business use cases, yielding a 85% recall of the correct answer at the top 1 of the returned results.

Original languageEnglish
Title of host publication14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of IC3K 2022 - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
EditorsFrans Coenen, Ana Fred, Joaquim Filipe
PublisherScience and Technology Publications, Lda
Pages75-82
Number of pages8
ISBN (Electronic)9789897586149
Publication statusPublished - 2022
Event14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022 - Valletta, Malta
Duration: 24 Oct 202226 Oct 2022

Publication series

NameInternational Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
Volume1
ISSN (Electronic)2184-3228

Conference

Conference14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022
Country/TerritoryMalta
CityValletta
Period24/10/2226/10/22

Bibliographical note

Publisher Copyright:
Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

Keywords

  • Dense Passage Retrieval
  • Long Form Question Answering
  • Siamese-BERT

Fingerprint

Dive into the research topics of 'Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT'. Together they form a unique fingerprint.

Cite this