Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT

Tolga Çekiç, Yusufcan Manav, Batu Helvacıoğlu, Enes Burak Dündar, Onur Deniz, Gülşen Eryiğit

Araştırma sonucu: ???type-name???Konferans katkısıbilirkişi

Özet

In business cases, there is an increasing need for automated long form question answering (LFQA) systems from business documents, however data for training such systems is not easily achievable. Developing such data sets require a costly human annotation stage where <<question-answer-related document passage>> triplets should be created. In this paper, we present a method to rapidly develop an LFQA dataset from existing logs of help-desk data without need of manual human annotation stage. This method first creates a Siamese-Bert encoder to relate recorded answers with business documents’ passages. For this purpose, the Siamese-Bert encoder is trained over a synthetically created dataset imitating paraphrased document passages using a noise model. The encoder is then used to create the necessary triplets for LFQA from business documents. We train a Dense Passage Retrieval (DPR) system using a bi-encoder architecture for the retrieval stage and a cross-encoder for re-ranking the retrieved document passages. The results show that the proposed method is successful at rapidly developing LFQA systems for business use cases, yielding a 85% recall of the correct answer at the top 1 of the returned results.

Orijinal dilİngilizce
Ana bilgisayar yayını başlığı14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of IC3K 2022 - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
EditörlerFrans Coenen, Ana Fred, Joaquim Filipe
YayınlayanScience and Technology Publications, Lda
Sayfalar75-82
Sayfa sayısı8
ISBN (Elektronik)9789897586149
Yayın durumuYayınlandı - 2022
Etkinlik14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022 - Valletta, Malta
Süre: 24 Eki 202226 Eki 2022

Yayın serisi

AdıInternational Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings
Hacim1
ISSN (Elektronik)2184-3228

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???14th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2022 as part of 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2022
Ülke/BölgeMalta
ŞehirValletta
Periyot24/10/2226/10/22

Bibliyografik not

Publisher Copyright:
Copyright © 2022 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

Parmak izi

Long Form Question Answering Dataset Creation for Business Use Cases using Noise-Added Siamese-BERT' araştırma başlıklarına git. Birlikte benzersiz bir parmak izi oluştururlar.

Alıntı Yap