Abstract
When a new request comes to the existing software, determining whether there will be reuse and determining where the new requests will be mapped in the existing design are important problems. Since this process is done manually by developers in the context we work, it depends on experience and domain knowledge, besides it is an error-prone and time-consuming process due to the human factor. The main purpose of this study is to correctly predict which new requests in the System Design Document (SDD) match which feature set in the existing software’s Software Requirement Specification (SRS) document. We consider the feature mapping problem between SDD items and SRS requirements as a multi-label multi-class classification problem. Zemberek, a Turkish natural language processing library, is used for preprocessing and feature extraction of the SRS document of the existing software and three SDD documents of different systems to which this software will be delivered. The features extracted from the SRS document are categorized under a certain number of feature topics using the LDA algorithm. The FastText algorithm and AdaBoost-based classifier ICSIBoost are used to decide which of the topics from the SRS document represents a feature in the SDD document, and the predictions are compared with manually determined topics by experts. ICSIBoost achieves quite 67% to 90% precision in topic predictions, whereas the FastText algorithm does not meet our expectations for small and imbalanced data.
Original language | English |
---|---|
Title of host publication | Lecture Notes on Data Engineering and Communications Technologies |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 175-186 |
Number of pages | 12 |
DOIs | |
Publication status | Published - 2022 |
Publication series
Name | Lecture Notes on Data Engineering and Communications Technologies |
---|---|
Volume | 143 |
ISSN (Print) | 2367-4512 |
ISSN (Electronic) | 2367-4520 |
Bibliographical note
Publisher Copyright:© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- Feature mapping
- Multi-label multi-class classification
- Software product line
- Topic modeling
- Turkish NLP