Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Litera
Reference:

Aspects of creating a corporate question-and-answer system using generative pre-trained language models

Golikov Aleksei

Postgraduate student, Department of Philology and Literature, Department of Russian Language and Literature, Kazan (Volga Region) Federal University (Yelabuga Institute)

109316, Russia, Moscow, Volgogradsky Ave., 42

ag@mastercr.ru
Other publications by this author
 

 
Akimov Dmitrii

ORCID: 0009-0004-2800-4430

PhD in Technical Science

Analyst, LLC "Digital solutions workshop"

109316, Russia, Moscow, Volgogradsky ave., 42

akimovdmitry1@mail.ru
Other publications by this author
 

 
Romanovskii Maksim

Sr. Technology Manager, Deutsche Bank AG

10243, Germany, Berlin, Koppenstra straße, 93

maksim.s.romanovskii@gmail.com
Trashchenkov Sergei

ORCID: 0000-0001-8786-8336

Head of the Department of Programming and Computing Technologies of the Academy of Digital Education, LLC «Mobile e-Learning»

127018, Russia, Moscow, Sushchevsky Val, 16, p. 4

trashchenkov@gmail.com

DOI:

10.25136/2409-8698.2023.12.69353

EDN:

FSTHRW

Received:

17-12-2023


Published:

25-12-2023


Abstract: The article describes various ways to use generative pre-trained language models to build a corporate question-and-answer system. A significant limitation of the current generative pre-trained language models is the limit on the number of input tokens, which does not allow them to work "out of the box" with a large number of documents or with a large document. To overcome this limitation, the paper considers the indexing of documents with subsequent search query and response generation based on two of the most popular open source solutions at the moment – the Haystack and LlamaIndex frameworks. It has been shown that using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, requires the use of an average of several more tokens.    The article used a comparative analysis to evaluate the effectiveness of using generative pre-trained language models in corporate question-and-answer systems using the Haystack and Llamaindex frameworks. The evaluation of the obtained results was carried out using the EM (exact match) metric. The main conclusions of the conducted research on the creation of question-answer systems using generative pre-trained language models are: 1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes. 2. Processing information using the Haystack framework with the best settings allows you to get somewhat more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings). 3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in the chunk. 4. On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework. 5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.


Keywords:

generative anguage models, information retrieval system, QA-system, indexing, Haystack, LlamaIndex, chunk, exact match, token, retriever

This article is automatically translated.

1           Introduction

Question-and-answer systems appeared in the 1960s [1], and, like other areas of computational linguistics, with the development of machine learning technologies, they have undergone significant changes in recent years. There are two types of question-and–answer systems - extractive and generative [2]. Extractive question-and-answer systems generally provide a short answer to a given question as an answer, often in the form of a quote from a set of documents submitted for input. For example, to the question "in what year was Lord Byron born?" such a system can answer "in 1788" if materials containing a biography of Lord Byron were submitted to this system for processing, which most likely included the sentence "Lord Byron was born in 1788". It is worth noting that many search engines have a similar function: for example, Google will give a similar exact answer above various links to various sites.

Interest in generative language models (and generative question-answer systems, in particular) has increased dramatically after the appearance of large pre-trained models GPT-3 and ChatGPT [3], impressive for their "erudition" and ability to complex reasoning. Generative question-and-answer systems allow you to answer more complex questions in more detail. So, to the question "what do Lermontov and Byron have in common?" the extractive question-and-answer system will most likely not be able to give an answer if such a comparison is not given in the materials submitted to it, while the ChatGPT model (which in this context can be considered a generative question-and-answer system) gives a detailed answer to the question asked: "both were romantic poets, had a reputation for rebels at some point they were exiles, known for their lyricism and ability to convey the beauty of nature, etc."

At the moment (December 2023), ChatGPT-3.5 (the free version of ChatGPT) has been trained on a huge number of publicly available materials that existed in the world before January 2022, so they are able to answer a large number of questions on various branches of knowledge and by default are unable to answer questions on events after January 2022 or according to the data that were not provided to them for training. At the same time, an additional significant advantage would be the ability to submit your own data to these models for input – whether it is some kind of corporate documentation, financial reports or new scientific articles – in order to be able to receive answers and reasoning based on them.

However, such powerful and attractive language models as GPT-3 and ChatGPT have a limit on the number of tokens that can be submitted to them for input – for example, the subspecies of the GPT-3 text-davinci-003 model has a limit of 4,000 tokens per input, i.e. about 3,000 words in English. Thus, it is impossible to directly submit a large number of documents or a large document (containing more than 4,000 tokens) to the GPT?3 and ChatGPT language models to get answers to questions about them. Another approach is to retrain the model on your own additional data – which, however, is not always possible both from a technical point of view, since it requires significant computing resources, and from an organizational point of view, since it requires highly qualified data analysis specialists in the company staff. A third possible approach is to summarize text data in one way or another to a volume of less than 4,000 tokens, however, it is obvious that a significant part of the information in this case will be lost. In many cases, the most attractive way to solve the issue is indexing documents followed by a search query and generating an answer, which can be done both completely independently and using the popular open source (open source) frameworks Haystack, LlamaIndex, which will be discussed in this article.

It is worth noting that many publications are devoted to research in the field of question-answer systems, but most of them are devoted to extractive question-answer systems, since sufficiently high-quality generative large language models appeared later. Among the most recent and relevant scientific materials, including generative question-and-answer systems, articles [4-6] and a dissertation [7] can be distinguished. The author of the above-mentioned dissertation even created a separate service (https://demo.caire.ust.hk /), which works as a generative question-and-answer system for a large number of articles about coronavirus. However, paying tribute to the author of the above-mentioned dissertation and service, it is worth saying that currently, in particular, due to the LlamaIndex frameworks that appeared after the release of ChatGPT, as well as the possibility of using the Haystack framework in conjunction with GPT-3 to build a generative question-answer system, the creation of such a system has become much easier and more accessible, and therefore, the comparison of frameworks and their settings is relevant and of considerable interest.

2 Expanding the possibilities of using large language models by using document indexing

The main way to build question-and-answer systems is to use a retriever to determine the most relevant parts of the text, and then synthesize the answer from the found parts of the text using a so-called reader (for an extractive question-and-answer system) or a generator (for a generative question-and-answer system).

At the same time, for a more efficient search, it is advisable to pre-index the document or set of documents that are supposed to be searched. Indexing refers to the identification and preservation of some key information about parts of documents, with the help of which it is convenient in the future to determine how much this or that part of the text corresponds to the search query (Figure 1).

Figure 1: Simplified scheme of the question-answer system using indexing

As a simple example, indexing using keywords can be given: for each part of the text, the key terms in question are saved, and then, during the search query, the query terms and the saved keywords of the text sections will be compared. So, when searching on the Wikipedia page dedicated to Lord Byron, with the search query "In what year was Lord Byron born?" using the keywords "birth", "to be born" by a retriever, a section of the text "George Gordon Byron was born on January 22, 1788" can be found. And then the task of the reader will be to extract the required information from the found section of the text – i.e. "1788" in this case.

Obviously, such an example of indexing using keywords, although simple, is at the same time not very effective, since in this case it is not entirely clear which words are considered key. One of the more preferable indexing methods in most cases is indexing using the statistical measure TF-IDF, reflecting the importance of a word in the corpus, or variations of the TF-IDF algorithm BM25 [8, 9]. So, when using TF-IDF in the above question, the words "year", "born", "lord", "Byron" will automatically be assigned more weight in the search, since they are less common than the words "in" and "which". However, in this case there is a significant drawback due to the fact that these methods ignore the word order, context, the possibility of replacing words with synonyms, etc.

With the invention of vector semantic models, it became possible to index a document, correlating a section of text with a certain representation in a vector space, the so-called embedding [10, 11]. This indexing method allows you to determine the semantic context, thus overcoming the disadvantages of indexing methods using keywords, TF-IDF and BM25. With the advent of large language models such as BERT, GPT and their variations, it became possible to build sufficiently accurate embeddings in a large-dimensional vector space. In the work, a method was chosen for constructing embeddings using a subspecies of the GPT-3 – ada-002 model from Open AI [12] – a vector in space with dimension 1536 is mapped to the input text.

Also, in addition to choosing a model for building an index, a method of building an index can be chosen – a set of consecutive embeddings corresponding to successive parts of the text (vector store index [13]) (Figure 2) or a hierarchical tree-like structure of the index of indexes (tree index) (Figure 3), consisting in a sequential ascending summation of parts of the text.

Figure 2: Sequential indexing

Figure 3: Hierarchical indexing

Also, in the LlamaIndex framework, two response generation modes are possible based on selected relevant parts of the text – iterative improvement of the response based on each subsequent relevant part of the text ("create and refine" mode) (figure 4) and hierarchical summarization of the response based on relevant parts of the text ("tree summarize" mode) (Figure 5).

Figure 4: Iterative improvement of the response

Figure 5: Hierarchical summarization of the response

It is worth noting that the second popular document indexing framework, Haystack, does not allow you to choose indexing methods and response generation modes so flexibly. The Haystack framework uses sequential indexing by default. Both frameworks are capable of processing various optimized vector representation stores, such as Weaviate, Pinecone, FAISS and others [14, 15].

3           Test results of the question-and-answer system

To assess the quality of the work of the methods described above for processing text data of a sufficiently large volume, a document from April 2022 was selected – the so-called "White Paper of Artificial Intelligence" from the Chinese Academy of Information and Communication Technologies translated into English (https://cset.georgetown.edu/wp-content/uploads/t0442_AI_white_paper_2022_EN.pdf ) containing about 12 thousand words (which is about 16 thousand tokens), which approximately 4 times the limit for processing text data at a time by the current GPT-3 model. A dataset of questions and answers was compiled manually for this document (the person's answers were taken as reference ones).

The open-source Haystack and LlamaIndex frameworks were used for testing, the source code for various testing scenarios was divided into parts (so-called chunks) of 20, 100, 200 and 1000 tokens. In order to minimize the loss of information when dividing the source text into parts, one part of the text was superimposed on neighboring parts for 3 tokens. 

Thus, the full parameters of the test of the question-answer system using generative pre-trained language models are as follows (Table 1).

 

Parameter

Meaning

The framework

Haystack; LlamaIndex

Retriever

ada-002 (a subspecies of the GPT-3 model)

The generator

davinci-003 (a subspecies of the GPT-3 model)

Index structure (only for LlamaIndex)

sequential; hierarchical

Response synthesis mode (only for LlamaIndex)

create and refine; tree summarize

The size of the text parts

20; 100; 200; 1000 tokens

Superimposing parts of the text on top of each other

3 tokens

Metric

Exact match

Table 1: Test parameters of the question-and-answer system

When testing the question?and-answer system, the amount of tokens spent was measured, which are necessary to assess the potential costs of using OpenAI models. At the same time, since a very inexpensive subspecies of the GPT-3 ada-002 model was used to create embeddings, no tokens were counted for building embeddings. The work took into account only the tokens involved in generating the response using the more expensive subspecies of the GPT-3 davinci-003 model.

It is worth noting that when using the hierarchical tree structure of the index (Figure 3), 167510 tokens were spent on the answers of the question-answer system, which is significantly more than the average consumption of tokens for answers without using this index structure (about 30,000 tokens). Therefore, hierarchical indexing had to be abandoned further, and only sequential indexing was used in this study.

Table 2 shows an example of the response of a question-and–answer system with different settings to one of the questions asked for the selected document: "By 2025, will there be more data in the world – structured or unstructured - and by how many times?".

Model Number

Parameters of the question-and-answer system

Answer

True / False

1

Framework: Haystack

Chunk: 20 tokens

By 2025, there will be 163ZB of data in the world, with unstructured data accounting for the majority of the data, estimated to be 80%-90%.

Yes

2

Framework: Haystack

Chunk: 100 tokens

By 2025, there will be more unstructured data, estimated to be 80%-90%, with the global data scale estimated to be 163ZB.

Yes

3

Framework: Haystack

Chunk: 200 tokens

By 2025, the global data scale is estimated to reach 163ZB, with unstructured data accounting for 80%-90%.

Yes

4

Framework: Haystack

Chunk: 1000 tokens

By 2025, the global data scale will reach 163ZB, with unstructured data accounting for 80%-90%.

Yes

5

Framework: LlamaIndex

Indexing: sequential

Chunk: 20 tokens

Response generation mode: create and refine

By 2025, it is likely that there will be more unstructured data in the world, potentially up to five times more than there is today. This is due to the fact that unstructured data currently accounts for 80% of the world's data, and this percentage is expected to increase as more data is generated.

No

6

Framework: LlamaIndex

Indexing: sequential

Chunk: 20 tokens

Response generation mode: tree summarize

By 2025, there will likely be more unstructured data in the world, potentially up to five times more than there is today.

No

7

Framework: LlamaIndex

Indexing: sequential

Chunk: 100 tokens

Response generation mode: create and refine

By 2025, there will be more unstructured data in the world, by a ratio of 80%-90%.

Yes

8

Framework: LlamaIndex

Indexing: sequential

Chunk: 100 tokens

Response generation mode: tree summarize

By 2025, there will be more unstructured data in the world, by a ratio of 80%-90%. This means that there will be at least 80% more unstructured data than structured data in the world by 2025.

Yes

9

Framework: LlamaIndex

Indexing: sequential

Chunk: 200 tokens

Response generation mode: create and refine

By 2025, there will be more unstructured data in the world, by a factor of 8-9 times.

Yes

10

Framework: LlamaIndex

Indexing: sequential

Chunk: 200 tokens

Response generation mode: tree summarize

By 2025, there will be more unstructured data in the world, estimated to be 80%-90% of the total data. This is an increase of approximately 163ZB, which is approximately 163 times more data than there is currently.

Yes

11

Framework: LlamaIndex

Indexing: sequential

Chunk: 1000 tokens

Response generation mode: create and refine

It is difficult to predict the amount of data in the world by 2025. However, it is likely that the amount of data will increase significantly, both in terms of structured and unstructured data. The growth of data is expected to be driven by the increasing use of artificial intelligence and machine learning technologies, as well as the increasing availability of data sources.

No

12

Framework: LlamaIndex

Indexing: sequential

Chunk: 1000 tokens

Response generation mode: tree summarize

It is impossible to predict by how many times the amount of data will increase by 2025, but it is likely that there will be more data in the world - both structured and unstructured - by 2025.

No

Table 2: Example of a question-and–answer system response to the question "By 2025, will there be more data in the world – structured or unstructured - and by how many times?" with different settings

For the question-and–answer system, the proportion of correct answers was determined from Table 2 (Figure 6) - the EM (Exact match) metric [16, 17].:

where M is the number of correct answers, N is the total number of questions in the dataset to be evaluated.

Figure 6: The proportion of correct answers with different settings of the question and answer system (blue is the Haystack framework, orange is LlamaIndex, no hatching is the "create and refine" response generation mode, hatching is the "tree summarize" response generation mode)

It also determined the number of tokens spent when using a subspecies of the GPT-3 model – davinci-003 – to generate answers to questions.

Figure 7: The number of spent response tokens under various settings of the question and answer system (blue - Haystack framework, orange – LlamaIndex, no hatching – "create and refine" response generation mode, hatching – "tree summarize" response generation mode)

Thus, the highest accuracy of the answers was demonstrated by the question-and-answer system using the open-source Haystack framework, with the number of tokens in the chunk 100, 200, 1000 (for all three cases, the accuracy is the same and is 0.7). At the same time, as we can see from Figure 7, the more tokens in a chunk, the more tokens are required to be used when generating a response – which is logical, since the generator creates a response by processing chunks selected by the retriever, which are the larger in size the more tokens in the chunk. For the LlamaIndex framework, the "create and refine" and "tree summarize" response generation modes are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.

4 Conclusion

Generative pre-trained language models (such as ChatGPT) have revolutionized the field of natural language processing. However, their significant limitation is their limit on the number of input tokens, which can be overcome by using index data structures. The paper considered the creation of a question-and–answer system using generative pre-trained language models based on two main open source frameworks - Haystack and LlamaIndex. Based on the document "White Paper of Artificial Intelligence" from the Chinese Academy of Information and Communication Technologies, a dataset of questions and answers was compiled to assess the quality of the question-and-answer system at various settings using the Exact match metric.

The following provisions can be cited as the results of the study:

1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes.

2. Processing information using the Haystack framework with the best settings allows you to get slightly more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings).

3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in a chunk – for the number of tokens in a chunk of 100, 200 and 1000, the accuracy of responses was the same and amounted to 0.7.

4.       On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework.

5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.

Thus, using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, it requires the use of a slightly larger number of tokens on average.

References
1. Simmons, R. F., Klein, S., & McConlogue, K. (1964). Indexing and dependency logic for answering English questions. American Documentation, 15(3), 196-204.
2. Luo, M., Hashimoto, K., Yavuz, S., Liu, Z., Baral, C., & Zhou, Y. (2022). Choose your qa model wisely: A systematic study of generative and extractive readers for question answering. arXiv preprint arXiv:2203.07522.
3. Zhou, C., Li, Q., Li, C., Yu, J., Liu, Y., Wang, G., ... & Sun, L. (2023). A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
4. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
5. Maslyuhin, S. M. (2023). A spoken dialogue-based system with access to an unstructured knowledge base. Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics, 23(1), 88-95
6. Evseev, D. A., Burcev, M. S. (2022). Use of graph and text knowledge bases in the dialogue assistant DREAM. Proceedings of the Moscow Institute of Physics and Technology, 14(3), 21-33.
7. Su, D. (2022). Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness. arXiv preprint arXiv:2211.08386.
8. Kim, M. Y., Rabelo, J., Okeke, K., & Goebel, R. (2022). Legal information retrieval and entailment based on bm25, transformer and semantic thesaurus methods. The Review of Socionetwork Strategies, 16(1), 157-174.
9. Ke, W. (2022, December). Alternatives to Classic BM25-IDF based on a New Information Theoretical Framework. In 2022 IEEE International Conference on Big Data (Big Data) (pp. 36-44). IEEE.
10. Rodriguez, P. L., & Spirling, A. (2022). Word embeddings: What works, what doesn’t, and how to tell the difference for applied research. The Journal of Politics, 84(1), 101-115.
11. Zherebcova, Yu. A., Chizhik, A. V. (2020). Comparison of text vector representation models in the task of chatbot creation. Bulletin of Novosibirsk State University. Series: Linguistics and Intercultural Communication, 18(3), 16-34.
12. Digutsch, J., & Kosinski, M. (2023). Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans. Scientific Reports, 13(1), 5035.
13. Kamnis, S. (2023). Generative pre-trained transformers (GPT) for surface engineering. Surface and Coatings Technology, 129680.
14. Khadija, M. A., Aziz, A., & Nurharjadmo, W. (2023, October). Automating Information Retrieval from Faculty Guidelines: Designing a PDF-Driven Chatbot powered by OpenAI ChatGPT. In 2023 International Conference on Computer, Control, Informatics and its Applications (IC3INA) (pp. 394-399). IEEE.
15. Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3), 535-547.
16. Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
17. Bai, Y., & Wang, D. Z. (2021). More than reading comprehension: A survey on datasets and metrics of textual question answering. arXiv preprint arXiv:2109.12264.

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The topic of the reviewed article is certainly relevant; the author of this work deals with the issue of using a question-and-answer system within the framework of generative pre-trained language models platforms. As noted at the beginning of the study, "question-and-answer systems appeared in the 1960s, and, like other areas of computational linguistics, with the development of machine learning technologies, they have undergone significant changes in recent years. There are two types of question-and–answer systems - extractive and generative. Extractive question-and-answer systems generally provide a short answer to a given question as an answer, often in the form of a quote from a set of documents submitted for input", "interest in generative language models (and generative question-and-answer systems, in particular) has increased dramatically after the appearance of large pre-trained GPT-3 and ChatGPT models [3], impressive for their "erudition" and ability to complex reasoning." The article is well structured, its available volume is sufficient to disclose the topic, indicate the argumentation base, and manifest judgments / conclusions / conclusions. The author examines in detail the question-and-answer mechanism, which is the main generative of pre-trained language models such as ChatGPT, verifies / systematizes the main block of critical sources, evaluates the productivity of this form. The style of work is focused on the scientific type itself; the article is differentiated into semantic blocks, the general analytical logic is aligned throughout the work. The material is quite informative: "the main way to build question-and-answer systems is to use a retriever to determine the most relevant parts of the text, and then synthesize the answer from the found parts of the text using a so-called reader (for an extractive question-and-answer system) or a generator (for a generative question-and-answer system)." The quotation layer is accompanied by a comment; I believe that the work can be useful in the formation of new research of a related thematic focus. The practical component of the material is that "the open-source Haystack and LlamaIndex frameworks were used for testing, the source text for various testing scenarios was divided into parts (so-called chunks) of 20, 100, 200 and 1000 tokens. In order to minimize the loss of information when dividing the source text into parts, one part of the text was superimposed on neighboring parts for 3 tokens", "when testing the question-and-answer system, the number of spent tokens was measured, which are necessary to assess the potential costs of using OpenAI models. At the same time, since a very inexpensive subspecies of the GPT-3 ada-002 model was used to create embeddings, no tokens were counted for building embeddings. The work took into account only the tokens involved in generating the response using the more expensive subspecies of the GPT-3 davinci-003 model." The data obtained during the analysis is structured in a tabular form, the consolidation of data into a single block is justified. The design standard has been maintained, the necessary notes have been made: for example, "Figure 6: The proportion of correct answers with different settings of the question and answer system (blue is the Haystack framework, orange is LlamaIndex, no hatching is the "create and refine" response generation mode, hatching is the "tree summarize" response generation mode)", etc. The results of the work are summarized as follows: "generative pre-trained language models (such as ChatGPT) have revolutionized the field of natural language processing. However, their significant limitation is their limit on the number of input tokens, which can be overcome by using index data structures. The paper considered the creation of a question-and–answer system using generative pre-trained language models based on two main open source frameworks - Haystack and LlamaIndex. Based on the document "White Paper of Artificial Intelligence" from the Chinese Academy of Information and Communication Technologies, a dataset of questions and answers was compiled to assess the quality of the question-and-answer system under various settings using the Exact match metric...", "using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-a response system compared to the open source LlamaIndex framework, however, requires the use of a slightly larger number of tokens on average." The list of sources is reflected in the main text, the format of the reference is taken into account. I believe that the work has a full-fledged appearance, the research topic has been disclosed, the material may be useful to interested readers / researchers of this problem. I recommend the article "Aspects of creating a corporate question-and-answer system using generative pre-trained language models" for open publication in the scientific journal "Litera" ID "Nota Bene".