Translate this page:
Please select your language to translate the article


You can just close the window to don't translate
Library
Your profile

Back to contents

Software systems and computational methods
Reference:

The evolution of the Semantic Web technologies: problems and prospects

Lukichev Ruslan Vladimirovich

ORCID: 0000-0002-2293-2410

PhD in Philosophy

Master's degree; Faculty of Software Engineering and Computer Engineering; ITMO University
Director of Development; LLC 'First Festival Company'

198332, Russia, Saint Petersburg, Leninsky ave., 100k3, block 113

ruslanlukichev@gmail.com

DOI:

10.7256/2454-0714.2024.3.71719

EDN:

JJXDYW

Received:

16-09-2024


Published:

05-10-2024


Abstract: The article is devoted to the consideration of key Semantic Web technologies, the analysis of their features, problematic aspects and growth points, which seems especially relevant in the context of import substitution and improving national information security. Special attention is paid to RDF graphs, which are based on an ontology-oriented approach, as well as the OWL language as the main tool for organizing machine-readable data structures with complex relationships between entities, a hierarchy of classes and properties. Attention is also paid to the limitations associated with the security of semantic databases, the need for their simplification, standardization and development of specialized software that meets usability criteria are analyzed. In addition, the prospects for further improvement of these technologies in the context of the Internet of Things and artificial intelligence are outlined. The article uses a comprehensive methodological framework, which implies the use of mainly general scientific methods, in particular, systematic and analytical. The article summarizes and analyzes current developments related to the Semantic Web technologies, which made it possible to identify a number of problems that need to be solved. First of all, the tools available today often have a high entry threshold, are characterized by an excessively complex, featureless interface without functions of complementary prompts and query visualization. Moreover, the Semantic Web languages need standardization and the introduction of a common protocol in order to simplify the process of working with multiformat data aggregated from different sources. Other important issues are ensuring the reliability and relevance of information, its integrity and confidentiality, as well as the contextual conditionality of logical conclusions and compliance with user requests. Among the key prospects is the creation of an intelligent autonomous environment in which devices can freely exchange data and interact with each other at the semantic level in order to provide high-quality personalized services. The provisions of the article can be taken as a basis for the development of domestic systems for structuring and describing data available for machine processing, as well as specialized lecture courses in higher education institutions.


Keywords:

semantic web, ontologies, graph databases, data models, semantic web of things, RDF, RDFS, OWL, SPARQL, XML

This article is automatically translated.

Relevance of the topic and research methodology

The semantic web is a concept proposed exactly a quarter of a century ago by Sir Tim Berners Lee, the creator of the World Wide Web. Even on the eve of the Web 2.0 era, he prophetically declared the need to teach computers to describe certain phenomena on their own, then draw conclusions and, finally, reason [1, p. 184]. On the pages of the book "Weaving the Web: the Origins and Future of the World Wide Web", the founder and permanent head of the W3C Consortium discussed the need to establish the semantic web as an information network in which data would be presented in a machine–readable format or converted into such, in order to process them directly or indirectly by computers that constantly interact with each other [1, p. 177].

The semantic web is still in the process of becoming and includes a number of technologies, including RDF, OWL, SPARQL, which are reviewed and analyzed in this article. The set research task seems to be very relevant.

Firstly, a deeper understanding of the nature of such semantic tools makes it possible to improve the mechanisms of processing big data in a wide variety of subject areas, including medicine, economics, the military-industrial complex, science, education and much more, as well as to improve the process of selecting and interpreting information requested by the user in search engines. All this helps to free up human and time resources allocated to work with data, and automates the exchange of information between various devices, thereby realizing the concept of the Internet of Things.

Secondly, most of the articles by Russian authors over the past few years devoted to the semantic web reveal only some of its aspects, without touching on the problems and prospects of their development in a complex, or suggest studying the use of such technologies in one or another particular type of professional activity. For example, from online education [2], finance and investment [3] to archaeological research [4] or geoanalytics of road networks [5]. However, the breadth and diversity of the use of semantic tools and the ontology-oriented approach associated with it only emphasizes the need for its in-depth and comprehensive consideration.

Thirdly, the chosen topic is of particular importance in the context of the problem of import substitution. The study of semantic web technologies will help to create Russian analogues of foreign web services, ensuring independence from foreign companies and increasing the information security of our country. The main conclusions and provisions of this study can be taken as a theoretical basis for the development of domestic systems for structuring and describing data available for machine processing, as well as specialized lecture courses in higher educational institutions.

As for the research methodology, the article uses a comprehensive methodological framework, which implies the use of mainly general scientific methods, in particular, systematic and analytical, in order to systematize and generalize relevant theoretical and scientific-practical research by both domestic and foreign authors. The inductive method also plays an important role in this work: the alternate consideration of the key features of a number of fundamental semantic technologies contributes to the formation of a more holistic view of the problem of their current limitations and prospects for further development.

Overview of the main semantic technologies

The starting point in the development of the semantic web can rightfully be called the creation of the XML language (eXtensible Markup Language) in 1996, an extensible markup language designed to create, store and transmit structured information in a machine–readable format. The XML document assumes the mandatory presence of a so-called prologue, indicating the language version, and a root element, which in turn contains an arbitrary number of nested elements consisting of opening and closing tags with their corresponding values and attributes, the name of which is determined by the user. With all the advantages of the specified markup language, including its independence from the platform, extensibility and focus on fast and reliable information exchange between programs and devices, it has a significant drawback, namely, it is able to describe only the data structure, and not the meaning inherent in them.

For this reason, the main component of the semantic web has become a technology called RDF (Resource Description Framework). It does not depend on the subject area and supports a graph data model that assumes the presence of a triplet of a subject (entity or resource), an object and a relationship between them, called a predicate. This kind of triplet allows the machine to form logical statements from the information provided to it.

If the RDF model describes subject-object relations, then the relationships between entities were represented in its extension, the primitive ontology language RDF Schema. It provides an opportunity to define the semantics of a specific area of knowledge by defining a dictionary of terms through the organization of a hierarchy of classes and properties.

The concept of ontology is one of the key concepts in the semantic web. It implies a hierarchically structured, formalized set of knowledge about a particular subject area, including a description of its characteristic classes, subclasses and their instances, as well as properties and dependencies between them. For these purposes, the working groups of the W3C consortium have developed a separate, more expressive and effective language than RDFS, OWL (Ontology Web Language), which has a wide range of tools for forming complex ontologies and is based on dexriptional logics. This is "a family of logics created specifically to represent terminologically knowledge, which has its own rich history and features well known in the relevant professional society" [6, p. 88].

Today, the second version of the OWL language remains relevant, which is designed to expand its original standard and provide additional features for modeling complex relationships between data, such as equivalence, equality and logical combinations of classes, their disjointedness and consistency, special property characteristics and support for logical inference. By the way, there are several OWL dialects, including OWL2 DL (Description Logic), which provides computational efficiency and effective support for logical inference based on descriptive logic, but does not have full compatibility with RDF; and OWL2 Full, on the contrary, with maximum expressive power and absolute structural and semantic compatibility with RDF, however without guarantees of effective support for logical inference [6, p. 94].

It is also impossible not to mention another important tool: by analogy with the SQL language, semantic databases have their own query language for effective interaction with RDF graphs, this is SPARQL (SPARQL Protocol and RDF Query Language). It has an SQL-like syntax and supports various types of queries, including fetching data in triplet stores, creating new records, modifying them, and deleting them. SPARQL uses templates to generate queries, and also allows them to be combined and separated, which contributes to a more flexible configuration of data search and processing and the creation of complex queries.

Semantic Web: Limitations and growth points

The technologies discussed above ensure efficient work with data on the semantic web, making them more informative, connected and accessible for machine processing. However, despite significant progress in this area, they have a number of problematic issues that need to be addressed, as well as promising prospects for further improvement and development.

One of the perhaps less obvious problems hindering the widespread adoption of semantic technologies is their complexity and high entry threshold, as well as the lack of intuitive tools for work. The existing semantic web standards should be simplified, making them more accessible and attractive to novice developers [7, p. 14]. And interfaces should be simple and easy to use for both a specialist and an ordinary user, and meet the criteria of the so-called usability [8, p. 14587]. Some domestic authors offer their own developments in this regard – for example, a query editor in SPARQL. The existing solutions do not differ in the expressiveness of the interface and do not have the functions of complementary prompts and query visualization [9, p. 87], which was proposed by specialists from Perm State National Research University in their own version of such an editor.

In addition, the languages used in the semantic web have many dialects, which significantly complicates their study and standardization. Obviously, it is necessary to introduce a single common protocol, since the aggregation of data from various sources remains a difficult task due to the wide variety of their formats and structures.

Another problem is the reliability and relevance of data: rapidly changing information requires constant updating and synchronization with primary sources, which seems to be a very time–consuming process, the performance of which also needs to be optimized. An equally important role is played by the relevance of the aggregated information to the task set by the user, as well as the contextual conditionality of logical conclusions. As noted by foreign researchers, semantic web standards do facilitate data exchange and integration, but they fully reveal their potential only due to the possibility of adapting data for various purposes [10, p. 3389].

Finally, one of the key issues for any type of data, including the semantic web, is its security, including integrity and confidentiality. For example, the issue of preventing attacks based on injections of malicious code into a SPARQL query remains unresolved, and the available proposals for protecting semantic networks do not seem effective [11, p. 41]. A possible solution is to create special cryptographic tools, which are already being developed, in particular, by representatives of the Southern Federal University.

Speaking about the prospects, we note that the main directions for further applied integration of the semantic web today are such areas as big data, machine learning, artificial intelligence and the Internet of Things. The concepts and technologies of the semantic web are actively integrated into networks of related objects to solve compatibility problems that limit the development of the Internet of Things, which ultimately led to the emergence of a new term known as the Semantic Web of Things (SWoT) [12, p. 265]. Automatic extraction and aggregation of information and the possibility of creating intelligent agents capable of providing personalized services adapted to user needs, as well as interoperability with a wide range of devices, opens the way to creating an intelligent autonomous environment in which devices and devices can freely exchange data and interact with each other at the semantic level.

Conclusion

The semantic web is a key element of the Internet of the future, where knowledge will be equally accessible to both people and machines, taking into account their context and semantic component. The technologies used for this purpose, including XML, RDF, RDFS, OWL, SPARQL, provide opportunities for personalized search, aggregation of information from different sources, accelerated and efficient processing of large amounts of data, information exchange between various devices and are used in a wide variety of fields such as education, science, healthcare, transport and finances. Solving the existing problems of the semantic web and implementing its promising development directions will allow us to achieve a new level of interaction with data, which opens up wide opportunities for subsequent technological innovations.

References
1. Berners-Lee, T. (2000). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. New York, USA: HarperCollins Publishers.
2. Shpolyanskaya, I. Yu., & Seredkina, T. A. (2020). Semantic Web Technologies in Organizing Online Learning Support. Systems Analysis in Design and Management, XXIV(3), 343-350. doi:10.18720/SPBPU/2/id20-231
3. Ivashchuk, Yu. S., Orlyanskaya, N. P., & Teshev, V. A. (2023). Development of a Knowledge Base for Investment Activities Based on Ontological Modeling. Bulletin of Adyghe State University, Series 5: Economics, 2(320), 90-98. doi:10.53598/2410-3683-2023-2-320-90-98
4. Petrov, I. D., & Terekhova, Yu. V. (2020). Expansion of the logical model of the subject area of archaeological research using ontologies. Advances in Chemistry and Chemical Technology, 34(6), 133-135.
5. Smirnov, A. V., & Teslya, N. N. (2023). Ontology-oriented geoanalytics for determining the locations of traffic accidents on sections of the street and road network. Proceedings of the Kola Science Center of the Russian Academy of Sciences. Series: Technical Sciences, 14(7), 79-85. doi:10.37614/2949-1215.2023.14.7.008
6. Antoniou, G., Gros, P., van Harmelen, F., & Hoekstra, R. (2016). The Semantic Web. Moscow, Russia: DMK Press.
7. Hogan, A. (2020). The Semantic Web: Two decades on. Semantic Web, 11, 169-185. doi:10.3233/SW-190387 doi:10.3233/SW-190387
8. Hassan, B. (2015). Towards Semantic Web: Challenges and Needs. International Journal Of Engineering And Computer Science, 4(10), 14585-14588. doi:10.48550/arXiv.2105.02708
9. Turova, I. A., & Postanogov, I. S. (2021). Development of an intelligent editor for SPARQL queries. Bulletin of the Novosibirsk State University. Series: Information Technologies, 19(4), 85-95.
10. Jat, A. (2020). Semantic web technologies: challenges and applications. Journal of Critical Reviews, 7(17), 3388-3390. doi:10.31838/jcr.07.17.417
11. Chudinov, P. Yu., Babenko, L. K., & Rogozov, Yu. I. (2022). Analysis of information security problems in semantic networks. Bulletin of the Southern Federal University. Technical sciences, 5(229), 37-47.
12. Amara, F. Z., Hemam, M., Djezzar, M., & Maimor, M. (2022). Semantic Web and Internet of Things: Challenges, Applications and Perspectives. Journal of ICT Standardization, 10(2), 261-291. doi:10.13052/jicts2245-800X.1029

Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The reviewed article is devoted to the study of the evolution of semantic web technologies, it examines the problems and prospects for the development of such technologies. The research methodology is based on the use of general scientific methods, in particular, a systematic approach and an analytical method, which are used to systematize and generalize relevant theoretical and scientific-practical research by domestic and foreign authors, as well as an inductive method for alternately considering the features of various semantic technologies. The authors rightly associate the relevance of the work with the need to understand the nature of the semantic tools used, a comprehensive presentation of the problems and prospects for the development of semantic web technologies and the need to create Russian analogues of foreign web services. The scientific novelty of the reviewed research consists in identifying and systematizing the problem and prospects of the evolution of semantic web technologies. Structurally, the following sections are highlighted in the article: Relevance of the topic and research methodology, Overview of the main semantic technologies, Semantic Web: limitations and growth points, Conclusion and Bibliography. The publication provides an overview of the following technologies: XML (eXtensible Markup Language), RDF (Resource Description Framework), OWL (Ontology Web Language), OWL2 DL (Description Logic), OWL2 Full, SPARQL (SPARQL Protocol and RDF Query Language). Among the problems hindering the widespread dissemination of semantic technologies are such as: their complexity and high entry threshold, lack of intuitive tools for work; the presence of many dialects that make it difficult to study and standardize them; the requirement of constant updating and synchronization with primary sources in a rapidly changing information environment. In conclusion, the authors conclude that XML, RDF, RDFS, OWL, SPARQL technologies provide opportunities for personalized search, aggregation of information from different sources, accelerated and efficient processing of large amounts of data, information exchange between various devices and are used in a variety of fields such as education, science, healthcare, transport and finance. Naturally, the publication does not contain solutions to all the problems of semantic web technologies, but the work done to generalize and systematize them seems necessary. Speaking about the prospects for the development of the technologies under consideration, the authors note the main directions for further applied integration of the semantic web: big data, machine learning, artificial intelligence and the Internet of Things. The bibliographic list includes 21 sources – publications of domestic and foreign scientists in Russian and English on the topic under consideration, to which there are targeted links in the text confirming the existence of an appeal to opponents. The reviewed material corresponds to the direction of the journal "Software Systems and Computational Methods", reflects the results of the work carried out by the authors, contains elements of scientific novelty and practical significance, may arouse interest among readers, and is recommended for publication.