








|
Library
|
Your profile |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.|
Philology: scientific researches
Reference:
Sokolov, A.V. (2025). Computational Methods in Comparative Media Analysis: Operationalization of Conflict Narrative Research Using the Example of CNN and Al Jazeera. Philology: scientific researches, 12, 103–120. https://doi.org/10.7256/2454-0749.2025.12.77328
Computational Methods in Comparative Media Analysis: Operationalization of Conflict Narrative Research Using the Example of CNN and Al Jazeera
DOI: 10.7256/2454-0749.2025.12.77328EDN: PXYKFTReceived: 12/16/2025Published: 12/25/2025Abstract: The subject of the research is the application of computational methods in the comparative analysis of media narratives that shape representations of armed conflicts in the global media landscape. The object of the study consists of news materials from the international media outlets CNN and Al Jazeera related to the Israeli-Palestinian conflict. The author analyzes aspects of the topic such as the operationalization of theoretical concepts in media analysis, the comparison of narrative strategies across different media, and the possibilities of scalable analysis of media discourse. Special attention is given to how media not only reflect events but also construct meaning models of conflict through framing, tonality, and attribution of agency. The article examines differences in thematic priorities, emotional coloring, and narrative structure in the materials from the two media systems that represent different political and institutional contexts. It discusses the limitation of traditional qualitative analysis methods when working with large text corpora and the necessity of integrating computational tools into media research. Thus, the study aims to identify persistent narrative differences and to form a reproducible analytical framework for comparative studies of media discourse on conflicts. The methodology of the research is based on a hybrid approach that combines natural language processing methods, thematic modeling, and interpretative analysis using large language models. The main findings of the conducted research reveal systematic differences in the narrative and discursive strategies of CNN and Al Jazeera when covering the same conflict. It has been established that Al Jazeera's materials are characterized by a consistently negative tonal background and an emphasis on the humanitarian dimension of the conflict, while CNN creates a more fragmented narrative alternating between negative and positive plots related to diplomatic events. The novelty of the research lies in the development and testing of a meta-model for comparative media analysis, which combines quantitative computational methods with qualitative interpretation of media texts. A significant contribution of the author is the operationalization of classic theories of media studies through specific NLP metrics and procedures, which enhances the reproducibility and transparency of the analysis. It has been shown that the integration of large language models expands the possibilities for semantic comparison of narratives without losing analytical depth. The conclusion is made regarding the continued influence of institutional factors in media systems on the formation of media reality even in the context of digitalization and automation of analysis. Keywords: comparative media analysis, computational hermeneutics, natural language processing, large language models, media narratives, framing, operationalization, meta-model, conflict discourse, media analysisThis article is automatically translated. Introduction The modern global media sphere demonstrates the intensification of information confrontation, in which various media organizations construct competing narrative realities. The Middle East conflict represents a paradigmatic case of a mediatized confrontation, where the information dimension becomes no less significant than the military-political one [25]. Al Jazeera and CNN, as global media brands, broadcast diametrically opposed geopolitical positions: the Qatari channel focuses on the Palestinian perspective [6], the American corporation supports the strategic alliance of the United States with Israel. The relevance of the study is determined by a number of factors. First, the critical theory of media proceeds from the fact that the media are not a neutral mirror of reality, but act as an element of institutional power that shapes public opinion and ideology, constructing meanings through the selection and emphasis of certain frames and thereby participating in the production of social reality, and not just its reflection [15, 14]. Secondly, traditional qualitative media analysis methods have limited scalability, which hinders the processing of large corpora. Thirdly, modern computational methods integrate Natural Language Processing technologies, thematic modeling and large language models (LLM), providing innovative tools for large-scale analysis of media discourse [12, 8, 24]. The analysis of current works demonstrates the lack of research that operationalizes the theoretical concepts of media analysis through computational methods in the Russian academic tradition. Foreign works on computational media studies focus on automated content analysis [9], coding using large language models (LLM-assisted coding) [10], sentiment analysis of large datasets [17]. The Russian-speaking field is dominated by theoretical and review works on framing/agenda (agenda-setting) [5, 2, 3, 4], whereas systematic operationalization through NLP/LLM is still found on a spot [1]. The scientific novelty of the present study lies in the development of a meta-model for the comparative study of media narratives, integrating critical media theory, computational social science and hermeneutic interpretation. An attempt is made to implement a systematic operationalization of media analysis based on the synthesis of NLP libraries with generative AI models, ensuring reproducibility with theoretical validity of the conclusions. The aim of the research is to develop and empirically validate a meta-model for comparative analysis of media narratives using computational hermeneutics. The research objectives include: theoretical substantiation of the concept of computational hermeneutics in media analysis; operationalization of framing, the theory of "agenda" and propaganda models through NLP metrics; development of the architecture of a hybrid analytical platform; empirical testing of the meta-model based on the comparative analysis of Al Jazeera and CNN. The theoretical significance is determined by the validation of classical media research paradigms in the context of the digital media environment and the expansion of the methodological tools of computational media research. The practical significance is due to the development of a set of software tools for automated comparative analysis, relevant for journalistic practice, media education and political analytics. Research methodology The methodological framework of the research is based on the concept of computational hermeneutics[1] ("computational hermeneutics") – interpretation through computation, and synthesizes objectifying NLP procedures with a hermeneutic understanding of semantic structures [27]. The meta-model of comparative research is structured in four interrelated dimensions. The ontological dimension postulates that media narratives do not reflect, but constitute, social reality through selection, framing, and attribution of meanings to events. The epistemological dimension is based on methodological pluralism: quantitative NLP methods objectify patterns of the surface structure of the text, qualitative interpretation reveals deep semantic structures, LLM-mediated analysis implements scalable hermeneutics. The methodological dimension operationalizes three levels of comparison. The first level represents a quantitative comparison of linguistic patterns: the frequency distribution of lexical units, the tonality of discourse through the VADER sentiment analyzer [17], the density of named entities through spaCy Named Entity Recognition[2], thematic clusters through Latent Dirichlet Allocation[7]. The second level includes semantic comparison of narrative structures: dominant frames according to the R. Entman model [14], attribution of agency, causal schemes, modality of statements. The third level represents a pragmatic comparison of discursive effects: legitimizing strategies, constructing an enemy image, and forming affective communities. The institutional dimension integrates the structural analysis of media systems through the propaganda model of E. Herman and N. Chomsky [15]: filters of ownership, advertising, sources, ideology are operationalized as independent variables that determine narrative production. Comparative analysis reveals how different filtering configurations produce divergent narratives for Al Jazeera (government funding, regional geopolitical position) and CNN (corporate ownership, advertising dependence, alignment with American foreign policy). The research is implemented through a two-component analytical platform that integrates computational linguistics methods, thematic modeling, and transformeroriented semantic comparison. The first module presents a web parsing and NLP processing system in Python using the following libraries: requests for HTTP communication, beautifulsoup4 for DOM structure parsing, newspaper3k for structured content extraction, spaCy for linguistic annotation, VADER [17] for sentiment analysis, geopy for geocoding, gensim for thematic modeling using latent Dirichlet placement (LDA). The second module is a React-based web application for comparative AI analysis using the Google Gemini API. The empirical basis of the study was a two-component corpus of news materials from the "Middle East" sections of the Al Jazeera and CNN news agencies for October 31, 2025[3]. Case parameters: Al Jazeera – 15 publications, CNN – 20 publications, total volume – 35 news items, total tokenized length – approximately 89,000 tokens. The representativeness of the sample is ensured by the temporal identity of the collection period and the completeness of the coverage of publications of the relevant section for the analyzed period. The data collection procedure included: HTTP request to target pages of the "Middle East" categories; parsing via BeautifulSoup using specific selectors; filtering relevant URLs by structural patterns; asynchronous processing via ThreadPoolExecutor; structured data extraction via newspaper3k. The linguistic annotation included: Named Entity Recognition via spaCy en_core_web_sm to identify personalities, geopolitical units, and organizations; Sentiment Analysis via VADER to calculate a composite tonality score in the range from -1 to +1 (sentiment score); geocoding via the Nominatim API to convert geographical entities into coordinates; thematic modeling using LDA with parameters: the number of topics K is five, the iteration passes is ten, the Variational Bayes algorithm. LLM integration was carried out through the Google Gemini API with the configuration: gemini-2.5-pro model, temperature 0.2 to minimize stochasticity, responseMimeType application/json for structured output, responseSchema for validating the response structure. The industrial architecture is structured in four sections: role specification (expert in media analysis of Middle Eastern journalism), task formulation (identification of dominant narratives, comparison of sentiment scores, analysis of keyword divergence), data injection (serialized corpora in JSON format), definition of the output format (JSON Schema specification with the fields overallSummary, alJazeeraAnalysis, cnnAnalysis, comparisons, comparativeInsight). Validation of the results was carried out through computational triangulation – convergence of conclusions of various methods. If statistical NLP analysis, LDA topic modeling, and LLM interpretation produce congruent conclusions about narrative differences, the validity of the conclusion increases. The divergence of methods indicates the need for an in-depth qualitative interpretation to resolve contradictions. Results The analysis of tonality using the VADER sentiment analyzer revealed a systematic divergence of the emotional polarity of texts[4]. The average tone of Al Jazeera publications shows a high degree of negativity: M = - 0.67, SD is 0.45. The average tonality of CNN is M = - 0.24, SD = 0.78. When compared with the absolute value of the average tonality of Al Jazeera materials (the modulus of which is about 2.8 times higher than 0.24), it can be concluded that CNN discourse turned out to be "~180% less negative", in other words, the negative bias of Al Jazeera is almost 3 times exceeds the negative of CNN. At the same time, it is important to emphasize that we are talking about comparing the modules of average values, and not about a direct percentage reduction in negative tonality. The categorical distribution of tonality demonstrates that Al Jazeera is characterized by monotonous negative discourse: 93.3% of publications are classified as highly negative, which indicates a strategy for constructing an image of humanitarian catastrophe and systemic violence. CNN demonstrates a bipolar distribution: 60% of highly negative materials plus 35% of highly positive publications, which indicates an editorial policy of balancing critical reporting (strikes, casualties) and positive diplomatic narratives (truce, hostage release). The absence of a gray area in both buildings indicates a high degree of polarization of the media discourse. The most negative Al Jazeera publication received a negative 0.9985 with the headline "Acute trauma: the indelible wounds of Gaza's children from Israel's war" (Acute trauma: The ever-present wounds of Gaza's children from Israel's war), focusing on the clinical description of the psychological trauma of the child population using medical terminology to enhance the emotional impact. The most positive CNN publication received a plus 0.9983 tone with the headline "As Israel celebrates the hostages' return, Trump basks in the glory" (As Israel celebrates the hosts' homecoming, Trump baskets in the spotlight), focusing on diplomatic triumph, emotions of liberation, and the role of the United States as a mediator. Frequency analysis of named entities through spaCy NER revealed contrasting geopolitical optics of the sources[5]. Al Jazeera demonstrates the dominance of Israel (70 mentions) as the primary aggressor actor, high representation of RSF (Rapid Support Forces) and El Fasher (35 plus 34 mentions, respectively) how to focus on the Sudanese conflict, the Palestinians (18 mentions) as a collective sacrifice, Al Jazeera self-reference (20 mentions) as a meta-discursive marker of one's own role as an information actor. CNN shows the dominance of Gaza (207 mentions) and Hamas (188 mentions) as a territorial and organizational focus, not an ethnic one, D. Trump (63 mentions) Focusing on the role of the United States as a diplomatic mediator, Iran (64 mentions) as a geopolitical extension of the narrative to regional players, the UN (35 mentions) as an emphasis on international institutions. A contrasting discursive structure is manifested in chains of actors: Al Jazeera constructs the sequence Israel – US – Palestinians as oppressor – ally – victim, CNN constructs the sequence Gaza – Hamas – Israel as territory – organization –state. This differentiation indicates the different attribution modes of agency and patient in media narratives. The extraction of keywords by the TF-IDF algorithm revealed semantic patterns of lexical priorities[6]. Al Jazeera demonstrates the vocabulary of the humanitarian crisis: "prisoners" (5 mentions), "bodies" (5), "killed" (3) as the dominance of lexical units of death and imprisonment; "cease-fire" (8), "truce" (truce) (6) as a focus on ending violence as an imperative. CNN demonstrates the vocabulary of the diplomatic process: "hostages" (10 mentions) quantitatively exceeds "prisoners" (prisoners) as a terminological framework legitimizing Israeli operations through the semantics of "hostages" (hosts), implying the legal validity of the release; "deal" (3), Trump (trump) (3) as the commercialization of diplomacy, the personification of the process; "war" (7) as an explicit designation of conflict by military action. The terminological divergence of "prisoners" versus "hostages" represents a semantic struggle for legitimacy: prisoners depersonalizes and criminalizes the status of detainees, implying the illegality of detention; in turn, hosts legally justifies the Israeli position through the framework of anti-terrorism, constructing a legal reality where military operations are legitimate as the release of prisoners. Thematic modeling using LDA with parameters K equal to five, passes equal to ten[7] revealed structural differences in latent topics. Al Jazeera constructs a multilocation narrative: "The humanitarian crisis in Gaza" as a dominant topic (probability 0.030 for Israel, 0.023 for nuclear), "Conflict in Sudan/Darfur" (Sudan Darfur conflict) as a parallelism of humanitarian disasters, "Lebanon/Hezbollah dynamics" (Lebanon Hezbollah dynamics) as a regional extension, "Middle East geopolitics" as contextualization, "Journalist targeting" as a meta-discourse of repression against the media. CNN demonstrates a monofocus structure: all five topics are centered on the Israel–Hamas–Gaza triad with varying accents – "Hostage negotiations", "Gaza military operations", "Trump diplomacy", "the implementation of the cease-fire regime" (Ceasefire implementation), "Regional security threats" (Regional security threats). Geographical analysis through Nominatim geocoding revealed spatial discourse[8]. Al Jazeera demonstrates a dispersed geography: "Sudan" (Sudan) (3 mentions), "Darfur" (Darfur) (2), "Khartoum" (Khartoum) (2), "North Darfor" (North Darfor) (2) – 9 mentions of Sudanese locations, which indicates a strategy of conflict parallelism – "Gaza" and "Darfur" as equivalent humanitarian crisis zones. CNN demonstrates concentrated geography: "Gaza" (15 mentions), "Gaza City" (10), "Gaza Strip" (5) – 30 mentions of Gaza-territories, detailing Israel through "Tel Aviv" (6 mentions) and "Jerusalem Jerusalem (5) as a representation of a differentiated space, diplomatic hubs through Egypt (9 mentions) and Qatar (4) as an emphasis on intermediary states. The integration of AI analytics through the Google Gemini API with a temperature of 0.2 revealed implicit narrative structures[9]. Constructing agency: Al Jazeera attributes active agency to Israel as a subject of violence, the Palestinians are represented as objects of suffering; CNN distributes agency among Hamas as the initiator, Israel as a reacting actor, D. Trump as a mediator. The temporality of the narrative: Al Jazeera constructs the continuum of trauma through the discursive markers "omnipresent wounds", "ongoing" (ever-present wounds, ongoing), the historical context of colonialism; CNN creates the discreteness of events through the markers "truce enters the second week", "after the agreement" (ceasefire enters second week, following agreement), the procedural nature of negotiations. Legitimization strategies: Al Jazeera legitimizes resistance through the dehumanization of the occupier through the terms "no mercy", "mass killings"; CNN legitimizes Israeli actions through the legal framework "after violating the ceasefire", "staging hostage discovery". Discussion of the results The results obtained demonstrate the heuristic validity of the developed meta-model of the comparative study of media narratives. The convergence of the conclusions of three methodological paradigms – statistical NLP analysis, thematic modeling, and LLM interpretation - confirms the concept of computational triangulation as a validation mechanism in computational media research. The quantitative metrics of tonality correlate with the qualitative findings of the Gemini analysis.: A statistically significant differentiation of the average tonality scores (minus 0.67 versus minus 0.24, t-test p less than 0.001) corresponds to the LLM identification of humanitarian disaster construction versus diplomatic management. The results confirm the theory of framing in the context of the digital media environment. Al Jazeera implements a framework of systemic violence: diagnosing the problem as "genocide of the Palestinian people," attributing causality to "Israel with the support of the United States," normative assessment as "morally unacceptable," prescribing solutions as "ending the occupation." CNN implements a framework of geopolitical management: diagnosing the problem as a "protracted military conflict," attributing causality to "Hamas initiator – Israel reacting actor," normative assessment as "tragic but requires an understanding of security concerns," prescribing solutions as "a diplomatic settlement mediated by the United States." This divergence validates the thesis that frames are determined not only by what they include, but also by what they omit, registering the "imprint of power" in news texts. The propaganda model (Herman and Chomsky) demonstrates sustained relevance for explaining systematic patterns of media production. CNN has classic corporate filters: Warner Bros. property Discovery creates shareholder pressure in the context of profitability, which intensifies the focus on "advertiser-friendly content"; advertising dependence creates economic pressure on content that attracts a wealthy audience; dependence on official sources is manifested in the predominance of quoting Israeli government agencies, the Israel Defense Forces (IDF), the government, and American diplomats The ideology of fear is realized through the "terrorist framing of Hamas." Modified filtering dynamics operate for Al Jazeera: government funding by the government of Qatar eliminates advertising dependence but creates alignment with the geopolitical interests of a regional power; sources are mainly Palestinian witnesses, UN reports, human rights organizations; ideology is constructed through a framework of anti-imperialism and support for national liberation movements. The systematic action of these filters produces different information realities for audiences, which confirms the model's thesis about the media as ideological apparatuses of influence groups (states). The theory of setting the second-level agenda [20] is validated through differential attribution of characteristics to actors. Al Jazeera attributes to Israel the characteristics "aggressive", "occupier", "intruder" (perpetrator) through a high-frequency combination with the lexemes "merciless" (no mercy), "mass killings" (mass murders); Palestinians attribute the characteristics: "victims" (victims), "traumatized", "suffering" through collocations with "acute trauma", "indelible wounds". CNN attributes to Hamas the characteristics of "violator", "militant", "hostage-taker" through the collocations "violating ceasfire", "staging discovery"; Israel attributes the characteristics of "reactionary subject" (respondent), "security-concerned" (security-concerned), "negotiator" (negotiator); D. Trump (Trump) attributes the characteristics of "dealmaker" (dealmaker), "mediator" (mediator), "seeking attention, seeking publicity" (spotlight-seeker). This attributive differentiation not only determines what the audience thinks about, but also how to think about these actors. The methodological innovation of the research is the operationalization of the concept of computational hermeneutics through a hybrid workflow. Automated NLP procedures ensure the scalability of large corpus analysis while objectifying quantitative patterns, which is unattainable by traditional qualitative methods due to resource constraints. LLM integration provides semantic comparison, identifying articles on identical events even with different terminology "prisoners returned vs hostages released", and framing analysis that recognizes the rhetorical strategies of "victim framing" and "legitimization". Human interpretation provides theoretical validity and critical reflection on the limitations of methods. Triangulation of the three approaches minimizes the weaknesses of each: NLP provides objective metrics but does not interpret semantics, LLM provides depth but can hallucinate, a person introduces theory but is subjective. The limitations of the study include the small volume of the corpus (35 articles), which requires validation on large datasets for statistical power of conclusions; the one-time slice on October 31, 2025, which does not take into account the diachronic evolution of the narrative; the English-language focus, which does not analyze Arabic-language publications of Al Jazeera Arabic, which may demonstrate a different narrative; the lack of multimodal analysis of visual content (photographs, video), which limits the completeness of understanding media structures; limitations of VADER for detecting irony and sarcasm, which requires the use of BERT-based sentiment models in future research. Comparison with the results of other researchers demonstrates the convergence of conclusions. M. Elmasry found a 60-fold difference in framing Israeli actions as self-defense versus Palestinian as aggression in Instagram[10]-Western media posts, which corresponds to our conclusions about the terminological struggle of prisoners versus hosts [13]. K. El Damanhouri and F. Saleh in their study established that Al Jazeera America quoted only Palestinian citizens and always distinguished between militants and civilians when covering Palestinian casualties. At the same time, about 15% of CNN articles did not specify whether the deceased was a militant or a civilian [11], which confirms our conclusion about the humanitarization of victims in the Al Jazeera discourse. Hossain et al. In the computer text analysis of Facebook[11] posts, differential framing based on organizational filters was identified, which validates the applicability of the propaganda model to modern digital media [16]. The theoretical implications of the research include the expansion of the conceptual apparatus of computational media research through the introduction of the term computational hermeneutics as a methodological paradigm synthesizing quantitative and qualitative approaches; the operationalization of classical theories of media analysis (framing, agenda, propaganda model) through NLP metrics and LLM interpretation, which provides empirical testability of theoretical postulates; the development of a meta-model of comparative research of media narratives as a theoreticala methodological framework for reproducible analysis. Practical implications include the development of software tools for automated collection, processing and comparative analysis of news corpuses, which can be adapted for the study of other media conflicts; the relevance of the results for journalistic practice through increased reflexivity in relation to their own framing strategies and structural limitations of media production; applicability to media education through the development of critical media literacy of audiences aware of the media construction of reality; relevance for diplomatic intelligence through the understanding of media frames as indicators of the geopolitical positions of states. Conclusion In this study, a meta-model for comparative analysis of media narratives was developed and empirically validated using computational hermeneutics that integrates Natural Language Processing, thematic modeling, and LLM-mediated semantic comparison. The key results demonstrate a systematic divergence of the media narratives of Al Jazeera and CNN: tonal differentiation is 180% (minus 0.67 versus minus 0.24), the terminological struggle of prisoners versus hosts manifests competing frames of legitimacy, attribution of agency constructs different causal schemes (passive victim versus active aggressor in Al Jazeera, the mutual agency of Hamas – Israel –Trump in CNN), geographical ideology is implemented through the dispersive multilocation of Al Jazeera (Sudan, Lebanon) versus the concentrated Gas-centric structure of CNN. The developed meta-model provides a theoretical and methodological framework for reproducible analysis of media structures of reality through four dimensions: ontological (media narratives constitute reality), epistemological (methodological pluralism), methodological (three levels of comparison of linguistic patterns, narrative structures, discursive effects), institutional (filters of the propaganda model). Operationalization is carried out through a two-component analytical platform that automates web parsing, NLP annotation, LDA topic modeling, and LLM comparison. The methodological innovation consists in the introduction of the concept of computational hermeneutics as a synthesis of objectifying NLP procedures with a hermeneutic understanding of semantic structures, implemented through computational triangulation – the convergence of conclusions from statistical analysis, thematic modeling and AI interpretation. This approach overcomes the limitations of traditional methods: automation provides scalability, LLMs provide depth of semantic analysis, and human interpretation provides theoretical validity. The theoretical significance is determined by the validation of classical media research paradigms (Entman framing, McCombs second-level agenda-setting, Herman-Chomsky propaganda model) in the context of the digital media environment, which confirms the stable explanatory power of these theories. The structural filters of media systems continue to determine information production despite technological democratization: algorithms and platform ownership create new filtering mechanisms in addition to the classic corporate and government ones. Prospects for further research include expanding the corpus to 500 plus articles over a three-month period for statistical power of conclusions; longitudinal design with weekly sampling to track the diachronic evolution of narrative; integration of multimodal analysis of visual content through CLIP (Contrastive Language-Image Pre-training); application of BERT-based sentiment models to overcome the limitations of VADER in detecting irony; cross-national expansion on BBC, Reuters, RT, Al Arabiya to identify global media framing patterns; integration of multilingual NLP models (XLM-RoBERTa) to analyze Arabic-language publications. The final thesis of the study is that in the context of the global mediatization of conflicts, information warfare is becoming no less significant than military warfare. Al Jazeera and CNN don't just report on conflict – they constitute different interpretations of conflict through divergent narrative practices, forming incompatible epistemological realities for their audiences. Computational hermeneutics as a methodological paradigm provides a scalable study of these media structures while maintaining theoretical validity and interpretative depth, which expands the toolkit of contemporary computational media studies. [1] Computational hermeneutics is an iterative method of text analysis that combines computing techniques and hermeneutic understanding to identify hidden semantic structures and interpret discourse through the correlation of parts and the whole. See [21, 19, 23] [2] spaCy-NER and named entity density are used at the NLP toolkit level, see [26]. [3] The choice of a one-day period (31.10.2025) is due to the need to fix a coherent media context that excludes the cumulative effects of the news cycle and calendar distortions, as well as the representativeness of the point of high geopolitical saturation. The analysis of the Middle East section was chosen due to its discursive density, conflict sensitivity, and global significance, which makes it possible to identify the structural mechanisms of framing, semantic divergence, and narrative models in a synchronous media stream. This design conforms to the procedures of event-based sampling, micro-temporal discourse analysis, and computational hermeneutics, ensuring experimental and methodological purity and comparability of discursive patterns. See [22, 18] [4] VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon- and rule-oriented algorithm for tonality analysis optimized for media and news texts. He assigns values of positive, negative, and neutral valence to each utterance, as well as an integral composite index. This allows us to identify the direction and intensity of emotional assessments in the corpus and compare them between sources. [5] Named Entity Recognition (NER) in the spaCy library automatically identifies and classifies entities in the text (for example, states, cities, organizations, persons). Frequency counting of detected entities makes it possible to capture thematic focuses and map the geopolitical accents of sources, correlating the intensity of mentions with the narrative orientation of the corpus. [6] The TF-IDF (Term Frequency – Inverse Document Frequency) algorithm calculates the significance of words in a corpus, increasing the weight of lexemes specific to a particular document and decreasing the weight of lexemes often found in other texts. Thus, the key terms reflecting the local thematic accents and semantic priorities of the discourse are highlighted. [7] The Latent Dirichlet Allocation (LDA) algorithm, a probabilistic model that distributes words and documents by latent topics, was used to extract hidden thematic structures. The parameter K = 5 fixes the number of allocated topics, ensuring the interpretability of a limited volume corpus; the parameter passes = 10 means a tenfold complete passage through the data to stabilize the distributions and improve the convergence of the model. [8] Nominatim is a geocoder that uses OpenStreetMap data to convert geographical mentions to coordinates and vice versa. Automatic comparison of toponyms with spatial metadata allows you to reconstruct the geographical configuration of events and visualize the territorial focus of the media discourse. [9] The Google Gemini model is used in the controlled generation mode with the parameter temperature = 0.2, which minimizes the stochasticity of the output and enhances the determinacy of interpretations. The low temperature provides a more stable identification of hidden semantic connections and stable narrative patterns, allowing you to capture the implicit structures of discourse while maintaining strict semantic discipline of response. Instagram Facebook [10] The American multinational holding company Meta Platforms inc (Facebook and Instagram social networks) has been recognized as extremist and its activities are prohibited in the Russian Federation. Instagram Facebook and Instagram [11] The American multinational holding company Meta Platforms inc (Facebook and Instagram social networks) has been recognized as extremist and its activities are banned in the Russian Federation.
The article is published in its final version as approved following the last positive peer review recommending acceptance for publication. It incorporates revisions made by the author in response to prior negative peer review reports that did not recommend publication. All peer review reports, including initial negative reviews, are published in open access alongside the article. All versions of the author’s revisions are archived in the publisher’s repository and may be made available upon reasonable request in accordance with Elsevier’s editorial policies and applicable data availability requirements. References
1. Alekyan, M. V., & Erisyanz, D. E. (2024). The application of the automated sentiment analysis method for studying the specifics of Telegram channel content. Systemic Transformations of Journalism, 5-6.
2. Aslanov, I. A. (2020). Metaphorical framing in media texts and communication about depression: Results of content analysis and experiment. Bulletin of Moscow University. Series 10: Journalism, 6, 3-22. https://doi.org/10.30547/vestnik.journ.6.2020.322 3. Dunas, D. V., Salikhova, E. A., Tolokonnikova, A. V., & Babyna, D. A. (2022). Establishing the agenda and the framing effect: On the necessity of conceptual unity in media research of "digital youth." Bulletin of Moscow University. Series 10: Journalism, 4, 47-78. https://doi.org/10.30547/vestnik.journ.4.2022.4778 4. Dunas, D. V., Tolokonnikova, A. V., Babyna, D. A., Boiko, O. A., & Sidorov, E. A. (2025). The agenda in social media and federal news agencies: The thematic gap and semantic unity (on the example of "digital youth" in Russia). Journal of Siberian Federal University. Humanities, 18(1), 178-193. 5. Kazakov, A. A. (2015). Agenda-setting theory vs. framing: On the correlation of approaches. Politia: Analysis. Chronicle. Forecast, 1(76), 103-113. 6. Sokolov, A. V. (2024). Contextualization of the pan-Arab narrative "Al-Jazeera": Thematic classification, framing, and sentiment analysis of news content. Journalist. Social Communications, 4(56), 97-110. 7. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022. 8. Bojić, L., Zagovora, O., Zelenkauskaite, A., et al. (2025). Comparing large language models and human annotators in latent content analysis of sentiment, political leaning, emotional intensity, and sarcasm. Scientific Reports, 15, Article 11477. https://doi.org/10.1038/s41598-025-96508-3 9. Burggraaff, C., & Trilling, D. (2020). Through a different gate: An automated content analysis of how online news and print news differ. Journalism, 21(1), 112-129. https://doi.org/10.1177/1464884917716699 10. Chew, R., Bollenbacher, J., Wenger, M., Speer, J., & Kim, A. (2023). LLM-assisted content analysis: Using large language models to support deductive coding. arXiv. https://doi.org/10.48550/arXiv.2306.14924 11. Damanhoury, K. E., & Saleh, F. (2017). Is it the same fight? Comparative analysis of CNN and Al Jazeera America's online coverage of the 2014 Gaza War. Journal of Arab & Muslim Media Research, 10(1), 85-103. https://doi.org/10.1386/jammr.10.1.85_1 12. Dunivin, Z. O. (2025). A computational qualitative approach to large-scale characterization of cultural identities on social media. SocArXiv. https://doi.org/10.31235/osf.io/5wvhx_v1 13. Elmasry, M. H. (2024). Images of the Israel-Gaza war on Instagram: A content analysis of Western broadcast news posts. Journalism & Mass Communication Quarterly, 102(3), 695-721. https://doi.org/10.1177/10776990241287155 14. Entman, R. M. (1993). Framing: Toward clarification of a fractured paradigm. Journal of Communication, 43(4), 51-58. https://doi.org/10.1111/j.1460-2466.1993.tb01304.x 15. Herman, E. S., & Chomsky, N. (2002). Manufacturing consent: The political economy of the mass media. Pantheon Books. 16. Hossain, A., Abdul Wahab, J., & Khan, M. S. R. A. (2022). A computer-based text analysis of Al Jazeera, BBC, and CNN news shares on Facebook: Framing analysis on COVID-19 issues. SAGE Open, 12(1). https://doi.org/10.1177/21582440211068497 17. Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Weblogs and Social Media, 8(1), 216-225. https://doi.org/10.1609/icwsm.v8i1.14550 18. Kim, H., Jang, S. M., Kim, S.-H., & Wan, A. (2018). Evaluating sampling methods for content analysis of Twitter data. Social Media + Society, 4(2). https://doi.org/10.1177/2056305118772836 19. Kommers, S., et al. (2025). Computational hermeneutics: Evaluating generative AI as a cultural technology. SSRN. https://doi.org/10.2139/ssrn.5409144 20. McCombs, M., Llamas, J. P., Lopez-Escobar, E., & Rey, F. (1997). Candidate images in Spanish elections: Second-level agenda-setting effects. Journalism & Mass Communication Quarterly, 74, 703-717. https://doi.org/10.1177/107769909707400404 21. Mohr, J. W., Wagner-Pacifici, R., & Breiger, R. L. (2015). Toward a computational hermeneutics. Big Data & Society, 2(2). https://doi.org/10.1177/2053951715613809 22. Pavelko, R. L., & Grabe, M. E. (2017). Sampling, content analysis. In The International Encyclopedia of Communication Research Methods (pp. 1-10). Wiley-Blackwell. https://doi.org/10.1002/9781118901731.iecrm0223 23. Picca, D., Schnyder, A., & Romele, A. (2024). Computational hermeneutics of emotion: A comparative study of emotional landscapes in Dostoevsky's Crime and Punishment. Humanities and Social Sciences Communications, 11, Article 1428. https://doi.org/10.1057/s41599-024-03955-w 24. Schroeder, H., Aubin Le Quéré, M., Randazzo, C., Mimno, D., & Schoenebeck, S. (2025). Large language models in qualitative research: Uses, tensions, and intentions. Proceedings of the CHI Conference on Human Factors in Computing Systems, 1-15. 25. Seib, P. (2008). The Al Jazeera effect: How the new global media are reshaping world politics. Potomac Books. 26. Taher, H. A., Alabid, N., & Hasan, B. M. (2025). Integration named entity recognition and latent Dirichlet allocation to enhance topic modeling. Annals of Emerging Technologies in Computing, 9(2), 20-30. https://doi.org/10.33166/AETiC.2025.02.002 27. van Atteveldt, W., Welbers, K., & Van der Velden, M. A. C. G. (2019). Studying political decision making with automatic text analysis. In Oxford Research Encyclopedia of Politics (pp. 1-11). Oxford University Press. https://doi.org/10.1093/acrefore/9780190228637.013.957
First Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
Second Peer Review
Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
|
| We use cookies to make your experience of our websites better. By using and further navigating this website you accept this. | Accept and Close |
