Social and Information Networks
See recent articles
Showing new listings for Thursday, 12 June 2025
- [1] arXiv:2506.09764 [pdf, html, other]
-
Title: Alice and the Caterpillar: A more descriptive null model for assessing data mining resultsJournal-ref: Knowledge and Information Systems, 2024Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.
- [2] arXiv:2506.09866 [pdf, html, other]
-
Title: ELRUHNA: Elimination Rule-basedHypergraph AlignmentSubjects: Social and Information Networks (cs.SI)
Hypergraph alignment is a well-known NP-hard problem with numerous practical applications across domains such as bioinformatics, social network analysis, and computer vision. Despite its computational complexity, practical and scalable solutions are urgently needed to enable pattern discovery and entity correspondence in high-order relational data. The problem remains understudied in contrast to its graph based counterpart. In this paper, we propose ELRUHNA, an elimination rule-based framework for unsupervised hypergraph alignment that operates on the bipartite representation of hypergraphs. We introduce the incidence alignment formulation, a binary quadratic optimization approach that jointly aligns vertices and hyperedges. ELRUHNA employs a novel similarity propagation scheme using local matching and cooling rules, supported by an initialization strategy based on generalized eigenvector centrality for incidence matrices. Through extensive experiments on real-world datasets, we demonstrate that ELRUHNA achieves higher alignment accuracy compared to state-of-the-art algorithms, while scaling effectively to large hypergraphs.
New submissions (showing 2 of 2 entries)
- [3] arXiv:2506.09221 (cross-list from cs.IR) [pdf, other]
-
Title: In Crowd Veritas: Leveraging Human Intelligence To Fight MisinformationComments: PhD thesis, University of Udine, defended May 2023, 458 pagesSubjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The spread of online misinformation poses serious threats to democratic societies. Traditionally, expert fact-checkers verify the truthfulness of information through investigative processes. However, the volume and immediacy of online content present major scalability challenges. Crowdsourcing offers a promising alternative by leveraging non-expert judgments, but it introduces concerns about bias, accuracy, and interpretability. This thesis investigates how human intelligence can be harnessed to assess the truthfulness of online information, focusing on three areas: misinformation assessment, cognitive biases, and automated fact-checking systems. Through large-scale crowdsourcing experiments and statistical modeling, it identifies key factors influencing human judgments and introduces a model for the joint prediction and explanation of truthfulness. The findings show that non-expert judgments often align with expert assessments, particularly when factors such as timing and experience are considered. By deepening our understanding of human judgment and bias in truthfulness assessment, this thesis contributes to the development of more transparent, trustworthy, and interpretable systems for combating misinformation.
- [4] arXiv:2506.09726 (cross-list from eess.SP) [pdf, html, other]
-
Title: Don't be Afraid of Cell Complexes! An Introduction from an Applied PerspectiveComments: Preprint version, comments welcome!Subjects: Signal Processing (eess.SP); Computational Geometry (cs.CG); Social and Information Networks (cs.SI); Algebraic Topology (math.AT)
Cell complexes (CCs) are a higher-order network model deeply rooted in algebraic topology that has gained interest in signal processing and network science recently. However, while the processing of signals supported on CCs can be described in terms of easily-accessible algebraic or combinatorial notions, the commonly presented definition of CCs is grounded in abstract concepts from topology and remains disconnected from the signal processing methods developed for CCs. In this paper, we aim to bridge this gap by providing a simplified definition of CCs that is accessible to a wider audience and can be used in practical applications. Specifically, we first introduce a simplified notion of abstract regular cell complexes (ARCCs). These ARCCs only rely on notions from algebra and can be shown to be equivalent to regular cell complexes for most practical applications. Second, using this new definition we provide an accessible introduction to (abstract) cell complexes from a perspective of network science and signal processing. Furthermore, as many practical applications work with CCs of dimension 2 and below, we provide an even simpler definition for this case that significantly simplifies understanding and working with CCs in practice.
- [5] arXiv:2506.09947 (cross-list from cs.CY) [pdf, html, other]
-
Title: KI4Demokratie: An AI-Based Platform for Monitoring and Fostering Democratic DiscourseRudy Alexandro Garrido Veliz, Till Nikolaus Schaland, Simon Bergmoser, Florian Horwege, Somya Bansal, Ritesh Nahar, Martin Semmann, Jörg Forthmann, Seid Muhie YimamSubjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Social media increasingly fuel extremism, especially right-wing extremism, and enable the rapid spread of antidemocratic narratives. Although AI and data science are often leveraged to manipulate political opinion, there is a critical need for tools that support effective monitoring without infringing on freedom of expression. We present KI4Demokratie, an AI-based platform that assists journalists, researchers, and policymakers in monitoring right-wing discourse that may undermine democratic values. KI4Demokratie applies machine learning models to a large-scale German online data gathered on a daily basis, providing a comprehensive view of trends in the German digital sphere. Early analysis reveals both the complexity of tracking organized extremist behavior and the promise of our integrated approach, especially during key events.
Cross submissions (showing 3 of 3 entries)
- [6] arXiv:2404.05861 (replaced) [pdf, html, other]
-
Title: The increasing fragmentation of global science limits the diffusion of ideasComments: 37 pages (main text), 4 figures (main text), 1 table (main text)Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Global science is often portrayed as a unified system of shared knowledge and open exchange. Yet this vision contrasts with emerging evidence that scientific recognition is uneven and increasingly fragmented along regional and cultural lines. Traditional models emphasize Western dominance in knowledge production but overlook regional dynamics, reinforcing a core-periphery narrative that sustains disparities and marginalizes less prominent countries. In this study, we introduce a rank-based signed measure of national citation preferences, enabling the construction of a global recognition network that distinguishes over- and under-recognition between countries. Using a multinomial logistic link prediction model, we assess how economic, cultural, and scientific variables shape the presence and direction of national citation preferences. We uncover a global structure composed of multiple scientific communities, characterized by strong internal citation preferences and negative preferences between them-revealing growing fragmentation in the international scientific system. A separate weighted logistic regression framework suggests that this network significantly influences the international diffusion of scientific ideas, even after controlling for common covariates. Together, these findings highlight the structural barriers to equitable recognition and underscore the importance of scientific community membership in shaping influence, offering valuable insights for policymakers aiming to foster inclusive and impactful global science.
- [7] arXiv:2307.14530 (replaced) [pdf, html, other]
-
Title: Optimal Noise Reduction in Dense Mixed-Membership Stochastic Block Models under Diverging Spiked Eigenvalues ConditionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.
- [8] arXiv:2501.06137 (replaced) [pdf, html, other]
-
Title: Supervision policies can shape long-term risk management in general-purpose AI modelsComments: 24 pages, 14 figuresSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
The rapid proliferation and deployment of General-Purpose AI (GPAI) models, including large language models (LLMs), present unprecedented challenges for AI supervisory entities. We hypothesize that these entities will need to navigate an emergent ecosystem of risk and incident reporting, likely to exceed their supervision capacity. To investigate this, we develop a simulation framework parameterized by features extracted from the diverse landscape of risk, incident, or hazard reporting ecosystems, including community-driven platforms, crowdsourcing initiatives, and expert assessments. We evaluate four supervision policies: non-prioritized (first-come, first-served), random selection, priority-based (addressing the highest-priority risks first), and diversity-prioritized (balancing high-priority risks with comprehensive coverage across risk types). Our results indicate that while priority-based and diversity-prioritized policies are more effective at mitigating high-impact risks, particularly those identified by experts, they may inadvertently neglect systemic issues reported by the broader community. This oversight can create feedback loops that amplify certain types of reporting while discouraging others, leading to a skewed perception of the overall risk landscape. We validate our simulation results with several real-world datasets, including one with over a million ChatGPT interactions, of which more than 150,000 conversations were identified as risky. This validation underscores the complex trade-offs inherent in AI risk supervision and highlights how the choice of risk management policies can shape the future landscape of AI risks across diverse GPAI models used in society.
- [9] arXiv:2502.01340 (replaced) [pdf, html, other]
-
Title: Human-Agent Interaction in Synthetic Social Networks: A Framework for Studying Online PolarizationSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
Online social networks have dramatically altered the landscape of public discourse, creating both opportunities for enhanced civic participation and risks of deepening social divisions. Prevalent approaches to studying online polarization have been limited by a methodological disconnect: mathematical models excel at formal analysis but lack linguistic realism, while language model-based simulations capture natural discourse but often sacrifice analytical precision. This paper introduces an innovative computational framework that synthesizes these approaches by embedding formal opinion dynamics principles within LLM-based artificial agents, enabling both rigorous mathematical analysis and naturalistic social interactions. We validate our framework through comprehensive offline testing and experimental evaluation with 122 human participants engaging in a controlled social network environment. The results demonstrate our ability to systematically investigate polarization mechanisms while preserving ecological validity. Our findings reveal how polarized environments shape user perceptions and behavior: participants exposed to polarized discussions showed markedly increased sensitivity to emotional content and group affiliations, while perceiving reduced uncertainty in the agents' positions. By combining mathematical precision with natural language capabilities, our framework opens new avenues for investigating social media phenomena through controlled experimentation. This methodological advancement allows researchers to bridge the gap between theoretical models and empirical observations, offering unprecedented opportunities to study the causal mechanisms underlying online opinion dynamics.
- [10] arXiv:2503.03047 (replaced) [pdf, html, other]
-
Title: Stochastic block models with many communities and the Kesten--Stigum boundComments: 46 pages, 1 figure, added discussion and minor corrections, extended abstract in COLT 2025Subjects: Probability (math.PR); Social and Information Networks (cs.SI); Statistics Theory (math.ST)
We study the inference of communities in stochastic block models with a growing number of communities. For block models with $n$ vertices and a fixed number of communities $q$, it was predicted in Decelle et al. (2011) that there are computationally efficient algorithms for recovering the communities above the Kesten--Stigum (KS) bound and that efficient recovery is impossible below the KS bound. This conjecture has since stimulated a lot of interest, with the achievability side proven in a line of research that culminated in the work of Abbe and Sandon (2018). Conversely, recent work by Sohn and Wein (2025) provides evidence for the hardness part using the low-degree paradigm.
In this paper we investigate community recovery in the regime $q=q_n \to \infty$ as $n\to\infty$ where no such predictions exist. We show that efficient inference of communities remains possible above the KS bound. Furthermore, we show that recovery of block models is low-degree hard below the KS bound when the number of communities satisfies $q\ll \sqrt{n}$. Perhaps surprisingly, we find that when $q \gg \sqrt{n}$, there is an efficient algorithm based on non-backtracking walks for recovery even below the KS bound. We identify a new threshold and ask if it is the threshold for efficient recovery in this regime. Finally, we show that detection is easy and identify (up to a constant) the information-theoretic threshold for community recovery as the number of communities $q$ diverges.
Our low-degree hardness results also naturally have consequences for graphon estimation, improving results of Luo and Gao (2024).