Tight Sampling in Unbounded Networks

Last updated Jul 11, 2023

Introduces snowball-type sampling scheme to prioritize cohesive communities around given seed nodes.

# Introduction

# Paragraph 1

Social networks valuable source
Obtaining full-sized networks is impossible
- limited access
- volume
Sampling is a fundamental challenge

# Paragaph 2

Sampling schemes
1. Attribute-based sampling (e.g. demographics or tweet content)
2. Connectivity-based sampling (e.g. seed-set expansion)
Connectivity-based sampling quickly covers distant parts of graph on small-world networks
Unwanted if one wants to sample from cohesive groups (as in local clustering with seeds)

# Paragaph 3

Proposal of sampling scheme that prioritizes sampling within cohesive subgroups
Similar to seeded community detection
- Determine locally dense subgraph containing a set of seed nodes
- Difference: graph is only partially known
Common clustering objectives (e.g. conductance, modularity) are difficult to optimize
- rely on the whole graph (page rank, random walk)

# Paragraph 4

Generalization of maximum adjacencey search
- prominently used to find minimum graph cuts
Replace maximum-adjacency criterion with a weighted interaction-based priority
Iteratively include profiles that show maximum priority with profiles inside
Priority function is iteratively calibrated with empirical data
Repeated until stopping criterion is met

# Paragraph 5

Contributions:
- Snowball-sampling mechanism
- Behavior of sampling scheme vs. other connectivity-based on synthetic data
- Generalize the adjacency criterion
- Provide Twitter Interaction dataset with local clusters

# Node-based sampling methods

Simple: Random vertex sampling
- samples nodes with equal probability
- constructs induced subgraph
- good to estimate i.e. degree distribution but not connectivity
Sophisticated: Page-Rank weighted sampling

# Traversal-based sampling methods

Fix seed nodes and expand on them
Expand by
- Traditonal Graph traversal:
  - BFS: biased towards high-degree nodes
  - DFS
- Snowball sampling
  - based on BFS
  - peripheral nodes have many missing neighbors
- Random Walks
  - modeled by markov chains
  - biased towards high-degree nodes, which many try to solve:
  - Metropolis-Hastings
    - visit low-degree nodes
  - Liu et al.
    - hybrid jump mechanism
    - avoids repetititve sampling within small connected components
  - Forest Fire
    - hyrbird combination of RW/SS sampling
    - “burns down” fraction of outgoing links
  - Maiya et al.
    - uses concepts from expander graphs
    - greedily construct with maximal expansion
    - preserves community structure
  - Zhang et al.
    - based on edge connectivity
    - requires large parts of the graph surrounding the seeded nodes

# Methodology

# Terminologies

Seed users
- Given set of influencers
- Civil society category (DISMISS dataset)
- Denoted by $V_s$
Interactions
- interactions X consists of like, retweet, reply, quote
- interaction $I_t(i, j) = x \in X$
Insiders
- same as frontier
- Denotes as $V_i \supseteq V_s$
Outsiders
- set of nodes that interact with insiders
- Denoted as $V_o$ disjoint $V_i$

# Priority scheme

# Data

DISMISS dataset
- 11.580 invidiuals from India
- use civil society category
- tweets from July 2022
50,379 tweets sampled
used only 90% bottom tweets to remove outlier
379.514 interactors
1 to 534 tweets per author

# Sampling strategy

4 sampling strategies:
- uniform (U) or weighted(W) outsider selection
- direct (D) or staged(S - choose insider then outsider) outsider selection
Combinations: “DU”, “DW”, “SU”, “SW”

# Synthetic dataset

For assessment
Stochastic Block Model: 5000 nodes with 3 communities (1.5,, 1.5k, 2k), 58,799 edges
Sampling
- no interactions therefore based on maximum adjacency search
- MAS: select outsider with highest adjecency to insider set
- RI_MAS (analogous SW): select ousider with highest adjencency to random insider
- RO (similar DU): select random ousider
- RI_RO (similar SU): select random outsider incident to random insider

# Experimentation

Louvain community detection algorithm

My Notes

Tight Sampling in Unbounded Networks

# Introduction

# Paragraph 1

# Paragaph 2

# Paragaph 3

# Paragraph 4

# Paragraph 5

# Node-based sampling methods

# Traversal-based sampling methods

# Methodology

# Terminologies

# Priority scheme

# Data

# Sampling strategy

# Synthetic dataset

# Experimentation

Backlinks

Interactive Graph

Tight Sampling in Unbounded Networks

# Introduction

# Paragraph 1

# Paragaph 2

# Paragaph 3

# Paragraph 4

# Paragraph 5

# Related Work

# Node-based sampling methods

# Traversal-based sampling methods

# Methodology

# Terminologies

# Priority scheme

# Data

# Sampling strategy

# Synthetic dataset

# Experimentation

Backlinks

Interactive Graph