Parametric tests

136 views 9:39 am 0 Comments June 3, 2023

SOST71032 Social Network Analysis
Random Graph ModelsStrategic Leadership Assignment
Dr Termeh Shafie
Department of Social Statistics
School of Social Sciences
The University of Manchester
Day 4
SOST71032 Social Network Analysis Random Graph Models Day 4 1 / 31
parametric vs. non-parametric
I parametric methods
– fitting a model
– parametric tests based on the theoretical distribution of the summary
statistics (only available for some models)
– the data follow some sort of theoretical probability distribution
– from simple to more complicated models that incorporate
dependencies among tie variables
I non- parametric methods
– statistical tests for model fit
– when we don’t think that assumptions of parametric tests are satisfied
(distribution free methods)
– evaluates
H0 against H1without assuming any parametric model
p-values have the same structure of interpretation:
probability of seeing such extreme data given the null hypothesis is true
– tests: shuling edges while fixing an observed summary measure
last lecture
this lecture
SOST71032 Social Network Analysis Random Graph Models Day 4 2 / 31
example.
Knecht (2008): Friendship Selection and Friends’ Influence
friendship networks at 4 time points for 25 pupils at a school
running hypotheses:
H1 pupils chose friends with the same gender
H2 pupils reciprocate friendship
H3 the friend of a friend is a friend
H4 pupils chose friends with similar delinquency behaviour
H5 pupils adopt delinquent behaviour from their friends
SOST71032 Social Network Analysis Random Graph Models Day 4 3 / 31
example.
Knecht (2008): Friendship Selection and Friends’ Influence
friendship networks at 4 time points for 25 pupils at a school
running hypotheses:
H1 pupils chose friends with the same gender
H2 pupils reciprocate friendship
H3 the friend of a friend is a friend
H4 pupils chose friends with similar delinquency behaviour
H5 pupils adopt delinquent behaviour from their friends
SOST71032 Social Network Analysis Random Graph Models Day 4 3 / 31
example.
Knecht (2008): Friendship Selection and Friends’ Influence
SOST71032 Social Network Analysis Random Graph Models Day 4 4 / 31
example.
Knecht (2008): Friendship Selection and Friends’ Influence
SOST71032 Social Network Analysis Random Graph Models Day 4 5 / 31
example.
first test: social selection by gender
hypothesis:
pupils chose friends with the same gender
more precisely:
the
probability of friendship between pupils with same gender is higher
method: divide pairs of pupils (dyads) into two categories
D1 = f(A, B); gender(A) = gender(B)g
D2 = f(A, B); gender(A) 6= gender(B)g
compare the ratio of friendship ties in the two groups
# ties in D1
# dyads in D1 vs.
# ties in
D2
# dyads in D2
results:
105
312 = 0.337 vs. 288 31 = 0.108
SOST71032 Social Network Analysis Random Graph Models Day 4 6 / 31
example.
significance of observed dierence 0.11 probability for friendship between
dierent gender
0.34 probability for friendship between same gender
could this dierence be just accidental?
if we divided pupils into two meaningless groups,
the tie probability would also not be equal
the non-parametric approach:
repeat the analysis 1000 times with random gender assignment
=) average dierence is 0.035; maximum is 0.142
maybe friendship is only seemingly influenced by gender equality; the
“true” explanatory variable might be
I behaviour
I other ties in the network
we need a model that control for the influence of other variables
SOST71032 Social Network Analysis Random Graph Models Day 4 7 / 31
logistic regression
classic starting point: why not treat edges as independent,
with log-odds as a linear function of covariates?
=) modelling the occurrence of ties with logistic regression
random variable Yuv for tie from node u to node v
Y
uv = (1 with probability 0 with probability 1 Puv Puv
Puv = FunctionOf(parameters,statistics)
– statistics quantify characteristics of dyad (u, v) in observed network
– parameters quantify influence of those variables on tie probability:
I a positive (negative) parameter means:
the higher (lower) the statistic the higher (lower) the probability
I a zero parameter means:
the statistic has no influence on the tie-probability
parameters are estimated from the observed network
SOST71032 Social Network Analysis Random Graph Models Day 4 8 / 31
logistic regression
probability puv of a tie from u to v is specified as
puv = logit1(q · s) = exp(q · s)
1 + exp(
q · s)
where
s = (s1, . . . , sk) Rk statistics
q = (q1, . . . , qk) Rk parameters
q · s =
ki
=1
qi · si
The statistics si = si(u, v; y) are functions of the observed data
The parameters are estimated to maximize the probability of
the observed network
y:
P(Y = y) =
u6=v
p
yuv
uv (1 puv)1yuv
SOST71032 Social Network Analysis Random Graph Models Day 4 9 / 31
example.
results from logistic regression
gender model: friendship ties explained by gender equality
puv = logit1(q0 + q1 · SameGender(u, v))
output:
statistic parameter st. error Pr(> jzj)
(intercept) -2.1151 0.1901
<2e-16 ***
SameGender 1.4363 0.2247 1.64e-10 ***
implied probability for ties by gender equality:
p = 0.1076 for friendship between pupils with dierent gender
p = 0.3365 for friendship between pupils with same gender
SOST71032 Social Network Analysis Random Graph Models Day 4 10 / 31
example.
results from logistic regression
fit more complex models and control for alternative explanations
puv = logit1
ki
=1
qi · si(u, v; y)!
with
si(u, v; y) interpretation
1 constant (intercept)
SameGender(u, v) gender homophily
SimDelinquency(u, v) behaviour homophily
yvu reciprocity
w
yuwywv transitivity
SOST71032 Social Network Analysis Random Graph Models Day 4 11 / 31
example.
results from logistic regression
fit more complex models and control for alternative explanations
puv = logit1
ki
=1
qi · si(u, v; y)!
output:
statistic parameter st. error Pr(> jzj)
(intercept) -4.3664 0.3915
<2e-16 ***
SameGender 1.2644 0.3036 3.12e-10 ***
SimDelinquency -0.0009 0.3595 0.998
reciprocity 2.0622 0.2839 3.76e-13 ***
transitivity 0.9420 0.0918
<2e-16 ***
SOST71032 Social Network Analysis Random Graph Models Day 4 11 / 31
moving beyond logistic regression
the logistic model can be powerful, but still very limiting
I logistic regression is only valid for independent observations
I here, the dierent observations are not independent
I in many cases the existence of an edge (or several edges)
changes the probability of other edges
randomly drawing ties from the logistic regression model:
observed network simulated network
SOST71032 Social Network Analysis Random Graph Models Day 4 12 / 31
random graph models
a random graph model
I assigns probabilities to entire graphs (rather than to individual edges)
I implies edge probabilities (but is not determined by them)
outline for some simple random graph models
I Bernoulli graphs
I configuration model
I small world model
SOST71032 Social Network Analysis Random Graph Models Day 4 13 / 31
random graph models
definition.
a graph is a pair G = (V, E), where V is a finite vertex set and E the edge set
definition.
a random graph model is a probability space (G, P), where G is a (finite) set
of graphs.
example.
let G be the set of all undirected, loopless graphs with vertex set
V = 1, . . . , n and let P be defined by
P : G ! R; P(G) = 1
2
n(n1)
2
then (G, P) is a random graph model
the set of vertices is fixed; all the randomness is in the edges
SOST71032 Social Network Analysis Random Graph Models Day 4 14 / 31
the Bernoulli graph G(n, p)
(a.k.a the Erdös Rényi model)
definition.
G(n, p) is the random graph model on the set of undirected, loopless graphs
with vertex set
V = f1, . . . , ng, that defines the probability of a graph G
with m edges by
P(G) = pm(1 p)n(n21) m
1. the edge probability of every dyad is equal to p
2. the model is fully independent
3. there is just one model satisfying properties (1) and (2).
remark.
the uniform random graph model is identical with G(n, 1 2)
SOST71032 Social Network Analysis Random Graph Models Day 4 15 / 31
the Bernoulli graph G(n, p): simple summary
a graph with n nodes where an edge exists with
independent random probability
0 < p < 1 for each edge
I the model is fully independent
I the tie probability of every dyad is equal to p
I degree distribution is binomial
I expected degree of a node is (n 1)p np
what is the most likely parameter value p?
=) maximum likelihood estimation (MLE)
MLE is the density of the observed graph: pˆ = L
M
where L = number of edges and M = n(n21) is number of dyads
note. this is the same as the non-parametric UjE(L) null model,
where parameter density is estimated using MLE:
pˆ = E(L)
SOST71032 Social Network Analysis Random Graph Models Day 4 16 / 31
the Bernoulli graph G(n, p)
properties of Bernoulli graphs:
I most nodes are average linked (think normal distribution)
I the average distance between two nodes is small
I nodes do not tend to form clusters (no hubs)
are Bernoulli graphs representatives of real networks?
SOST71032 Social Network Analysis Random Graph Models Day 4 17 / 31
plausibility of G(n, p)
example. Florentine business network
can such a network be drawn from a
G(n, p) model?
I which G(n, p)?
I what is the most likely value for the parameter p?
number of edges
L=15
number of dyads
M = n(n1)
2
=
16(15)
2
= 120
density
pˆ = 15=120 = 0.125
SOST71032 Social Network Analysis Random Graph Models Day 4 18 / 31
plausibility of G(n, p)
Florentine business network
9 simulated Bernoulli graphs with 15 ties
observed
SOST71032 Social Network Analysis Random Graph Models Day 4 19 / 31
plausibility of G(n, p)
Florentine business network
9 simulated Bernoulli graphs with 15 ties: triad census
381 153 21 5
observed
simulated
SOST71032 Social Network Analysis Random Graph Models Day 4 20 / 31
plausibility of G(n, p)
example. facebook network with 769 vertices and 16 600 edges
can such a network be drawn from a
G(n, p) model?
I which G(n, p)?
I what is the most likely value for the parameter p?
SOST71032 Social Network Analysis Random Graph Models Day 4 21 / 31
plausibility of G(n, p)
facebook example
both graphs have 769 vertices and 16 600 edges
maximum likelihood estimate for
p is 0.056
which graph is more likely drawn from a
G(n, p) model?
both graphs have the same (very small) probability in
G(n, p)
=
) the probability of the graph is not a good criterion
SOST71032 Social Network Analysis Random Graph Models Day 4 22 / 31
plausibility of G(n, p)
facebook example
both graphs have 769 vertices and 16 600 edges
maximum likelihood estimate for
p is 0.056
which graph is more likely drawn from a
G(n, p) model?
let’s look at three network properties
SOST71032 Social Network Analysis Random Graph Models Day 4 22 / 31
1. inhomogeneity of the graph density
facebook example colours encode the dorm variable (grey for missing
value)
density of the whole network is 0.056
SOST71032 Social Network Analysis Random Graph Models Day 4 23 / 31
1. inhomogeneity of the graph density
facebook example
subnetworks induced by the 8 dorms have much higher densities:
0.21, 0.37, 0.20, 0.35, 0.31, 0.24, 0.37, 0.25
can this happen in a
G(n, p) model?
not likely, probability that randomly drawn subnetworks of that size have
such high density is very small
SOST71032 Social Network Analysis Random Graph Models Day 4 24 / 31
2. degree distributions
facebook example
SOST71032 Social Network Analysis Random Graph Models Day 4 25 / 31
3. number of triads
facebook example
expected: 13 000 observed: 119 000
SOST71032 Social Network Analysis Random Graph Models Day 4 26 / 31
plausibility of G(n, p)
facebook example
how plausible is the G(n, p) model?
address this question by looking at some network properties:
1. inhomogeneity of the graph density
2. skewness of the degree distribution
3. number of triads
all three properties are very dierent for the facebook network
than for the
G(n, p) model
SOST71032 Social Network Analysis Random Graph Models Day 4 27 / 31
configuration model
(Bender & Canfield, 1978)
to account for the unrealistic degree distribution
generated from the
G(n, p) model
random networks constrained by the observed degree sequence
I half-edges (stubs)
given observed degrees
I randomly joined to form edges
limitations
I loops (c) and multi-edges (d)
I sum of degrees must be even
note. this is the same as the non-parametric Ujd null model
i.e. generated random networks given fixed degree sequence
SOST71032 Social Network Analysis Random Graph Models Day 4 28 / 31
small world model
(Wattz & Strogatz, 1998)
to account for small graph diameter (short average path length)
and high clustering (the small world phenomena)
1. a circle network where each node is connected
to a specified number of nearest neighbours
2. rewire ties to new nodes given a probability
p
limitations:
p 0 0.5: all nodes have same degrees
p 1: G(n, p) graph
SOST71032 Social Network Analysis Random Graph Models Day 4 29 / 31
beyond the basic random graph models
we need to introduce dependencies among the network tie variables
I these express various types of network self organisation
I dependence assumption picks out certain types of network patterns,
so called
network configurations, that are possible in the model
I in other words:
we assume that the network is built up of these configurations
dependency among dyads (or higher-order structures)
I is what makes network modelling diicult
I is what makes network modelling interesting
I is oen the essence of social network theories
SOST71032 Social Network Analysis Random Graph Models Day 4 30 / 31
four generations of dependence assumptions
I Bernoulli dependence
network variables are independent of each other
I dyadic dependence
for directed graphs: dependence within dyads
I Markov dependence
network variables are conditionally independent
unless they share at least one node
I social circuit dependence
network variables are conditionally dependent
if they create 4-cycles
(also, dependence arising from actor attributes)
simple
random graph models
nested
random graph models
SOST71032 Social Network Analysis Random Graph Models Day 4 31 / 31

Tags: , , , , , , , , , , , , , , , , , , ,