Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Mathematical Engineering - Bayesian Statistics

Full exam

BAYESIAN STATISTICS A. Guglielmi & M. Beraha 24.01.2022 Properly justify all your answers. Exercise 1Consider the beta-binomial distribution, i.e. M|N , p∼Bin(N , p),whereNis a fixed positive integer, p∼beta(α, β), α, β >0. 1.Show that the prior marginal distribution of the r.v.Mis the beta-binomial density, i.e., P(M=m) = N m B(α+m, β+N−m)B( α, β), m = 0,1, . . . , N(1) where B(α, β) = Γ(α)Γ(β)/Γ(α+β) is the beta function. 2.Compute E[M] and Var[M]. Consider now an urn withN= 4 balls of whichMare blue and the remaining 4−Mare red. LetX denote the number of blue balls obtained when samplingn= 2 balls without replacement from the urn. The r.v.Xhas the hypergeometric distribution, i.e. P(X=x|M) = M x 4−M 2−x 4 2 1 SM( x),(2) whereS M= {max{0, M−2},max{0, M−2}+ 1, . . . ,min{2, M}}. The statistical goal here is to make inference onM, with dataXunder likelihood (2). A priori, we assume thatM(the only unknown parameter) is the beta-binomial r.v. with hyperparameters N= 4, α, β, i.e. its (discrete) density is (1). 3.Find the values of prior hyperparametersαandβsuch that E[M] =N/2 = 2 and Var[M] = N2 /10 = 16/10. Assume those values for the rest of the exercise. We observe datax= 1. 4.Derive the support of the posterior distribution ofM, givenX= 1. (Hint: check for which values ofm= 0,1,2,3,4the indicator function in(2)assumes value 1) 5.Compute the posterior probability P(M=m|X= 1), m= 0,1,2,3,4. (Hint: first ignore factors that do not depend onmin the expression and make this calculation for a generalmin the support of the posterior) 6.Test the hypothesesH0: M= 2 vsH 1: M= 1 by computing the Bayes factor of the model (2)-(1) with available data. Write down your conclusion. 1 Solution ofExercize 1. 1.Form= 0,1, . . . , N: P(M=m) =Z 1 0P (M=m|p)π(dp) =Z 1 0 N m pm (1−p)N −mpα −1 (1−p)β −1B( α, β)dp = N m 1B( α, β)Z 1 0p α +m−1 (1−p)β +N−m−1 dp = N m B(α+m, β+N−m)B( α, β) 2.By standard properties of the conditional distribution, we haveE[M] = E [E[M|p]] = E[N p] =NE[p] =Nαα +β Var[M] = E [Var[M|p]] + Var [E[M|p]] = E[N p(1−p)] + Var[N p] =NE[p(1−p)] +N2 Var[p] We haveE[p−p2 ] = αα +β− α (α+ 1)( α+β)(α+β+ 1) =αβ( α+β)(α+β+ 1) andVar[p] =αβ( α+β)2 (α+β+ 1), so that Var[M] =Nαβ( α+β)(α+β+ 1)+ N2 αβ( α+β)2 (α+β+ 1) =N αβ (α+β) +N2 αβ( α+β)2 (α+β+ 1)= N αβ (α+β+N)( α+β)2 (α+β+ 1) 3.From the condition on E[M] we getα=β. Substituting into the expression for the variance: Var[M] =N α 2 (2α+N)4 α2 (2α+ 1)= 2 N α+N28 α+ 4= N 210 Solving forαyields β=α=6 N8 N−20= 3 N4 N−10= 2 . 4.We check thatS 0= {0},S 1= {0,1},S 2= {0,1,2},S 3= {1,2},S 4= {2}. This imply that 1Sm(1) = 1 iif m= 1,2,3. The posterior ofMis proportional to the product of (2) and (1). However, sincex= 1, the posterior discrete density is equal to 0 whenm= 0,4, and it is larger than zero form= 1,2,3. Hence, the support of the posterior ofM, givenX= 1, is{1,2,3}. 5.From the support found at point 4., we have thatP(M=m|X= 1) = 0 form= 0,4. The posterior masses whenm= 1,2,3 are proportional to P(M=m|X= 1)∝ m 1 4−m 2−1 × 4 m B(α+m, β+ 4−m) = m 1 4−m 1 × 4 m Γ(m+ 2)Γ(6−m) sinceα=β= 2 = m 1 4−m 1 4 m (m+ 1)!(5−m)! form= 1,2,3 2 Hence P(M= 1|X= 1)∝576,P(M= 2|X= 1)∝864 andP(M= 3|X= 1)∝576. Normalizing the numbers above (576 + 864 + 576 = 2016) we get P(M= 1|X= 1) =P(M= 3|X= 1) =5762016 = 27 , P (M= 2|X= 1) =37 . 6.We compute the Bayes factor as the ratio between prior and posterior oddsBF01=P (M= 2|X= 1)P (M= 1|X= 1)P (M= 1)P (M= 2), whereP(M= 1) = 4 1 B(3,5)B (2,2)= 4 1 Γ(2 + 1)Γ(2 + 4−1)Γ(2 + 1 + 2 + 4 −1)Γ(4)Γ(2)Γ(2) = 835 P(M= 2) = 4 2 B(4,4)B (2,2)= 4 2 Γ(2 + 2)Γ(2 + 4−2)Γ(2 + 2 + 2 + 4 −2)Γ(4)Γ(2)Γ(2) = 935 whileP(M= 1|X= 1) andP(M= 2|X= 1) have been computed at point 4. Therefore we obtain BF01=32 9 8 = 43 and 2 log( BF 01) ≃0.5754. There is weak evidence in favour ofH 0. Note that, by Bayes’ theorem,P (M=j|X= 1)P (M=j)= P(X= 1|M=j) and hence the BF is the ratio of the likelihood atM= 2 andM= 1: BF01=P (X= 1|M= 2)P (X= 1|M= 1)= 43 .3 Exercise 2 Consider a regression problem where, starting from covariatesx i= ( x i1, . . . , x ip) ∈Rp , we model responsesY ifrom a categorical distribution, with i= 1, . . . , n. In the simplest case, we assume only two categories, so thatY i∈ { 0,1}. In this case, we consider the following probit regression model: P(Y i= 1 |β= (β 1, . . . , β p)) = Φ( xt iβ ) = Φ(x i1β 1+ · · ·+x ipβ p) , i= 1, . . . , n βjiid ∼ N(0, σ2 ), j= 1, . . . , p where Φ(·) denotes the cumulative distribution function of the standard Gaussian distribution. As usual, we assumeY 1, . . . , Y nindependent, conditionally to β. 1.Introducing suitable auxiliary variables, describe a Gibbs sampler algorithm to simulate fromthe posterior distribution ofβ, given datay 1, . . . , y n, according to the probit regression model described above. Consider now the case whereY iis a categorical r.v. assuming korderedcategories,Y i∈ { 1, . . . , k}, wherekis a positive integer larger than 2. To help with the intuition, consider the categories to be age bins, for instanceY i= 1 if sub ject iis less than 10 years old,Y i= 2 if the age of sub ject iis between 10 and 15, etc., whileY i= kif sub jectiis older than 85. Introducing latent variablesZ 1, . . . , Z nand a set of fixed cut-pointsc 1< c 2< · · ·< c k−1, we assume the following regression model for Y 1, . . . , Y n: Yi= jiffZ i∈ (c j−1, c j) ,wherec 0= −∞, c k= ∞, i= 1, . . . , n(3) Zi| βind ∼ N(xt iβ ,1), i= 1, . . . , n(4) βjiid ∼ N(0, σ2 ), j= 1, . . . , p(5) We also assumeY 1, . . . , Y nindependent, conditionally to Z= (Z 1. . . , Z n). 2.Derive the conditional distribution of a singleY igiven βandx i. (Hint: start from(3)-(4)and then marginalizeZ iout ) 3.Derive the full-conditionals of the Gibbs sampler to simulate from the posterior ofβ,Z, given datay 1, . . . , y nand propose a simulation technique for the full-conditional of Z. SolutionofExercize 2. 1.See LESSON 18 -2021-10-28. 2.Forj= 1, . . . k,Y i= jiffZ i∈ (c j−1, c j). Hence P(Y i= j|β) = Φ(c j− xT iβ )−Φ(c j−1− xT iβ ) for anyj= 1, . . . , k. 3.We start from the joint model L(Y,Z,β) =N Y i=1  k X j=1I [Y i= j]I[Z i∈ (c j−1, c j)] N(Z i| xT iβ ,1) p Y ℓ=1N (β ℓ| 0, σ2 ). Both full-conditionals ofβand ofZare proportional to the joint distribution above. A posteriori, theZ i’s are independent random variables and hence the full-conditional of Zis the product of the conditional distribution ofZ i, given data and β. Specifically, letj∗ i the category of thei–th observation, we get that L(Z i| · · · )∝ N(Z i| xT iβ ,1)I[Z i∈ (c j∗ i− 1, c j∗ i)] , that is,Z ifollows a truncated normal distribution with mean xT iβ , variance 1, restricted to the interval (c j∗ i− 1, c j∗ i). We can simulate each Z iefficiently using the inverse-cdf method or (less efficiently) using rejection sampling. 4 As far asβis concerned, its distribution is independent of theY i’s and it can be sampled following standard results from Bayesian linear regression, using theZ i’s as observations. In particular, we get L(β| · · ·)∝n Y i=1e − (Z i− xT iβ )2 /2 ×e− βT β/(2σ2 ) = exp −12 (Z−Xβ)T (Z−Xβ) +1σ 2βT β ∝exp −12 1σ 2βT β+βT (XT X)β−2βT (XT Z) = exp −12 βT 1σ 2I +XT X β−2βT (XT Z) where the last expression displays the kernel of a multivariate normal distribution with variance equal to Σβ= 1σ 2I +XT X −1 and meanµβ= 1σ 2I +XT X −1 XT Z.5