Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Mathematical Engineering - Bayesian Statistics

Full exam

BAYESIAN STATISTICS A. Guglielmi & M. Gianella 10.06.2024 Properly justify all your answers. Use the indicator function to denote the support of a distr.. Exercise 1We study the distribution of the eye color among people with black hair. There areM= 4 color categories:v 1=black, v 2=blue, v 3=brown and v 4=green. We rule out cases with different colors of the two eyes. We select a sample of people (with black hair) of sizenand assume that the color of the eyes of each individual is described by iid random variables, conditional to parametersp:= (p 1, . . . , p 4), where p 1+ · · ·+p 4= 1 and p j> 0 for allj= 1, . . . ,4. LetY= (Y 1, . . . , Y 4) the vector of the associated counts, i.e., Y jis the number of individuals in the sample of sizenwith eye color in thej-th category. It is well-known thatY∼Multinomial(n,p), that is P(Y 1= y 1, . . . , Y 4= y 4| p) =n !y 1! · · ·y 4!p y 1 1· · · py 4 4, if4 X i=1y i= n.(1) 1.Find a conjugate prior densityπforpunder model (1). Denote byα= (α 1, . . . , α 4) its hyperparameters, writing down the set of values they can assume. Now assumen= 50,α 1+ α 2+ α 3+ α 4= 5 and α 2= α 3= 1. 2.Fix the hyperparameterα 1of the prior πfound at point 1. such that the marginal prior variance ofY 1is equal to 20. 3.We have registered the eye color of people with black hair, obtainingy 1= 28 , y 2= 10 , y 3= 7. Find the posterior mean of eachp j, j= 1,2,3,4, using available data and the prior derived above. Compute also the posterior variance ofp 1and of p 2. 4.Test the hypotheses that the percentage of blue eyes among black hair people is larger than 20% versusthe percentage of blue eyes among black hair people is smaller or equal than 20%. Compute the Bayes factor and draw your conclusion. (Hint: use a proper approximation of the posterior of the parameter of interest.) Solution of Exercise 11.The likelihood in (1) has a similar mathematical structure of the density of a Dirichlet distribution:(p 1, p 2, p 3, p 4) ∼Dirichlet(α) withp 1+ p 2+ p 3+ p 4= 1, then the density of ( p 1, p 2, p 3) in R3 is π(p 1, p 2, p 3; α) =Γ( α 1) . . .Γ(α 4)Γ( α 1+ α 2+ α 3+ α 4)p α 1− 1 1pα 2− 1 2pα 3− 1 3(1 −p 1− p 2− p 3)α 4− 1 0< p 1, p 2, p 3< 1,0< p 1+ p 2+ p 3< 1 withα 1, α 2, α 3, α 4> 0. It is straightforward to prove that this distribution and (1) are conjugate, and that the posterior distribution ofpis Dirichlet(α 1+ y 1, . . . , α 4+ y 4). 2.It is straightforward to prove that if (p 1, p 2, p 3, p 4) ∼Dirichlet(α), thenp j∼ beta(α j, α −α j) for j= 1,2,3,4, whereα:=α 1+ α 2+ α 3+ α 4. Hence E(p j) =α jα j+ α−α j= α jα . The information that we have is that, a priori, Var(Y 1) = 20. We compute Var( Y 1) as follows: Var(Y 1) = E(Var(Y 1| p 1)) + Var ( E(Y 1| p 1)) = E(np 1(1 −p 1)) + Var ( np 1) = nE(p 1) −nE(p2 1) + n2 Var(p 1) =nE(p 1) −n(Ep 1)2 −nVar(p 1) + n2 Var(p 1) = nEp 1(1 −Ep 1) + n(n−1) Var(p 1) =nα 1α 1−α 1α +n(n−1)α 1( α−α 1)α 2 (α+ 1) We need to solve the following equation: 50α 15 5 −α 15 + 50 ·49α 1(5 −α 1)5 2 ·6= 20 ⇔2α 1(5 −α 1) +493 α 1(5 −α 1) = 20 ⇔55α2 1− 275α 1+ 60 = 0 1 The equation has two solutions: 275−249.85110 ≃ 0.2286,275 + 249 .85110 ≃ 4.7714. The second solution gives no positive value forα 4= α−(α 1+ α 2+ α 3), so that α 1= 0 .2286,α 2= 1, α3= 1, α 4= 2 .7714. 3.With these data we have that a posteriori (p 1, p 2, p 3, p 4) ∼Dirichlet(α 1+ y 1, . . . , α 4+ y 4) where α 1+ y 1= 28.2286,α 2+ y 2= 11, α 3+ y 3= 8, α 4+ y 4= 7 .7714, andα+n= 5 + 50 = 55. SinceE(p j| y) =α j+ n jα +n, we find that E(p 1| y) =28 .228655 ≃ 0.5132,E(p 2| y) =1155 = 0 .2,E(p 3| y) =855 ≃ 0.1455,E(p 4| y) =7 .771455 ≃ 0.1413. Sincep 1| y∼beta(28.2286,55−28.2286 = 26.7714), then Var(p 1| y)≃0.0045. Similarlyp 2| y∼beta(α 2+ y2, α +n−α 2− y 2) = beta(11,44) so that Var(p 2| y) =1350 andpVar( p 2| y)≃0.0535. 4.We need to check the hypotheses: H0: p 2> 0.2 vsH 1: p 2≤ 0.2 The Bayes factor is BF01=P (p 2> 0.2|y)P (p 2≤ 0.2|y)P (p 2> 0.2)P (p 2≤ 0.2) A priorip 2∼ beta(α 2, α −α 2) = beta(1,4), i.e. the prior density isπ p2( x) = 4(1−x)3 1(0,1)( x) and P(p 2> 0.2) =R 1 0.24(1 −x)3 dx= (1−x)4 |0 1. 2 = 0.84 = 0.5096. A posteriorip 2| y∼beta(11,44)approx ∼ N(0.2,0.0535), so that P(p 2≤ 0.2|y)≃Φ 0.2−0.20 .0535 = 0.5 and the posterior odds are equal to 1. HenceBF01≃1 −0.50960 .5096= 0 .9623. The BF is so close to 1 that we cannot make any decision in favor ofH 0or H 1. The exact value ofP rob(p 2≤ 0.2|y) = 0.5270, so thatBF 01= 0 .8637 with no approximation. The conclusion does not change.Exercise 2 In a linear regression context, unidimensional continuous datay 1, . . . , y nare modelled as Yi| x i, βind ∼ N(xT iβ , σ2 ), i= 1, . . . , n, wherex iis a p-dimensional vector of covariates for thei-th individual,i= 1, . . . , n. We assume a marginal priorπ σ2 (σ2 ) forσ2 . 1.Describe at leasttwodifferent types of marginal priors forβ, to performcovariate selection. 2.Describe at leastthreedifferent criteria, based on the posteriorπ(β|y 1, . . . , y n, x 1, . . . , x n) when π(β) is one of the priors described at point 1., for effectively selecting covariates. 3.Describe how you would predict a new responseYnew with associated covariatesxnew . Give details on the computational method you would use. Solution of Exercise 2 2 1.Priors of the regression parameter β= (β 1, . . . , β p) for covariate selection include the Bayesian lasoo prior, the spike-and-slab prior and the SSVS prior. The last two priors can be straightforwardly expressed with the introduction of auxiliary variablesγ= (γ 1, . . . , γ p)T such that π(β,γ) =π(β|γ)π(γ) where eachγ j∈ { 0,1}. Eachvisitedregression model is uniquely characterized by the vectorγ= (γ 1, . . . , γ p)T of binary inclusion variables indicating whether or not the variable enters the model. For more details, see Section 5.3.1 of the Lecture Notes. 2.If we use parameterγas above, criteria for selecting covariates are based on the posterior probability that γassume a valueγ 0, i.e., π(γ 0| y). The most popular criteria are: Highest Posterior Probability (HPD). Choose the model with the highest posterior probability. Median Probability Model (MPM). Pick all the covariates with estimated marginal posterior inclu- sion probability larger than 0.5: jsuch thatπ(γ j= 1 |y)≃1m m X t=1I γ( t) j=1 >0.5. Hard Shrinkage (HS). Pick all the covariates such that 0 does not belong to the posterior marginal credibility interval of the corresponding regressor. For more details, see Section 5.3.1 of the Lecture Notes. 3.We predict the responseYnew with new covariatesxnew , computing the predictive distribution ofYnew , given the data andxnew : p(ynew |y, X,xnew ) =Z p(ynew ,β, σ2 |y, X,xnew )dβdσ2 =Z p(ynew |β, σ2 ,xnew )p(β, σ2 |y, X)dβdσ2 Note thatp(ynew |β, σ2 ,xnew ) isN((xnew )T β, σ2 ). Hence, if we have a MCMC sample{(β( l) , σ2 ( l) , l= 1, . . . , L}from the posterior distribution, it is easy to obtain a MCMC sample from this posterior predictive distribution ofYnew by independently sampling fromLunivariate densitiesN((xnew )T β( l) , σ2 ( l) ).3