Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Mathematical Engineering - Bayesian Statistics

Full exam

BAYESIAN STATISTICS A. Guglielmi & M. Beraha 24.01.2023 Properly justify all your answers. Use the indicator function to denote the support of a distr.. Exercise 1LetX 1, X 2, . . . , X nbe (conditionally) independent and identically distributed (iid) absolutely con- tinuous random variables with density fX( x;α, β) =β α αx α +1I [β ,+∞)( x), α, β >0.(1) For the moment, considerβ= 3 fixed. 1.Compute the likelihoodL(α;x 1, . . . , x n) and show that the gamma distribution is a conjugate prior for α. Derive the hyperparameters of the associated posterior distribution. (Hint:denote by(a, b)the hyperparameters of the prior gamma densityπ, with prior variance forαgiven by the ratioa/(b2 )) 2.Compute the marginal distribution ofX 1. 3.For the priorπas at point 1, consider prior hyperparameters (1, b) (i.e.,a= 1) and fixbsuch that Eπ[ α] = 5. Propose a two-step method to sample from the marginal distribution ofX 1using the inverse- CDF method. You sample two iid random variables from the Uniform distributionU([0,1]) and observe U1= 0 .235 andU 2= 0 .901. Which value ofX 1do you get? We observe data (x 1, . . . , x 7) such thatP ilog( x i) = 9 .73. 4.Test the hypothesesH 0: α= 1.5 vsH 1: α̸ = 1.5 using the Bayes factor, making explicit your conclusion under available data and the prior at points 1 and 3. For any integern, ifX 1, X 2, . . . , X nare (conditionally) iid with density (1), assume that both αandβare random and a priori independent. Forαassume the marginal prior derived at point 1. 5.Propose a prior forβand find the associated full conditional distribution, in such a way that you can easily sample from it. 6.Describe a Gibbs sampling algorithm to sample from the posterior distribution of (α, β), with any dataset (x 1, . . . , x n). Solution of Exercise 1.1.Forx 1, . . . , x n> β = 3, the likelihood is L(α;x 1, . . . , x n) =n Y i=1f (x i; α) =β nα αn( Q n i=1x i)α +1=1Q n i=1x iα n βnQ n i=1x i α 1(0,+∞)( α). Hence, forα >0, we have L(α;x 1, . . . , x n) ∝ n Y i=1x iβ ! −α αn =αn e− αP 7 i=1log( x i/β ) . It is clear that a conjugate prior forαis the gamma(a, b),a, b >0, so that the posterior density is computed by Bayes’ theorem: π(α|x 1, . . . , x n) ∝αn e− αP ilog( x i/β ) αa −1 e− bα 1(0,+∞)( α) =αa +n−1 e− α (b +P ilog( x i/β ) ) 1(0,+∞)( α), that is a gamma density as well, with posterior hyperparameters ap= a+n, b p= b+n X i=1log x iβ . 1 2.To compute the marginal distribution of X 1, for x > β= 3, we write fX1( x) =Z +∞ 0αβ αx α +1b aΓ( a)e − bα αa −1 dα=b aΓ( a)xZ +∞ 0α ( a+1)−1 exp (−α(b+ log(x/β)))dα=b aΓ( a)xΓ( a+ 1)( b+ log(x/β))a +1. Hencef X1( x) =ab ax ( b+ log(x/β))− (a+1) 1[β ,+∞)( x). It is straightforward to compute the marginal distri- bution function ofX 1, for x≥β, as FX1( x) =Z x βab as ( b+ log(s/β))− (a+1) ds=aba (b+ logsβ )− a− a! s=x s=β= ba (b)− a −(b+ logxβ ) − a = 1−b a( b+ logxβ ) a, x ≥β . 3.SinceE[α] = 1/b= 5, we getb= 0.2. To sample from the marginal ofXd =X 1, we first sample α∼gamma(1, b) and then sampleX|α. Remember that ifU∼ U(0,1) andFis a distribution function with quantile function (i.e.,F− 1 ), then the random variableF− 1 (U)∼F. To sample from the gamma distribution, recall that gamma(1, b)≡ E(b). Since the distribution function of the exponential is 1−e− bα whenα >0, its inverse function, i.e., its quantile function, is−log(1−y)/b. Instead, the conditional distribution function of the density (1) can be computed as follows FX( x;α) =Z x ββ α αt α +1dt =αβα −1αx α+1β x α = 1− βx α whenx > β= 3 whose inverse isF− 1 (y) =β(1 −y)1 /α= βe− log (1−y)α = 3 e− log (1−y)α . Hence, forU 1= 0 .235 we obtainα=−log(1−U 1) /b≈1.339 and, withU 2= 0 .901,X= 3e− log (1−U2)1 .339 ≈ 16.865. The one-step method considers the quantile function of the marginal distribution functionF X1( x) = 1−b/(b+ log(x/3))1 [3,+∞)( x) as derived above, that is F− 1 X1( u) = 3ebu/ (1−u) , u∈(0,1). 4.We compute the Bayes factor asBF01=L (1.5;x 1, . . . , x 7)m (x) where m(x) =Z +∞ 0α n βnα( Q ix i)α +1b aΓ( a)α a −1 e− α dα=Γ( n+a)Γ( a)b a( Q n i=1x i)( b+P n i=1log( x i/β )a +n, so that BF01=1Q n i=1x i(1 .5)n βnQ n i=1x i 1.5Γ( a+n)Γ( a)1Q n 1x ib a( b+P n i=1log( x i/β )a +n= (1 .5)n e− 1.5P n i=1log( x i/β )Γ( a+n)Γ( a)b a( b+P n i=1log( x i/β )a +n =(1 .5)7 e− 1.5(P 7 i=1log( x i) −7 log(β))7! b( b+P n i=1log( x i) −7 log(β))8= 0 .80151 .5919= 0 .5035 andlogBF 01= −0.6861716. We conclude that there is very mild/weak evidence in favor ofH 1. 2 5.Now both αandβare random. Writing the likelihood as a function ofβ, we get L(β;x 1, . . . , x n) =n Y i=1f (x i; α, β)∝βnαn Y i=1I [β ,+∞)( x i) = βnα I[0,min(x i)]( β). The expression of the last indicator function follows sinceQ n i=1I [β ,+∞)( x i) = 1 if and only if all x iare larger or equal thanβ, i.e., min ix i≥ β. One possible prior forβ, that yields aneasy-to-samplefull-conditional, isβ∼gamma(θ, λ). This prior gives the full conditional fβ( β|x)∝e− λβ βnα +θ−1 I[0,min(x i)]( β) (2) which is the kernel of a gamma(θ+nα, λ) truncated on the interval [0,min(x i)] and can be efficiently sampled via rejection sampling. Alternatively, another prior forβisβ∼ U([0, c]), wherecis a positive constant. The associated full- conditional is fβ( β|x)∝βnα I[0,min(x i)]( β)×1c I [0,c]( β)∝βnα I[0,min(c,min(x i))]( β),(3) which is a re-scaled (on the interval [0,min(c,min(x i))]) beta distribution with parameters ( nα+ 1,1). The associated distribution function can be analytically derived and sampling from this full-conditional is obtained through the inverse-CDF method. 6.First of all, observe that the full-conditional ofαis such that L(α|β ,x)∝ L(X|α, β)π(α)π(β)∝ L(X|α, β)π(α). This distribution is equal to the posterior ofαderived at point 1. whenβwas kept fixed, i.e.,α|β ,x∼gamma(a+n, b+P n i=1logx iβ ). To sampleMrealizations from the posterior distribution of (α, β), a Gibbs sampling algorithm is as follows. Initializeα(0) andβ(0) to admissible values. For instance sampleα(0) ∼gamma(a, b) andβ(0) ∼ Uniform(0,min(x i)). Fori= 1. . . , M: –Sampleα( i) |β( i−1) ,x∼gamma a+n, b+P n i=1log( x i/β( i−1) ) ≡gamma 8,0.2 + 9.73−7 log(β( i−1) ) –Sampleβ( i) |α( i) ,xfrom the distribution (2) using rejection sampling or a Metropolis-Hastings step; if you use the prior (3) forβ, sampling can be obtained from the inverse-CDF method.Exercise 2- Only for the 10CFU-course students Consider a Dirichlet process mixture of normal distributions with fixed variance, i.e.:Xi| µ iind ∼ N(µ i, 1)i= 1, . . . , n µ1, . . . , µ n| Piid ∼P P∼ D αP0 whereα >0 andP 0is a probability on R. 1.Derive the marginal distribution of (µ 1, . . . , µ n) under the model µ1, . . . , µ n| Piid ∼P P∼ D αP0. 3 2.Based on the marginal distribution found at the point before, describe a marginalGibbs sampling strat- egy to sample from the posterior distribution of (µ 1, . . . , µ n) given X 1, . . . , X n. Make explicit which parametrization you adopt, i.e., which variables are in the state of your MCMC algorithm. Solution of Exercise 2.1.The marginal distribution of the sampleµ 1, . . . , µ nis equivalently described by the law of the distinct valuesµ∗ 1, . . . , µ∗ kand a set of “cluster allocation” variables c i, i= 1, . . . , nsuch thatc i= hif and only if µi= µ∗ h. We have L(µ 1, . . . , µ n) = L(µ∗ 1, . . . , µ∗ k, c 1, . . . , c n) =α k( α) nk Y j=1( n j− 1)!k Y j=1P 0( µ∗ j) (4) wheren h=P n i=1I [c i= h]. See the TA lesson on Dec 02 for details. 2.The state of the MCMC algorithm consists of (µ 1, . . . , µ n) ↔(µ∗ 1, . . . , µ∗ k, c 1, . . . , c n). From the marginal distribution of theµ i’s we have that L(µ i| µ −i) =k − i X h=1n − i hα +n−1δ µ∗ h+αα +n−1P 0( ·) where the−isuperscript indicates that thei-th observationµ ihas been removed from the MCMC state. Bayes’ theorem implies that L(µ i| X,µ −i) ∝ L(X|µ)L(µ) ∝ L(X i| µ i) L(µ i| µ −i) = N(X i| µ i, 1)L(µ i| µ −i) . We expand the last expression as L(µ i| X,µ −i) ∝k − i X h=1n − i hδ µ∗ h( µ i) N(X i| µ i, 1) +αP 0( µ i) N(X i| µ i, 1).(5) Lettingm(X i) =Z RN (X i| µ,1)P 0( µ)dµ, H(µ i) =N (X i| µ,1)P 0( µ)R RN (X i| µ,1)P 0( µ)dµ, we have that (5) is a mixture of the measures{δ µ∗ 1( ·), . . . , δ µ∗ k− 1( ·), H(·)}with weights proportional to {n− i 1N (X i| µ∗ 1) , . . . , n− i hN (X i| µ∗ k) , αm(X i) }. Then, we can sampleµ iby first sampling c ifrom a categorical distribution over 1 , . . . , k− i + 1 with weights proportional to{n− i 1N (X i| µ∗ 1) , . . . , n− i hN (X i| µ∗ k) , αm(X i) }. Then, ifc i< k− i we setµ i= µ∗ ci. Otherwise, we setk=k+ 1, sampleµ∗ k+1∼ H(µ i) and set µ i= µ∗ k+1, c i= k+ 1. This alone suffices to produce an MCMC chain with target distribution equal to the posterior of theµ i’s. We can obtain a more efficient algorithm by including an “acceleration” step: after having sampled the µi’s as before, sample each µ∗ hindependently from fh( ·)∝Y i:c i= hN (X i| µ∗ h) P 0( µ∗ h) .Exercise 3- Only for 8CFU-course students 4 Consider the probit regression model Yi| x i, βind ∼Be(p i) , i= 1, . . . , n pi= Φ( xt iβ ) = Φ(β 1x i1+ · · ·+β px ip) (6) β∼ N p( b 0, B 0) , whereb 0∈ Rp ,B 0is a p×pcovariance matrix,x iis a p-dimensional vector of covariates and Φ is the standard Gaussian d.f.. Introducing suitable latent variables, describe a Gibbs sampler algorithm to simulate from the posterior distribution ofβ, given datay 1, . . . , y n, according to model (6). Solution of Exercise 3. We introduce latent variablesZ 1, . . . , Z nsuch that Yi| x i=( 1 ifZ i> 0 0 otherwise, i = 1, . . . , n(7) Ziiid ∼ N(xt iβ ,1).(8) It is straightforward to check that the likelihood in (6) is equivalent to (7)-(8). Theparameteris now (β,Z), whereZ= (Z 1, . . . , Z n)t . We build a Gibbs sampler to simulate from the joint posterior lawL(β,Z|y 1, . . . , y n), but in the end we will be interested only in the marginal posteriorL(β|y 1, . . . , y n). In order to derive a Gibbs sampler, we need the full-conditionals, that are proportional to the joint law of data and the parameter vector: L(Y 1, . . . , Y n, β,Z)∝ L(Y 1, . . . , Y n| β,Z)× L(β,Z) =L(Y 1, . . . , Y n| Z)× L(Z|β)×π(β). Then, the full-conditionals are:L(β|Z,Y)∝ L(Z|β)×π(β). This last distribution is the posterior of a linear model with Gaussian likelihood, where thedataareZ 1, . . . , Z n, i.e. L(β|Z,Y) =N p( b n, B n) , B n= ( Xt X+B− 1 0)− 1 ,b n= B n( Xt Xˆ β+B− 1 0b 0) . HereXis the design matrix withx i’s as rows, andˆ β= (Xt X)− 1 Xt Z. L(Z|β,Y)∝ L(Y 1, . . . , Y n| β,Z)×L(Z|β) =Q n i=1{ (1(y i= 1) 1(Z i> 0) +1(y i= 0) 1(Z i< 0))ϕ(Z i; xt iβ ,1)}, i.e.Z 1, . . . , Z nare independent given β,y, and Zi| β,y∼( 1(Z i> 0)ϕ(Z i; xt iβ ,1) ify i= 1 1(Z i< 0)ϕ(Z i; xt iβ ,1) ify i= 0 . The last two expressions denote the truncated Gaussian densities with support (0,+∞) and (−∞,0), respectively.5