logo
  • userLoginStatus

Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Current View

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)September 10, 2022 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 ˆThe following exam is composed of8 exercises(one per page). The rst page needs to be lled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the nal mark. ˆDuring this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. ˆThe rst reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the nal mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. ˆYou are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. ˆThe exam can be written either inEnglishorItalian. ˆYou are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. ˆThree of the points will be given on the basis on how quick you are in solving the exam.If you nish earlier than 30 min before the end of the exam you will get 3 points, if you nish earlier than 20 min you will get 2 points and if you nish earlier than 10 min you will get 1 point (the points cannot be accumulated). ˆThe box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 2 of ??Exercise 1 (7marks) DescribeSupport Vector Machines(SVMs) for supervised classi cation problems. In particular explain how do they work, their strengths and weaknesses.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 3 of ??Exercise 2 (7marks) Describe thePCAtechnique and what it is used for.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 4 of ??Exercise 3 (2marks) Consider the following snippet of code: 1 V1 = np . l i n a l g . i n v ( np . e y e ( nS )=gamma*p i @ Ps a s ) @ ( p i @ Rs a ) 2 3 Vo l d = np . z e r o s ( nS ) 4 t o l = 0 . 0 0 0 1 5 V2 = p i @ Rs a 6w h i l enp .any( np .abs( Vo l d =V2 )>t o l ) : 7 Vo l d = V2 8 V2 = p i @ ( Rs a =gamma*Ps a s @ V) 1.Describe the purpose of the procedure of line1and the purpose of the procedure of lines 3-8. Are they correct? If not, propose a correction. 2.What is the main disadvantage of the procedure of line1compared to the one of lines3-8? 3.What happens to the two procedures whengamma= 1? 1.The procedure of line1computes the closed-form solution of the state value functionV of policypiin the MDP with transition modelPsas , rewardRsa and discount factorgamma. The procedure of lines3-8performs the iterative application of the Bellman expectation operator to compute the same value functionV . The iterative procedure is stopped when a given thresholdtolbetween consecutive approximation is reached. However, line8does contain a mistake and should be corrected as follows: V2 = pi @ (Rsa + gamma * Psas @ V) . 2.The main disadvantage of the procedure of line1compared to the one of lines3-8is a computational one, i.e., the computation of the closed-form solution might be infeasible when the number of states/actions is large. 3.Whengamma= 1 the procedure of line1might lead to a singular matrix (attempting to invert it), whereas the procedure of lines3-8might never reach the requested tolerancetol.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 5 of ??Exercise 4 (2marks) Tell if the following statements about bias-variance trade-o are true or false. Motivate your answers. Consider a regression problem with input variablesx 1, x 2, and x 3, that are linearly independent. 1.In linear regression, if we replace variablex 1with x 1+ x 2, we do not change the bias of the model. 2.In linear regression, if we replace variablex 1with x2 1= 100, we might increase the variance of the model. 3.For an arbitrary model, if we remove variablex 2, we do not increase the variance of the model. 4.For an arbitrary model, if we add variablex2 3, we cannot increase the bias of the model. 1.TRUE. Indeed, since the variables are linearly independent, the linear models representablewithfx 1; x 2; x 3g are the same as the one representable withfx 1+ x 2; x 2; x 3g . 2.TRUE. It might be the case that the variance is larger, since the linear models representablewithfx 1; x 2; x 3g are not comparable with the ones representable withfx2 1= 100; x 2; x 3g . 3.TRUE. Indeed, regardless the used model, the space of representable models with a subsetof the variables, i.e.,fx 1; x 3g is contained in the space of representable models with all the variables. 4.TRUE. Indeed, adding variables enlarges (or, possibly, leaves unchanged) the space ofrepresentable models.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 6 of ??Exercise 5 (2marks) Tell if the following functions are valid kernels. Motivate your answers. Letx;y2Rd : 1.k 1( x;y) =xT y+xT 1+yT 1+d, where12Rd is the vector of all ones. 2.k 2( x;y) =xT y kxk2 3.k 3( x;y) =k 1(cos( x);cos(y))3 , where the cos() function is applied element-wise. 4.k 4( x;y) = exp (k 2( x;y) +k 2( y;x)) 1.TRUE. By de nition, selecting(x) =x+1, we havek 1( x;y) =(x)T (y). 2.FALSE.k 2( x;y) is not symmetric. 3.TRUE. Since we are considering the same transformation cos() applied to both arguments, then ()3 is a polynomial transformation with non-negative coecients, andk 1is a kernel. 4.TRUE. With simple computations, we obtain: k4( x;y) = exp (k 2( x;y) +k 2( y;x)) = exp 2xT y kxk2 kyk2  = exp kxyk2  that is the Gaussian kernel with2 =12 .Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 7 of ??Exercise 6 (2marks) A pool has the problem that many customers are not wearing bathing caps, which is against the rules. The currently implemented method to prevent such a behaviour is that the lifeguards are checking the pool and notify to the person not wearing the bathing cap that she/he is in violation of the pool rules. However, this task is boring and prone to errors (lifeguards are commonly distracted by other issues). You are required to design a new system to automatically detect such a violation. 1.Provide a solution to tell if at least one person is not wearing a bathing cap. De ne the datayou are required to collect, the type of problem you are solving and suggest a method that is suitable for the situation. 2.Provide a solution if, instead, you want to count the number of people not wearing a bathingcap. Does this problem requires new data? Is it a di erent problem? Are you required to use a di erent technique? 1.The solution would be to install a camera on the ceiling of the pool and use the images asinput for a classi er. This process would require the acquisition of a dataset of images with the corresponding labels (there is someone in violation of the pool rules/ everything is ne). A common way to solve classi cation problems involving images is to use CNN. 2.In this case we would require a di erent label for the images. We need to couple each imagewith the number of people not wearing a bathing cap. In this case we are solving a regression problem. Assuming we are able to extract a set of meaningful features from the images,we might resort to any regression method (such as Linear Regression). Otherwise, we can still use CNN to solve the regression problem directly from the images.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 8 of ??Exercise 7 (4marks) Consider an initial parameter for a Linear Regressionw(0) = [0 1 0]> and a loss function of the form: J(w) =12 NN X n=1( w> xn t n)2 ; whereNis the number of data. 1.Given a learning rate of = 0:2, derive the update given from the gradient descent algorithm to minimize the loss when it is applied to single data; 2.Apply sequentially the above rule on the following dataset (one at a time):x1= [1 3 2]> ; t1= 4 ;x 2= [1 1 1]> ; t2= 1 ;x 3= [5 10 5]> ; t3= 14 ; 3.Do we have any issue about using the gradient descend algorithm if we use the following lossfunction instead? J(w) =12 NN X n=1j w> xn t nj 1.The loss for a single data isJ(w) =12 ( w> xn t n)2 and the gradient of the loss becomes rJ(w) = (w> xn t n) x n. The update of the gradient descent methods is then: wk+1 w k rJ(w k) = w k 0:2(w> kx k t k) x k 2.w1=2 40 1 03 50:2(1)2 41 3 23 5=2 40 :2 1:6 0:43 5 w2=2 40 :2 1:6 0:43 50:2(1:2)2 41 1 13 5=2 4 0:04 1:36 0:163 5 w3=2 4 0:04 1:36 0:163 50:2(0:2)2 45 1053 5=2 4 0:24 0:96 0:043 5 3.Since the provided loss function has a gradient almost everywhere, we can apply such astrategy to nd the optimal solution even in this case. The only issue is if we have parameter vectors s.t.jw> xn t nj = 0.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 9 of ??Exercise 8 (4marks) Suppose we collect data for a group of customers of a car insurance company with variables agex 1, number of accidents occurred so far x 2and the probability to have a car accident in the next yeart. We t a logistic regression and produce estimated coecients:w 0= 6,w 1= 0 :05 andw 2= 0 :5. 1.Compute the probability that a 20-years old customer who did 4 accidents will have anew accident in the next year (provide the formula and substitute the values, leave the calculations). 2.How many accidents would a 30-years old customer need to have to have 50% chance ofgetting another one in the next year? 3.Do you think that the parameters learned by the model are meaningful w.r.t. the problem itaddresses? 4.Do you think that this model is missing some crucial information about the customer? Doyou need to retrain the model from scratch in the case you added a new variablex 3to it? 1.The probability is equal to: P(y= 1jx) =11 + e (w 0+ w 1x 1+ w 2x 2)=11 + e (6+200:05+40:5)=11 + e3 2.On the boundary of the Logistic Regression we have a 50% chance to get an accident in thenext year, therefore: w0+ w 1x 1+ w 2x 2= 0 ! 6 + 1:5 + 0:5x 2= 0 !x 2= 9 3.It seems to be correct that the more the accidents I had in the past the more the chance Iwill have a new accident. Conversely an increase of the age should decrease the chance of having an accident since drivers get skilled over time, reducing the chance of having a new accident. 4.It seems that the features included are meaningful w.r.t. the problem solved. Probably weare missing the information about the amount of time the customer had the insurance, since old customers have more accidents only because they have a longer period over which they had the possibility to damage their car. Since there is no easy way of decoupling the loss function w.r.t. a single feature, one has to retrain everything from scratch.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 10 of ??Student's name: Please go on to the next page. . .