Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)June 20, 2023 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 The following exam is composed of8 exercises(one per page). The rst page needs to be lled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the nal mark. During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. The rst reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the nal mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. The exam can be written either inEnglishorItalian. You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. Three of the points will be given on the basis on how quick you are in solving the exam.If you nish earlier than 30 min before the end of the exam you will get 3 points, if you nish earlier than 20 min you will get 2 points and if you nish earlier than 10 min you will get 1 point (the points cannot be accumulated). The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 2 of 10 Exercise 1 (7marks) Describe the Bayesian Linear Regression approach and compare it to the least squares method.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 3 of 10 Exercise 2 (7marks) Illustrate the PAC bound for consistent learners (train error equal to zero) and its applications.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code: 1 x = z s c o r e ( d a t a s e t [ ' p e t a l=l e n g t h ' ] . v a l u e s ) . r e s h a p e (=1 , 1 ) 2 y = z s c o r e ( d a t a s e t [ ' c l a s s ' ] . v a l u e s ) 3 4 ns a m p l e s = l e n( x ) 5 P h i = np . o n e s ( ( ns a m p l e s , 2 ) ) 6 P h i [ : , 1 ] = x . f l a t t e n ( )# t h e s e c o n d c o l u m n i s t h e f e a t u r e 7 w = i n v ( P h i . T @ P h i ) @ ( P h i . T . d o t ( y ) ) 1.Describe the operations executed and the purpose (which problem and which method hasbeen used) of the snippet reported above. 2.Do you think the methods used above are appropriate for the considered problem? If not,propose an alternative approach. If yes, propose other approaches to solve the same problem. 3.Are there any issue intrinsic to Line 7? Describe the issue and propose a method to dealwith such a problem. 1.The snippet is applying OLS to a classication problem. 2.Logistic Regression or other classication methods will be more suited than OLS. 3.In line 7, the inversion of T is feasible only if the matrix is not singular. This problem can be avoided using ridge regression.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 5 of 10 Exercise 4 (2marks) The gure below was produced by changing a hyperparameter (on the x-axis) in a classier/regressor on a given training/test dataset.For each one of the following methods along with the description of how hyperparameters was changed (from left to right), tell if the picture could have been produced by it: 1.Ridge regression, increasing regularizationparameter. 2.k-nearest neighbor classier, decreasingnumber of neighborsk. 3.Logistic regressor, increasing the numberof principal components extracted from the dataset. 4.SVM, decreasing the number of featuresselected by a ltering method. 1.NO: Increasing the regularization would increase the training error, therefore the graphcannot be corresponding to this case. 2.YES: The more theKparameter is small the more it is prone to overt the original dataset. WithK= 1 all the data in the training set are perfectly classied. 3.YES: The more the number of features we are considering, the more the model il complexand prone to overtting. 4.NO: It would decrease the complexity of the model and therefore the training error shouldincrease.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 6 of 10 Exercise 5 (2marks) Tell if the following expressions are valid kernels, motivating your answer: 1.k(x; x0 ) =ak 1( x; x0 ) +bk 2( x0 ; x), wherek 1( ;) andk 2(( ;) are valid kernels 2.k(x; x0 ) =jjxjj2 2+ jjxjj 2jj x0 jj2 3.k(x; x0 ) =j xx0 jxx 0 + 2 for x; x0 2[1 1] 4.k(x; x0 ) =x> Ax0 withA= 3 1 14 andx; x0 2R2 The formation of the kernel should follow the Mercer's theorem conditions (continuous, symmetric and semidenite positive function). 1.As long asa; b >0 we are assured that the formation of the above kernel provides a valid kernel. 2.The kernel form is not symmetric (due to the rst term). 3.The kernel is not continuous inx= 0 for eachx0 2[1 1] 4.Since the matrixAis not semidenite positive (det(A) = 3(4)11 =13 and a learning rate set to= 1. You are given the following dataset: x1= (2 1)> t1= 1 x2= (1 2)> t2= 1 x3= (1 3)> t3= 1 x4= ( 3 2)> t4= 1 1.Compute the vectorwonce we run the gradient descent algorithm on the provided dataset once. 2.Do you think that the computedwis optimal for the given dataset? 3.Can you tell if the perceptron training procedure would converge eventually? 4.In general, given 4 distinct points with arbitrary labels in a 2D space, can they always becorrectly classied by a perceptron? 1.The update prescribed by the online update given by the gradient descent algorithm is: w( k+1) =w( k) rL(w) =w( k) +xt n if the prediction of the perceptron is wrong^ tn6 =t n. Therefore: (a)x 1correctly classied ( sign(x> 1w ) =sign((2 1)(1 0)> ) =sign(2) = 1 =t 1), no update. (b)x 2misclassied ( sign(x> 2w ) =sign((12)(1 0)> ) =sign(1) = 16 =t 2), therefore: w(1) = (1 0) + 1(12)(1) = (0 2): (c)x 3misclassied, therefore: w(2) = (0 2) + 1(1 3)(1) = (11): (d)x 4correctly classied, no update. 2.Since the rst sample is not yet correctly classied by the last vectorw(2) the convergence has not yet achieved. 3.Since the four points are not linearly separable (you should draw them to prove it), theprocedure will not converge. 4.NO, since the VC dimension of a linear classier in 2D is 3, in general 4 points cannot beshattered by a perceptron in 2D.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 9 of 10 Exercise 8 (4marks) Consider the following episode obtained by an agent interacting with an MDP having three statesS=fA; B; Cgand two actionsA=fl; rg, (A; l;1)!(C; l;2)!(A; r;0)!(B; r;5)!(B; l;0)!(C; l;0)!(B; l;2)!(A; r): Consider an initial state-action valuesQ(s; a) = 0 for every state-action pair, a learning rate = 0:5, and a discount factor = 1. Answer to the following questions providing adequate motivations. 1.Execute theQ-learningalgorithm on the given episode (break ties using the rst action listed in the action set). 2.Execute theSARSAalgorithm on the given episode. 3.Provide the best policy according to the output ofSARSA. Is it dierent from the one provided byQ-learning? 1.The formula fo Q-learning isQ(s; a) Q(s; a) +(R+ max a0 Q(s0 ; a0 )Q(s; a)), which in our case is: Q(A; l) = 0 + 0:5(1 + 100) = 1=2 Q(C; l) = 0 + 0:5(2 + 10:50) = 5=4 Q(A; r) = 0 + 0:5(0 + 100) = 0 Q(B; r) = 0 + 0:5(5 + 100) = 5=2 Q(B; l) = 0 + 0:5(0 + 15=40) = 5=8 Q(C; l) = 5=4 + 0:5(0 + 15=25=4)) = 15=8 Q(B; l) = 5=8 + 0:5(2 + 11=25=8) =7=16 2.The formula for SARSA isQ(s; a) Q(s; a) +(R+ Q(s0 ; a0 )Q(s; a)), wherea0 is the currently chosen action. In our case: Q(A; l) = 0 + 0:5(1 + 100) = 1=2 Q(C; l) = 0 + 0:5(2 + 100) = 1 Q(A; r) = 0 + 0:5(0 + 100) = 0 Q(B; r) = 0 + 0:5(5 + 100) = 5=2 Q(B; l) = 0 + 0:5(0 + 110) = 1=2 Q(C; l) = 1 + 0:5(0 + 11=21)) = 3=4 Q(B; l) = 1=2 + 0:5(2 + 101=2) =3=4 3.The policy prescribed by SARSA is to do the following action in the corresponding states:A!l,B!r, andC!l. Instead, Q-learning would suggestA!l,B!r, andC!l. Therefore, their suggestions would be the same.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 10 of 10 Student's name: End of exam