Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)February 1, 2023 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 The following exam is composed of8 exercises(one per page). The rst page needs to be lled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the nal mark. During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. The rst reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the nal mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. The exam can be written either inEnglishorItalian. You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. Three of the points will be given on the basis on how quick you are in solving the exam.If you nish earlier than 30 min before the end of the exam you will get 3 points, if you nish earlier than 20 min you will get 2 points and if you nish earlier than 10 min you will get 1 point (the points cannot be accumulated). The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 2 of 10 Exercise 1 (7marks) Illustrate the AdaBoost algorithm, explain its purpose and in which case it is useful.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 3 of 10 Exercise 2 (7marks) Describe and compare Ridge regression and Lasso regression.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code: 1 X = z s c o r e ( d a t a s e t [ [ ' s e p a l=l e n g t h ' , ' s e p a l=w i d t h ' ] ] . v a l u e s ) 2 t = d a t a s e t [ ' c l a s s ' ] . v a l u e s == ' I r i s=s e t o s a ' 3 X , t = s h u f f l e ( X , t , r a n d o ms t a t e =0) 4 5 w = np . o n e s ( 3 ) 6f o ri , ( xi , ti ) i n enumerate(z i p( X , t ) ) : 7 e x tx = np . c o n c a t e n a t e ( [ np . o n e s ( 1 ) , xi . f l a t t e n ( ) ] ) 8i fnp . s i g n (w . d o t ( e x tx ) ) != ti : 9 w = w + e x tx *ti 1.Describe the procedure presented above. What is the purpose of such a procedure? Whichproblem it solves? 2.Tell if the procedure above is correct and, in the case it is not, propose a modication to xit. 3.Do you think that Line 3 is fundamental for this procedure? Can you describe an examplewhere it can be removed and motivate why it is not useful in such a case? 1.The snippet is applying the perceptron online learning algorithm to the iris dataset. At rst,it standardize the data and shue the samples. Then it initialize the weight vectorw= [111] and apply the gradient descent update over each of the samples in the training set. It solves a binary classication problem. 2.The procedure should be repeated multiple times over the dataset. In this case, a singleiteration over the dataset does not guarantee that the algorithm converges to a meaningful solution, even if the dataset is linearly separable. We should add a for loop to perform the procedure multiple times over the dataset, until the solutionwdoes not change for an entire epoch (an iteration over the entire dataset) or a maximum number of iterations is reached. In addition, the encoding of the target variable (line 2) is not correct. The target variable should be encoded as1 and 1 instead of 0 and 1. 3.Here it is important to shue the data since it might occur that the samples are ordered(e.g., the ones of the positive class are all in the rst positions of the dataset), which might slow down the optimization procedure. The same holds if we want to apply a train/validation approach. Instead, if we are using a closed form solution, like the one for linear regression, the shuing procedure is not mandatory.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 5 of 10 Exercise 4 (2marks) Tell if the following statements about SVM are true or false. Motivate your answers. 1.There exists a closed form solution to provide the optimal weights of the SVM. 2.There exists a unique solution for the problem of optimizing the weights of the SVM. 3.Since they have the same activation function, it is equivalent to train an SVM and aperceptron, if we have a linearly separable dataset. 4.The boundary of a nonlinear kernel SVM is linear in a specic high-dimensional feature space. 1.FALSE The problem of nding the optimal parameter requires to transform the primaloptimization problem into the dual one and apply an iterative procedure (i.e., SMO) to nd the optimal weights. 2.TRUE The ob jective function of the SVM is convex, therefore it has a unique minimum. 3.FALSE Even if they have thesign() activation function, they are optimizing dierent ob jectives. The perceptron is minimizing the misclassication error, while the SVM is maximizing the intraclass margin. This holds also for linearly separable datasets. 4.TRUE The idea of using a kernel is based on the idea that expanding the features into acomplex space it is more likely the data can be all correctly classied by a straight boundary (i.e., a hyperplane).Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 6 of 10 Exercise 5 (2marks) You are given with the following class of regression modelsM over the variables fx 1; x 2g and an additional feature (x 1; x 2): M : y w( x 1; x 2) = w 0+ w 1x 1+ w 2x2 2+ w 3 (x 1; x 2) ;w2R4 : Consider three possible choices of feature : 1( x 1; x 2) = 1 ; 2( x 1; x 2) = x 1 3x2 2; 3( x 1; x 2) = x 1x 2: Tell if the following statements are true or false. Motivate your answers.1.ModelsM 1and M 2have the same bias. 2.The bias of modelM 1is larger than the bias of model M 3. 3.The VC dimension of modelM 2is larger than the VC dimension of model M 3. 4.The VC dimension of modelM 1+ 2is larger than the VC dimension of model M 3. 1.TRUE. Indeed, they are the very same model after a renaming of the weights: M 1: y w( x 1; x 2) = w 0+ w 1x 1+ w 2x2 2+ w 3 =w 0+ w 3 |{z} w(1) 0+ w 1 |{z} w(1) 1x 1+ w 2 |{z} w(1) 2x 2 2; w(1) 2R3 ; M 2: y w( x 1; x 2) = w 0+ w 1x 1+ w 2x2 2+ w 3( x 1 3x2 2) =w 0 |{z} w(2) 0+ ( w 1+ w 3) |{z} w(2) 1x 1+ ( w 2 3w 3) |{z} w(2) 2x 2 2; w(2) 2R3 : 2.TRUE. Indeed, modelsM 3are a super-set of models M 1. Formally: M 1: y w( x 1; x 2) = w 0+ w 1x 1+ w 2x2 2+ w 3; M 3: y w( x 1; x 2) = w0 0+ w0 1x 1+ w0 2x2 2+ w0 3x 1x 2: To getM 1from M 3, simply set w0 0= w 0+ w 3, w0 1= w 1, w0 2= w 2, and w0 3= 0. 3.FALSE. For the same reason as the previous question, recalling that modelsM 2are the same as modelsM 1. 4.FALSE. Indeed, modelsM 1+ 2are the same as models M 1(or models M 2).Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 7 of 10 Exercise 6 (2marks) We are given the problem of evaluating the time and materials necessary for the construction of a building. Even if in principle all the quantities are known, the fact that there might be some delays in the construction site and the fact that some of the materials might be lost during transportation, allow us to consider all the quantities that are involved in the process as random variables. Let us assume to have a dataset of previous construction building, where they describe the entire process, also including the intermediate steps, e.g., design, foundation building, structure building, furnishing of the apartments. 1.Model the problem of evaluating the time and materials used for a new construction. 2.Model the problem of selecting the best way of performing the construction works to minimizethe time and materials required for the construction. In both cases, specify the problem, feature, solution and all the relevant information to solve such task using ML-based approaches. 1.Multiple regression problem: we can use as input the characteristics of the building ground,the one about the location we are planning to built to predict the amount of materials and time required (output) to nish the building. A method to solve such a problem is to use multiple linear regression. Notice that, with this approach, we are only considering the process as a whole, while we are not taking into account the intermediate steps of the process. 2.Reinforcement Learning problem: given the actions required to nish a building, select theones which are the most fruitful from a time and economic point of view. The state consists in the current state of the building, the actions are the decision one may take to advance the building construction, the reward is either time or costs incurred by the specic action, the transition are stochastic, the discount factor might be set to 1 since we are assuming to have a nite horizon for the process. We can use either SARSA or Q-learning to nd the optimal policy. If we want to make use of the dataset we need to use an oine opolicy approach or an oine approach with importance sampling.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 8 of 10 Exercise 7 (4marks) Consider the set of tra jectories collected with a policyin a Markov decision process with state spaceS=fA; B; Cg, whereCis an absorbing zero-reward state, and action spaceA=fx; yg: (A; x;2)!(B; y;3)!(A; x;1)!(C) (B; x;1)!(B; y;3)!(C) (A; y;6)!(C) 1.Execute rst-visit Monte Carlo for estimating the value functionV (s) for all statess2 S, keeping the discount factor parametric. 2.Execute TD(0)on the rst tra jectory onlyfor estimating the value functionV for all statess2 S, keeping the discount factor parametric, starting with a zero initialization of the value functions and with learning rate= 1. 3.Can the provided tra jectories be generated by a deterministic policy? Justify your answer. 1.Let us apply rst-visit Monte Carlo: V (C) = 0 (absorbing zero-reward state) V (B) =12 ((3 + ) + (13 )) = 2 V (A) =12 (2 + 3 + 2 ) + 6 = 2 +32 +12 2 : 2.Let us apply TD(0) starting with a zero initialization of the value functions and with learningrate= 1: (A; x;2)!(B; y;3) =)V (A) R+ V (B) =2 + 0 =2 (B; y;3)!(A; x;1) =)V (B) R+ V (A) = 3 + (2) = 32 (A; x;1)!(C) =)V (A) R+ V (C) = 1 + 0 = 1 (C) =)V (C) 0 (absorbing zero-reward state) 3.No, because, for instance, in stateAboth actionsxandyare played at least once.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 9 of 10 Exercise 8 (4marks) Consider a binary classier, dened in terms of the threshold: y(x) =( 1 ifwT x 0 otherwise; where the weightswwere trained on a dataset made ofN= 100 samples and= 0. 1.Suppose that Accuracy = Precision = 0:5 and that the negative samples correctly classied are 10, draw the confusion matrix. 2.Compute the Recall. 3.Suppose that we move the thresholdto the value 0:2. How will the Recall change?. Provide a formal justication to your answer (Suggestion.Functionxx +yfor x; y0 is increasing inxand decreasing iny) 1.Knowing that Accuracy = Precision = 0:5, we deduce: Accuracy =T P +T NT P +T N+F P+F N= T P +T N100 = 0 :5 =)T P+T N= 50; Precision =T PT P +F P= 0 :5 =)(10:5)T P= 0:5F P=)T P=F P: Thus, recalling thatT N= 10, we have the system: 8 > < > :T P +T N= 50 T P=F P T N= 10= )8 > < > :T P = 40 F P= 40 T N= 10: From which,F N=NT PF PT N= 10. Thus, the confusion matrix becomes: Actual Class: 1 Actual Class: 0 Predicted Class: 1 40 40 Predicted Class: 0 10 10 2.Using the confusion matrix, we ca compute the Recall:Recall =T PT P +F N= 4040 + 10 = 0 :8: 3.If we increase(it does not matter the actual value 0:2), we will have that some points previously classied as positive, will be classied as negative. Which means, formally:T P0 T P,F P0 F P,F N0 F NandT N0 T N, having denoted with0 the quantities after the change of. The recall will become: Recall0 =T P 0T P 0 +F N0T PT P +F N= Recall ; having applied the suggestion withx=T P0 andy=F N0 and observing thatT P0 T P and the function is increasing inxandF N0 F Nand the function is decreasing iny.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 10 of 10 Student's name: End of exam