Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)July 14, 2022 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 The following exam is composed of8 exercises(one per page). The first page needs to be filled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the final mark. During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. The first reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the final mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. The exam can be written either inEnglishorItalian. You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. Three of the points will be given on the basis on how quick you are in solving the exam.If you finish earlier than 30 min before the end of the exam you will get 3 points, if you finish earlier than 20 min you will get 2 points and if you finish earlier than 10 min you will get 1 point (the points cannot be accumulated). The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 2 of 10 Exercise 1 (7marks) Provide an overview of thefeature selectionmethods that you know and explain their pros and cons.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 3 of 10 Exercise 2 (7marks) Describe theperceptronmodel and how it is trained.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code: 1f o rti n range( 1 , T+ 1 ) : 2 p u l l e d = np . a r g w h e r e ( c r i t e r i o n == c r i t e r i o n .max( ) ) . r e s h a p e (=1 ) 3 r e w a r d = r e w ( p u l l e d ) 4 5 np u l l s [ p u l l e d ] = np u l l s [ p u l l e d ] + 1 6 e x pp a y o f f s [ p u l l e d ] = ( ( e x pp a y o f f s [ p u l l e d ] * 7 ( np u l l s [ p u l l e d ] =1 . 0 ) + r e w a r d ) / np u l l s [ p u l l e d ] ) 8f o rki n range( 0 , no p t i o n s ) 9 c r i t e r i o n [ k ] = e x pp a y o f f s [ k ] + np . s q r t ( 2 *t / np u l l s [ k ] ) 1.Describe the procedure (what algorithm it is implementing) and the purpose (which kind ofproblem it is solving) of the above snippet of code. 2.Is it correct? In the case the algorithm is not correct, propose a modification to fix theproblem. In the case the algorithm is correct, state the theoretical guarantees of such an algorithm. 3.Are there other available methods to solve this problem? Are they requiring some specificassumptions to be applied to the same setting the algorithm in the snippet is used? 1.The code is implementing an Upper Confidence Bound based algorithm to solve the stochasticMAB problem. It uses thecriterionvariable (i.e., the upper confidence bound) to select the arm (Line 2), and, after observing the reward, it updates the number of pulls of the specific arm (Line 5), the expected payoff (Line 6) and, finally, it computes the bound (Line 9). This process is repeated over a time horizon ofTrounds. 2.The algorithm is not correct since the numerator of the term under the square root shouldbe2 * np.log(t). Instead, this version of the bound would increase that term too much over the execution of the algorithm. 3.A viable option for the stochastic MAB problem is the Thompson Sampling algorithm, whichadopts a Bayesian approach to solve the MAB scenario. The version of the algorithm we have been shown during the lectures requires that the reward are extracted from a Bernoulli distribution, otherwise the prior-posterior conjugate trick would not apply. Moreover, the other strong assumption is that the environment is stationary over time, otherwise the stochastic MAB techniques would not apply. Conversely, in that case one might use the EXP3 algorithm instead.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 5 of 10 Exercise 4 (2marks) Indicates whether the following statements about Linear Regression are true or false. Motivate your answers. 1.Ordinary Least Squares is more prone to overfitting than Lasso. 2.Ridge regression performs an implicit feature selection procedure, leading to sparse solutions. 3.Lasso admits a closed-form solution of the optimal weights. 4.Both Ridge regression and Lasso introduce a bias (from a bias-variance trade-off perspective)compared to Ordinary Least Squares. 1.TRUE. Ordinary Least Squares does not employ any regularization and, therefore, is moreprone to overfitting than Lasso regression that employs a regularization on the squaredL 1- norm of the weights∥w∥ 1. 2.FALSE. Although Ridge regression employs a regularization, such a regularization is notintended to perform a feature selection and the resulting solution is not, in general, sparse. Indeed, such a properly is enforced by Lasso. 3.FALSE. Lasso employs a regularization based on theL 1-norm of the weights ∥w∥ 1. This makes the resulting ob jective function non differentiable and prevents from obtaining a closed-form solution. Nevertheless, the ob jective function remains convex and can be easily optimized numerically. 4.TRUE. They both make use of regularization techniques compared to Ordinary LeastSquares. As such, from a bias-variance trade-off perspective, they increase the bias with a possible reduction of the variance.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 6 of 10 Exercise 5 (2marks) Indicate whether the following statements about Monte Carlo and Temporal Difference are true or false. Motivate your answers. 1.If I am trying to evaluate a policy on a small number of interactions, it is generally better touse TD rather than MC methods. 2.MC evaluation would be a reasonable choice for providing an online estimation of a policyon a stream of data. 3.The MC every-visit evaluation suffers a smaller bias in comparison to MC first-visit evaluation. 4.The TD evaluation method is consistent, which means that it will surely converge to theoptimal value function if sufficient data is available. 1.TRUE. TD difference evaluation generally suffers a smaller variance in comparison to MCmethods. If I can only access a few interactions, the variance would likely be the main issue for the value estimation. 2.FALSE. MC evaluation requires full episodes to provide an estimate of the value function,which is not a good fit for an online setting. Instead, one could use TD that employs bootstrapping to provide an online estimate of the value function. 3.FALSE. MC first-visit is unbiased but suffers from a large variance, whereas MC every-visitis slightly biased but it usually reduce the variance of the estimation. 4.FALSE. TD is a consistent policy evaluation method. This means that it will converge tothe exact evaluation of the policy taking the data, which can be different from the optimal policy.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 7 of 10 Exercise 6 (2marks) This summer, the problem of water shortages is becoming more and more crucial in Italy as well. You are given a physical model of the water distribution network, which is able to simulate the evolution of the water reservoirs over time, providing the consumption of civil and industrial entities and the amount of water generated by rivers and other water sources. 1.Do you think the same prediction can be provided by means of ML models? Provide amodelization of the problem, the possible data you would need, and the methods you would use to solve the problem. 2.Provide a reason for preferring the developed ML model over the physical one. 3.Would you use a different ML approach based on the amount of data available?1.This scenario can be modeled as a regression problem, where the water sources and consumerare given as input and the output is the water available in the different reservoirs. In this case, a solution might be to used Linear Regression or Gaussian Processes. 2.A motivation for which one might apply a ML model instead of a physical one is thatthe physical model is too simple and does not capture the complexity of the phenomenon. Another reason is that we have some uncertainty on the water distribution system that the physical model cannot include, e.g., leaks over the network. 3.One one hand, the more data is available the more complex model (e.g., computing additionalfeatures with basis functions) could be employed without fears of overfitting data. On the other hand, a large dataset could make some approaches computationally unfeasible, as Gaussian Processes, requiring to use methods that allow for inceremental training, as Linear Regression.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 8 of 10 Exercise 7 (4marks) Consider a linear, hard-margin, two-class SVM classifier with parameters w= [2,−1], b= 1. Answer the following questions providing adequate motivations.1.How is the pointp 1= [2 ,1] classified according to the trained SVM? 2.Is the pointp 2= [0 ,2] a support vector of the given SVM? 3.Provide a point that falls on the boundary of the given SVM. 4.Let us suppose we collect new data that make the classification task not linearly separable.How would you change the SVM model to deal with the task? 1.We havew⊤ ·p 1+ b= [2,−1]·[2,1]⊤ + 1 = 4−1 + 1 = 4≥1 thus the point is classified as positive. 2.We havew⊤ ·p 1+ b= [2,−1]·[0,2]⊤ + 1 = 0−2 + 1 =−1 thus the point falls on the negative margin and it is a support vector of the SVM. 3.We look for a pointxsuch thatw⊤ ·x+b= 0. We can takex 2= 0 and then compute [2,−1]·[x 1, 0]⊤ + 1 = 0→x 1= −1/2. 4.We could either consider a soft-margin SVM or a different kernel function. In any case weshould retrain the model to update the parameters accordingly.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 9 of 10 Exercise 8 (4marks) Consider a binary classifier trained on a dataset made ofN= 100 samples. 1.Suppose that the Precision = 0.25 and the F1 = 0.4, compute the Recall. 2.Knowing, in addition, that the Accuracy = 0.85, compute the full confusion matrix. 3.In which circumstances the Accuracy is not a reliable index to assess the quality of the trainedmodel? 1.First of all, we compute the Recall from the F1 definition: F1 =2 ·Precision·RecallPrecision + Recall = ⇒Recall =F 1·Precision2 ·Precision−F1= 0 .4·0.252 ·0.25−0.4= 0 .10 .1= 1 . 2.To get the confusion matrix, we simply apply the definition of Accuracy, Precision, andRecall, keeping in mind that the total number of samples isN= 100: Accuracy =T P +T NT P +T N+F P+F N= T P +T N100 = 0 .85 =⇒T P+T N= 85 Recall =T PT P +F N= 1 = ⇒F N= 0 Precision =T PT P +F P= 0 .25 =⇒(1−0.25)T P= 0.25F P=⇒3T P=F P Now, usingT P+T N+F P+F N= 100, recalling thatT P+T N= 85 andF N= 0, we getF P= 15. Using 3T P=F P, we getT P= 5. Finally, usingT P+T N= 85, we get T N= 80. Thus, the confusion matrix is given by: Actual Class: 1 Actual Class: 0 Predicted Class: 1 5 15 Predicted Class: 0 0 80 3.Accuracy is not a reliable index of the quality of the trained model mainly in two scenarios: (i)when the dataset is unbalanced; (ii) when the importance of wrongly predicting positive-class samples is different from wrongly predicting negative-class samples.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 10 of 10 Student’s name: End of exam