Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)June 24, 2024 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 •The following exam is composed of8 exercises(one per page). The first page needs to be filled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the final mark. •During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. •The first reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the final mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. •You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. •The exam can be written either inEnglishorItalian. •You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. •Three of the points will be given on the basis on how quick you are in solving the exam.If you finish earlier than 30 min before the end of the exam you will get 3 points, if you finish earlier than 20 min you will get 2 points and if you finish earlier than 10 min you will get 1 point (the points cannot be accumulated). •The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 2 of 10 Exercise 1 (7marks) Discuss at least three methods used for feature selection and compare their advantages and disadvantages.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 3 of 10 Exercise 2 (7marks) Explain the concept of bagging (Bootstrap Aggregating) in machine learning. Describe a scenario where bagging would be particularly beneficial.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code:1 X = d a t a s e t [ [ ’ s e p a l - l e n g t h ’ , ’ s e p a l - w i d t h ’ , ’ p e t a l - l e n g t h ’ ] ] . v a l u e s2 t = d a t a s e t [ ’ p e t a l - w i d t h ’ ] . v a l u e s3 4 N , M = X . s h a p e5 l a m b d a _ = 0 . 16 7 P h i = n p . h s t a c k ( ( n p . o n e s ( ( N , 1 ) ) , z s c o r e ( X ) ) )8 w 1 = i n v ( P h i . T @ P h i ) @ ( P h i . T @ t )9 w 2 = i n v ( P h i . T @ P h i - l a m b d a _ * n p . e y e ( M + 1 ) ) @ ( P h i . T @ t )10 w 3 = ( n p . a b s ( w 1 ) > 1 / l a m b d a _ ) * 1 / l a m b d a _ + ( n p . a b s ( w 1 ) < = 1 / l a m b d a _ ) * w 1 1.Describe what the code is doing from lines 1 to 9. What kind of machine learning problemis it attempting to solve? What machine learning techniques are being used? 2.Tell if lines 7-9 are correct. If not, identify the mistakes and propose a correction. 3.What happens when increasing the value oflambdato w2andw3? Describe the role of lambdain terms of bias-variance trade-off. 1.From line 1 to line 9, the code is loading the Iris dataset using three variables as input andthe fourth one as target. Then, it constructed a design matrixPhiincluding the intercept term and performing the normalization withzscorefor the other variables. Then, OLS is applied to computew1while Ridge regression is used for computingw2, with regularization parameterlambda. The code is attempting to solve a regression problem. The techniques are OLS (ordinary least squared), i.e., linear regression without regularization and linear regression with Ridge regularization. 2.Line 9 is not correct because of the minus sign in the application of the regularization. Itshould be replaced withw2 = inv(Phi.T @ Phi + lambda* np.eye(M + 1)) @ (Phi.T @ t). 3.When increasing the value oflambdawe are enforcing more regularization. This is straightforward for Ridge regression when computingw2. It can be seen that forw3, instead, 1/lambdadefined a clipping threshold, so that if the weight obtained by OLS is above 1/lambdain absolute value, it is clipped to 1/lambda. Thus, the threshold approaches zero aslambdaincreases. In term of bias-variance trade-off, increasing lambdareduces the variance but increases the bias of the model.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 5 of 10 Exercise 4 (2marks) Tell if the following statements about Markov Decision Processes (MDPs) are true or false. Motivate your answers. 1.If the MDP has stochastic rewards, it may not admit a deterministic optimal policy. 2.V∗ (s)≥Q∗ (s, a) for all statessand actionsa, whereV∗ is the optimal value function and Q∗ is the optimal state-action value function. 3.A larger discount factor assigns more importance to the long-term effects of actions. 4.Value iteration is guaranteed to converge to the optimal value function in a finite amount ofsteps. 1.False. Every MDP admits a deterministic optimal policy. Stochastic rewards do not matter,since the optimal policy can just be defined in terms of theexpectedrewards of state-action pairs. 2.True, sinceV∗ (s) = max aQ∗ (s, a). 3.True, provided some actions do have long-term effects, since a larger discount factor givesmore weight to rewards further in the future. 4.False, value iteration is only guaranteed to converge to the optimal value functionasymptotically.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 6 of 10 Exercise 5 (2marks) Consider the parametric classification modelf w( x) = sign Pd i=0w ixi , wherew= [w 0, . . . , w d]⊤ andd∈Nthat is trained on a datasetD={(x i, t i) }N i=1made of Nsamples. Tell if the following statements are true or false. Motivate your answers. 1.KeepingNfixed and increasingd, the variance of the model reduces. 2.Keepingdfixed and increasingN, the bias of the model reduces. 3.Increasing bothNandd, the variance of model increases. 4.Decreasing bothNandd, the bias of model increases. 1.FALSE: By increasing the degreedof the polynomial, we are considering enlarging hypothesis spaces, and, therefore, the variance does not reduce. 2.FALSE: By increasing the number of samplesN, we are not affecting the bias that remains fixed since the degreeddoes not change. 3.FALSE: By increasing the number of samplesNwith a fixed degreed, we reduce the variance, but, by increasingdwith a fixedN, we increase the variance. Therefore, we can conclude nothing about the variance, in this case. 4.TRUE: The number of samplesNdoes not affect the bias, but by reducing the degreed, we are considering simpler models and, therefore, we are increasing the bias.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 7 of 10 Exercise 6 (2marks) Consider the following real-world problem. An online bookstore wants to recommend books to its potential clients. Whenever a user enters the homepage of the website, they are immediately shown a single book from the bookstore’s catalog. We want to display the book that maximizes the probability that the user will end up buying it. For each of the following situations, briefly discuss how the problem could be modeled and solved using the techniques seen in the class: 1.The bookstore does not have any previous purchase data, nor any information about thecurrent visitor. 2.The bookstore has access to detailed information (such as age, location...) and previouspurchases of registered users. 1.This is a hard problem, but we can still model it as a multi-armed bandit where the arms arethe books, and try to recommend to all users the single book that has the largest probability of being bought overall. We could solve this with UCB or Thompson sampling. 2.The purchase history of each registered user can be seen as a Markov decision process. Thestate can be designed to summarize the user’s data and purchase history, and the actions are the books to recommend at each visit of the same user. To solve the problem, we can use reinforcement learning control methods such as SARSA or Q-learning, deploying a separate instance for each registered user. If we already have a large amount of purchase data, we could also use an off-policy (offline) RL approach. Alternatively, we can also use multi-class classification with features based on the user’s data and history, and the classes being the books from the catalog.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 8 of 10 Exercise 7 (4marks) Consider a binary classifier, defined in terms of the thresholdι∈[0,1]: yw( x) =( 1 ifσ(w⊤ x)≥ι 0 otherwise, (1) whereσ(·) is the sigmoid function. The weightsware trained with Logistic Regression on a training set made ofN= 100 samples and the threshold is set toι= 1/2. 1.Supposes that Accuracy = 0.4, Recall = 0.5, and that the positive samples correctly classifiedare 20, draw the confusion matrix. 2.Compute the F1 score. 3.Suppose we move the thresholdιto the value 1/4, and we discover that the number of samples wrongly classified as positive does not change. What can we conclude about the Precision? Provide a formal justification. 1.Knowing that Accuracy = 0.4 and Recall = 0.5, we deduce: Accuracy =T P +T NN = 0 .4 =⇒T P+T N= 40,(2) Recall =T PT P +F N= 0 .5 =⇒T P=F N .(3) Thus, exploiting thatT P= 20, we have the system:     T P +T N= 40 T P=F N T P= 20= ⇒    T P = 20 T N= 20 F N= 20. (4) From which,F P=N−T P−T N−F N= 40. Thus, the confusion matrix becomes:Actual Class: 1Actual Class: 0 Predicted Class: 12040 Predicted Class: 22020 2.We first compute the Precision: Precision =T PT P +F P= 2020 + 40 = 13 . (5) The F1 score is the harmonic mean between Precision and Recall:F=21 Precision +1Recall = 23 + 2 = 25 . (6) 3.By decreasing the threshold toι= 1/4, there will be more samples classified as positive. In particular, T P′ ≥T P. Since the text says thatF P′ =F P, we have: Precision′ =T P ′T P ′ +F P′=11 + F PT P ′≥ 11 + F PT P = T PT P +F P= Precision .(7) Thus, the precision increases.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 9 of 10 Exercise 8 (4marks) Consider a linear binary SVM classifier that, after training, has weightsw= [−1,2] and bias b=−3. Motivate all answers and write down any necessary computation. 1.How is the point [0.5,1] classified? 2.Provide an example of a support vector. 3.Suppose the point [1,1] is added to the dataset, with a negative label. Do we need to retrainthe SVM? 4.Can we discard the point [−3,2] from the dataset if all we care about is predicting the class of new points? 1.f([0.5,−1]) =−1∗0.5 + 2∗1−3 =−1.5