Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)July 24, 2024 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 •The following exam is composed of8 exercises(one per page). The first page needs to be filled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the final mark. •During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. •The first reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the final mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. •You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. •The exam can be written either inEnglishorItalian. •You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. •Three of the points will be given on the basis on how quick you are in solving the exam.If you finish earlier than 30 min before the end of the exam you will get 3 points, if you finish earlier than 20 min you will get 2 points and if you finish earlier than 10 min you will get 1 point (the points cannot be accumulated). •The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 2 of 10 Exercise 1 (7marks) Illustrate the Bias-Variance decomposition of the expected error of a regression model. Provide its complete derivation, the meaning of each term of the decomposition, and the practical significance of this decomposition.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 3 of 10 Exercise 2 (7marks) Explain what a Gram Matrix is and its role in Kernel Methods within Machine Learning.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code:1 S V M _ m o d e l _ 1 = s v m . S V C ( k e r n e l = " l i n e a r " )2 S V M _ m o d e l _ 1 . f i t ( X _ t r a i n , t _ t r a i n )3 p r e d i c t i o n _ 1 = S V M _ m o d e l _ 1 . p r e d i c t ( X _ t e s t )4 5 d e f c u s t o m _ k e r n e l ( X _ 1 , X _ 2 ) :6 A = n p . a r r a y ( [ [ - 0 . 5 , 0 ] , [ 0 , 2 ] ] )7 r e t u r n n p . d o t ( n p . d o t ( X _ 1 , A ) , X _ 2 . T )8 9 S V M _ m o d e l _ 2 = s v m . S V C ( k e r n e l = c u s t o m _ k e r n e l )10 S V M _ m o d e l _ 2 . f i t ( X _ t r a i n , t _ t r a i n )11 p r e d i c t i o n _ 2 = S V M _ m o d e l _ 2 . p r e d i c t ( X _ t e s t )12 13 p r i n t ( a c c u r a c y _ s c o r e ( t _ t e s t , p r e d i c t i o n _ 1 ) , a c c u r a c y _ s c o r e ( t _ t e s t , p r e d i c t i o n _ 2 ) ) 1.What kind of problem is this code trying to solve? What methods are being used? 2.What is being printed in line 13, and what could we conclude from this output? 3.Are there any problems with the solution implemented here (that are visible from this codefragment)? If so, propose a fix. 1.All we can say is that it’s a classification problem. The method that is used is support vectormachines, with two different kernels. 2.Line 13 prints the accuracy of the two models on a test set, so it shows which of the twokernels yields a more accurate classifier. 3.Line 6: this is not a valid kernel. It’s a kernel of the kindx⊤ AybutAis not positive semidefinite. A possible fix isA = np.array[[0.5, 0], [0, 2]].Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 5 of 10 Exercise 4 (2marks) Tell if the following statements about reinforcement learning are true or false. Motivate your answers. 1.When the model of the environment is known, it is never convenient to use sample-basedreinforcement learning methods; 2.SARSA and Q-learning are off-policy algorithms for control; 3.Q-learning can be considered the sample-based version of value iteration; 4.SARSA necessitates to play anϵ-greedy policy with a vanishingϵin order to converge to the optimal policy. 1.FALSE: We may use MC or TD learning (instead of dynamic programming) even for knownenvironments when the state space is too large or infinite; 2.FALSE: They are both algorithms for control, but SARSA is on-policy while Q-learning isoff-policy; 3.TRUE: Indeed, it updates the estimate of the optimal Q-function by resorting to a sample-based version of the optimal Bellman equation; 4.TRUE: Indeed, SARSA is on policy and, thus, it learns the Q-function of the policy it isplaying. For this reason, to converge to the optimal policy explorationϵmust vanish.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 6 of 10 Exercise 5 (2marks) Consider a regression problem in two variablesx 1, x 2with features ϕ 1( x 1, x 2) = x 1, ϕ 2( x 1, x 2) = x2, ϕ 3( x 1, x 2) = x 1x 2and a single output, solved with Ordinary Least Squares (OLS). Tell if the following statements are true or false. Motivate your answers. 1.The learned function is a hyperplane in input space. 2.OLS admits a closed-form solution. 3.The learned model has 3 parameters. 4.The overall significance of the model can be established with a single Student’s t-test. 1.False, because of the third feature that is nonlinear w.r.t. input variables. It is a hyperplanein feature space. 2.True, since the regressor is still linear w.r.t. weights. 3.False, the parameters are 4 due to the intercept. 4.False, a Fisher-Snedecor test is needed to establish the overall significance of the model. Asingle Student’s t-test can be used to establish the significance of a single weight.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 7 of 10 Exercise 6 (2marks) You are the CTO of a trading startup, and you are requested to address the following scenarios using machine learning techniques. For each scenario, classify it as a machine learning problem, specify the data (features and target or states, actions, and reward), and a specific method to solve it. 1.Predict the price of multiple stocks for the next day, given historical data of the price of thesame stocks in the past; 2.Decide the amount of money to invest in each stock, given the current price of the samestocks and a limited investment budget. 1.The scenario can be classified as a REGRESSION problem with multiple targets (the stocks).The features are the price of all the stocks in the pastndays (nto be tuned), and the targets are the price of the stocks in the next day. A method to solve it is, for instance, LINEAR REGRESSION. 2.The scenario can be classified as a REINFORCEMENT LEARNING problem, specificallyCONTROL. The state is the price of the stocks in the current day and the investment budget, the action is the amount of money to be invested in each stock, and the reward function is the profit, i.e., the sum over all the stocks of the invested amount times the difference in the price of the stocks in two consecutive days. A method to solve it (assuming to discretize states and actions) is Q-LEARNING.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 8 of 10 Exercise 7 (4marks) Consider a Multi-Armed Bandit (MAB) algorithm selecting amongK= 2 actions forn= 10 rounds. The reward distributions for armsa 1and a 2are Bernoullis with parameters p 1= 0 .6 and p2= 0 .4, respectively. The realization of arm pulls and rewards up to round 8 is the following (only the rewards in boldface are observed by the learner):1 2 3 4 5 6 7 8 9 10 Reward a 10 1 1 1101 0? ? Rewarda 21 0 0 1010 1 ? ? Selected arma 1a 2a 2a 2a 1a 2a 1a 1? ? 1.What is the (random) regret of the algorithm at the end of round 8? And the expectedpseudo-regret? 2.Suppose the algorithm was Thompson sampling, initialized with uniform priors. What arethe posterior distributions for the two arms at the end of round 8? What can we say on the probability of Thompson sampling selecting arma 1at round 9? You can assume ties are broken in favor of arma 1. 3.Instead, assume the algorithm was UCB1, and it also playeda 1at round 9 observing a reward of 1. What arm would it play next, at round 10? Hint:p0 .4 ln(10)1.07. Motivate all answers and report all necessary computations.1.The optimal arm isa 1, so the random regret is 8 X t=1( R t( a 1) −R t( A t)) = (0 −0) + (1−0) + (1−0) + (1−1) + (1−1) + (0−1) + (1−1) + (0−0) = 1, while the pseudo-regret isNt( a 2) ∗(E[R(a 1)] −E[R(a 2)]) = 4 ∗(0.6−0.4) = 0.8. 2.Since we observed two successes and two failures for both arms, the posterior is Beta(3,3) for botha 1and a 2. TS will then draw two independent samples θ 1, θ 2from this distribution, and pull arma 1if θ 1≥ θ 2. By symmetry, Pr( θ 1≥ θ 2) = Pr( θ 2> θ 1) and since they sum to one they must both be equal to 0.5. So TS will pull arma 1with probability 0 .5. 3.For arma 1we observed a total reward of 3 with 5 pulls, so the UCB index is: 35 +r2 ln(10) 5 < 1.56. For arm 2 we observed total reward 2 with 4 pulls so the UCB index is:24 +r2 ln(10) 4 > 1.57. UCB1 will playa 2given its larger index.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 9 of 10 Exercise 8 (4marks) Suppose you train a logistic regression classifier for binary classification with a training set composed of 1000 samples and using 29 features. The training classification error is 0.1, while the test classification error is 0.4.You know that in the true data process, the positive and negative classes are generated with equal probability.Answer the following questions, reporting all the steps and calculations: 1.Looking at the training classification error only, what is the maximum probability 1−δthat the trained classifier is better than a random guess classifier? 2.Looking at the test classification error only, what is the minimum number of samples in the test set to ensure that the trained classifier is better than a random guess classifier, with probability at least 1−δ= 1−e− 8 ? Hint. The VC-dimension of a linear classifier inddimensions isd+ 1. Recall the bounds: L ≤e L+slog 1δ 2 JL ≤ b L+sVC log 2eNVC + log4δ N (1) If needed, 30 log200 e3 ≈ 156. Since in the true data process, the positive and negative classes are generated with equal probability, the random guess classifier has a classification error equal to 0.5. 1.First of all, we recall that the VC-dimension of a logistic regression classifier (which is a linearclassifier) with 29 features is 29 + 1 = 30. We apply the VC-dimension bound to compute an upper bound on the true classification error. Then, we enforce that the upper bound is smaller than 0.5, solving for the probabilityδ: 0.1 +sVC log 2eNVC + log4δ N < 0.5 =⇒0.1 +s156 + log 4δ 1000 < 0.5 =⇒log4δ < 4 =⇒δ >4e 4. Thus, the maximum probability is 1−δ 400. Thus, we need at leastJ= 401 test samples.Student’s name: Please go on to the next page. . . Computer Science Computer Science — Machine Learning Page 10 of 10 Student’s name: End of exam