Welcome

Our website is made possible by displaying online advertisements to our visitors.
Please disable your ad blocker to continue.

Computer Engineering - Machine Learning

Full exam

DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Politecnico di Milano Machine Learning (Code: 097683)January 10, 2023 Surname: Name: Student ID: Row: Column:Time:2hoursMaximumMarks:33 The following exam is composed of8 exercises(one per page). The rst page needs to be lled with yourname, surname and student ID. The following pages should be used only in the large squarespresent on each page. Any solution provided either outside these spaces orwithout a motivationwill not be considered for the nal mark. During this exam you arenot allowed to use electronic devices, such as laptops, smartphones, tablets and/or similar. As well, you are not allowed to bring with you any kind of note, book, written scheme, and/or similar. You are also not allowed to communicate with other students during the exam. The rst reported violation of the above-mentioned rules will be annotated on the exam and will be considered for the nal mark decision. The second reported violation of the above-mentioned rules will imply the immediate expulsion of the student from the exam room and theannulment of the exam. You are allowed to write the exam either with a pen (black or blue) or a pencil. It is your responsibility to provide a readable solution. We will not be held accountable for accidental partial or total cancellation of the exam. The exam can be written either inEnglishorItalian. You are allowed to withdraw from the exam at any time without any penalty. You are allowed to leave the room not early than half the time of the duration of the exam. You are not allowed to keep the text of the exam with you while leaving the room. Three of the points will be given on the basis on how quick you are in solving the exam.If you nish earlier than 30 min before the end of the exam you will get 3 points, if you nish earlier than 20 min you will get 2 points and if you nish earlier than 10 min you will get 1 point (the points cannot be accumulated). The box on Page 10 can only be used to complete the Exercises 7, and/or 8.Ex. 1Ex. 2Ex. 3Ex. 4Ex. 5Ex. 6Ex. 7Ex. 8TimeTot. / 7/ 7/ 2/ 2/ 2/ 2/ 4/ 4/ 3/ 33 Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 2 of 10 Exercise 1 (7marks) Describe and compare Q-learning and SARSA.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 3 of 10 Exercise 2 (7marks) What is theVC dimensionof a hypothesis space? What can it be used for?Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 4 of 10 Exercise 3 (2marks) Consider the following snippet of code: 1 c o e f = [ ] 2 mse = [ ] 3f o ra l p h ai n[ 0 . 0 0 1 , 0 . 0 1 , 0 . 1 , 0 . 2 , 0 . 5 ] : 4 m o d e l = l i n e a rm o d e l . L a s s o ( a l p h a=a l p h a ) 5 m o d e l . f i t ( x , y ) 6 mse . a p p e n d ( m e a ns q u a r e de r r o r ( y , m o d e l . p r e d i c t ( x ) ) ) 7 c o e f . a p p e n d ( m o d e l . c o e f) 8 i = np . a r g m i n ( mse ) 9p r i n t( i , mse [ i ] , c o e f [ i ] ) 1.Describe the procedure provided by the snippet above. Is it correct? If not, propose acorrection. 2.What is the purpose of the procedure above? Can you tell which property have the coecientscoefcomputed in the snippet above? 3.Can you list at least two other approaches that can be used for purposes similar to the oneprovided by the procedure above? Motivate your answers. 1.The above procedure is evaluating the regularization parameter for a LASSO model for theproblem of regression. It ts models with dierent values ofand computes the training RSS to decide which one is the most suitable for the given dataset. Clearly this procedure would give as a result that the model with the smallest parameter= 0:001 is the one to chose, since we are validating our possible models on the training set. A correct procedure would have to validate on independent data (for instance, using validation or crossvalidation). 2.The purpose of the procedure is to introduce regularization into a regression model to avoidovertting. The LASSO procedure produces a nal parameter vector which is sparse. Indeed in Line 5 the resulting printed parameter would have less and less non-zero elements as the value of() increases. 3.To avoid overtting one may apply other regularization techniques, like Ridge or modelselection techniques, like the backward/forward feature selection approach. Dierently from the LASSO we would have that the nal solution is not necessarily sparse with Ridge, while it might be with the wrapper approaches.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 5 of 10 Exercise 4 (2marks) Tell if the following statement about the Principal Component Analysis (PCA) procedure are true or false. Motivate your answers. 1.The set of the Principal Components vectors are providing an orthonormal base for theoriginal feature space. 2.Using as features for regression/classication problems the pro jection of the original featuresinto the principal components provided by the PCA reduces the phenomenon of overtting. 3.The percentage of the variance explained by a Principal Component is inversely proportionalto the value of the corresponding eigenvalue. 4.The procedure to apply PCA to a dataset is deterministic. 1.TRUE: The procedure for the PCA looks for the direction in the dataset which is providingthe most variance and extracts the (rst) Principal Component (PC) as the unit vector identifying that direction. Iteratively, it checks the direction, orthogonal to the previous one, with the most variance and extracts another (second) PC. The process iterates over a number of PC equal to the number of dimensions of the dataset. This produces an orthogonal base to the initial dataset. 2.FALSE: only selecting the rstKPC one has the chance to remove the noise from the data and avoid overtting. If we are keeping all the PC, we would have a linear transformation of the original dataset, which is likely to behave as good as the original one for the supervised task. 3.FALSE: The variance explained by each PC is directly proportional to the correspondingeigenvalue. Indeed, its formula is iP i i, where iis the eigenvalues corresponding to the i-th PC. 4.TRUE: The procedure includes a mean-scaling procedure and the inversion of the covariancematrix, which are both deterministic procedures.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 6 of 10 Exercise 5 (2marks) Tell if the following statement about the Gaussian Processes (GP) are true or false. Motivate your answers. 1.The main dierence between the Nadaraya-Watson Model and the GP is that the probabilisticmodel is dened over the joint input/output for the former model and only on the output for the latter model. 2.The GP model is a linear model w.r.t. the optimal parameter vector learned using a maximumlikelihood estimation procedure. 3.Assuming to have the inverse of the Gram matrix, the computational complexity of providinga new prediction is quadratic in the number of samples present in the training set. 4.The prediction provided by a GP with a linear kernel, i.e.,k(x; y) =x> y, corresponds to the one provided by a specic linear regression model. 1.TRUE: The Nadaraya-Watson Model imposes a prior onp(x; y), wherexare the input and yis the output, while GPs are only taking into account a probabilistic model on the target y. They both derive the value of the mean prediction starting from such priors. 2.FALSE: GPs are nonparametric models, and therefore it is not always possible (see answerto 5:4) to dene a parameter vector summarizing the GP model. Instead, the model is given by the training dataset itself. 3.TRUE: The inversion of the matrix would have costedO(n3 ), wherenis the number of samples present in the dataset. Instead, the formula to compute the mean prediction includes a vector/matrix multiplication and a vector/vector multiplication, which areO(n2 ) andO(n), respectively. 4.TRUE: The nal formula would bem(x) =x> X C t=x> , whereXis the matrix of the training inputs. Note that the value ofis entirely dependent on the dataset.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 7 of 10 Exercise 6 (2marks) You are an ML specialist working for an online shipper that accepts orders and delivers them relying on several trucks. You are facing the following problems: 1.Estimating the shipping time, based on the order characteristics (e.g., arrival time, weight,dimensions, pick up location, destination location). 2.Identifying which are the characteristics of the order that are irrelevant for accomplishingthe task described in the previous point. 3.Deciding the price to charge the client in order to maximize the long-term prot, accountingalso for the customer delization phenomenon. Model these three problems as ML problems, specifying the data (input, output, state, action, reward), the problem class, and a specic method to solve them. 1.This is a SUPERVISED LEARNING problem, specically, MULTIVARIATEREGRESSION. The input variables are the order characteristics (e.g., arrival time, weight, dimensions, pick up location, destination location) and the output variable is the shipping time (continuous variable). A possible method for solving it is linear regression. 2.This is a FEATURE SELECTION problem. The input variables are the order characteristics(e.g., arrival time, weight, dimensions, pick up location, destination location). The output variable might be missing in the case of unsupervised feature selection or corresponding to the shipping time for supervised feature selection. A possible method for solving it is a wrapper approach that uses linear regression ad inner algorithm and forward feature selection. 3.This is a REINFORCEMENT LEARNING problem. The state is represented by thehistorical information about customers and orders (from which deriving information about the delization) as well as the characteristics of the current order. The action is the price to be assigned to the current order. The reward is the prot (revenue - costs) related to the current order. It would have been inappropriate to consider the problem a MULTI- ARMED BANDIT because of the delization phenomenon that is such that our actions (prices) determine a modication of the environment state. A possible method for solving it is Q-learning.Student's name: Please go on to the next page. . . Computer Science Computer Science | Machine Learning Page 8 of 10 Exercise 7 (4marks) Consider the linear two-class SVM classier dened by the parametersw= [5;3] andb=3 and trained on alinearly separabledataset. Answer the following questions, providing adequate motivation. 1.How is the pointx 1= [2 ;3] classied according to the trained SVM? 2.Suppose that the true label ofx 1is t 1= +1. Is it possible that x 1was contained in the training set? What about the caset 1= 1? 3.Consider the set of points of the formx= [a;2]. For which values ofathe pointxlies on the margin? 1.Let us compute the expressionwT x1+ b= 52 + 3(3)3 =2