Question 1 :
Various ____ methods and techniques are used for calculation of the outliers.
- distance calculation
- prediction
- optimization
- integration
Question 2 :
Which of the following is a disadvantage of decision trees?
- Decision trees require less preprocessing.
- Decision trees are robust to outliers.
- Decision trees are prone to be overfit.
- Decision tree traces all possible alternatives.
Question 3 :
Which of the following techniques would perform worst for reducing dimensions of a data set?
- Removing columns which have high variance in data
- Removing columns which have too many missing values
- Removing columns with redundant data
- Removing columns with similar data trends
Question 4 :
Below are the 8 actual values of the target variable in the train file.[0,0,0,1,1,1,1,1]What is the entropy of the target variable?
- -(5/8 log(5/8) + 3/8 log(3/8))
- 5/8 log(5/8) + 3/8 log(3/8)
- 3/8 log(5/8) + 5/8 log(3/8)
- 5/8 log(3/8) – 3/8 log(5/8)
Question 5 :
Predicting on whether it will rain or not tomorrow evening at a particular time is a type of _________ problem.
- Clustering
- Regression
- Unsupervised learning
- Supervised learning
Question 6 :
Which of the following problems can be solved by supervised learning too? Assume appropriate dataset is available.
- From a large collection of spam emails, discover if there are sub types of spam emails.
- Given data on how 1000 medical patients respond to an experimental medicine , discover whether there are different categories of patients in terms of how they respond to , and if so what are these categories
- Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different groups of patients for which customised treatment is required
- Given genetic (DNA) data from a person, predict the odds of the person developing diabetes over the next 10 years
Question 7 :
You ran gradient descent for 20 iterations with learning rate=0.2 and compute cost for each iteration. You observe that cost decreases after each iteration. Based on this which conclusion is more suitable.
- 0.2 is an effective choice of learning rate.
- Try larger values of learning rate like 1.
- 0.2 is not an effective choice of learning rate.
- The model is overfitting.
Question 8 :
Support Vector Machine(SVM) performs well in _____ dimension spaces.
- high
- low
- wide
- single
Question 9 :
K-fold cross-validation is____.
- linear in K
- quadratic in K
- cubic in K
- exponential in K
Question 10 :
Why SVM’s are more accurate than logistic regression?
- SVM gives more weightage to wrongly classified data points.
- SVM gives more weightages to data points which are correctively classified .
- SVM uses all the data points assuming a probabilistic model.
- SVM uses concept of large margin seperator and for non linearity it uses kernel functions
Question 11 :
What is the approach of the basic algorithm for decision tree induction?
- Greedy
- Bottom up
- Procedural
- Step by Step
Question 12 :
Which of the following is not supervised learning algorithm
- PCA
- Decision Tree
- Bayes Theorem
- Linear regression
Question 13 :
While comparing reinforcement learning and supervised learning, which of the following statement is true?
- Both in reinforcement and supervised learning decisions are taken sequentially
- Supervised learning is best suited where human interaction is prevalant wheareas reinforcement learning is best suited for sofware systems.
- Reinforcement learning works by interacting with environment wheareas supervised learning works on sample data
- Both in reinforcement and supervised learning decisions taken at one time step is independent with respect to previous timestep.
Question 14 :
For a trained logistic classifer given a sample x,it gives prediction as 0.8.This means that___.
- P(Y=0|x)=0.8
- P(Y=1|x)=0.8
- P(Y=0|x)=0.2
- P(Y=1|x)=0.2
Question 15 :
Which algorithm is State Transition Based Algorithm?
- K-Nearest neighbor
- Hidden markov model
- Bayes theorem
- Linear regression
Question 16 :
Principal component analysis(PCA) is used for___.
- Dimensionality Enhancement
- LU Decomposition
- QR Decomposition
- Dimensionality Reduction
Question 17 :
What is true about the discount factor in reinforcement learning?
- discount factor should be greater than 1
- discount factor should always be negative
- discount factor should be in range of 0 and 1
- discount factor can be any real number
Question 18 :
What are support vectors?
- These are the datapoints which help the SVM to generate optimal hyperplane.
- It is an intermediate vector generated during calculation of optimal hyperplane
- In SVM all the data points are called support vectors.
- This are predefined vectors used in calculating hyperplane
Question 19 :
The process of obtaining best result under given constraints is called as
- Optimization
- Generalization
- Summation
- Regularization
Question 20 :
A and B are two events. If P(A, B) decreases while P(A) increases, which of the following is true?
- P(A|B) decreases
- P(B|A) decreases
- P(B) decreases
- P(B|A) increases
Question 21 :
Machine Learning comes under which of the following domain?
- Artificial Intelligence
- Network Security
- Engineeering sciences
- System programming
Question 22 :
Which of the following option(s) is / are true? 1.You need to randomize parameters in PCA 2.You don’t need to randomize parameters in PCA 3.PCA can be trapped into local maxima problem 4.PCA can’t be trapped into local minima problem
- 1 and 3
- 1 and 4
- 2 and 3
- 2 and 4
Question 23 :
What is the major component of PCA?
- all the eigen vectors for the projection space
- The average of eigen vectors for the projection space
- Value of the last among the eigen vectors for the projection space
- Value of the first among the eigen vectors for the projection space
Question 24 :
Which of the following is a clustering algorithm in machine learning?
- Expectation Maximization
- CART
- Gaussian Naïve Bayes
- Apriori
Question 25 :
A machine learning model gives 95% accuracy on an unbalanced dataset. What can be concluded about the classifier?
- Since accuracy is 95% the classifier will perform well in real life scenario
- Classifier will give good accuracy on the validation of the dataset.
- Unbalanced Dataset will not affect the performance of classifier
- Because of an unbalanced dataset the classifier will predict only one class of samples accurately.
Question 26 :
Choose correct applications of reinforcement learning?
- Aircraft Control
- Sentimental analysis
- House price prediction
- Spam Email Filtering
Question 27 :
Which algorithm is used for performing probabilistic reasoning on temporal data?
- Hill-climbing search
- Hidden markov model
- Naïve Method
- Support Vector Machine
Question 28 :
You are training an RBF SVM with the following parameters: C (slack penalty) and γ = 1/2σ 2 (where σ 2 is the variance of the RBF kernel). How should you tweak the parameters to reduce overfitting?
- Increase C and/or reduce γ
- Reduce C and/or increase γ
- Reduce C and/or reduce γ
- Reduce C only (γ has no predictable effect on overfitting)
Question 29 :
___________ phenomenon refers that a model is neither trained on training data nor generalized properly on new data.
- good fitting
- overfitting
- moderate fitting
- underfitting
Question 30 :
Neural networks:
- Optimize a convex objective function
- Can use a mix of different activation functions
- are not suitable for learning.
- Can only be trained with stochastic gradient descent