top of page

Top questions on data science

Data Science Most Imp Questions for Interview Preparation based on Technical Concepts

  • What are the differences between supervised and unsupervised learning?

  • How is logistic regression done?

  • Explain the steps in making a decision tree.

  • How do you build a random forest model?

  • How can you avoid overfitting your model?

  • Differentiate between univariate, bivariate, and multivariate analysis.

  • What are the feature selection methods used to select the right variables?

  • In your choice of language, write a program that prints the numbers ranging from one to 50.

  • You are given a data set consisting of variables with more than 30 percent missing values. How will you deal with them?

  • What are dimensionality reduction and its benefits?

  • How should you maintain a deployed model?

  • What are recommender systems?

  • How do you find RMSE and MSE in a linear regression model?

  • How can you select k for k-means?

  • What is the significance of p-value?

  • How can outlier values be treated?

  • How can time-series data be declared as stationery?

  • 'People who bought this also bought…' recommendations seen on Amazon are a result of which algorithm?

  • You are given a dataset on cancer detection. You have built a classification model and achieved an accuracy of 96 percent. Why shouldn't you be happy with your model performance? What can you do about it?

  • Which machine learning algorithms can be used for inputting missing values of both categorical and continuous variables?

  • We want to predict the probability of death from heart disease based on three risk factors: age, gender, and blood cholesterol level. What is the most appropriate algorithm for this case?

  • After studying the behaviour of a population, you have identified four specific individual types that are valuable to your study. You would like to find all users who are most similar to each individual type. Which algorithm is most appropriate for this study?

  • Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon. You have been asked to determine if offering a coupon to website visitors has any impact on their purchase decisions. Which analysis method should you use?

Questions Based on Basic Concepts:

  • What are the feature vectors?

  • What is root cause analysis?

  • What is logistic regression?

  • Explain cross-validation.

  • What is collaborative filtering?

  • Do gradient descent methods always converge to similar points?

  • What is the goal of A/B Testing?

  • What are the drawbacks of the linear model?

  • What is the goal of A/B Testing?

  • What is the law of large numbers?

  • What are the confounding variables?

  • What is star schema?

  • How regularly must an algorithm be updated?

  • What are eigenvalue and eigenvector?

  • Why is resampling done?

  • What is selection bias?

  • What are the types of biases that can occur during sampling?

  • What is survivorship bias?

  • How do you work towards a random forest?

  • What do you understand by true positive rate and false-positive rate?

  • Why R is used in Data Visualization?

  • What is k-fold cross-validation?

  • What is precision?

68 views0 comments


bottom of page