Skip to document

Data Mining - MCQ Questions 2

Question Bank for Data Mining Course
Course

Data Mining

91 Documents
Students shared 91 documents in this course
Academic year: 2023/2024
Uploaded by:
531Uploads
415upvotes

Comments

Please sign in or register to post comments.

Related Studylists

zelalem simon

Preview text

-"Efficiency and scalability of data mining algorithms" issues comes under? A. Mining Methodology and User Interaction Issues B. Performance Issues C. Diverse Data Types Issues D. None of the above

-Data used to build a data mining model. a. validation data b. training data c. test data d. hidden data

-In Smoothing by bin means each value in a bin is replaced by the mean value of the bin. a. True b

-The basic algorithm for decision tree induction is a greedy algorithm a b

-Noise is a random error or variance in measured variables. a b *

  • Consider discretizing a continuous attribute whose values are listed below: 3, 4, 5, 10, 21, 32, 43, 44, 46, 52, 59, 67 Using equal-width partitioning and four bins, how many values are there in the first bin (the bin with small values)? A. 1 B. 2 C. 3 D. 4 not sure yet

-Supervised learning differs from unsupervised clustering in that supervised learning requires a. at least one input attribute b. input attributes to be categorical. c. at least one output attribute. d. ouput attriubutes to be categorical. *

  • Which statement about outliers is true? a. Outliers should be identified and removed from a dataset. b. Outliers should be part of the training dataset but should not be present in the test data. c. Outliers should be part of the test dataset but should not be present in the training data. d. The nature of the problem determines how outliers are used.

  • Data mining can be used to improve ___________. a) Efficiency b) Quality of data c) Marketing d) All the above

  • Which of the following is true for Classification? a) A subdivision of set b) A measure of the accuracy c) The task of assigning a classification d) All of these

  • Which one of the following is alternative search strategies for mining multiple- level associations with reduced support? a) Level – by level independent b) Level – cross-filtering by a single item c) Level – cross-filtering by k – itemset: d) All the above

  • In association rule mining the generation of the frequent itermsets is the computational intensive step a b. False

  • What is true about data mining? A. Data Mining is defined as the procedure of extracting information from huge sets of data B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation C. Data mining is the procedure of mining knowledge from data. D. All of the above

  • A type of data that represents the numeric values of specific variables. for example age number of children etc a. Ratio data b. Numeric data “not sure yet” c. Nominal data d. Interval data

  • Which of the following statements about Naive Bayes is incorrect? A. Attributes are equally important. B. Attributes are statistically dependent of one another given the class value. C. Attributes are statistically independent of one another given the class value. D. All of the above

  • Information gain measure is not biased towards attributes with a large number of values a,True “not sure” b

  • Prediction can be viewed as the construction and use of a model to assess the class of an unlabeled sample, or to assess the value or value ranges of an attribute that a given sample is likely to have a. true b

  • This step of the KDD process model deals with noisy data. a. Creating a target dataset b. data preprocessing c. data transformation d. data mining

  • Relevance analysis (feature selection) remove the irrelevant or redundant attributes a. True b. False

  • The correlation coefficient for two real-valued attributes is – 0. What does this value tell you? a. The attributes are not linearly related. b. As the value of one attribute increases the value of the second attribute also increases. c. As the value of one attribute decreases the value of the second attribute increases. d. The attributes show a curvilinear relationship.

  • FP–Tree Growth Algorithm can be implemented in tow Phases. a. True b. False

  • The problem of finding hidden structure in unlabeled data is called A. Supervised learning B. Unsupervised learning C. Reinforcement learning

  • Classification accuracy is A. A subdivision of a set of examples into a number of classes “not sure” B. Measure of the accuracy, of the classification of a concept that is given by a certain theory C. The task of assigning a classification to a set of examples D. None of these

  • Data Mining System Classification consists of A. Database Technology B. Machine Learning C. computer Vision D. All of the above

  • Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that a. Y is true when X is known to be true. b. X is true when Y is known to be true. c. Y is false when X is known to be false. d. X is false when Y is known to be false.

  • Association rule support is defined as

a. the percentage of instances that contain the antecendent conditional items listed in the association rule. b. the percentage of instances that contain the consequent conditions listed in the association rule. c. the percentage of instances that contain all items listed in the association rule. d. the percentage of instances in the database that contain at least one of the antecendent conditional items listed in the association rule.

  • The most commonly used algorithm to discover association rules by recursively identifying frequent item sets a. Apriori algorithm b. Ordinal data c. Nominal data d. Categorical data

  • In decision tree Induction, tree is constructed in a top-down recursive divide-and-conquer manner a. True b. False

-The first steps involved in the knowledge discovery is?

a. True b. False

  • An itemset X is closed if X is frequent and there exists super-pattern Y כ X, with the same support as X a. True b. False

  • The term data mining was originally used to ______. a. include most forms of data analysis in order to increase sales b. describe the prices through which previously unknown patterns in data were discovered c. describe the analysis of huge datasets stored in data warehouses d. All of the above

  • This approach is best when we are interested in finding all possible interactions among a set of attributes. a. decision tree b. association rules c. K-Means algorithm d. genetic learning

  • Which of the following is not a data mining functionality?

A) Characterization and Discrimination B) Classification and regression C) Selection and interpretation D) Clustering and Analysis

  • Naïve Bayesian prediction requires each conditional prob. be zero a. True “not sure” b
Was this document helpful?

Data Mining - MCQ Questions 2

Course: Data Mining

91 Documents
Students shared 91 documents in this course

University: Assiut University

Was this document helpful?
-"Efficiency and scalability of data mining algorithms" issues comes
under?
A. Mining Methodology and User Interaction Issues
B. Performance Issues
C. Diverse Data Types Issues
D. None of the above
-Data used to build a data mining model.
a. validation data
b. training data
c. test data
d. hidden data
-In Smoothing by bin means each value in a bin is replaced by the mean
value of the bin.
a. True
b.False
-The basic algorithm for decision tree induction is a greedy algorithm
a.true
b.false