- Information
- AI Chat
Was this document helpful?
Data mining ch2
Course: Data Mining
91 Documents
Students shared 91 documents in this course
University: Assiut University
Was this document helpful?
1. Data in the real world is dirty:
• Incomplete
• Noisy
• Inconsistent
2. Why Is Data Preprocessing Important?
• No quality data, no quality mining results
• Data extraction, cleaning, and transformation comprises the majority of the work of
building a data warehouse.
3. Multi-Dimensional Measure of Data Quality:
• Accuracy
• Completeness
• Consistency
• Timeliness
• Believability
• Value added
• Interpretability
• Accessibility
4. Major Tasks in Data Preprocessing:
i. Data cleaning
ii. Data integration
iii. Data transformation
iv. Data reduction
v. Data discretization
5. Mining Data Descriptive Characteristics:
• Motivation: To better understand the data: central tendency, variation and spread.
• Data dispersion characteristics: median, max, min, quantiles, outliers, variance.
• Numerical dimensions: Boxplot or quantile analysis on sorted intervals.
• Dispersion analysis on computed measures
6. Measuring the Central Tendency:
• Mean
• Median: Middle value if odd number of values, or average of the middle two values
otherwise
• Mode: Value that occurs most frequently in the data
Three types: Unimodal, bimodal, trimodal.
• Empirical formula
Students also viewed
Related documents
- Data Mining - Bio Mining Lecture 3
- Data Mining - Ch10 Mining texts and web data Lecture 4
- Data Mining - Ch10 Mining texts and web data Lecture 3
- Data Mining - Ch10 Mining texts and web data Lecture 2
- Data Mining - Ch10 Mining texts and web data Lecture 1
- Data Mining - Ch10 Mining Object, Spatial, and Multimedia Data Lecture 4