- Information
- AI Chat
Decision Support System DSS - Lec 2 Data Mining
Data Mining
Assiut University
Preview text
Decision Support Systems
Data Mining
Why Data Mining?
The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, ...
Science: Remote sensing, bioinformatics, scientific simulation, ...
Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets
Evolution of Database
Technology
1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended- relational, OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s: Data mining, data warehousing, multimedia databases, and Web databases 2000s Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems
Types of Data Sets
####### Record
Relational records
Data matrix, e., numerical matrix, crosstabs
Document data: text documents: term- frequency vector
Transaction data
####### Graph and network
World Wide Web
Social or information networks
Molecular Structures
####### Ordered
Video data: sequence of images
Temporal data: time-series
Sequential Data: transaction sequences
Genetic sequence data
Data Objects
Data sets are made up of data
objects.
A data objectrepresents an entity.
Examples:
sales database: customers, store
items, sales
medical database: patients,
treatments
university database: students,
professors, courses
Also called samples , examples,
instances, data points, objects, tuples.
Data objects are described by
attributes.
Database rows -> data objects;
columns ->attributes.
Attributes
Attribute (ordimensions,
features, variables): a data field,
representing a characteristic or
feature of a data object.
E., customer _ID, name,
address
Types:
Nominal
Binary
Numeric: quantitative
Interval-scaled
Ratio-scaled
Numeric Attribute
Types
Quantity (integer or real-valued)
Interval
Measured on a scale of equal-sized
units
Values have order
E., temperature in C ̊or F ̊,
calendar dates
No true zero-point
Ratio
Inherent zero-point
We can speak of values as being an
order of magnitude larger than the
unit of measurement (10 K ̊is twice as
high as 5 K ̊).
e., temperature in Kelvin, length,
counts, monetary quantities
What Is Data Mining?
Data mining (knowledge discovery
from data)
Extraction of interesting (non-trivial,implicit, previously unknownand potentially useful)patterns or knowledge from huge amount of data
Data mining: a misnomer?
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
Watch out: Is everything “data
mining”?
Simple search and query processing
(Deductive) expert systems
Data selection for data mining
Data mining
Presentation of the mining results
Patterns and knowledge to be used
or stored into knowledge-base
Data Mining in Business
Intelligence
KDD Process: A Typical View
from ML and Statistics
This is a view from typical machine learning and statistics communities
Multi-Dimensional View of
Data Mining
Data to be mined Database data (extended-relational, object- oriented, heterogeneous, legacy), data warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs & social and information networks Knowledge to be mined (or: Data mining functions) Characterization, discrimination, association, classification, clustering, outlier analysis, etc.
Descriptive vs. predictive data mining
Multiple/integrated functions and mining at multiple levels Techniques utilized Data-intensive, data warehouse, machine learning, statistics, pattern recognition, visualization, high-performance, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Data Mining: On What Kinds
of Data?
Database-oriented data sets and applications
Relational database, data warehouse, transactional database Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and multi-linked data
Object-relational databases Heterogeneous databases and legacy databases
Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web
Decision Support System DSS - Lec 2 Data Mining
Course: Data Mining
University: Assiut University
- Discover more from:Data MiningAssiut University91 Documents
- More from:Data MiningAssiut University91 Documents