Skip to document

Decision Support System DSS - Lec 2 Data Mining

Why Data Mining?, Evolution of Sciences, Evolution of Database Technol...
Course

Data Mining

91 Documents
Students shared 91 documents in this course
Academic year: 2023/2024
Uploaded by:
531Uploads
415upvotes

Comments

Please sign in or register to post comments.

Preview text

Decision Support Systems

Data Mining

Why Data Mining?

The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, ...

Science: Remote sensing, bioinformatics, scientific simulation, ...

Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge!

“Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets

Evolution of Database

Technology

1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended- relational, OO, deductive, etc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s: Data mining, data warehousing, multimedia databases, and Web databases 2000s Stream data management and mining

Data mining and its applications

Web technology (XML, data integration) and global information systems

Types of Data Sets

####### Record

Relational records

Data matrix, e., numerical matrix, crosstabs

Document data: text documents: term- frequency vector

Transaction data

####### Graph and network

World Wide Web

Social or information networks

Molecular Structures

####### Ordered

Video data: sequence of images

Temporal data: time-series

Sequential Data: transaction sequences

Genetic sequence data

Data Objects

Data sets are made up of data
objects.
A data objectrepresents an entity.
Examples:
sales database: customers, store
items, sales
medical database: patients,
treatments
university database: students,
professors, courses
Also called samples , examples,
instances, data points, objects, tuples.
Data objects are described by
attributes.
Database rows -> data objects;
columns ->attributes.

Attributes

Attribute (ordimensions,
features, variables): a data field,
representing a characteristic or
feature of a data object.
E., customer _ID, name,
address
Types:
Nominal
Binary
Numeric: quantitative
Interval-scaled
Ratio-scaled

Numeric Attribute

Types

Quantity (integer or real-valued)
Interval
Measured on a scale of equal-sized
units
Values have order
E., temperature in C ̊or F ̊,
calendar dates
No true zero-point
Ratio
Inherent zero-point
We can speak of values as being an
order of magnitude larger than the
unit of measurement (10 K ̊is twice as
high as 5 K ̊).
e., temperature in Kelvin, length,
counts, monetary quantities

What Is Data Mining?

Data mining (knowledge discovery
from data)

Extraction of interesting (non-trivial,implicit, previously unknownand potentially useful)patterns or knowledge from huge amount of data

Data mining: a misnomer?

Alternative names

Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Watch out: Is everything “data
mining”?

Simple search and query processing

(Deductive) expert systems

Data selection for data mining
Data mining
Presentation of the mining results
Patterns and knowledge to be used
or stored into knowledge-base

Data Mining in Business

Intelligence

KDD Process: A Typical View

from ML and Statistics

This is a view from typical machine learning and statistics communities

Multi-Dimensional View of

Data Mining

Data to be mined Database data (extended-relational, object- oriented, heterogeneous, legacy), data warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs & social and information networks Knowledge to be mined (or: Data mining functions) Characterization, discrimination, association, classification, clustering, outlier analysis, etc.

Descriptive vs. predictive data mining

Multiple/integrated functions and mining at multiple levels Techniques utilized Data-intensive, data warehouse, machine learning, statistics, pattern recognition, visualization, high-performance, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.

Data Mining: On What Kinds

of Data?

Database-oriented data sets and applications

Relational database, data warehouse, transactional database Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences)

Structure data, graphs, social networks and multi-linked data

Object-relational databases Heterogeneous databases and legacy databases

Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web

Was this document helpful?

Decision Support System DSS - Lec 2 Data Mining

Course: Data Mining

91 Documents
Students shared 91 documents in this course

University: Assiut University

Was this document helpful?
Decision Support Systems
Data Mining