- Information
- AI Chat
Machine Learning Notes 1,2,3,4,5
Electronic and communication (ECE)
Visvesvaraya Technological University
Recommended for you
Preview text
||Jai Sri Gurudev|| Sri Adichunchanagiri Shikshana Trust(R)
SJB INSTITUTE OF TECHNOLOGY
BGS HEALTH AND EDUCATION CITY Kengeri, Bengaluru-
Department of Electronics and Communication Engineering
B., VIII Semester
[As per Choice Based Credit System (CBCS) scheme]
Machine Learning
(17EC834)
ACADEMIC YEAR – 2020 -
Faculty Name Dr. Supreeth H S G
Designation Associate Professor
Course Machine Learning
Course Code 17EC
Semester VIII
1 Dept. of ECE, SJBIT, Bengaluru
MODULE 1
INTRODUCTION
Ever since computers were invented, we have wondered whether they might be made tolearn. If we could understand how to program them to learn-to improve automatically with experience-the impact would be dramatic. Imagine computers learning from medical records which treatments are most effective for newdiseases Houseslearningfromexperiencetooptimizeenergycostsbasedontheparticularusage patterns of theiroccupants. Personal software assistants learning the evolving interests of their users in order to highlight especially relevant stories from the online morningnewspaper
A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization
Some successful applications of machine learning
Learning to recognize spokenwords Learning to drive an autonomous vehicle Learning to classify new astronomicalstructures Learning to play world-classbackgammon
Why is Machine Learning Important?
Some tasks cannot be defined well, except by examples (e., recognizingpeople). Relationships and correlations can be hidden within large amounts of data. Machine Learning/Data Mining may be able to find theserelationships. Human designers often produce machines that do not work as well as desired in the environments in which they areused. The amount of knowledge available about certain tasks might be too large for explicit encoding by humans (e., medicaldiagnostic). Environments change overtime. New knowledge about tasks is constantly being discovered by humans. It may be difficult to continuously re-design systems “byhand”.
3
DESIGNING A LEARNING SYSTEM
The basic design issues and approaches to machine learning are illustrated by designing a program to learn to play checkers, with the goal of entering it in the world checkers tournament 1. Choosing the TrainingExperience 2. Choosing the TargetFunction 3. Choosing a Representation for the TargetFunction 4. Choosing a Function ApproximationAlgorithm 1. Estimating trainingvalues 2. Adjusting theweights 5. The FinalDesign
1. Choosing the TrainingExperience
The first design choice is to choose the type of training experience from which the system willlearn. The type of training experience available can have a significant impact on successor failure of thelearner.
There are three attributes which impact on success or failure of the learner
- Whether the training experience provides direct or indirect feedback regarding the choices made by the performancesystem.
For example, in checkers game: In learning to play checkers, the system might learn from direct training examples consisting of individual checkers board states and the correct move for each.
Indirect training examples consisting of the move sequences and final outcomes of various games played. The information about the correctness of specific moves earlyin the game must be inferred indirectly from the fact that the game was eventually won or lost.
Here the learner faces an additional problem of credit assignment , or determining the degree to which each move in the sequence deserves credit or blame for the final outcome. Credit assignment can be a particularly difficult problem because the game can be lost even when early moves are optimal, if these are followed later by poor moves. Hence, learning from direct training feedback is typically easier than learning from indirect feedback.
4
2. The degree to which the learner controls the sequence of trainingexamples
For example, in checkers game: The learner might depends on the teacher to select informative board states and to provide the correct move for each.
Alternatively, the learner might itself propose board states that it finds particularly confusing and ask the teacher for the correct move.
Thelearnermayhavecompletecontroloverboththeboardstatesand(indirect)training classifications,asitdoeswhenitlearnsbyplayingagainstitselfwith noteacherpresent.
- How well it represents the distribution of examples over which the final system performance P must bemeasured
For example, in checkers game: In checkers learning scenario, the performance metric P is the percent of games the system wins in the world tournament.
IfitstrainingexperienceEconsistsonlyofgamesplayedagainstitself,thereisadanger that this training experience might not be fully representative of the distribution of situations over which it will later betested. It is necessary to learn from a distribution of examples that is different from those on which the final system will be evaluated.
2. Choosing the Target Function
The next design choice is to determine exactly what type of knowledge will be learned and how this will be used by the performance program.
Let‟s consider a checkers-playing program that can generate the legal moves from any board state. The program needs only to learn how to choose the best move from among these legal moves. We must learn to choose among the legal moves, the most obvious choice for the type of information to be learned is a program, or function, that chooses the best move for any given board state.
- Let ChooseMove be the target function and the notationis
ChooseMove : B→ M which indicate that this function accepts as input any board from the set of legal board states B and produces as output some move from the set of legal moves M.
6
Where,
w 0 through w 6 are numerical coefficients, or weights, to be chosen by the learning algorithm. Learned values for the weights w1 through w6 will determine the relative importance of the various board features in determining the value of theboard The weight w 0 will provide an additive constant to the boardvalue
4. Choosing a Function ApproximationAlgorithm
In order to learn the target function f we require a set of training examples, each describing a specific board state b and the training value Vtrain(b) for b.
Each training example is an ordered pair of the form (b, Vtrain(b)).
For instance, the following training example describes a board state b in which black has won the game (note x 2 = 0 indicates that red has no remaining pieces) and for which the target function value Vtrain(b) is therefore +100.
((x 1 =3, x 2 =0, x 3 =1, x 4 =0, x 5 =0, x 6 =0), +100)
Function Approximation Procedure
Derive training examples from the indirect training experience available to thelearner
Adjusts the weights wi to best fit these trainingexamples
Estimating trainingvalues
A simple approach for estimating training values for intermediate board states is to assign the training value of Vtrain(b) for any intermediate board state b to be V̂(Successor(b))
Where , V̂ is the learner's current approximation toV Successor(b) denotes the next board state following b for which it is again the program's turn to move
Rule for estimating training values
Vtrain(b ) ← V̂ (Successor(b))
7
- Adjusting theweights
Specify the learning algorithm for choosing the weights wi to best fit the set of training examples {(b, Vtrain(b))} A first step is to define what we mean by the bestfit to the training data. One common approach is to define the best hypothesis, or set of weights, as that which minimizes the squared error E between the training values and the values predicted by the hypothesis.
Several algorithms are known for finding weights of a linear function that minimize E. One such algorithm is called the least mean squares, or LMS training rule. For each observed training example it adjusts the weights a small amount in the direction that reduces the error on this training example
LMS weight update rule :- For each training example (b, Vtrain(b)) Use the current weights to calculate V̂ (b) For each weight wi, update it as
wi← wi + ƞ (Vtrain (b ) - V̂ (b)) xi
Here ƞ is a small constant (e., 0) that moderates the size of the weight update.
Working of weight update rule
When the error (Vtrain(b)- V̂ (b)) is zero, no weights arechanged. When (Vtrain(b) - V̂ (b)) is positive (i., when V̂ (b) is too low), then each weight isincreasedinproportiontothevalueofitscorrespondingfeature the value of V̂ (b), reducing theerror. If the value of some feature xi is zero, then its weight is not altered regardless of theerror,sothattheonlyweightsupdatedarethosewhosefeaturesactuallyoccur on the training exampleboard.
9
PERSPECTIVES AND ISSUES IN MACHINE LEARNING
Issues in Machine Learning The field of machine learning, and much of this book, is concerned with answering questions such as the following What algorithms exist for learning general target functions from specific training examples? In what settings will particular algorithms converge to the desired function, given sufficient training data? Which algorithms perform best for which types of problems andrepresentations? How much training data is sufficient? What general bounds can be found to relate the confidenceinlearnedhypothesestotheamountoftrainingexperienceandthecharacter of the learner's hypothesisspace?
10
Whenandhowcanpriorknowledgeheldbythelearnerguidetheprocessofgeneralizing from examples? Can prior knowledge be helpful even when it is only approximately correct? What is the best strategy for choosing a useful next training experience, and how does the choice of this strategy alter the complexity of the learningproblem?
What is the best way to reduce the learning task to one or more functionapproximation problems?Putanotherway,whatspecificfunctionsshouldthesystemattempttolearn? Can this process itself beautomated? How can the learner automatically alter its representation to improve its ability to represent and learn the targetfunction?
12
For each attribute, the hypothesis will either
Indicate by a "?' that any value is acceptable for thisattribute, Specify a single required value (e., Warm) for the attribute,or Indicate by a "Φ" that no value isacceptable
If some instance x satisfies all the constraints of hypothesis h , then h classifies x as a positive example ( h(x) = 1 ).
The hypothesis that PERSON enjoys his favorite sport only on cold days with high humidity is represented by the expression (?, Cold, High, ?, ?, ?)
The most general hypothesis-that every day is a positive example-is represented by (?, ?, ?, ?, ?, ?)
The most specific possible hypothesis-that no day is a positive example-is represented by (Φ, Φ, Φ, Φ, Φ, Φ)
Notation
The set of items over which the concept is defined is called the set of instances , which is denoted byX.
Example: X is the set of all possible days, each represented by the attributes: Sky, AirTemp, Humidity, Wind, Water, and Forecast
The concept or function to be learned is called the target concept , which is denoted by c. c can be any Boolean valued function defined over the instancesX
c: X→ {O, 1}
Example: The target concept corresponds to the value of the attribute EnjoySport (i., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).
Instancesforwhichc(x)=1arecalled positiveexamples ,ormembersofthetargetconcept. Instances for which c(x) = 0 are called negative examples , or non-members of the target concept. The ordered pair (x, c(x)) to describe the training example consisting of the instance x and its target concept valuec(x). D to denote the set of available trainingexamples
13
The symbol H to denote the set of all possible hypotheses that the learner may consider regarding the identity of the target concept. Each hypothesis h in H represents a Boolean- valued function defined over X h: X→{O, 1}
The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all x in X.
Given: Instances X: Possible days, each described by theattributes Sky (with possible values Sunny, Cloudy, and Rainy), AirTemp (with values Warm andCold), Humidity (with values Normal and High), Wind (with values Strong andWeak), Water (with values Warm and Cool), Forecast (with values Same and Change).
Hypotheses H : Each hypothesis is described by a conjunction of constraints on the attributes Sky, AirTemp, Humidity, Wind, Water , and Forecast. The constraints may be "?" (any value is acceptable), “Φ” (no value is acceptable), or a specificvalue.
Target concept c : EnjoySport : X → {0,l} Training examples D : Positive and negative examples of the targetfunction
Determine: A hypothesis h in H such that h(x) = c(x) for all x inX.
Table: The EnjoySport concept learning task.
The inductive learning hypothesis
Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
15
Inthefigure,theboxontheleftrepresentsthesetXofallinstances,theboxontheright the set H of allhypotheses. EachhypothesiscorrespondstosomesubsetofX-thesubsetofinstancesthatitclassifies positive. The arrows connecting hypotheses represent the more - general -than relation, with the arrow pointing toward the less generalhypothesis. Note the subset of instances characterized by h 2 subsumes the subset characterized by hl , hence h 2 is more - general– thanh 1
FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
16
To illustrate this algorithm, assume the learner is given the sequence of training examples from the EnjoySport task
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes
The first step of FIND-S is to initialize h to the most specific hypothesis inH h - (Ø, Ø, Ø, Ø, Ø, Ø)
Consider the first trainingexample x 1 = <Sunny Warm Normal Strong Warm Same>, +
Observing the first training example, it is clear that hypothesis h is too specific. None of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next more general constraint that fits the example h 1 = <Sunny Warm Normal Strong Warm Same>
Consider the second trainingexample x 2 = <Sunny, Warm, High, Strong, Warm, Same>, +
The second training example forces the algorithm to further generalize h, this time substituting a "?" in place of any attribute value in h that is not satisfied by the new example h 2 = <Sunny Warm? Strong Warm Same>
Consider the third trainingexample x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
Upon encountering the third training the algorithm makes no change to h. The FIND-S algorithm simply ignores every negative example. h 3 = < Sunny Warm? Strong Warm Same>
Consider the fourth trainingexample x 4 = <Sunny Warm High Strong Cool Change>, +
The fourth example leads to a further generalization of h h 4 = < Sunny Warm ?Strong?? >
18
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM
The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the set of all hypotheses consistent with the training examples
Representation
Definition: consistent- A hypothesis h is consistent with a set of training examples D if and only if h(x) = c(x) for each example (x, c(x)) in D.
Consistent ( h, D ) ( x, c ( x ) D ) h ( x ) = c ( x ))
Note difference between definitions of consistent and satisfies An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x is a positive or negative example of the targetconcept. An example x is said to consistent with hypothesis h iff h(x) = c(x)
Definition:versionspace- The versionspace, denoted VS with respect to hypothesisspace H, D H and training examples D, is the subset of hypotheses from H consistent with the training examples in D VS { h H | Consistent ( h,D )} H, D
The LIST-THEN-ELIMINATION algorithm
The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H and then eliminates any hypothesis found inconsistent with any training example.
- VersionSpace c a list containing every hypothesis inH
- For each training example, (x,c(x)) remove from VersionSpace any hypothesis h for which h(x) ≠ c(x) 3. Output the list of hypotheses in VersionSpace
The LIST-THEN-ELIMINATE Algorithm
List-Then-Eliminate works in principle, so long as version space isfinite. However,sinceitrequiresexhaustiveenumerationofallhypothesesinpracticeitisnot feasible.
19
A More Compact Representation for Version Spaces
The version space is represented by its most general and least general members. These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.
Definition: The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D
G { g H | Consistent ( g, D )( g' H )[( g' g ) Consistent ( g', D )]} g
Definition: The specific boundary S, with respect to hypothesis space H and training data D, is the set of minimally general (i., maximally specific) members of H consistent with D.
S { s H | Consistent ( s, D )( s' H )[( s s' ) Consistent ( s', D )]} g
Theorem: Version Space representation theorem
Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean-valued hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X, andletDbeanarbitrarysetoftrainingexamples{(x,c(x))).ForallX,H,c,andDsuchthatS and G are welldefined,
VS = { h H | ( s S ) ( g G ) ( g h s )} H,D g g
To Prove: 1. Every h satisfying the right hand side of the above expression is in VS H, D 2. Every memberof VS satisfies the right-hand side of theexpression H, D
Sketch of proof: 1. let g, h, s be arbitrary members of G, H, S respectively with g gh gs By the definition of S, s must be satisfied by all positive examples in D. Because h g s, h must also be satisfied by all positive examples in D. Bythedefinitionof G, gcannotbesatisfiedbyanynegativeexampleinD,andbecause g gh h cannot be satisfied by any negative example in D. Because h is satisfied by all positive examples in D and by no negative examples in D, h is consistent with D, and therefore h is a member of VSH,D. 2. It can be proven by assuming some h in VSH,D ,that does not satisfy the right-hand side of the expression, then showing that this leads to aninconsistency
Machine Learning Notes 1,2,3,4,5
Course: Electronic and communication (ECE)
University: Visvesvaraya Technological University
- Discover more from: