Skip to document

1.5 Prob Theory - senior

senior
Course

english 101 (Eng101)

14 Documents
Students shared 14 documents in this course
Academic year: 2017/2018
Uploaded by:
0followers
9Uploads
1upvotes

Comments

Please sign in or register to post comments.

Preview text

Probability Theory

Sargur N. Srihari

srihari@cedar.buffalo

Probability Theory in Machine Learning

" Probability is key concept is dealing with

uncertainty

3 Arises due to finite size of data sets and noise on
measurements

" Probability Theory

3 Framework for quantification and manipulation of
uncertainty
3 One of the central foundations of machine learning

4

Probability with Two Variables

" Key concepts:

3 conditional & joint probabilities of variables

" Random Variables: B and F

3 Box B , Fruit F
" F has two values orange ( o ) or apple ( a )
" B has values red ( r ) or blue ( b )

2 apples 6 oranges

3 apples 1 orange

Priors: Let p ( B=r )=4/10 and p ( B=b )=6/

Given the above data we are interested in several probabilities of interest: marginal, conditional and joint

Box\Fruit orange apple red 6 2 blue 1 3

CPD: Data p ( F|B ) Box\Fruit orange apple red 3/4 1/ blue 1/4 3/

Probabilities of Interest

" Marginal Probability

3 what is the probability of an

apple? P ( F=a )

" Note that we have to consider P ( B )

" Conditional Probability

3 Given that we have an orange

what is the probability that we

chose the blue box? P ( B=b|F=o )

" Joint Probability

3 What is the probability of orange

AND blue box? P ( B=b,F=o )

2 apples 6 oranges

3 apples 1 orange

Priors: p ( B=r )=4/10 and p ( B=b )=6/

Product Rule of Probability Theory

" Consider only those instances for which X=xi
" Then fraction of those instances for which Y=yj is
written as p ( Y=yj|X=xi )
" Called conditional probability
" Relationship between joint and conditional probability:

p ( Y = yj | X = xi )=

nij ci

p ( X = xi , Y = yj )=

nij N

=

nij ci

ci N = p ( Y = yj | X = xi ) p ( X = xi )

Bayes Theorem

" From the product rule together with the symmetry

property p ( X,Y )= p ( Y,X ) we get

" Which is called Bayes 9 theorem

" Using the sum rule the denominator is expressed as

p ( Y | X )=

p ( X | Y ) p ( Y )

p ( X )

p ( X )= p ( X | Y ) p ( Y )

Y

3

Normalization constant to ensure sum of conditional probability on LHS sums to 1 over all values of Y

Ex: Joint Distribution over two Variables

N = 60 data points Histogram of Y (Fraction of data points having each value of Y )

Histogram of X

Histogram of X given Y =

X takes nine possible values, Y takes two values

Fractions would equal the probability as N à>

Bayes rule applied to Fruit Problem

3 Conditional Probability
" box is red given that fruit is orange
3 Marginal Probability
" fruit is orange

p ( B = r | F = o )= p ( F = o | B = r ) p ( B = r ) p ( F = o )

=

3 4

× 4 10 9 20

= 2 3

=0.

p ( F = o )= p ( F = o , B = r )+ p ( F = o , B = b ) = p ( F = o | B = r ) p ( B = r )+ p ( F = o | B = b ) p ( B = b )

=

6 8

×

4 10

1 4

×

6 10

=

9 20

=0.

The a posteriori probability of 0. is different from the a priori probability of 0.

The marginal probability of 0 is lower than simple average 7/12=0, since we need box priors

From sum rule of probability From product rule of probability

P(F|B ) orange apple red 3/4 1/ blue 1/4 3/ Priors: p ( B=r )=4/10, p ( B=b )=6/

Similarly P ( F = a )=0. Note that even though there are fewer apples, the blue box is more likely P ( B = r | F = a )=1/55=0.

Probability Density Function (pdf)

" Continuous Variables

" If probability that x falls in

interval ( x, x + · x ) is given

by p ( x ) dx for ·x à 0

then p ( x ) is a pdf of x

" Probability x lies in

interval ( a,b ) is

Cumulative Distribution Function

p ( x *( a , b ))= p ( x ) dx a

b

+

####### P ( z )= p ( x ) dx

2>

z

+

Probability that x lies in Interval( ->,z ) is

Several Variables

" If there are several continuous variables x 1 ,&,xD

denoted by vector x then we can define a joint

probability density p (x)= p ( x 1 ,..,xD )

" Multivariate probability density must satisfy

p (x) g 0

p (x) d x

2>

>

+ = 1

Expectation

" Expectation is average value of some function f ( x )under the
probability distribution p ( x ) denoted E [ f ]
" For a discrete distribution

E [ f ] = £ x p ( x ) f ( x )

" For a continuous distribution
" If there are N points drawn from a pdf, then expectation can be
approximated as

E [ f ] =(1 /N )£ nN =1 f ( xn )

" Conditional Expectation with respect to a conditional distribution

Ex [ f ] = £ x p ( x|y ) f ( x )

E [ f ]= + p ( x ) f ( x ) dx

This approximation is extremely important when we use sampling to determine expected value

Examples of f ( x ) of use in ML: f ( x )= x ; E [ f ] is mean f ( x )=ln p ( x ); E [ f ] is entropy f ( x )= - ln[ q ( x ) /p(x )]; K-L divergence

Variance

" Measures how much variability there is in f ( x )

around its mean value E [ f ( x )]

" Variance of f ( x ) is denoted as

var[ f ] = E [( f ( x ) 3E [ f ( x )]) 2 ]

" Expanding the square

var[ f ] = E [( f ( x ) 2 ] 3E [ f ( x )] 2

" Variance of the variable x itself

var[ x ] = E [ x 2 ] 3E [ x ] 2

Bayesian Probabilities

" Classical or Frequentist view of Probabilities

3 Probability is frequency of random, repeatable event
3 Frequency of a tossed coin coming up heads is 1/

" Bayesian View

3 Probability is a quantification of uncertainty
3 Degree of belief in propositions that do not involve random
variables
3 Examples of uncertain events as probabilities:

" Whether Arctic Sea ice cap will disappear " Whether moon was once in its own orbit around the sun " Whether Thomas Jefferson had a child by one of his slaves " Whether a signature on a check is genuine

Whether Arctic Sea cap will disappear

" We have some idea of how quickly polar ice is melting

" Revise it on the basis of fresh evidence (satellite observations)

" Assessment will affect actions we take (to reduce

An uncertain event greenhouse gases)
Answered by general Bayesian interpretation

NASA Video

Was this document helpful?

1.5 Prob Theory - senior

Course: english 101 (Eng101)

14 Documents
Students shared 14 documents in this course
Was this document helpful?
Machine Learning Srihari
1
Probability Theory
Sargur N. Srihari
srihari@cedar.buffalo.edu