- Information
- AI Chat
1.5 Prob Theory - senior
english 101 (Eng101)
University of Shendi
Preview text
Probability Theory
Sargur N. Srihari
srihari@cedar.buffalo
Probability Theory in Machine Learning
" Probability is key concept is dealing with
uncertainty
3 Arises due to finite size of data sets and noise on
measurements
" Probability Theory
3 Framework for quantification and manipulation of
uncertainty
3 One of the central foundations of machine learning
4
Probability with Two Variables
" Key concepts:
3 conditional & joint probabilities of variables
" Random Variables: B and F
3 Box B , Fruit F
" F has two values orange ( o ) or apple ( a )
" B has values red ( r ) or blue ( b )
2 apples 6 oranges
3 apples 1 orange
Priors: Let p ( B=r )=4/10 and p ( B=b )=6/
Given the above data we are interested in several probabilities of interest: marginal, conditional and joint
Box\Fruit orange apple red 6 2 blue 1 3
CPD: Data p ( F|B ) Box\Fruit orange apple red 3/4 1/ blue 1/4 3/
Probabilities of Interest
" Marginal Probability
3 what is the probability of an
apple? P ( F=a )
" Note that we have to consider P ( B )
" Conditional Probability
3 Given that we have an orange
what is the probability that we
chose the blue box? P ( B=b|F=o )
" Joint Probability
3 What is the probability of orange
AND blue box? P ( B=b,F=o )
2 apples 6 oranges
3 apples 1 orange
Priors: p ( B=r )=4/10 and p ( B=b )=6/
Product Rule of Probability Theory
" Consider only those instances for which X=xi
" Then fraction of those instances for which Y=yj is
written as p ( Y=yj|X=xi )
" Called conditional probability
" Relationship between joint and conditional probability:
p ( Y = yj | X = xi )=
nij ci
p ( X = xi , Y = yj )=
nij N
=
nij ci
ci N = p ( Y = yj | X = xi ) p ( X = xi )
Bayes Theorem
" From the product rule together with the symmetry
property p ( X,Y )= p ( Y,X ) we get
" Which is called Bayes 9 theorem
" Using the sum rule the denominator is expressed as
p ( Y | X )=
p ( X | Y ) p ( Y )
p ( X )
p ( X )= p ( X | Y ) p ( Y )
Y
3
Normalization constant to ensure sum of conditional probability on LHS sums to 1 over all values of Y
Ex: Joint Distribution over two Variables
N = 60 data points Histogram of Y (Fraction of data points having each value of Y )
Histogram of X
Histogram of X given Y =
X takes nine possible values, Y takes two values
Fractions would equal the probability as N à>
Bayes rule applied to Fruit Problem
3 Conditional Probability
" box is red given that fruit is orange
3 Marginal Probability
" fruit is orange
p ( B = r | F = o )= p ( F = o | B = r ) p ( B = r ) p ( F = o )
=
3 4
× 4 10 9 20
= 2 3
=0.
p ( F = o )= p ( F = o , B = r )+ p ( F = o , B = b ) = p ( F = o | B = r ) p ( B = r )+ p ( F = o | B = b ) p ( B = b )
=
6 8
×
4 10
1 4
×
6 10
=
9 20
=0.
The a posteriori probability of 0. is different from the a priori probability of 0.
The marginal probability of 0 is lower than simple average 7/12=0, since we need box priors
From sum rule of probability From product rule of probability
P(F|B ) orange apple red 3/4 1/ blue 1/4 3/ Priors: p ( B=r )=4/10, p ( B=b )=6/
Similarly P ( F = a )=0. Note that even though there are fewer apples, the blue box is more likely P ( B = r | F = a )=1/55=0.
Probability Density Function (pdf)
" Continuous Variables
" If probability that x falls in
interval ( x, x + · x ) is given
by p ( x ) dx for ·x à 0
then p ( x ) is a pdf of x
" Probability x lies in
interval ( a,b ) is
Cumulative Distribution Function
p ( x *( a , b ))= p ( x ) dx a
b
+
####### P ( z )= p ( x ) dx
2>
z
+
Probability that x lies in Interval( ->,z ) is
Several Variables
" If there are several continuous variables x 1 ,&,xD
denoted by vector x then we can define a joint
probability density p (x)= p ( x 1 ,..,xD )
" Multivariate probability density must satisfy
p (x) g 0
p (x) d x
2>
>
+ = 1
Expectation
" Expectation is average value of some function f ( x )under the
probability distribution p ( x ) denoted E [ f ]
" For a discrete distribution
E [ f ] = £ x p ( x ) f ( x )
" For a continuous distribution
" If there are N points drawn from a pdf, then expectation can be
approximated as
E [ f ] =(1 /N )£ nN =1 f ( xn )
" Conditional Expectation with respect to a conditional distribution
Ex [ f ] = £ x p ( x|y ) f ( x )
E [ f ]= + p ( x ) f ( x ) dx
This approximation is extremely important when we use sampling to determine expected value
Examples of f ( x ) of use in ML: f ( x )= x ; E [ f ] is mean f ( x )=ln p ( x ); E [ f ] is entropy f ( x )= - ln[ q ( x ) /p(x )]; K-L divergence
Variance
" Measures how much variability there is in f ( x )
around its mean value E [ f ( x )]
" Variance of f ( x ) is denoted as
var[ f ] = E [( f ( x ) 3E [ f ( x )]) 2 ]
" Expanding the square
var[ f ] = E [( f ( x ) 2 ] 3E [ f ( x )] 2
" Variance of the variable x itself
var[ x ] = E [ x 2 ] 3E [ x ] 2
Bayesian Probabilities
" Classical or Frequentist view of Probabilities
3 Probability is frequency of random, repeatable event
3 Frequency of a tossed coin coming up heads is 1/
" Bayesian View
3 Probability is a quantification of uncertainty
3 Degree of belief in propositions that do not involve random
variables
3 Examples of uncertain events as probabilities:
" Whether Arctic Sea ice cap will disappear " Whether moon was once in its own orbit around the sun " Whether Thomas Jefferson had a child by one of his slaves " Whether a signature on a check is genuine
Whether Arctic Sea cap will disappear
" We have some idea of how quickly polar ice is melting
" Revise it on the basis of fresh evidence (satellite observations)
" Assessment will affect actions we take (to reduce
An uncertain event greenhouse gases)
Answered by general Bayesian interpretation
NASA Video