Tuesday, April 2, 2019

Hypothesis Testing for ML claims

                                                                Hypothesis Testing

What is Hypothesis:
    It is a claim made by a person or organization
    eg: Average salary of an IT employee with 4 years of experience is 10L in India

What is Hypothesis Testing:
    Is a process used to either reject or accept the Hypothesis

 How:
         Before understanding the process, we need define the following two hypothesis from the original hypothesis
          1. Null Hypothesis : The original hypothesis (claim)  is wrong i.e not True
         2. Alternative Hypothesis : Is the compliment of Null hypothesis

         After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)

        This Process is explained below:
           1. Define or identify the Null hypothesis of the claim
           2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
           3. Check the validity of the Null hypothesis using the sample of evidence
               Based on the results:
                   If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
                   If the Null Hypothesis can be rejected means original claim can be retained

Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
           - calculate the standard difference between estimated value (eg: Mean/Average)  to hypothesis Value (Mean/Average)

          - Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis

                    Xbar : mean of the samples
                    mu : mean of the Null hypothesis
                    sigma: standard deviation
                    N : number of samples
                    standard Error of the sampling distribution = sigma/sqrt(N)
                    standard distance between the  value from sample and hypothesis is :
                                (xBar - Mu)/sigma/sqrt(N)
                   This is also called the test statistics value 

    - Calculate the p-Value : 
                     What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
                       i.e P(Observing the test statistics Value | Null hypothesis is True)
            
                      p-Value will be reduced if test statistics value is increased

                      How to calculate the p-Value:
                             Depending upon the distribution to be used (covered later)
      - Decision:
             Have the significance Level  (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not

               The statistic value at significance level (alpha) is called the critical Value
         
               In Right-Tailed Test   ( if Null Hypothesis is <= OR alternativeHypothesis is > )
                  If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
                 This is same as :
                  If p-value is lessthan < alpha value => reject Null Hypothesis
               
                  Rejection (rejection of Null Hypothesis) region is at right side from alpha  

                In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
                 
                 If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
                 Rejection (rejection of Null Hypothesis) region is at left side from alpha 

                Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != ) 
                   rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides






No comments:

Post a Comment