Tuesday, April 2, 2019

Hypothesis Testing for ML claims

                                                                Hypothesis Testing

What is Hypothesis:
    It is a claim made by a person or organization
    eg: Average salary of an IT employee with 4 years of experience is 10L in India

What is Hypothesis Testing:
    Is a process used to either reject or accept the Hypothesis

 How:
         Before understanding the process, we need define the following two hypothesis from the original hypothesis
          1. Null Hypothesis : The original hypothesis (claim)  is wrong i.e not True
         2. Alternative Hypothesis : Is the compliment of Null hypothesis

         After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)

        This Process is explained below:
           1. Define or identify the Null hypothesis of the claim
           2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
           3. Check the validity of the Null hypothesis using the sample of evidence
               Based on the results:
                   If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
                   If the Null Hypothesis can be rejected means original claim can be retained

Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
           - calculate the standard difference between estimated value (eg: Mean/Average)  to hypothesis Value (Mean/Average)

          - Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis

                    Xbar : mean of the samples
                    mu : mean of the Null hypothesis
                    sigma: standard deviation
                    N : number of samples
                    standard Error of the sampling distribution = sigma/sqrt(N)
                    standard distance between the  value from sample and hypothesis is :
                                (xBar - Mu)/sigma/sqrt(N)
                   This is also called the test statistics value 

    - Calculate the p-Value : 
                     What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
                       i.e P(Observing the test statistics Value | Null hypothesis is True)
            
                      p-Value will be reduced if test statistics value is increased

                      How to calculate the p-Value:
                             Depending upon the distribution to be used (covered later)
      - Decision:
             Have the significance Level  (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not

               The statistic value at significance level (alpha) is called the critical Value
         
               In Right-Tailed Test   ( if Null Hypothesis is <= OR alternativeHypothesis is > )
                  If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
                 This is same as :
                  If p-value is lessthan < alpha value => reject Null Hypothesis
               
                  Rejection (rejection of Null Hypothesis) region is at right side from alpha  

                In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
                 
                 If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
                 Rejection (rejection of Null Hypothesis) region is at left side from alpha 

                Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != ) 
                   rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides






Monday, April 1, 2019

yum install - same rpm of different architectures (32-bit and 64-bit) in same development environment


Requirement:
----------------

   We have one development environment ( or docker container used for development )
    For development  we need 64-bit package installation of certain rpm
    Later for some other requirements we need the 32-bit package of the same rpm to be installed in the same environment ( or docker container)

Problem :
------------
      Since there was already an existing package of 64-bit version was already available, installation of the 32-bit version returned the following error during the "yum install"
    
     for example while installing the libgcc of 32 version, we got the following error message

Error:  Multilib version problems found. This often means that the root
       cause is something else and multilib version checking is just
       pointing out that there is a problem. Eg.:

         1. You have an upgrade for libgcc which is missing some
            dependency that another package requires. Yum is trying to
            solve this by installing an older version of libgcc of the
            different architecture. If you exclude the bad architecture
            yum will tell you what the root cause is (which package
            requires what). You can try redoing the upgrade with
            --exclude libgcc.otherarch ... this should give you an error
            message showing the root cause of the problem.

         2. You have multiple architectures of libgcc installed, but
            yum can only see an upgrade for one of those arcitectures.
            If you don't want/need both architectures anymore then you
            can remove the one with the missing update and everything
            will work.

         3. You have duplicate versions of libgcc installed already.
            You can use "yum check" to get yum show these errors.

       ...you can also use --setopt=protected_multilib=false to remove
       this checking, however this is almost never the correct thing to
       do as something else is very likely to go wrong (often causing
       much more problems).

       Protected multilib versions: libgcc-4.4.7-4.el6.i686 != libgcc-4.4.7-3.el6.x86_64
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles –nodigest
 
 
Solution :
------------
      As highlighted in the above error, since the both versions are different we are getting this error message
     The solution that worked is, by installing the 64-bit and 32-bit versions together at the same timeas mentioned below:
        yum install -y libgcc.x86_64 libgcc.i686 
 
 
   

Python - important commands

How to get the list of installed packages in your python

using Pip
------------

execute the following command in the environment where you would like to know the list of installed python packages

> pip freeze


If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.

How to get the paths of the imported packages 

# python 3

import sys
import pprint

# pretty print loaded modules
pprint.pprint(sys.modules)