Friday, July 5, 2019

Docker Image creation using Dockerfile

Dockers


Dockerfile :

     Dockerfile is used to build the new docker image

      Structure and Different sections and/or commonly used commands in Dockerfile are :
  1.       Base Image [FROM]
  • contains the Base Image that is used (downloaded from Repository) to create the new Image. The Base Image is referred in this section as follows:
  • FROM       baseImageFullPath:Tag 
  • where baseImageFullpath includes the repositoryname and ImageName
  1.       Arguments [ARG]
  • We can supply the set of user defined arguments that are useful in building different versions of Images using one uniform dockerfile
  • For example, We can have an argument like VERSION, that can be used to build an Image from different base versions of the baseImage . Similarly we can have arguments to pass the PROXY SERVER , INSTALLATION DIRECTORY, BIN DIRECTORY e.tc
  • Arguments are mentioned in dockerfile as : ARG [ArgumentName] 
  • Arguments values are passed to the dockerfile while building the Image using the docker build command as docker build --build-arg ArgumentName=ArgumentValue. --build-arg ArgumentName2=ArgumentValue2 ....
  1.       Environments [ENV]:
    • Environment variables are the variables that can be available as environmental variables when the Image was loaded and executing 
    • Environmental variables are defined in Dockerfile as : ENV EnvironmentalVariableName=Value 
    •  
  2.       Installation new Packages  or Softwares into the Image [RUN, COPY, ADD, ...]
    • Depending upon the requirements, we might need to add new packages (or) code (or) data into the new Image
    • RUN command is used to execute any unix command in Image creation . For example if we want to install a new python package, then we can use RUN pip install [packageName]. Similarly if we want download , untar and install any new software from the internet, then we can the respective commands using RUN as RUN curl -O -L https://......./ && tar -xvf <> ...
    • NOTE : to reduce the size of the final Image, it is advisable to combine multiple RUN command usages in consecutive commands into single RUN command. This will reduce the intermediate cache and layers created for each RUN command 
    • Similar to RUN command, there is COPY and ADD commands that can be used to copy files from local machine to the Image
  3.       Clean up 
    • Once the required packages are installed , it is good to remove the temporary and intermediate files created during the Image creation . Depending upon the Image OS type we can use the required cleanup commands like yum remove or rm
    •   
The following sections and steps are used w.r.t to setup the starting environment and commands in the Image
  1.       Working Dir [WORKDIR] : specifies the Current Working Directory , when the Image is started to execute
  2.       startup command [CMD]: Startup command
  3.       Entry Point [ENTRYPOINT] : Specifies the  command or script that will be executed while the Image is getting executed as part of the startup command
  4.       Ports expose [EXPOSE] : Specifies any ports that are to exposed to the outside


Useful Docker Commands


1. To list all docker containers in the system

docker images

2. To list all Images that are currently loaded and executing

docker ps 

3. tagging one docker with some other name

docker tag

4. To know more details (like tags, base, ...) and layers in the docker Image

docker inspect

5. To Pull an Image from the repository to local system

docker pull 

6. To Push an Image to the repository

docker push

7. To use (enter into ) the already executing docker Image
    docker exec -it 

8. To user ( enter into ) the docker that is not in execution
     docker run -it   




9. To copy the files to or from the Docker Container
     step1 : get the docker container id using the docker ps command
     Step2 : using the docker cp command copy the file or directories to/from the container and to the local file-system , using the docker container id as the host-machine
              eg: copying to the container
                       docker cp file-name docker-container-id:/destination-path
                    copying from the container
                       docker cp docker-container-id:/src-path/file destination-path
                       



Tuesday, April 2, 2019

Hypothesis Testing for ML claims

                                                                Hypothesis Testing

What is Hypothesis:
    It is a claim made by a person or organization
    eg: Average salary of an IT employee with 4 years of experience is 10L in India

What is Hypothesis Testing:
    Is a process used to either reject or accept the Hypothesis

 How:
         Before understanding the process, we need define the following two hypothesis from the original hypothesis
          1. Null Hypothesis : The original hypothesis (claim)  is wrong i.e not True
         2. Alternative Hypothesis : Is the compliment of Null hypothesis

         After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)

        This Process is explained below:
           1. Define or identify the Null hypothesis of the claim
           2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
           3. Check the validity of the Null hypothesis using the sample of evidence
               Based on the results:
                   If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
                   If the Null Hypothesis can be rejected means original claim can be retained

Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
           - calculate the standard difference between estimated value (eg: Mean/Average)  to hypothesis Value (Mean/Average)

          - Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis

                    Xbar : mean of the samples
                    mu : mean of the Null hypothesis
                    sigma: standard deviation
                    N : number of samples
                    standard Error of the sampling distribution = sigma/sqrt(N)
                    standard distance between the  value from sample and hypothesis is :
                                (xBar - Mu)/sigma/sqrt(N)
                   This is also called the test statistics value 

    - Calculate the p-Value : 
                     What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
                       i.e P(Observing the test statistics Value | Null hypothesis is True)
            
                      p-Value will be reduced if test statistics value is increased

                      How to calculate the p-Value:
                             Depending upon the distribution to be used (covered later)
      - Decision:
             Have the significance Level  (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not

               The statistic value at significance level (alpha) is called the critical Value
         
               In Right-Tailed Test   ( if Null Hypothesis is <= OR alternativeHypothesis is > )
                  If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
                 This is same as :
                  If p-value is lessthan < alpha value => reject Null Hypothesis
               
                  Rejection (rejection of Null Hypothesis) region is at right side from alpha  

                In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
                 
                 If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
                 Rejection (rejection of Null Hypothesis) region is at left side from alpha 

                Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != ) 
                   rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides






Monday, April 1, 2019

yum install - same rpm of different architectures (32-bit and 64-bit) in same development environment


Requirement:
----------------

   We have one development environment ( or docker container used for development )
    For development  we need 64-bit package installation of certain rpm
    Later for some other requirements we need the 32-bit package of the same rpm to be installed in the same environment ( or docker container)

Problem :
------------
      Since there was already an existing package of 64-bit version was already available, installation of the 32-bit version returned the following error during the "yum install"
    
     for example while installing the libgcc of 32 version, we got the following error message

Error:  Multilib version problems found. This often means that the root
       cause is something else and multilib version checking is just
       pointing out that there is a problem. Eg.:

         1. You have an upgrade for libgcc which is missing some
            dependency that another package requires. Yum is trying to
            solve this by installing an older version of libgcc of the
            different architecture. If you exclude the bad architecture
            yum will tell you what the root cause is (which package
            requires what). You can try redoing the upgrade with
            --exclude libgcc.otherarch ... this should give you an error
            message showing the root cause of the problem.

         2. You have multiple architectures of libgcc installed, but
            yum can only see an upgrade for one of those arcitectures.
            If you don't want/need both architectures anymore then you
            can remove the one with the missing update and everything
            will work.

         3. You have duplicate versions of libgcc installed already.
            You can use "yum check" to get yum show these errors.

       ...you can also use --setopt=protected_multilib=false to remove
       this checking, however this is almost never the correct thing to
       do as something else is very likely to go wrong (often causing
       much more problems).

       Protected multilib versions: libgcc-4.4.7-4.el6.i686 != libgcc-4.4.7-3.el6.x86_64
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles –nodigest
 
 
Solution :
------------
      As highlighted in the above error, since the both versions are different we are getting this error message
     The solution that worked is, by installing the 64-bit and 32-bit versions together at the same timeas mentioned below:
        yum install -y libgcc.x86_64 libgcc.i686 
 
 
   

Python - important commands

How to get the list of installed packages in your python

using Pip
------------

execute the following command in the environment where you would like to know the list of installed python packages

> pip freeze


If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.

How to get the paths of the imported packages 

# python 3

import sys
import pprint

# pretty print loaded modules
pprint.pprint(sys.modules)