Sunday, July 7, 2019
Friday, July 5, 2019
Docker Image creation using Dockerfile
Dockers
Dockerfile :
Dockerfile is used to build the new docker image
Structure and Different sections and/or commonly used commands in Dockerfile are :
- Base Image [FROM]
- contains the Base Image that is used (downloaded from Repository) to create the new Image. The Base Image is referred in this section as follows:
- FROM baseImageFullPath:Tag
- where baseImageFullpath includes the repositoryname and ImageName
- Arguments [ARG]
- We can supply the set of user defined arguments that are useful in building different versions of Images using one uniform dockerfile
- For example, We can have an argument like VERSION, that can be used to build an Image from different base versions of the baseImage . Similarly we can have arguments to pass the PROXY SERVER , INSTALLATION DIRECTORY, BIN DIRECTORY e.tc
- Arguments are mentioned in dockerfile as : ARG [ArgumentName]
- Arguments values are passed to the dockerfile while building the Image using the docker build command as docker build --build-arg ArgumentName=ArgumentValue. --build-arg ArgumentName2=ArgumentValue2 ....
- Environments [ENV]:
- Environment variables are the variables that can be available as environmental variables when the Image was loaded and executing
- Environmental variables are defined in Dockerfile as : ENV EnvironmentalVariableName=Value
- Installation new Packages or Softwares into the Image [RUN, COPY, ADD, ...]
- Depending upon the requirements, we might need to add new packages (or) code (or) data into the new Image
- RUN command is used to execute any unix command in Image creation . For example if we want to install a new python package, then we can use RUN pip install [packageName]. Similarly if we want download , untar and install any new software from the internet, then we can the respective commands using RUN as RUN curl -O -L https://......./ && tar -xvf <> ...
- NOTE : to reduce the size of the final Image, it is advisable to combine multiple RUN command usages in consecutive commands into single RUN command. This will reduce the intermediate cache and layers created for each RUN command
- Similar to RUN command, there is COPY and ADD commands that can be used to copy files from local machine to the Image
- Clean up
- Once the required packages are installed , it is good to remove the temporary and intermediate files created during the Image creation . Depending upon the Image OS type we can use the required cleanup commands like yum remove or rm
- Working Dir [WORKDIR] : specifies the Current Working Directory , when the Image is started to execute
- startup command [CMD]: Startup command
- Entry Point [ENTRYPOINT] : Specifies the command or script that will be executed while the Image is getting executed as part of the startup command
- Ports expose [EXPOSE] : Specifies any ports that are to exposed to the outside
Useful Docker Commands
1. To list all docker containers in the system
docker images
2. To list all Images that are currently loaded and executing
docker ps
3. tagging one docker with some other name
docker tag
4. To know more details (like tags, base, ...) and layers in the docker Image
docker inspect
5. To Pull an Image from the repository to local system
docker pull
6. To Push an Image to the repository
docker push
7. To use (enter into ) the already executing docker Image
docker exec -it
8. To user ( enter into ) the docker that is not in execution
docker run -it
step1 : get the docker container id using the docker ps command
Step2 : using the docker cp command copy the file or directories to/from the container and to the local file-system , using the docker container id as the host-machine
eg: copying to the container
docker cp file-name docker-container-id:/destination-path
copying from the container
docker cp docker-container-id:/src-path/file destination-path
Tuesday, April 2, 2019
Hypothesis Testing for ML claims
Hypothesis Testing
What is Hypothesis:
It is a claim made by a person or organization
eg: Average salary of an IT employee with 4 years of experience is 10L in India
What is Hypothesis Testing:
Is a process used to either reject or accept the Hypothesis
How:
Before understanding the process, we need define the following two hypothesis from the original hypothesis
1. Null Hypothesis : The original hypothesis (claim) is wrong i.e not True
2. Alternative Hypothesis : Is the compliment of Null hypothesis
After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)
This Process is explained below:
1. Define or identify the Null hypothesis of the claim
2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
3. Check the validity of the Null hypothesis using the sample of evidence
Based on the results:
If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
If the Null Hypothesis can be rejected means original claim can be retained
Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
- calculate the standard difference between estimated value (eg: Mean/Average) to hypothesis Value (Mean/Average)
- Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis
Xbar : mean of the samples
mu : mean of the Null hypothesis
sigma: standard deviation
N : number of samples
standard Error of the sampling distribution = sigma/sqrt(N)
standard distance between the value from sample and hypothesis is :
(xBar - Mu)/sigma/sqrt(N)
This is also called the test statistics value
- Calculate the p-Value :
What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
i.e P(Observing the test statistics Value | Null hypothesis is True)
p-Value will be reduced if test statistics value is increased
How to calculate the p-Value:
Depending upon the distribution to be used (covered later)
- Decision:
Have the significance Level (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not
The statistic value at significance level (alpha) is called the critical Value
In Right-Tailed Test ( if Null Hypothesis is <= OR alternativeHypothesis is > )
If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
This is same as :
If p-value is lessthan < alpha value => reject Null Hypothesis
Rejection (rejection of Null Hypothesis) region is at right side from alpha
In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
Rejection (rejection of Null Hypothesis) region is at left side from alpha
Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != )
rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides
What is Hypothesis:
It is a claim made by a person or organization
eg: Average salary of an IT employee with 4 years of experience is 10L in India
What is Hypothesis Testing:
Is a process used to either reject or accept the Hypothesis
How:
Before understanding the process, we need define the following two hypothesis from the original hypothesis
1. Null Hypothesis : The original hypothesis (claim) is wrong i.e not True
2. Alternative Hypothesis : Is the compliment of Null hypothesis
After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)
This Process is explained below:
1. Define or identify the Null hypothesis of the claim
2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
3. Check the validity of the Null hypothesis using the sample of evidence
Based on the results:
If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
If the Null Hypothesis can be rejected means original claim can be retained
Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
- calculate the standard difference between estimated value (eg: Mean/Average) to hypothesis Value (Mean/Average)
- Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis
Xbar : mean of the samples
mu : mean of the Null hypothesis
sigma: standard deviation
N : number of samples
standard Error of the sampling distribution = sigma/sqrt(N)
standard distance between the value from sample and hypothesis is :
(xBar - Mu)/sigma/sqrt(N)
This is also called the test statistics value
- Calculate the p-Value :
What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
i.e P(Observing the test statistics Value | Null hypothesis is True)
p-Value will be reduced if test statistics value is increased
How to calculate the p-Value:
Depending upon the distribution to be used (covered later)
- Decision:
Have the significance Level (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not
The statistic value at significance level (alpha) is called the critical Value
In Right-Tailed Test ( if Null Hypothesis is <= OR alternativeHypothesis is > )
If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
This is same as :
If p-value is lessthan < alpha value => reject Null Hypothesis
Rejection (rejection of Null Hypothesis) region is at right side from alpha
In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
Rejection (rejection of Null Hypothesis) region is at left side from alpha
Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != )
rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides
Monday, April 1, 2019
yum install - same rpm of different architectures (32-bit and 64-bit) in same development environment
Requirement:
----------------
We have one development environment ( or docker container used for development )
For development we need 64-bit package installation of certain rpm
Later for some other requirements we need the 32-bit package of the same rpm to be installed in the same environment ( or docker container)
Problem :
------------
Since there was already an existing package of 64-bit version was already available, installation of the 32-bit version returned the following error during the "yum install"
for example while installing the libgcc of 32 version, we got the following error message
Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem. Eg.:
1. You have an upgrade for libgcc which is missing some
dependency that another package requires. Yum is trying to
solve this by installing an older version of libgcc of the
different architecture. If you exclude the bad architecture
yum will tell you what the root cause is (which package
requires what). You can try redoing the upgrade with
--exclude libgcc.otherarch ... this should give you an error
message showing the root cause of the problem.
2. You have multiple architectures of libgcc installed, but
yum can only see an upgrade for one of those arcitectures.
If you don't want/need both architectures anymore then you
can remove the one with the missing update and everything
will work.
3. You have duplicate versions of libgcc installed already.
You can use "yum check" to get yum show these errors.
...you can also use --setopt=protected_multilib=false to remove
this checking, however this is almost never the correct thing to
do as something else is very likely to go wrong (often causing
much more problems).
Protected multilib versions: libgcc-4.4.7-4.el6.i686 != libgcc-4.4.7-3.el6.x86_64
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles –nodigest
Solution :------------
As highlighted in the above error, since the both versions are different we are getting this error message
The solution that worked is, by installing the 64-bit and 32-bit versions together at the same timeas mentioned below:
yum install -y libgcc.x86_64 libgcc.i686
Python - important commands
How to get the list of installed packages in your python
using Pip
------------
execute the following command in the environment where you would like to know the list of installed python packages
> pip freeze
If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.
How to get the paths of the imported packages
using Pip
------------
execute the following command in the environment where you would like to know the list of installed python packages
> pip freeze
If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.
How to get the paths of the imported packages
# python 3 import sys import pprint # pretty print loaded modules pprint.pprint(sys.modules)
Subscribe to:
Posts (Atom)