Sunday, July 7, 2019
Friday, July 5, 2019
Docker Image creation using Dockerfile
Dockers
Dockerfile :
Dockerfile is used to build the new docker image
Structure and Different sections and/or commonly used commands in Dockerfile are :
- Base Image [FROM]
- contains the Base Image that is used (downloaded from Repository) to create the new Image. The Base Image is referred in this section as follows:
- FROM baseImageFullPath:Tag
- where baseImageFullpath includes the repositoryname and ImageName
- Arguments [ARG]
- We can supply the set of user defined arguments that are useful in building different versions of Images using one uniform dockerfile
- For example, We can have an argument like VERSION, that can be used to build an Image from different base versions of the baseImage . Similarly we can have arguments to pass the PROXY SERVER , INSTALLATION DIRECTORY, BIN DIRECTORY e.tc
- Arguments are mentioned in dockerfile as : ARG [ArgumentName]
- Arguments values are passed to the dockerfile while building the Image using the docker build command as docker build --build-arg ArgumentName=ArgumentValue. --build-arg ArgumentName2=ArgumentValue2 ....
- Environments [ENV]:
- Environment variables are the variables that can be available as environmental variables when the Image was loaded and executing
- Environmental variables are defined in Dockerfile as : ENV EnvironmentalVariableName=Value
- Installation new Packages or Softwares into the Image [RUN, COPY, ADD, ...]
- Depending upon the requirements, we might need to add new packages (or) code (or) data into the new Image
- RUN command is used to execute any unix command in Image creation . For example if we want to install a new python package, then we can use RUN pip install [packageName]. Similarly if we want download , untar and install any new software from the internet, then we can the respective commands using RUN as RUN curl -O -L https://......./ && tar -xvf <> ...
- NOTE : to reduce the size of the final Image, it is advisable to combine multiple RUN command usages in consecutive commands into single RUN command. This will reduce the intermediate cache and layers created for each RUN command
- Similar to RUN command, there is COPY and ADD commands that can be used to copy files from local machine to the Image
- Clean up
- Once the required packages are installed , it is good to remove the temporary and intermediate files created during the Image creation . Depending upon the Image OS type we can use the required cleanup commands like yum remove or rm
- Working Dir [WORKDIR] : specifies the Current Working Directory , when the Image is started to execute
- startup command [CMD]: Startup command
- Entry Point [ENTRYPOINT] : Specifies the command or script that will be executed while the Image is getting executed as part of the startup command
- Ports expose [EXPOSE] : Specifies any ports that are to exposed to the outside
Useful Docker Commands
1. To list all docker containers in the system
docker images
2. To list all Images that are currently loaded and executing
docker ps
3. tagging one docker with some other name
docker tag
4. To know more details (like tags, base, ...) and layers in the docker Image
docker inspect
5. To Pull an Image from the repository to local system
docker pull
6. To Push an Image to the repository
docker push
7. To use (enter into ) the already executing docker Image
docker exec -it
8. To user ( enter into ) the docker that is not in execution
docker run -it
step1 : get the docker container id using the docker ps command
Step2 : using the docker cp command copy the file or directories to/from the container and to the local file-system , using the docker container id as the host-machine
eg: copying to the container
docker cp file-name docker-container-id:/destination-path
copying from the container
docker cp docker-container-id:/src-path/file destination-path
Tuesday, April 2, 2019
Hypothesis Testing for ML claims
Hypothesis Testing
What is Hypothesis:
It is a claim made by a person or organization
eg: Average salary of an IT employee with 4 years of experience is 10L in India
What is Hypothesis Testing:
Is a process used to either reject or accept the Hypothesis
How:
Before understanding the process, we need define the following two hypothesis from the original hypothesis
1. Null Hypothesis : The original hypothesis (claim) is wrong i.e not True
2. Alternative Hypothesis : Is the compliment of Null hypothesis
After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)
This Process is explained below:
1. Define or identify the Null hypothesis of the claim
2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
3. Check the validity of the Null hypothesis using the sample of evidence
Based on the results:
If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
If the Null Hypothesis can be rejected means original claim can be retained
Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
- calculate the standard difference between estimated value (eg: Mean/Average) to hypothesis Value (Mean/Average)
- Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis
Xbar : mean of the samples
mu : mean of the Null hypothesis
sigma: standard deviation
N : number of samples
standard Error of the sampling distribution = sigma/sqrt(N)
standard distance between the value from sample and hypothesis is :
(xBar - Mu)/sigma/sqrt(N)
This is also called the test statistics value
- Calculate the p-Value :
What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
i.e P(Observing the test statistics Value | Null hypothesis is True)
p-Value will be reduced if test statistics value is increased
How to calculate the p-Value:
Depending upon the distribution to be used (covered later)
- Decision:
Have the significance Level (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not
The statistic value at significance level (alpha) is called the critical Value
In Right-Tailed Test ( if Null Hypothesis is <= OR alternativeHypothesis is > )
If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
This is same as :
If p-value is lessthan < alpha value => reject Null Hypothesis
Rejection (rejection of Null Hypothesis) region is at right side from alpha
In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
Rejection (rejection of Null Hypothesis) region is at left side from alpha
Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != )
rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides
What is Hypothesis:
It is a claim made by a person or organization
eg: Average salary of an IT employee with 4 years of experience is 10L in India
What is Hypothesis Testing:
Is a process used to either reject or accept the Hypothesis
How:
Before understanding the process, we need define the following two hypothesis from the original hypothesis
1. Null Hypothesis : The original hypothesis (claim) is wrong i.e not True
2. Alternative Hypothesis : Is the compliment of Null hypothesis
After defining the Null Hypothesis , the Hypothesis testing will be limited to either rejection or retaining the Null Hypothesis (i.e original Hypothesis is rejected or acccepted)
This Process is explained below:
1. Define or identify the Null hypothesis of the claim
2. Initially start with the assumption that Null Hypothesis is TRUE (i.e actual claim is wrong )
3. Check the validity of the Null hypothesis using the sample of evidence
Based on the results:
If the Null Hypothesis can be retained means that implies the original claim (hypothesis) can be rejected
If the Null Hypothesis can be rejected means original claim can be retained
Detailed Process:
- Define the Null Hypothesis
- Calculate the standardized distance between the estimated value to the hypothesis value
- calculate the standard difference between estimated value (eg: Mean/Average) to hypothesis Value (Mean/Average)
- Calculate the standard distance (in terms of number of standard deviations) between the value of parameter estimated from the samples and the value of the Null hypothesis
Xbar : mean of the samples
mu : mean of the Null hypothesis
sigma: standard deviation
N : number of samples
standard Error of the sampling distribution = sigma/sqrt(N)
standard distance between the value from sample and hypothesis is :
(xBar - Mu)/sigma/sqrt(N)
This is also called the test statistics value
- Calculate the p-Value :
What is p-Value : conditional probability. of observing the test statistics value when the Null hypothesis is True
i.e P(Observing the test statistics Value | Null hypothesis is True)
p-Value will be reduced if test statistics value is increased
How to calculate the p-Value:
Depending upon the distribution to be used (covered later)
- Decision:
Have the significance Level (alpha) (generally 0.05) depending upon the context of the problem , the rejection or acceptance of the Null hypothesis is depending upon the p-value is crosses the threshold value (alpha) or not
The statistic value at significance level (alpha) is called the critical Value
In Right-Tailed Test ( if Null Hypothesis is <= OR alternativeHypothesis is > )
If test statistic value ( calculated statistics value ) > critical value => reject Null hypothesis
This is same as :
If p-value is lessthan < alpha value => reject Null Hypothesis
Rejection (rejection of Null Hypothesis) region is at right side from alpha
In Left-Tailed Test ( if Null Hypothesis is >= OR alternativeHypothesis is < )
If test statistic value ( calculated statistics value ) < critical value => reject Null hypothesis
Rejection (rejection of Null Hypothesis) region is at left side from alpha
Two Tailed Test ( if Null Hypothesis is = OR alternative Hypothesis is != )
rejection (rejecting the NULL hypothesis ) region is alpha/2 in both sides
Monday, April 1, 2019
yum install - same rpm of different architectures (32-bit and 64-bit) in same development environment
Requirement:
----------------
We have one development environment ( or docker container used for development )
For development we need 64-bit package installation of certain rpm
Later for some other requirements we need the 32-bit package of the same rpm to be installed in the same environment ( or docker container)
Problem :
------------
Since there was already an existing package of 64-bit version was already available, installation of the 32-bit version returned the following error during the "yum install"
for example while installing the libgcc of 32 version, we got the following error message
Error: Multilib version problems found. This often means that the root
cause is something else and multilib version checking is just
pointing out that there is a problem. Eg.:
1. You have an upgrade for libgcc which is missing some
dependency that another package requires. Yum is trying to
solve this by installing an older version of libgcc of the
different architecture. If you exclude the bad architecture
yum will tell you what the root cause is (which package
requires what). You can try redoing the upgrade with
--exclude libgcc.otherarch ... this should give you an error
message showing the root cause of the problem.
2. You have multiple architectures of libgcc installed, but
yum can only see an upgrade for one of those arcitectures.
If you don't want/need both architectures anymore then you
can remove the one with the missing update and everything
will work.
3. You have duplicate versions of libgcc installed already.
You can use "yum check" to get yum show these errors.
...you can also use --setopt=protected_multilib=false to remove
this checking, however this is almost never the correct thing to
do as something else is very likely to go wrong (often causing
much more problems).
Protected multilib versions: libgcc-4.4.7-4.el6.i686 != libgcc-4.4.7-3.el6.x86_64
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles –nodigest
Solution :------------
As highlighted in the above error, since the both versions are different we are getting this error message
The solution that worked is, by installing the 64-bit and 32-bit versions together at the same timeas mentioned below:
yum install -y libgcc.x86_64 libgcc.i686
Python - important commands
How to get the list of installed packages in your python
using Pip
------------
execute the following command in the environment where you would like to know the list of installed python packages
> pip freeze
If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.
How to get the paths of the imported packages
using Pip
------------
execute the following command in the environment where you would like to know the list of installed python packages
> pip freeze
If you are having multiple python versions on the same machine, then make sure that you are using the correct version of the pip that you wanted to get the information about.
How to get the paths of the imported packages
# python 3 import sys import pprint # pretty print loaded modules pprint.pprint(sys.modules)
Thursday, June 21, 2018
Elastic Search - blog 1 ( inverted Index )
Elastic Search uses inverted Index data structure
Inverted Index Data Structure representation of sample documents
Doc1: This is first sample document
Doc2: Second document for the Inverted Index
Doc3: Final sample document
Inverted Index of the above three documents
dictionary referred Documents
term frequency
this 1 1
is 1 1
first 1 1
sample 2 1, 3
document 3 1,2,3
second 1 2
for 1 2
the 1 2
inverted 1 2
index 1 2
final 1 3
Index :
lists terms in specific document
Some Advantages of inverted Index :
getting the list of all document that contains the given term or terms
AND and OR of the terms
prefix based searching
suffix based searching ( reverse the terms , reverse the search term , search by prefix
example: original term : fantastic , search suffix : astic then
reverse term : citsatnaf, reverse search suffix : citsa , now do prefix based search , I.e all terms ( reverse terms ) started with reverse search suffix )
finding substrings ( by splitting the terms in n-grams and search for strings )
Numbers searching e.g. between 100 to 199 ( Lucene stores 123 as "1"-hundreds,"2"-tens and "3", so searching for 100 to 199 will get all terms with prefix "1"-hundreds and it will avoid getting others numbers like 1234 )
References :
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
Tuesday, October 1, 2013
Workflow solution creation , registration and usage in CRM2011
In this blog I would like to explain some practical steps about the Workflow lifecycle in CRM 2011.
The steps includes
1. Creation of the workflow solution ( C# solution ) and generation of the assembly
2. Registering the workflow assembly for CRM usage
3. Using the workflow assembly in CRM
1. Creation of the workflow solution and generation of the assembly
The workflow can be defined as group of activities that performs the required process.
a) Creating the activities of the Workflow steps in the C# solution
Workflow activities are defined in the System.Activities.dll of .NETFramework, so we need to add this dll to the workflow solution.
Define the Class :
The workflow activities that are used in CRM are derived from the "CodeActivity" base class.
so define the workflow activity class derived from the "CodeActivity" class.
Override the abstract methods of the CodeActivity class :
Execute() Method:
CodeActivity class declares the Execute() method as the abstract method. All the derived classes should define the execute() method in the derived classes i.e in the workflow activity classes.
eg: protected override void Execute(CodeActivityContext executionContext)
Some common attributes and code useful in the execute() method are:
1. IWorkflowContext (defined in Microsoft.Xrm.Sdk.Workflow.dll) which is derived from the IExecutionContext: contains the details about the CRM attributes and parameters that can be used in the workflow execution. we can get the IWorkflowContext as follows
IWorkflowContext context = executionContext.GetExtension();
Using the IWorkflowContext , we can obtains the CRM context values like current records GUID, , Current User's Guid, OrganizationName, OrganizationId etc., as follows:
getting the current record Guid : WorkflowContext.PrimaryEntityId;
getting the current Usee Guid : WorkflowContext.UserId;
2. Organization Service and Organization Service Factory:
To execute any CRM SDK API's, we need to have the organization service or OrganizationServiceProxy. These objects can be created after getting the IOrganizationServiceFactory object from the CodeActivityContext and then by creating the service/serviceproxy by sending the user Guid to the Service factor
IOrganizationServiceFactory serviceFactory = executionContext.GetExtension();
IOrganizationService service = serviceFactory.CreateOrganizationService(context.UserId);
How to define and extract the Input and Output parameters required in the Workflow execution:
Defining the parameters: The input and output parameters for the workflow activity can be defined in the workflow activity class as follows:
a) String data type as input
[Input("Account ID")]
[Default("0")]
public InArgument AccountName { get; set; }
Accessing the parameters : Input and output parameters can be obtained from the CodeActivityContext class passed to the execute() method.
string accountName = AccountName.Get(executionContext);
Where executionContext is the argument of type CodeActivityContext for the function execute()
b) Entity Reference type as input
[Input("EntityReference input")]
[ReferenceTarget()]
[Default(, )]
public InArgument { get; set; }
This data type can be accessed and assigned to the local variable of type Guid, within the execute() function as :
Guid localGuid = (.Get(executionContext)).Id;
After defining the execute() method with the required functionality for the workflow activity, compile and generate the workflow dll.
This dll needs to be registered with the CRM. The steps that are to be followed for workflow registration into CRM 2011 will be available in my next blog on Registering the workflow assembly for CRM usage
The steps includes
1. Creation of the workflow solution ( C# solution ) and generation of the assembly
2. Registering the workflow assembly for CRM usage
3. Using the workflow assembly in CRM
1. Creation of the workflow solution and generation of the assembly
The workflow can be defined as group of activities that performs the required process.
a) Creating the activities of the Workflow steps in the C# solution
Workflow activities are defined in the System.Activities.dll of .NETFramework, so we need to add this dll to the workflow solution.
Define the Class :
The workflow activities that are used in CRM are derived from the "CodeActivity" base class.
so define the workflow activity class derived from the "CodeActivity" class.
eg : public class Sampleworkflow : CodeActivity {}
Execute() Method:
CodeActivity class declares the Execute() method as the abstract method. All the derived classes should define the execute() method in the derived classes i.e in the workflow activity classes.
eg: protected override void Execute(CodeActivityContext executionContext)
Some common attributes and code useful in the execute() method are:
1. IWorkflowContext (defined in Microsoft.Xrm.Sdk.Workflow.dll) which is derived from the IExecutionContext: contains the details about the CRM attributes and parameters that can be used in the workflow execution. we can get the IWorkflowContext as follows
IWorkflowContext context = executionContext.GetExtension
Using the IWorkflowContext , we can obtains the CRM context values like current records GUID, , Current User's Guid, OrganizationName, OrganizationId etc., as follows:
getting the current record Guid : WorkflowContext.PrimaryEntityId;
getting the current Usee Guid : WorkflowContext.UserId;
2. Organization Service and Organization Service Factory:
To execute any CRM SDK API's, we need to have the organization service or OrganizationServiceProxy. These objects can be created after getting the IOrganizationServiceFactory object from the CodeActivityContext and then by creating the service/serviceproxy by sending the user Guid to the Service factor
IOrganizationServiceFactory serviceFactory = executionContext.GetExtension
IOrganizationService service = serviceFactory.CreateOrganizationService(context.UserId);
How to define and extract the Input and Output parameters required in the Workflow execution:
Defining the parameters: The input and output parameters for the workflow activity can be defined in the workflow activity class as follows:
a) String data type as input
[Input("Account ID")]
[Default("0")]
public InArgument
Accessing the parameters : Input and output parameters can be obtained from the CodeActivityContext class passed to the execute() method.
string accountName = AccountName.Get
Where executionContext is the argument of type CodeActivityContext for the function execute()
b) Entity Reference type as input
[Input("EntityReference input")]
[ReferenceTarget(
[Default(
public InArgument
This data type can be accessed and assigned to the local variable of type Guid, within the execute() function as :
Guid localGuid = (
This dll needs to be registered with the CRM. The steps that are to be followed for workflow registration into CRM 2011 will be available in my next blog on Registering the workflow assembly for CRM usage
Subscribe to:
Posts (Atom)