81015 A few days ago I started work on the last project in the Udacity Data Analyst Nanodegree which is about A/B Testing.

An AB test is commonly used to determine which of two possible options have a higher probability of obtaining a desired result. In this project a hypothetical company operates a website and collects data about the CR (conversion rate) of what is referred to as a landing page (a page designed to receive new viewers and hopefully lead them through the marketing funnel resulting in a sale or conversion of some type like signing up for a course for example or some other type of purchase).

We begin here by defining some terms used in this analytical process. First, the name of the process as was mentioned earlier is called the AB Test. This test is built upon a method of analysis that implements two competing theories one of which is called the Null Hypothesis and its competing theory is called the Alternative Hypothesis.

The null hypothesis often mathematically represented as H_0 is used to represent our belief about the current situation, in this case the probability that a new viewer of a given website landing page will be converted into an enrolled student in one or more courses. This null hypothesis is also usually associated with what is called the control group or the group that the results of the alternative is being compared with.

The alternative hypothesis on the other hand is mathematically represented by H_1 and represents another possible website landing page that we have created and collected data on as well. The test is run for a period of time in order to collect enough data to enable a statistical comparison of the CR (conversion rates) of the two different hypothesis.

The data analyst’s job requires preparing the data and getting it into the proper form rendering it useable to analyze. Then the analyst preforms statistical calculations on the data to determine which of the two hypothesis are more likely provide the best CR (conversion rate).

A simple AB test, generally involves the interaction of two explanatory variables where as a more complicated analysis such as one using logistics regression might involve multiple explanatory variables as well as possible interactions between two or more of those multiple explanatory variables.

Another important feature involved in the methodology of AB testing is the Alpha and Beta rates. An Alpha rate represents the percentage of acceptable Type I errors. An example of a type one error would be concluding that someone was guilty of a crime when in fact they are innocent and we ended up convicting an innocent person and potentially sending them to prison.

Conversely, a Type II error is the reverse and is mathematically represented by the Beta rate. An example of this kind of error is when we conclude that a person who is actually guilty of a crime is innocent and we incorrectly let a guilty individual go free. Although this is also a mistake, it is not considered to be as severe of a mistake as sending an innocent person to prison or in some cases to death.

So far I have made good progress on this project and should be done within a few more days.