81015 A few days ago I started work on the last project in the Udacity Data Analyst Nanodegree which is about A/B Testing.

An AB test is commonly used to determine which of two possible options have a higher probability of obtaining a desired result. In this project a hypothetical company operates a website and collects data about the CR (conversion rate) of what is referred to as a landing page (a page designed to receive new viewers and hopefully lead them through the marketing funnel resulting in a sale or conversion of some type like signing up for a course for example or some other type of purchase).

We begin here by defining some terms used in this analytical process. First, the name of the process as was mentioned earlier is called the AB Test. This test is built upon a method of analysis that implements two competing theories one of which is called the Null Hypothesis and its competing theory is called the Alternative Hypothesis.

The null hypothesis often mathematically represented as H_0 is used to represent our belief about the current situation, in this case the probability that a new viewer of a given website landing page will be converted into an enrolled student in one or more courses. This null hypothesis is also usually associated with what is called the control group or the group that the results of the alternative is being compared with.

The alternative hypothesis on the other hand is mathematically represented by H_1 and represents another possible website landing page that we have created and collected data on as well. The test is run for a period of time in order to collect enough data to enable a statistical comparison of the CR (conversion rates) of the two different hypothesis.

The data analyst’s job requires preparing the data and getting it into the proper form rendering it useable to analyze. Then the analyst preforms statistical calculations on the data to determine which of the two hypothesis are more likely provide the best CR (conversion rate).

A simple AB test, generally involves the interaction of two explanatory variables where as a more complicated analysis such as one using logistics regression might involve multiple explanatory variables as well as possible interactions between two or more of those multiple explanatory variables.

Another important feature involved in the methodology of AB testing is the Alpha and Beta rates. An Alpha rate represents the percentage of acceptable Type I errors. An example of a type one error would be concluding that someone was guilty of a crime when in fact they are innocent and we ended up convicting an innocent person and potentially sending them to prison.

Conversely, a Type II error is the reverse and is mathematically represented by the Beta rate. An example of this kind of error is when we conclude that a person who is actually guilty of a crime is innocent and we incorrectly let a guilty individual go free. Although this is also a mistake, it is not considered to be as severe of a mistake as sending an innocent person to prison or in some cases to death.

So far I have made good progress on this project and should be done within a few more days.

81022 Successfully completed the 4th Project of the Udacity Data Analyst Nanodegree Term 1 yesterday. This project was on the topic of A/B Testing using a 294,478 row, 5 column data set and eventually joined with a second data set. This project develops a better understanding of hypothesis testing using null and alternate hypothesis, confidence intervals, Type I and Type II error rates or Alpha and Beta rates. The project flow was from manual probability calculations to using z_tests and p_value functions from the ‘statsmodels.api’ library then to multi variable logistics regression analysis and determination of statistical significance from the results of all three. The project also involved bootstrapping simulated data using a numpy random binomial generator. Next step is to start Term 2 and continue the Deep Learning Nanodgree Term 1.

81021 Submitted my fourth and final project for the first term in the Udacity Data Analyst Nanodegree that I’ve been working on for several months now.  Hopefully it will pass the requirement.

81006 It’s been a frustrating day concerning the progress on my 3rd project in the Udacity Data Analyst Nanodegree. The problem started this afternoon when the jupyter notebook I have been working in for about a month crashed. In other words I am getting the error message that the kernel died and although it tries to restart that’s not happening.

I’ve submitted an issue with the jupyter help on GitHub after researching the issue online and not finding the correct answer. Hopefully somebody will be contacting me soon on how to resolve the issue.

This all happened right after I updated the Seaborn application but I am not sure if that had anything to do with it or not it might have also been caused by the Apple Xcode application that I’ve had conflicts with many times before. Who knows?

Update 3 hrs later. After removing the Anaconda application and seaborn application and reinstalling the Anaconda application the kernel is now working again. I’ve submitted the 3rd project for the second time and passed the requirements and moving on to the forth and last project in term one now.

Update 12 hours later: The project submission received a successful review since my last update and I’ve resumed working on the fourth and final project A/B testing which will complete this Nanodegree. Two out of three down and one to go.

It’s been several months now since beginning three programs with Udacity out of Mountain View California.  The first, was the Flying Car Nanodegree. The second is the Data Analyst Nanodegree. Finally, the third program is the Deep Learning Nanodegree.

The Flying Car Nanodegree, FCND for short was quite challenging, involving writing code for linear algebra relating to physics and  geometry.  I had doubts I’d complete this nanodegree up until the last project submission. Fortunately I completed it and my persistence paid off after many months.

Now I am in the last days before completing the Data Analyst Nanodegree and looking forward to trying to complete the Deep Learning one also.

I highly recommend to anyone, enrolling in an online course about whatever topic interests you. Completing an online course is highly rewarding just based on the new knowledge you will have gained. The only draw back is that for me at least, these have taken a huge amount of my time and I’ve had to give up a lot of activities to make the time that I needed to work on this nearly all of the online courses that I’ve taken in the last ten years.

Getting back to the Data Analyst Nanodegree also referred to as the DAND.  This program is divided into two terms and I am almost done with the first one. The first term is broken into three modules; introduction to python, introduction to data analysis, and practical statistics.  Each module of instruction includes a project which requires the application of the concepts and techniques covered in the module. Therefore completing a nanodegree will give you the confidence needed to be proficient in the course topic.

Since I began taking courses involving computer programming I’ve been amazed at the results that can be derived from developing a software program. That’s it for today’s thoughts on data analysis. Thanks for stopping by!

Beginning October 2108, the intent of this blog will be discussing programming of autonomous agents, artificial intelligence, data science and analysis and supporting tools to facilitate accomplishing the tasks in these domains.



The video clip above was my submission for Project 2 of the Udacity Flying Car Nanodegree.

This is a demonstration of autonomous 3D flight planning using the A-star search algorithm to find a path solution for a simulated Quadrotor through a simulated 3D section of downtown San Francisco California as the physical environment.

This solution needs some improvement to reduce the number of nodes along the path to eliminate the need for the Quadrotor to double back when overshooting a given node. The process of reducing the number of waypoints or  nodes is called path pruning and there are several techniques available for achieving this.