If you’ve ever taken a statistics class at a graduate-level program, most probably your instructor would have begun the course with the Central Limit Theorem (CLT). Before we get started, a refresher on CLT:

Regardless of the distribution of the population, the sampling distribution of sample means is normal, provided that the samples are randomly picked with replacement and the sample size is sufficiently large (n≥30).

In case, the population is normally distributed, the sampling distribution of the sample means is normally distributed even for lower sample sizes.

As we increase our sample size, the mean of the sampling distribution…

Clustering is an unsupervised machine learning technique that groups data points based on similarities. We will be focusing on perhaps the most used (or abused) technique called the K-means Clustering, where…

We’re predicting diamonds today, care to join?

This is an inbuilt dataset in R-studio. We intend to predict the diamond prices based on the features available. There are 53940 records in the dataset. As a ritual, let’s split the data into a test (30%) and train (70%).

Now that we have our training data, let us check the structure of the dataset. I am using R for my analysis.

What’s the deal with the 4Cs — Carat, Cut, Color, Clarity?

A fun part of being a data analyst is an opportunity to learn across domains, and today is no different. Before we analyze the data further, let us understand what each…

All we do in this post is to predict the amount of power generated (in megawatts) based on the wind speed (in meters per second). Before we dive…

