Descriptive Statistics in R
Take your programming skills to the next level with interactive lessons and real-world projects.
Explore Coddy →Descriptive statistics are essential tools for summarizing and understanding data in R. They provide insights into the central tendency, dispersion, and shape of your dataset.
Measures of Central Tendency
R offers several functions to calculate measures of central tendency:
Mean
The mean is the average of all values in a dataset. Calculate it using the mean() function:
data <- c(1, 2, 3, 4, 5)
mean_value <- mean(data)
print(mean_value) # Output: 3
Median
The median is the middle value when the data is ordered. Use the median() function:
median_value <- median(data)
print(median_value) # Output: 3
Mode
R doesn't have a built-in mode function, but you can create one:
get_mode <- function(x) {
unique_x <- unique(x)
unique_x[which.max(tabulate(match(x, unique_x)))]
}
mode_value <- get_mode(c(1, 2, 2, 3, 4, 4, 4, 5))
print(mode_value) # Output: 4
Measures of Dispersion
These statistics describe the spread of your data:
Range
Calculate the range using range() or manually:
data_range <- max(data) - min(data)
print(data_range) # Output: 4
Variance
Variance measures the average squared deviation from the mean. Use var():
variance <- var(data)
print(variance) # Output: 2.5
Standard Deviation
The standard deviation is the square root of the variance. Calculate it with sd():
std_dev <- sd(data)
print(std_dev) # Output: 1.581139
Measures of Shape
These statistics describe the distribution of your data:
Skewness
Skewness measures the asymmetry of the distribution. Use the moments package:
library(moments)
skewness_value <- skewness(data)
print(skewness_value) # Output: 0
Kurtosis
Kurtosis measures the tailedness of the distribution:
kurtosis_value <- kurtosis(data)
print(kurtosis_value) # Output: 1.7
Summary Statistics
R provides a convenient summary() function to get an overview of your data:
summary_stats <- summary(data)
print(summary_stats)
Visualizing Descriptive Statistics
Visualizations can help understand your data better. Use the ggplot2 package for creating informative plots:
library(ggplot2)
ggplot(data.frame(x = data), aes(x = x)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
geom_vline(aes(xintercept = mean(x)), color = "red", linetype = "dashed", size = 1) +
labs(title = "Histogram with Mean", x = "Value", y = "Frequency")
Best Practices
- Always check for missing values before calculating statistics.
- Consider using robust statistics (e.g., median instead of mean) for skewed data.
- Visualize your data to get a better understanding of its distribution.
- Use the dplyr package for efficient data manipulation before analysis.
Mastering descriptive statistics in R is crucial for exploratory data analysis and lays the foundation for more advanced statistical techniques like hypothesis testing and regression analysis.