Descriptive statistics are essential tools for summarizing and understanding data in R. They provide insights into the central tendency, dispersion, and shape of your dataset.
R offers several functions to calculate measures of central tendency:
The mean is the average of all values in a dataset. Calculate it using the mean()
function:
data <- c(1, 2, 3, 4, 5)
mean_value <- mean(data)
print(mean_value) # Output: 3
The median is the middle value when the data is ordered. Use the median()
function:
median_value <- median(data)
print(median_value) # Output: 3
R doesn't have a built-in mode function, but you can create one:
get_mode <- function(x) {
unique_x <- unique(x)
unique_x[which.max(tabulate(match(x, unique_x)))]
}
mode_value <- get_mode(c(1, 2, 2, 3, 4, 4, 4, 5))
print(mode_value) # Output: 4
These statistics describe the spread of your data:
Calculate the range using range()
or manually:
data_range <- max(data) - min(data)
print(data_range) # Output: 4
Variance measures the average squared deviation from the mean. Use var()
:
variance <- var(data)
print(variance) # Output: 2.5
The standard deviation is the square root of the variance. Calculate it with sd()
:
std_dev <- sd(data)
print(std_dev) # Output: 1.581139
These statistics describe the distribution of your data:
Skewness measures the asymmetry of the distribution. Use the moments
package:
library(moments)
skewness_value <- skewness(data)
print(skewness_value) # Output: 0
Kurtosis measures the tailedness of the distribution:
kurtosis_value <- kurtosis(data)
print(kurtosis_value) # Output: 1.7
R provides a convenient summary()
function to get an overview of your data:
summary_stats <- summary(data)
print(summary_stats)
Visualizations can help understand your data better. Use the ggplot2 package for creating informative plots:
library(ggplot2)
ggplot(data.frame(x = data), aes(x = x)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
geom_vline(aes(xintercept = mean(x)), color = "red", linetype = "dashed", size = 1) +
labs(title = "Histogram with Mean", x = "Value", y = "Frequency")
Mastering descriptive statistics in R is crucial for exploratory data analysis and lays the foundation for more advanced statistical techniques like hypothesis testing and regression analysis.