K-means clustering is a technique used to partition data into groups that

are similar to one another. The groups, or clusters aim to maximize between

group variability and minimize within group variability using algorithms.

I wanted to see how 20 lakes compared to one another in terms of their chemistry. So I did a k-means clustering on some basic water chemistry data (DO,

temperature and conductivity)

#munge data and require packages
require(cluster)
require(factoextra)
require(reshape2)
setwd('C:/Users/kk/Documents/ponds')
dat <- read.csv("chem.csv")
str(dat)
dat2 <- reshape(dat, idvar = "site", timevar =
"sampledate", direction = "wide")

rownames(dat2)<- dat2$site
dat3 <- dat2[2:13]
#describe data through basic statistics
desc_stats <- data.frame(
Min = apply(dat3, 2, min), # minimum
Med = apply(dat3, 2, median), # median
Mean = apply(dat3, 2, mean), # mean
SD = apply(dat3, 2, sd), # Standard deviation
Max = apply(dat3, 2, max) # Maximum
)
#generate an elbow plot
set.seed(123)
fviz_nbclust(dat3, kmeans, method = "wss") +
geom_vline(xintercept = 5, linetype = 2)

# Compute k-means clustering
set.seed(123)
km.res <- kmeans(dat3, 5, nstart = 25)
print(km.res)

From this plot it looks like we have 5 groups of lakes.

Lake 15 may be an outlier since it is not similar to any of the other lakes.

From here we might be able to distinguish why some lakes are more similar than others.

### Like this:

Like Loading...

*Related*