summarystatistics

Summary Statistics

The benefit of working with R as opposed to excel is that you can work easily with large amounts of data.  The first thing you want to do with a large data set is to observe the structure of the data as well as summary statistics.

1.Grab data
I grabbed some data from world bank about threatened bird species

1.Set working directory
setwd(“C:/Users/kkuntz/_data/blog/bird”)

2.Read data
dat <- read.csv(“birds.csv”)

3.Filter for threatened bird species
dat2 <- filter(dat, Indicator_Name==”Bird species, threatened”)

4. Take a look at your data structure
nrow(dat2)
str(dat2)

5. Perform some summary statistics
var(dat2$Z2016) = variance
sd(dat2$Z2016) = standard deviation
max(dat2$Z2016) = maximum
min(dat2$Z2016) = minimum
range(dat2$Z2016) = range

6. Plot a histogram to visualize data
I filtered by 200 to the areas with the highest threatened species

dat5 <- filter(dat4, X2016>200)

g <- ggplot(data=dat5, aes(x=Country.Name, y=X2016, fill = Country.Name))+ geom_bar(stat=”identity”) +guides(fill=FALSE)

g + theme(axis.text.y = element_text(size=7, angle = 0, hjust=0)) + coord_flip()

g + theme(axis.text.x = element_text(size=15, angle = 90, hjust=0)+ theme(axis.text.y = element_text(size=15, angle = 90, hjust=0)) + coord_flip())

graph3

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s