We will use our class.csv
dataset as an example.
class <- read.csv("http://math.slu.edu/~clair/stat1300/data/class-survey.csv")
str(class)
## 'data.frame': 25 obs. of 12 variables:
## $ Timestamp : Factor w/ 24 levels "2016/08/24 10:10:41 AM EST",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Year : Factor w/ 3 levels "Junior","Senior",..: 1 3 3 3 1 3 1 3 3 3 ...
## $ Gender : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 1 1 1 1 2 ...
## $ Height..inches. : num 67 68 66 65 74 63 70 59 60 68 ...
## $ Shoe.size : num 10 11.5 8 6.5 11 7 8.5 3 6 8 ...
## $ Eye.Color : Factor w/ 5 levels "Black","Blue",..: 2 3 5 3 4 1 3 3 3 3 ...
## $ How.many.first.cousins.do.you.have. : int 11 13 1 8 30 8 5 48 8 16 ...
## $ Can.you...Roll.your.tongue.. : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 2 2 2 1 2 ...
## $ Can.you...Touch.your.nose.with.your.tongue..: Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
## $ Can.you...Wiggle.your.ears.. : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 2 1 1 1 1 ...
## $ Can.you...Raise.one.eyebrow.. : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 2 1 2 2 1 ...
## $ How.long.is.your.piece.of.string..in.inches.: num 19 2.3 9.1 10.2 10 ...
yg <- table(class$Year,class$Gender)
yg
##
## Female Male
## Junior 5 5
## Senior 3 1
## Sophomore 7 4
plot(yg)
barplot(yg,beside=TRUE,legend=TRUE)
You can use t
to transpose the table (switch rows with columns)
t(yg)
##
## Junior Senior Sophomore
## Female 5 3 7
## Male 5 1 4
plot(t(yg))
barplot(t(yg),beside=TRUE,legend=TRUE)
margin.table
can compute marginal distributions, which are the sum of the entire table, the rows, or the columns:
margin.table(yg)
## [1] 25
margin.table(yg,1)
##
## Junior Senior Sophomore
## 10 4 11
margin.table(yg,2)
##
## Female Male
## 15 10
prop.table
can compute overall or conditional proportions. Notice either the entire table, the rows, or the columns sum to 1.
prop.table(yg)
##
## Female Male
## Junior 0.20 0.20
## Senior 0.12 0.04
## Sophomore 0.28 0.16
prop.table(yg,1)
##
## Female Male
## Junior 0.5000000 0.5000000
## Senior 0.7500000 0.2500000
## Sophomore 0.6363636 0.3636364
prop.table(yg,2)
##
## Female Male
## Junior 0.3333333 0.5000000
## Senior 0.2000000 0.1000000
## Sophomore 0.4666667 0.4000000
We will test the hypotheses
Both chisq.test
and summary
perform the test, given a table:
chisq.test(yg)
## Warning in chisq.test(yg): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: yg
## X-squared = 0.85227, df = 2, p-value = 0.653
summary(yg)
## Number of cases in table: 25
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 0.8523, df = 2, p-value = 0.653
## Chi-squared approximation may be incorrect
Here, \(\chi^2 = .8523\), with 2 DF. The chi-squared distribution looks like this:
x <- seq(0,15,.1)
plot(x,dchisq(x,2),type="l")
and we see that .8523 is not far along the tail. The area to the right of .8523 is:
1-pchisq(.8523,2)
## [1] 0.6530184
which is the P-value for the test. We fail to reject \(H_0\), there is not significant evidence for a relationship between gender and grade level.
Data sets from our book come in a row summary form, like so:
load(url("http://math.slu.edu/~clair/stat1300/data/ex2532sleepq-fixed.rda"))
ex2532sleepq
## OTC.Rx Sleep Count
## 1 yes optimal 37
## 2 yes borderline 53
## 3 yes poor 84
## 4 no optimal 266
## 5 no borderline 186
## 6 no poor 245
To transform it into a crosstable, use xtabs
. The following gives a table of Count with rows OTC.Rx and columns Sleep.
sleep <- xtabs(Count ~ OTC.Rx + Sleep, data=ex2532sleepq)
sleep
## Sleep
## OTC.Rx optimal borderline poor
## no 266 186 245
## yes 37 53 84
To use all other variables for rows and columns, this shorthand works:
sleep <- xtabs(Count ~ ., data=ex2532sleepq)
sleep
## Sleep
## OTC.Rx optimal borderline poor
## no 266 186 245
## yes 37 53 84