Discover more from Just Emil Kirkegaard Things
Comments on Learning Statistics with R
So I found a textbook for learning both elementary statistics much of which i knew but hadnt read a textbook about, and for learning R.
http://health.adelaide.edu.au/psychology/ccs/teaching/lsr/ book is free legally
Numbers refer to the page number in the book. The book is in an early version ("0.4") so many of these are small errors i stumbled upon while going thru virtually all commands in the book in my own R window.
These modeOf() and maxFreq() does not work. This is because the afl.finalists is a factor and they demand a vector. One can use as.vector() to make them work.
Worth noting that summary() is the same as quartile() except that it also includes the mean.
Actually, the output of describe() is not telling us the number of NA. It is only because the author assumes that there are 100 total cases that he can do 100-n and get the number of NAs for each var.
The cakes.Rdata is already transposed.
as.logical also converts numeric 0 and 1 to F and T. However, oddly, it does not understand “0” and “1”.
Actually P(0) is not equivalent with impossible. See: http://en.wikipedia.org/wiki/Almost_surely
Actually 100 simulations with N=20 will generally not result in a histogram like the above. Perhaps it is better to change the command to K=1000. And why not add hist() to it so it can be visually compared to the theoretic one?
> hist(rbinom( n = 1000, size = 20, prob = 1/6 ))
It would be nice if the code for making these simulations was shown.
“This is just bizarre: σ ˆ 2 is and unbiased estimate of the population variance”
Typo in Figure 11.6 text. “Notice that when θ actually is equal to .05 (plotted as a black dot)”
“That is, what values of X2 would lead is to reject the null hypothesis.”
It is most annoying that the author doesn't write the code for reproducing his plots. I spent 15 minutes trying to find a function to create histplots by group.
“It works for t-tests, but it wouldn’t be meaningful for chi-square testsm F -tests or indeed for most of the tests I talk about in this book.”
“we see that it is 95% certain that the true (population-wide) average improvement would lie between 0.95% and 1.86%.”
This wording is dangerous because there are two interpretations of the percent sign. In the relative sense, they are wrong. The author means absolute %'s.
The code has +'s in it which means it cannot just be copied and runned. This usually isn't the case, but it happens a few times in the book.
In the description of the test, we are told to tick when the values are larger than. However, in the one sample version, the author ticks when the value is equal to. I guess this means that we tick when it is equal to or larger than.
This command doesn't work because the dataframe isn't attached as the author assumes.
> mood.gain <- list( placebo, joyzepam, anxifree)
First the author says he wants to use the R^2 non-adjusted, but then in the text he uses the adjusted value.
Typo with “Unless” capitalized.
“(3.45 for drug and 0.92 for therapy),”
He must mean .47 for therapy. .92 is the number for residuals.
In the alternates hypothesis, the author uses “u_ij” instead of “u_rc” which is used in the null-hypothesis. I'm guessing the null-hypothesis is right.
As earlier, it is ambiguous when the author talks about increases in percent. It could be relative or absolute. Again in this case it is absolute. The author should use %point or something to avoid confusion.
“I find it amusing to note that the default in R is Type I and the default in SPSS is Type III (with Helmert contrasts). Neither of these appeals to me all that much. Relatedly, I find it depressing that almost nobody in the psychological literature ever bothers to report which Type of tests they ran, much less the order of variables (for Type I) or the contrasts used (for Type III). Often they don’t report what software they used either. The only way I can ever make any sense of what people typically report is to try to guess from auxiliary cues which software they were using, and to assume that they never changed the default settings. Please don’t do this... now that you know about these issues, make sure you indicate what software you used, and if you’re reporting ANOVA results for unbalanced data, then specify what Type of tests you ran, specify order information if you’ve done Type I tests and specify contrasts if you’ve done Type III tests. Or, even better, do hypotheses tests that correspond to things you really care about, and then report those!”
An exmaple of the necessity of open methods along with open data. Science must be reproducible. The best is to simply share the exact source code to the the analyses in a paper.