Table of contentsLecture #1; Definitions
I will not be so interested in teaching a catalogue of statistical tests, but prefer to discuss questions like:
Another site, funnelweb.utcc.utk.edu (defunct) turned up something that looks more sensible.
"Statistics is [the theory and method of analyzing quantitative data obtained from samples of observations in order to study and compare sources of variation of phenomena, to help make decisions to accept or reject hypothesized relations between phenomena, and to aid in] making [reliable] inferences from empirical observations" (Kerlinger, 1986, p. 175).
Let's condense that to
making inferences about populations from samples
If the mean height of people in the sample is 2m, the mean height
of people in the population is close to 2m.
Population of people, of haploid cells, of 100-item samples, of
measurements of a person
50 people, 60 haploids, 70 100-person samples, 80 repeated measurements
Robbins example: An experiment has the possible outcomes
E1, E2, ... with unknown probabilities
p1, p2, ... . In n
independent trials suppose that Ei occurs
xi times. How can we "estimate" u, the
total probability of unobserved outcomes? (The quotation marks appear
because u is not a parameter in the usual statistical sense.)
Comment (and homework): What does Robbins' parenthetical statement mean?
Answer Perform an n+1st trial. Note the
proportion of outcomes (out of n+1) that occurred one time.
The proportion (in the population) of outcomes unobserved in the
n-sample, is the expected proportion of once-observed
outcomes in the n+1-sample.
Comment: a declarative sentence!
Hypothesis: The universe is half male, half female.
Sample: 10000 individuals, of whom 5100 are female.
Test statistic: chi2 = 4
p-value = 0.04. (Two tailed test)
Comment: if 5200 female, p=0.0001. If 60/100 female, p=0.04
Discussion: accept/reject paradigm
Example: DNA forensics analysts are happy if the population is in Hardy-Weinberg equilibrium. A test statistic is calculated on a population sample, and converted to a p-value. If the p-value is small, e.g. < 0.05, that tends to indicate that the population may not be in HWE.
An analyst proudly testified that out of a large number of such population studies, in only 1% was p<0.05. What's wrong with that?
I said that there must be publication bias. He said, no, the lack of low p-values was perhaps due to the samples being rather small.
What's wrong with that?
|We must remember, that the probability of an event is not a property of the event itself, but a mere name for the degree of ground which we, or someone else, have for expecting it. ... Every event is in itself certain, not probable: if we knew all, we should either know positively that it will happen, or positively that it will not. But its probability to us means the degree of expectation of its occurrence, which we are warranted in entertaining by our present evidence. — J.S. Mill|