Methods of Experimental Research

course codes 200300325 (B) and 200400575 (M)

2004-2005, period 3, February - April



News

  1. [2005.04.20] Corrected a minor error in my answers for session 8, p.2.
  2. [2005.04.19] Added model answers for session 8.
  3. [2005.04.19] In our last meeting I forgot to mention the online EVALUATION of this course. Please go to www.let.uu.nl/oce and fill out the evaluation form (in Dutch) for our course, "Methods of Experimental Research (2005)". This will help us to improve the course in future editions. Note that your comments are anonymous. The evaluation is available from tomorrow (20th) until 29th of April.
    As an incentive, I will release the final grades only after a reasonable but secret number of you have done your duty!
  4. [2005.04.08] Added link to GraphPad easy online statistical calculators [hyperlink by Min Que].
  5. [2005.04.08] Schedule updated, again. Deadline for the final assignment is now Thursday, April 21st, 23:55h.
  6. [2005.04.04] Why is it called "regression"? Look under April 1st.
  7. [2005.03.15] Two books on using SPSS added to Further Reading section below.

Practicalities

Teacher

Hugo Quené
e-mail Hugo Quene AT let uu nl,
Trans 10, room 1.31
office hours Tue 14:00-15:30 and by appointment

Readings

Since there is no good book that covers all topics in this course, we will use various chapters from various textbooks. These materials will be distributed either in class, or through the secretary of Linguistics, Mrs Sabine Wolfertz, who resides at Trans 10, room 21.

Schedule

cursusjaar 2004-2005, period 3
vrijdag 09:00-12:00KNG 66, 0.07class

Contents

Organization and assessment

This course will have only one class meeting per week, on Friday mornings. The focus is on independent study, assignments, and peer review.

Before each class meeting you'll have to do the following:
  1. make assignments about the topics covered in the last meeting;
  2. hand in your assingments, by means of WebCT Vista, by Tuesdays 18:00h at the latest;
  3. review and judge the assignments of a fellow student, by Thursday 18:00h at the latest;
  4. study new materials.
During a class meeting we will discuss your assigments, using your mutual reviews and judgements. Of course we will also discuss new topics each time.

After each class meeting, assignments have to be handed in by means of WebCT Vista. We'll use the Bulletin Board, so all information is available to all.
Put your work in one document per week, and place it on the bulletin board, Tuesdays 18:00h at the latest. Send a message to the bulletin board (Compose Message), with your document as an attachment.
Retrieve the message plus attached document of your victim for this week, and write a review of his/her work in a separate document. Put your review on the bulletin board, by Thursdays 18:00h at the latest.

Peer review, commenting the work of a peer or colleague, is a serious business. You can learn more about it through these web pages: Your final grade is determined by the 8 weekly assigments (each 10%) and the final assigment (20%).

Schedule

(1) Fri 18 Febr: session 1

Experimenting. General methodology. Design of experiment.
Peer review.

Reading: Before: Assignments:
Your elaborations on the questions below have to be handed in by means of WebCT Vista, as described above. Write clearly, correctly, and concisely. Make an HTML or PDF or PS document, or plain ASCII text file, with a maximum length of about 2000 words. (I've made a short explanation about how to make a non-proprietary document, in Dutch.)
  1. Visit the Letterenbibliotheek. Take a recent issue (2004) of an experimental journal (in phonetics, psycholinguistics, etc.), such as Language and Speech, Journal of Phonetics, Speech Communication, Phonetica, etc.
    (a) Which questions does the study attempt to answer?
    (b) Which independent and dependent variables are involved in the study?
    (c) Describe the design of the experiment.
  2. A researcher wants to know whether the vowel duration in stressed vowels is longer than in unstressed vowels. There are two groups of participants, and the researcher is interested in their difference (e.g. L1 and L2 speakers). The target vowels occur in the first vs. the third syllable of three-syllable words. To prevent strategic behavior (what's that?), a speaker may not produce words with different stress patterns: all words produced by a single speaker need to have the same stress pattern.
    Provide a possible design for this experiment. Indicate which factors are between or within subjects, dependent or independent, etc. Make a graph or table to illustrate your design.
  3. Answer the following questions in the chapter by Maxwell & Delaney, Chapter 1: Exercises 1, 5, 6, 7, 10.
  4. This last assignment is not for peer review but for independent study. Now is the perfect time to brush up your statistical skills. Answer the tentamina of my Statistics course (see above). Afterwards, check your answers with those provided on the course webpage. Determine what parts of your statistics proficiency are still deficient. Design a plan of action, to remedy your shortcomings during this teaching period.
Links:

(2) Fri 25th Febr: session 2

ANOVA: general, interaction, fixed vs random factors, error terms.

Readings: Ferguson, G. A., & Takane, Y. (1989). Statistical Analysis in Psychology and Education (6th ed.). New York: McGraw-Hill. Chapter 16 "Analysis of Variance: Two-Way Classification", pp. 272-296.

Assignments:
Your answers and solutions to the questions below have to be handed in by means of WebCT Vista, as described above. As always, write clearly, correctly, and concisely.
  1. Answer the following questions in Chapter 16 by Ferguson & Takane: Exercises 1, 5, 8.
  2. In a study of cardiovascular risk factors, joggers who run at least 15 miles per week were compared with a control group described as "generally sedentary". Both men and women participated in this study. The design is a 2x2 between-subjects ANOVA, with Group and Sex as factors. There were 200 participants for each combination of factors. One of the dependent variables is the rate of heartbeat of a participant, after 6 minutes on a treadmill, expressed in beats per minute.
    Data from this study are available here in SPSS format, or as plain text (the latter file contains variable names in the first line).
    (a) Which auxiliary theories (à la Meehl) are needed for this study. Comment on construct validity.
    (b) Is is allowed to conduct an analysis of variance on these data? Motivate your answer with relevant statistical considerations.
    (c) Conduct a two-way ANOVA on these data.
    (d) Write a summary of the results of this study, and draw your conclusions clearly.
    (e) From each cell (combination of factors), draw a random sample of n=20 individuals, out of the 200 in that cell. Explain how you have performed the random sampling. Repeat the two-way ANOVA on this smaller data set.
    (f) Discuss the similarities and differences in results between (b) and (d).
    This exercise is adapted from: Moore, D.S., & McCabe, G.P. (2003). Introduction to the Practice of Statistics (4th ed.). New York: Freeman. Example 13.8, pp.813-816.
  3. In a fictitious study, the effect of a growing potion was investigated. The growing potion was administered in 5 different dosages (of 1, 3, 5, 10, and 20 units per day), to 10 men and to 10 women for each dosage, during 15 days. The dependent variable is the increase in body length of a participant, after 15 days, in cm.
    Data from this study are available here in SPSS format, or as plain text (the latter file contains variable names in the first line).
    (a) Import these data into SPSS or a statistical package of your choice. Make a graph of the increase in body length, for each of the 10 conditions. (Hint: In SPSS use a "clustered boxplot".) Discuss what the graph shows.
    (b) Conduct a two-way ANOVA on these data, with Sex and Dosage as two "fixed" factors.
    (c) What is the range of generalisation over dosages, in the ANOVA in (b)? Discuss the external validity of the dosage factor.
    (d) Conduct a two-way ANOVA, but now with Dosage as "random" factor. (Hint: SPSS does not handle "mixed" models like this one very well. It's probably easiest to calculate the F-ratios by hand, using the ANOVA results obtained under (b) above.)
    (e) What is now the range of generalisation over dosages, in the ANOVA in (d)? Again discuss the external validity of the dosage factor.
    (f) Discuss the similarities and differences in results between the two ANOVAs in this assignment. Does the growing potion have a different effect on men and women?

(3) Friday 4 March: session 3

ANOVA: Repeated Measures, post-hoc tests.

Readings: Links:
Compare these notes from similar courses in experimental research methods, at other universities: Assignments:
Your answers and solutions to the questions below have to be handed in by means of WebCT Vista, as described above. As always, write clearly, correctly, and concisely.
  1. Answer the following questions in Chapter 19 by Ferguson & Takane: Exercises 5, 8, 9, 11, 12, 14. Use your own words. For exercise 14, answer the what-question as well as the why-question.
  2. Using SPSS, re-calculate the example of Ferguson & Takane §19.9. Import the data correctly, you'll need an additional column for factor R or Group. Describe a study that could plausibly have generated these data, and treat the data as if they are the results of that study. Discuss all relevant aspects, including possible violations of assumptions. (§19.10). Draw clear conclusions.

effect size

Als we twee groepen of gemiddelden met elkaar vergelijken, zoals bij een paarsgewijze t-toets, dan is de effect size d gedefinieerd als: d = (m1-m2)/s (Cohen, 1969, p.18; m staat voor mean).

Een waarde van d=.2 wordt beschouwd als klein (small), d=.5 als medium, d=.8 als groot (large). Het is aan de onderzoeker om tussenliggende waarden zelf te classificeren (ibid., p.23-25).
Het verschil in lichaamslengte van meisjes van 15 en 16 jaar heeft een kleine effect size, net als man-vrouw verschillen bij deeltoetsen van een IQ-test. "A medium effect size is conceived as one large enough to be visible to the naked eye," bv. het verschil in lichaamslengte tussen meisjes van 14 en 18 jaar oud. Large effect sizes are "grossly perceptible", bv. het verschil in lichaamslengte tussen meisjes van 13 en 18 jaar oud, of het verschil in IQ tussen gepromoveerden en eerstejaars.

Als we k groepen of gemiddelden met elkaar vergelijken, zoals in een F-toets (ANOVA), dan is de effect size f gedefinieerd als: f = sm/s, waarbij sm op z'n beurt gedefinieerd is als de standaarddeviatie van de k verschillende group means (ibid., p.268). Indien k=2, dan d=2f (ibid., p.278). We nemen aan dat de groepen even groot zijn; anders gelden andere criteria dan hier besproken.

Een waarde van f=.10 wordt beschouwd als klein (small), f=.25 als medium, f=.40 als groot (large). Het is weer aan de onderzoeker om tussenliggende waarden te classificeren (ibid., p.278-281).
Ook kleine effecten kunnen belangwekkend of betekenisvol zijn; soms leiden grotere verschillen tot een kleine effect size door meetfouten, storende bijwerkingen, enz. Medium effect size zien we bv. bij verschillen in IQ tussen huisschilders, automonteurs, timmerlieden, en slagers. Grote effect size zien we bv. bij verschillen in IQ tussen huisschilders, automonteurs, treinmachinisten, en laboranten.

Ontleend aan: Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences (1st ed.). New York: Academic Press.

Model answers for first assignment, Ferguson exercise 19.14:
my answer with annotations.
Model answers for second assignment:
input data for SPSS, my answer with annotations — in Dutch.

(4) Friday 11 March: session 4

Signal Detection Theory.

Reading:
Gelfand, S.A. (1990). Hearing: An Introduction to Psychological and Physiological Acoustics (2nd ed.). New York: Marcel Dekker. Chapter 8 "Theory of Signal Detection", pp. 207-217. [ISBN 0-8247-8368-9].

Links: Assignments:
Your answers and solutions to the questions below have to be handed in by means of WebCT Vista, as described above. As always, write clearly, correctly, and concisely.
  1. In a so-called "Yes/No" perception test, two types of stimuli (N and N+S respectively) are both perceived according to a stochastic process with standard normal distribution. The distributions of "Yes" and "No" responses is given below for each stimulus type. Calculate d" in Z units.
    stimulus
    NN+S
    "Yes" .10 .80
    "No" .90 .20
  2. In a so-called "Yes/No" perception test, two types of stimuli (N and N+S respectively) are both perceived according to a stochastic process with standard normal distribution. The results show that d'=2. Calculate the expected proportion of "false alarms", if P(hit) = 90.15%.
  3. True or false?
    1. The points on one ROC curve correspond to different d' values, at the same criterion.
    2. The points on one ROC curve correspond to different criteria, at the same d' value.
    3. The points on one ROC curve correspond to various d' values, at the same hit rate.
  4. We have conducted two "Yes/No" perception tests, in Experiments I and II. Two stimuli, N and N+S were both perceived according to a standard normal distribution. The following proportions "hits" and "false alarms" were observed in these experiments.
    Exp.IExp.II
    P(hit) .6915 .9713
    P(false alarm) .3085 .4602
    Construct one or more ROC curves for these results. Which experiment had the largest bias (shift in criterion)?
  5. In order to investigate the perception of two types of stimuli, two methods are available: by means of a so-called "Yes/No" judgement about each stimulus, or by means of pairwise comparison for all possible pairs of stimuli. What are the advantages and disadvantages of these two methods?

(5) Friday 18 March: session 5

regression, error of measurement, reliability.

Reading: Assignments:
Again, your answers and solutions to the questions below have to be handed in by means of WebCT Vista.
  1. Answer the following questions: Ferguson & Takane, Chapter 24: Exercises 1, 2.
  2. After you've done this week's reading, make a trip through the Reliability Test Maze, consisting of 10 tricky questions. Make notes of your thoughts and answers, and whether your answer was right or wrong. For the questions you've answered WRONG, you have to say why you came to that answer, and why it was wrong. Discuss these thoughts in a paragraph, one paragraph for each wrong answer. Also indicate the number of questions you've answered correctly.

Friday 25 March: no class

Good Friday

(6) Friday 1st April: session 6

Multiple regression, multivariate analyses.

Readings: Assignments:
Again, your answers and solutions to the questions below have to be handed in by means of WebCT Vista.
  1. Answer the following questions: Moore & McCabe, Chapter 11: Exercises 2, 3, 16, 33.
    Data for the last two questions are available here in plain text format (the first line of this file contains variable names).

Forward or Backward?

For questions 16 and 33 the FORWARD method is most appropriate. This means that you start with an empty model (only intercept b0) to which predictors are added step by step. After each addition of a predictor, you check whether the model performs significantly better than before (e.g. by checking whether R^2 increases).
The questions are about the *increment* in R^2 by *adding* a predictor. The relevant information is easier to find in the SPSS output if you specify the FORWARD method.
As a bonus, you could check what happens if you exclude case #51 from the data set, e.g. by marking it as a missing value. This is quite easy if you keep the regression command in a Syntax window for repeated use.

HSS, SAT, GPA??

The chapter by Moore & McCabe draws heavily on typically American concepts. In the USA, your achievements are all that counts, in life as well as in study. The US grading system ranges from A+ (excellent) to F (fail).
For admission to a university, two things are taken into account: (a) your average grades in the final years of high school (HSM, HSS, HSE), and (b) your score in a national admissions exam, like the Dutch CITO test (Scholastic Aptitude Test, SAT). Top-class universities, like Harvard, Yale, Stanford, etc., use both parameters in selection. You have to be the best in your class (but your classmates are strongly competing for this honor), plus you need a minimal score on your SAT.
During your academic study, all your grades and results contribute to your Grade Point Average (GPA), a weighted average grade. This GPA is generally used as an indication of academic achievement and success. The authors attempt to predict the GPA from the previously obtained indicators (a) and (b).

regression

Why is it "regression"? This has to do with heredity, the field of biology where regression was first developed by Francis Galton (cousin of Charles Darwin) in the late 19th century.
Take a sample of fathers, and note their body length (X). Wait for one full generation, and measure the body length of each father's oldest adult son (Y). Make a scattergram of X and Y. The best-fitting line throught the observations has a slope of less than 1 (typically about .65). This is because the sons' length Y tends to "regress to the mean" — outlier fathers tend to produce average sons, and average fathers also tend to produce average sons. Galton called this phenomenon "regression towards mediocrity". Thus the best-fitting line is a "regression" line because it shows the degree of regression to the mean, from one generation to the next. (Note that any slope larger than 0 suggests an hereditary component in the sons' body length, Y.)
Questions: Which variable has the larger variance, X or Y? Does the variation in body length increase or decrease (regress) over generations? Why?

(7) Friday 8st April: session 7

Non-normal data, transformations, nonparametric testing.

Readings: Links: Assignments:
Again, your answers and solutions to the questions below have to be handed in by means of WebCT Vista.
  1. Answer the following questions: Ferguson & Takane, Chapter 22: Exercises 1, 2, 4, 5, 9, 10, 11.
  2. Memorine is a non-existing drug which increases verbal memory. The effect of memorine on verbal memory was investigated in a fictitious experiment. Listeners (n=100) were presented with a spoken text of 1000 words, and afterwards they had to repeat the words they could remember from that spoken text. The same participants were observed in a control condition (first column, "control") and after swallowing 100 mg memorine ("test"). The numbers of remembered words per listener are available in plain text format (with comma between observations).
    (a) What are H0 and Ha for this study?
    To continue, use α=.05 for all statistical tests in this assignment.
    (b) Assume that the numbers of remembered words are normally distributed. Use a pairwise t-test to see whether there is a difference between the two conditions. Draw your conclusions clearly.
    (c) As a bonus, and to check on your analysis, use a repeated-measures ANOVA to see whether there is a difference between the two conditions. The t-test and F-test are equivalent, and should produce the same result here, with F = t2. Check.
    (d) Inspect whether the dependent variables are indeed distributed normally. Include relevant diagnostics and figures in your report. Discuss the validity of your previous conclusions.
    Note: In real life, these checks on assumptions are done before statistical testing, and not afterwards.
    As explained in class, there are two options if your data are not normal: transformation, or using nonparametric tests.
    (e) transformation. Construct (COMPUTE) two new variables, containing the square root of the raw observations. Inspect whether the new variables are distributed normally. Use a pairwise t-test on the transformed data. Draw your conclusions clearly.
    (f) non-parametric testing. Use a nonparametric, pairwise test to investigate your hypotheses. Again, draw your conclusions clearly.
    (g) Discuss the similarities and differences in conclusions at parts (b-c), (e), and (f). Your text should mention key concepts: validity, testing, power, significance, assumption.

QQ plot of normally distributed data

A Normal Probability plot (NP plot or QQ plot) is a great tool to verify whether a variable has a normal distribution. Such a plot can be made in SPSS by choosing Analyze > Descriptive Statistics > Explore, or by means of the Examine command in a Syntax window. But what is "normal"? Here is an SPSS script to generate a QQ Plot of random data from a normal distribution. To run the script, you also need this dummy data file. Repeat the marked commands in the script, at least 5 times, to train your eyes to QQ plots of normally distributed data. Also notice the reported Kolmogorov-Smirnov Test of Normality and its significance.

(8) Friday 15th April: session 8

logistic regression, GLM, modelling.

Readings: Links: Assignments:
Again, your answers and solutions to the questions below have to be handed in by means of WebCT Vista, by Tuesday 18:00.
  1. Answer the following questions: Moore & McCabe, Chapter 15: Exercises 8, 10, 12, 25.
    In order to speed up your work on exercise 15.25 in SPSS, I've put the data on the web, in a plain text data file. The first line contains the names of the variables. Data (N=2900) start on line 2, and are coded as follows:
    hospital:  0=hosp.A, 1=hosp.B;
    outcome:   0=died, 1=survived;
    condition: 0=poor, 1=good.
    
    Variables are separated by commas.
    In your logistic regression, the variables hospital and condition must be treated as categorical variables. For easier interpretation of the results, I prefer to use the zero codes as references or baselines (in SPSS choose Reference: First).
    SPSS does not provide you with 95% confidence intervals; you need to calculate these by hand. The Wald statistic in the SPSS output is the same as the test statistic for β as defined on p.46 in the reading material.
  2. For this week, there will be no review to write, since we're pressed for time in the final week of this teaching period. The answers provided below may help you in spotting errors in your submission.
As promised, here are my annotated answers for session 8 [corrected 2005.04.20].

final assignment

Your final assignment is to submit a revised or improved version of one previous assignment of this course. You're free to choose which one you want to revise.
As always, the revised paper should be (as much as possible) a running text, not a collection of incomplete sentences and statistical output.
In the revised version you have to accommodate the comments of your reviewer — if you agree of course. Also use the reading materials and hyperlinks provided.
You may discuss the reviewer's comments in the text of your revised version. But perhaps you find it easier to write a coherent (revised) text on your own, plus a second document in which you discuss the reviewer's comments explicitly, stating which comments you have taken into account, which comments you have ignored, and why. Such a separate document is called a "cover letter to the Editor".

Deadline is Thursday 21st April 2005, 23:55 h.


Further Reading

Check the webpage of my statistiek course [in Dutch].


© 2003-2005 HQ 2005.04.20

Valid HTML 4.01! Valid CSS!