Read CHAPTER 6: THE BASICS OF EXPERIMENTATION I: VARIABLES AND CONTROL text version

1

CHAPTER 6: THE BASICS OF EXPERIMENTATION I: VARIABLES AND CONTROL

VARIABLES: Types and definitions Variable: anything that can take on more than one value E.g. age, sex, weight, eye color

IV:

the variable you are testing the effect of. The IV must have at least 2 levels or categories (something to compare) the variable affected by the IV. It is your data. The DV (if interval or ratio) must be capable of taking on many values... the more the better... we'll discuss why later.

DV:

Extraneous Variables: any variable, other than the IV or DV EV's can become a problem if they can take on more than one value AND the values of the EV are systematically different across the levels of the IV. In this case, the EV has become a confound. Ideally, in experiments, EVs will only have one value. As such, they cannot become confounds. If the EV can take on different values and we suspect that they could become systematically different across the levels, we will have to find some other way of controlling them.

OPERATIONALLY DEFINING VARIABLES - Review a way to define IVs, DVs, EVs, and manipulations defined in terms of how they are measured or the operations used to produce them Attention = number of consecutive minutes spent looking at the teacher (good) gesture = a discernable movement of a limb or head executed when the individual is communicating verbally or non-verbally with another person (good) Type A personality = a score of 100 or more on Jenkins Activity Survey (good) normal = typical or usual behaviors (bad)

2

Without operational definitions, others could not replicate your work. Your definitions are too vague and wishy-washy. Also, your results would be open to criticism...others could say that the vague definitions of your IV and DV lead to misinterpretations from the participants.

INDEPENDENT VARIABLES: The whole story Treatment IV A "true" IV because random assignment to the different levels is possible (i.e. the participants, in theory, could be assigned to any of the different levels) e.g. paper color, method of instruction, room temp., dose of drug

Subject/classification IV A special kind of IV where random assignment to groups is not possible because the groups are "predetermined" based on some inherent characteristic the participant already possesses. e.g. Age, sex, SES, major, personality etc...

HOW MANY LEVELS SHOULD THE IV HAVE?

1. Depends on type of relationship expected (linear, U-shaped) You need at least 3 levels to test for a linear or U -shaped relationship 2. Depends on how many control groups you will need Purpose of control groups First, to rule out alternative explanations e.g. to determine if hippocampal lesions impair memory, you should have 2 control groups (unoperated rats + sham lesioned rats) + the lesion group second, control groups provide a baseline or "no treatment" condition to which the treatment will be compared

3

3. too many levels make it harder to detect a signif. effect, so try not to have too many levels ­ just what's necessary

Note: Publication considerations determine how many IVs (not levels) you should have

SHOULD THE LEVELS OF THE IV DIFFER IN KIND (QUALITATIVELY) OR AMOUNT (QUANTITATIVELY)?

this will depend on what you are interested in this will also depend on the type of relationship you are looking for Strong manipulations: increase the chance of finding a significant effect So make sure you vary the levels of the IV such that you have a strong manipulation

CHARACTERISTICS OF A GOOD IV (FACTOR)

Reliable IV each subject in a given level receives the same amount or type of treatment designated for that level - > error free participants within each treatment "actually and only" receive the treatment they are supposed to

Valid IV any differences found between your groups are actually and only due to the manipulation of the IV = High internal validity the effect on the DV is not due to a confound (solely or in combo)

if a variable is not reliable, then it cannot be valid That is, if the IV is unreliable, it means you have error. The IV cannot be valid at this point, because if you see a difference between the groups, it could be due to the IV and/or error. Remember, for an IV to be valid, the difference between the groups is only due to the IV.

4

Just because a variable is reliable does not mean it must be valid That is, it is possible for the participants in each level to get actually and only what they are supposed to (i.e. IV is reliable), but this does not mean that there are no confounds in your study (i.e. the IV may not be valid)

EXTRANEOUS VARIABLES (POTENTIAL CONFOUNDERS)

Extraneous variables are those other than the IV or DV which could realistically affect the outcome of our experiment if not controlled for. Confounds: extraneous variables which systematically differ between the levels of the IV. Extraneous variables can a) move the groups closer together (less likely to get a significant difference) b) move the groups further apart (more likely to see a difference) c) have no effect on the difference between the groups e.g. testing effectiveness of diet pill vs placebo what if people in placebo group had slower metabolisms to begin with? What if people in pill group had slower metabolisms to begin with?

When serious confounds are detected, you won't know if the difference (or lack of) is due to the IV or to the confound... can you still salvage something?? Yes! I disagree with the book... you've learned something

DEPENDENT VARIABLES: The whole story

SELECTING THE DV See what's been used in the past. That is, look for a precedence. Others would have established the pros and cons of using a particular DV. You can also more directly compare the results of your study with theirs if you use the same DV. Often, you can look at the effect of an IV on many different DVs (and you can choose more than one)

5

e.g. training rats on a maze, you could look at time to criterion, number of errors, time to complete the maze etc...

SHOULD YOU RECORD MORE THAN 1 DV

In theory, there is no limit on the number of DVs you can have. If the IV affects each DV in the same predictable way, then your hypothesis would receive very strong support. You would have converging lines of evidence within the same study! But keep in mind that: Recording a DV takes time (and it takes more time to analyze the data) If you run inferential tests on the data, the more DVs you have the more likely that one or more of these tests will yield significant results by chance alone (if you run 100 t-tests on 100 DV's and set alpha at .05, we would expect that 5 t-tests would be significant just because of chance). To control for this (i.e. to control for a type 1 error), you would conduct "multivariate analyses". This type of analysis compensates for the number of DV's recorded so that your overall risk of a type 1 error is not inflated.

CHARACTERISTICS OF A GOOD DV 1. DV must be valid The DV "actually and only" measures what it is supposed to measure. If DV = score on IQ test, but the test scores are influenced by something other than intelligence (e.g. culture) then that IQ test is not a valid DV to measure intelligence

2. DV must be reliable Scores must be "consistent and error free"

3. DV must be sensitive and have no restriction of range problem The DV should be sensitive to changes in what's being measured.

6

Example: what if a given drug has an effect on latency - but only on the magnitude of seconds... if you measure the DV in minutes, the DV is not sensitive enough to pick-up on the influence of the drug... The DV should not have a restricted range... that is, it should be possible to get a wide range of scores from low to high. ceiling effects (most scores are high and cannot go higher), or floor effects (most scores are low and cannot go any lower) create a restriction of range problem Restriction of range decreases the chance of finding a significant effect E.g. compare DV a) scores can range from 0-3 to DV b) scores can range from 0-100. If you have 2 groups in your experiment and compare their mean score, the difference between the means has the potential to be greater for DV b). What can affect the reliability and validity of the DV 1. Scoring Criteria for the DV must be good Well defined scoring criteria is essential to the DV's reliability. It is often the case that the scoring is subjective - what's right to one person is wrong to another. This is bad because the scoring would be inconsistent and thus not error free. Even the same person can score participants differently if well defined criteria have not been laid out. This will introduce error in measurement and thus decrease reliability.

2. Automation & Instrumentation Effects must be Minimized No matter how careful we are, humans are fallible. That is, we make mistakes. Having a machine keep score for us reduces the number of mistakes made and thus increases the DV's reliability. Scores are more likely to be error free. The problem with using machines is that over time, they break down while measuring the DV, introducing error and decreasing reliability. The same problem may occur with human researchers. They may get tired or ambivalent. However, humans may also improve with time, as they become more familiar with the experiment's procedures. This is still a problem, because scores are likely to differ from each other. That is, scores at the beginning of the experiment (while you are still fumbling around) will be different from scores at the end (when you are now the expert). Inconsistent scores = unreliability.

7

Inter-rater reliability This is a technique used to determine if the DV is reliable. Two or more people score the same observations. Their scores are then compared. If the correlation (r) between them is 0.90 or more, then the DV is considered reliable.

3. Practice trials and multiple trials per condition These are techniques used to increase reliability. Often, a participant will perform poorly at the beginning of an experiment because they lack familiarity with the experiment's procedures. Her scores at the beginning will be different from those at the end. Remember: inconsistency = unreliability. To prevent this, we can give her practice trials. This is to familiarize her with what she needs to do, but they are only really necessary if the procedures are complicated. Giving multiple trials (e.g. 2, 3, 4 etc...) to each subject, calculating the average of those trials, and then using the mean as the person's score is another way to increase reliability. This is because on any given trial, a participant could "screw up". If you only used the data from that one trial as his/her score, how can you be sure that the trial wasn't a "screwed up" trial for that person? You can't. But, if you give him/her say... 10 trials, and use the mean of those 10 trials as his/her score, then if she/he did screw up on 1 or 2 trials it'll get "smoothed over" when you calculate the mean. That is, the other 8 or 9 "good trials" will override the 12 "bad trials". Using this technique means you have to use it for every participant. Using the mean of several trials as each person's score is probably a better or truer representation of each participant's performance. These "true scores" are likely to vary less than "1 trial scores" since they are less affected by trial-specific factors. Remember, trial specific factors are "transient" events that could affect performance on a given trial.

NUISANCE VARIABLES

These are extraneous variables which do not systematically differ between the levels of the IV (i.e. they are not confounds), but they increase the variability of the scores within each and every level Nuisance variables could be subject variables (age, ethnicity, level of motivation), or variables related to the testing environment (temperature, lighting, noise levels)

8

Regardless, when present, they increase within group variability (scores are more variable) making a treatment effect more difficult to detect As within group variability increases, power decreases! Recall that as within group variability increases, so does the overlap between Ho and Ha. That means errors (alpha, beta) increase and power (1 - beta) decreases To decrease within group variability, make your groups as homogeneous as possible E.g. Test 18-20 year old LDS women under florescent lights after midnight Instead of 18-80 year old men and women of all denominations under all kinds of lighting conditions at all times of the day and night In some way, "nuisance" variables are good. If you detect a significant effect despite their presence, then you can claim that your results will likely generalize to a very "variable" population. That is you will have great external validity.

CONTROLLING EXTRANEOUS VARIABLES These techniques are used to decrease the chance that an extraneous variable will become a confound and/or a nuisance variable

1. Random Assignment to Groups makes it less likely that an EV will become a confound... less likely for a variable to be systematically different across groups Use this technique if you do not know what the EVs are that you need to control... the technique will "control" them all... theoretically. Difficult to know whether randomization actually "worked"... you are leaving things up to chance

2. Eliminate Get rid of the unwanted variable - the EV cannot become a confound or a Nuisance variable if it is not there E.g. Are facial expressions universal? If so, then people from all cultures/regions should recognize a sad face, happy face etc... But, body stance could also be used to identify an emotion. So, eliminate this EV

9

(body stance) by only showing pictures of the face. Some EV's are next to impossible to eliminate (e.g. many subject variables - how do you eliminate age?) In such cases, you can only eliminate certain aspects or categories of the variable... but then what you are really doing is "holding the variable constant" as described below

3. Holding the EV Constant keep the level of the extraneous variable the same for all participants This works well for EV that are subject variables, or any EV that cannot be completed eliminated the technique can be used to ensure that the EV does not become a confound or a nuisance variable E.g. Hold religion constant: Test people who are all LDS Hold age constant: Test people who are all 18-20 yrs old Hold gender constant: test only men (or test only women) Hold time of day constant: test all participants in the a.m.

Advantage of keep constant technique: groups are more homogeneous, within group variability decrease, so power increases Disadvantage of keep constant technique: external validity is low, so who/what you can generalize to is limited

4. Balance Each group (level of the IV) experiences all EVs or levels of the EV's to the same extent. But first, you have to know which EV(s) you want to control

10

e.g. Time of day = extraneous variable (3 levels: morning, afternoon, night) Group A1 M M M M A A A A N N N N Group A2 M M M M A A A A N N N N Group A3 M M M M A A A A N N N N

4 people from each group each experience the three levels of the EV That is, levels of the EV appear equally in each group time of day is not systematically different across levels of the IV so it cannot become a confound Note that this technique does NOT prevent the EV from becoming a nuisance variable

Advantage of the Balance Technique increases external validity, i.e. results will generalize to more people (compared to if you used the keep constant technique)

Disadvantage of Balance Technique makes groups more heterogenous > increases within group variability (error variance) > decreases power

Remember: test statistic = between group variability / within group variability Size of test statistic depends on how big the treatment effect is (reflected by between group variability) relative to how individuals differ by chance alone (reflected by within group variability or error variance) the larger the test statistic, the more likely you will get significance

11

5. Counterbalancing This is done to control order (or sequence) effects, and carryover effects Can be used in repeated (within subject) designs - when all subjects are tested in each level of the IV Order effects: occur when the only thing affecting the DV is when a level is experienced, regardless of which level it is

E.g.

Does the color of the test paper affect test scores? Everyone takes test 1 on white, test 2 on blue, and test 3 on pink. Say pink scores are significantly higher. How do we know it's because of the pink paper, and not because everybody took it last? To find out, you could counterbalance the order of the tests (i.e. for some, pink was first, others took blue first etc...). If the last test is still the best, regardless of the color, then you know you have an effect of order and not paper color.

Carryover effects: the effects of one treatment persist to affect responses in the next treatment E.g. Say you have 3 methods of psycho therapy... it could be that the improvements brought about by the first method "carry over" and add to the improvements brought about by the second method etc..

Note: counterbalancing cannot control for differential carryover effects This effect occurs when one sequence of treatments produces a unique effect on the DV that the other sequences do not. e.g. A followed by B is NOT cancelled out by B followed by A

Within group counterbalancing

Different orders (sequences) of the levels of the IV are presented to different subjects Possible order effects include: practice, fatigue etc... If everyone receives level a1 first > a2 second > a3 third etc... and they get worse in each level, maybe it has to do with the levels, or maybe it's just fatigue? We have a confound. Control this by counterbalancing the order of the levels

12

E.g.

Subject 1 is tested in Subject 2 Subject 3 Subject 4 Subject 5 Subject 6

a1 > a2 > a3 a2 > a3 > a1 a3 > a1 > a2 a2 > a1 > a3 a1 > a3 > a2 a3 > a2 > a1

All possible orders have been tested with no repeats. If there is a difference between A1 vs A2 vs A3 it cannot be because everyone got a1 first, followed by a2, followed by a3 Note: if we wanted more participants, they would have to be in multiples of 6 (for this example - it depends on how many levels the IV has) To calculate how many different sequences you'd need, use n! E.g. if have 4 levels, n! = 4x3x2x1 = 24 orders! See a problem? The number of participants required escalates exponentially!!!

Within subject counterbalancing

Each subject receives all possible orders of the levels of the IV Only feasible if IV has two levels Each subject is tested A1 > A2 > A2 > A1 A2 > A1 > A1 > A2 A2 > A1 > A2 > A1 A1 > A2 > A1 > A2

Problem: should really also have:

To do that, you would be using a combination of within subject and within group counterbalancing!!! Also, it may not be possible/desirable for each subject to experience each condition more than once I would just stick to within group counterbalancing

6. Incomplete counterbalancing This is when you test only some of the orders (sequences) that are possible

13

Which sequences do you choose? A. If you want to test 3 sequences, for example, choose the sequences at random... not the best option B. Randomly determine the first sequence, then systematically rotate the sequence until each level appears once in each row and column E.g. A3 > A1 > A2 A1 > A2 > A3 A2 > A3 > A1

But then not all sequences have been tested so any order or carry over effect has not been perfectly controlled for C. Latin Square Technique (you don't want to know)

Have I worn you out yet?

Information

CHAPTER 6: THE BASICS OF EXPERIMENTATION I: VARIABLES AND CONTROL

13 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

847274


You might also be interested in

BETA
THE INFLUENCES OF SCHOOL LEADERSHIP STYLES AND CULTURE ON STUDENTS' ACHIEVEMENTS IN CYPRUS PRIMARY SCHOOLS
01097_cover.indd
WESTERGREN