|
Image source - http://www.smaritan.in/images/psychologic_img.jpg |
by j.w.gibson 2005
Abstract
A summary of the important issues associated with psychological tests and measurements is offered. Topics such as reliability, validity, bias, and errors are examined. Intellectual, personality, neuropsychological, and disability and workplace assessment are briefly discussed. After careful review, the author then offers a critical analysis of relevant testing and measurement practices and theory, culminating in a synthesis of ideas into a modern psychological perspective on the nature, use, and purpose of psychological assessment.
A Summary, Critical Analysis, and Synthesis of Issues and Aspects in Psychological Tests and Measurements
Human curiosity is a natural fact. We see images, we manipulate objects, we communicate ideas and emotions, and we try desperately to organize our world into a neat little sphere that we can hold in our hands. Humans organize to understand, categorizing everything possible in order to find our place in this fragile world and the immensity of the universe.
As we investigate outward, we are drawn ever back into ourselves and find that an organization of our minds and emotions is inevitable. The field of psychology is challenged with the awesome task of unveiling mysteries deep within the human condition. Psychological assessment then is the light by which this discipline must find its path. However, the tools of assessment that psychologists use are never completely perfect; they sometimes paint pictures that are misleading. Constant discourse, scrutiny and investigation will enable future knowledge an edge toward a more complete interpretation of the variance of the human condition. The road is dark and changing. We must be clever in the use of our light, patient with what it shows us, and ever mindful to challenge the truths we might see.
Summary of Psychometrics
The myriad of issues regarding psychological tests and measurements make finding a starting point painfully difficult. To be sure, several definitions are necessary before one can begin. According to Cohen & Swerdlik (2002) psychological assessment refers to the gathering and integration of data for the purpose of making a psychological evaluation, by using tools such as tests, interviews, case studies, observations, and special measurement apparatuses and procedures. In other words, assessment is a more holistic venture, relying on different tools used together. In contrast, psychological testing can be understood as “the process of measuring psychology-related variables by means of devices or procedures designed to obtain a sample of behavior” (Cohen & Swerdlik, 2002, p. 4). While it may be difficult to clearly identify a line between testing and assessment, generally, testing is a singular activity aimed at a specific purpose; whereas assessment is more comprehensive in scope and accomplishment. More than one type and style of psychological test may be used in an assessment plan.
Psychological tests can differ in a number of ways, such as content, format, administration procedures, scoring and interpretation procedures, and psychometric or technical quality (Cohen & Swerdlik, 2002). Regardless, any test or assessment tool that hopes to be taken seriously must be able to prove that it is reliable, valid, normed, standardized, and free from unreasonable bias.
The reliability of a test speaks to its consistency in measurement (Cohen & Swerdlik, 2002, p. 128). A test for depression must have a high reliability of measuring depression in different individuals without an excessive amount of error. Unfortunately, small amounts of measurement error are unavoidable for any test. The test takers may try harder, guess luckier, be more alert and awake, feel less anxious, or be healthier on a given occasion (Joint Committee on Standards for Educational and Psychological Testing, 1999, p.25). During test construction, questions may be worded differently, or the content may be slightly altered from question to question. These small and other not so small variances serve to disintegrate the reliability factors of tests.
Reliability estimates are generally ascribed to tests using different methods. Test-Retest estimates rely on a test taker to score similarly on two different occasions. A test's reliability may also be measured by using alternate-forms or parallel-forms. Cohen and Swerdlik (2002) discuss how coefficients of equivalence can be derived using these techniques. Split-Half reliability estimates are especially useful when “it is impractical or undesirable to assess reliability with two tests or to have two test administrations” (Cohen & Swerdlik, 2002, p. 133). Split-half reliability is the practice of dividing a test in half, separately scoring each equivalent half, and then comparing scores using the Spearman-Brown formula (Cohen & Swerdlik, 2002). Psychologists may also utilize inter-scorer reliability or a combination of several estimates.
Tests must also validly measure what they purport to measure. A self-report test of eating disorders should be consistently able to identify the majority of testees who have a diagnosable eating disorder. Otherwise, such tests are worthless to clients and psychologists. Validity also “refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (Joint Committee on Standards for Educational and Psychological Testing, 1999, p.9). New psychological assessments may need to undergo revisions as the validation of tests scores relies in part on the conceptual framework of the actual test (Cohen & Swerdlik, 2002). As a clearer picture of a certain trait begins to emerge, the test can be focused. As this process continues, the validity of a test becomes stronger.
Beyond concerns of validity and reliability, any professional who uses educational, psychological, or other types of tests must be aware of any sources of error and bias that might exist. Since “bias is a factor inherent within a test that systematically prevents accurate, impartial measurement” (Cohen & Swerdlik, 2001, p. 179), it is essential to eliminate as much potential and actual bias as possible before the results of the test can be utilized effectively. The Joint Committee on Standards for Educational and Psychological Testing (1999) has written standards of practice with reference to eliminating possible error and bias in testing. The committee identified several different areas that need to be carefully examined by administrators. These areas cover bias both inside and outside of a test and test situation:
- Construct irrelevant components—those items that may lower or higher scores for different groups of examinees.
- Content related—those items, especially in educational testing, which discuss how well a given test covers the domain and whether that domain is appropriate. How clear the questions and instructions are written. The type of response necessary from the test taker, i.e. essay, short answer, bubble, etc.
- Testwiseness—those issues relating to the familiarity with skills to take a test, answer questions in a timely manner, and ability to guess well on questions the test taker does not know.
- Equitable treatment of test takers—those issues that are concerned with fair treatment of all test takers by the administrator.
- Testing environment—those aspects related to the physical test environment, such as temperature, comfort level, noise, etc.
- Perceived relationship between test taker and administrator—aspects related to how test administrator and test taker relate during the evaluation period.
- State of test taker—emotional, physical, mental condition of the test taker.
Even if all test biases can be limited to a hypercritical point, the psychologist must still consider the possible sources of error in the interpretation of scores or responses on any given evaluation. Raters may be too lenient in their scoring, known as leniency error, may take an extreme negative attitude, severity error, may fail to give any extreme scores causing the test taker’s scores to center in the continuum, central tendency error, or may “see” the ratee in an excessively positive manner, causing the scores to be unnaturally skewed, halo effect (Cohen & Swerdlik, 2002, p.181-182).
In addition to the many sources of error in construction, development, implementation, scoring, interpretation, and application, psychologists must utilize a number of different psychometric tools in order to assess clients in a wide spectrum of areas. Different aspects psychologists may need to assess are intelligence (including achievement tests and aptitude tests), personality (including behavioral aspects), and neuropsychological mental status. The assessment of these areas differs greatly with regard to the specific design of tools, approach to assessment, testing environment, data collection, and interpretation of client information. The variety and style of tests and assessments is as varied as the human condition. The choice of which test to use in any given situation depends upon a number of criteria including the purpose of assessment, time available, monetary cost, or level of diagnosis.
Tests designed to measure intelligence take varied forms dependent upon the test creator’s theory of intelligence (Cohen & Swerdlik, 2002). While there are literally hundreds of intelligence tests available, several have taken the spotlight in recent times. The Stanford-Binet Intelligence Scale (SB:FE) is considered a sound, reliable, and valid measure of overall general ability (Cohen & Swerdlik, 2002). Recently, however, its use has declined significantly by practitioners due to the ease and comprehensiveness of the Wechsler tests.
The Wechsler tests are perhaps the most widely used intelligence assessments. This may be due to the ease of administration, exceptionally good reliability and validity coefficients, and the fact that because so much research has been done on these tests, psychologists consider them extremely reliable. Many other types of intelligence assessments are available. “The Kaufman Adolescent and Adult Intelligence test focuses on fluid and crystallized intelligence while also assessing immediate and intermediate-term memory” (Daniel, 1997). A popular intelligence test in school settings is the Woodcock-Johnson Tests of Cognitive Ability-Revised. This test has been called a thorough implementation of a multifactor model, assessing seven dimensions of ability (Daniel, 1997).
Measures of personality can be divided into two distinct categories: Objective and Projective. Objective measures contain short answer items where the assessee selects one response from two or more answers (Cohen & Swerdlik, 2002). The respondent’s pattern of responses is measured and interpreted in order to measure the strength or absence of a given personality trait or state. Objective measures may be administered by computer or paper-and-pencil, and scored easily by either a computer or template. Generally objective measures are quick, cheap, and beneficial for preliminary identification of personality factors. However, objective measures have distinct disadvantages such as the test taker “faking” good or bad in order to impart a certain picture of themselves to the assessor for various reasons.
Personality tests can be subjective as well as objective. Subjective measures require the assessee to make a judgment on a particular piece of unstructured stimuli; of which the assessor then uses the information given to discover personality aspects (Cohen & Swerdlik, 2002). The advantage of projective measures is their flexibility. Masling (1997) has written specifically on the ability of projective measures to predict long-term behavior more efficiently than objective tests. The Rorschach Inkblot Test may be the most widely recognized icon of psychology. This controversial personality test uses inkblots smashed on white cards as the unstructured stimuli. The Thematic Apperception Test (TAT) is a collection of 31 cards, one blank, illustrating human situations. Both tests require the assessee to project responses, which are written down verbatim by the assessor. The responses as well as body language are then analyzed to determine personality aspects.
Neuropsychological assessment entails a very different form of test batteries. Cohen & Swerdlik (2002) take their definition of neuropsychological evaluation from Benton (1994), who states that the objective of this evaluation “is to draw inferences about the structural and functional characteristics of a person’s brain by evaluating an individual’s behaviour in defined stimulus-response situations” (Benton, 1994, p.1).
Under this definition, evaluation of the biological basis of behavior caused by deficiencies of the brain require a more comprehensive examination, consisting of an interview, case history or case study, and battery of psychological, behavioral, intellectual, functional, memory, and perceptual-motor tests.
Psychologists may also be called on to assess persons of disability for education, workplace, or legal reasons. Persons with disabilities are protected under federal law and granted certain accommodations in the work and school environment. The types of tests administered to assess disabilities may include visual, hearing, motor, or cognitive functioning batteries. Because disabled persons are protected under the Americans with Disabilities Act, proper assessment and sensitivity are very crucial to both employers and schools.
Work place assessment is a growing trend with large companies. Selection of employees that are best suited and capable of performing certain duties is important to companies that wish to maintain a productive and safe working environment. To many companies, assessment of aptitude, physical ability, motivation, personality, and organizational and leadership skills makes business sense because of the value of future potential predictors of efficiency, growth, productivity, motivation, and satisfaction (Joint Committee on Standards for Educational and Psychological Testing, 1999).
Clearly, psychological evaluation is a complex matter. The multitude of tests, the differing purposes of assessment, the complexity of reliability and validity issues, the variety of situational testing factors, and the growing ability of technology, create an environment so grandiose that psychologists must be constantly vigilant with regards to evaluation processes, or run the risk of falling by the wayside.
Critical Analysis
The American Psychological Association has estimated that upward of 20,000 new psychological tests are developed every year (Cohen & Swerdlik, 2002). The sheer volume and diversity of psychological assessments is both a blessing and a curse to psychological measurement. With so many tools available, the modern psychologist must have a crystal clear understanding of the theory and purpose of testing in any given situation. Testing theory seems to be as differential as psychological theories of personality and intelligence. According to Cohen and Swerdlik (2002), a psychologist necessarily makes twelve assumptions in any testing process. These assumptions serve to drive the creation of psychological tests, the theoretical framework from which they come, the situations in which they will be applied, and how interpreted results will be utilized in a given setting. The assumptions also provide as an excellent springboard for an analysis of the complex issues in testing and measurement.
Assumption # 1
The first assumption is that psychological traits and states exist (Cohen & Swerdlik, 2002). Traits and states differ in that a psychological trait is seen as a relatively enduring aspect of a person, whereas a state is less enduring. Arguably, these are not stringent definitions. Traits and states then should be distinguishable from each other. “The word distinguishable conveys the idea that behavior labeled with one trait term can be differentiated from behavior that is labeled with another trait term” (Cohen & Swerdlik, 2002, p. 13). Inextricable from test construction theories is the fact that in order to measure some facet of human behavior, we must be able to identify the domain to be measured. It has been stated that for testing and evaluation purposes, psychological traits do not exist except as constructs—an informed scientific idea developed or constructed to describe or explain behavior (Cohen & Swerdlik, 2002).
Construct development is not an easy task. The pinpointing of “personality” or “intelligence” depends upon a mountain of choices. What theory of personality is the assessor going to use (i.e. psychoanalytic, behavioral, etc.)? What items make up personality? Which specific aspect of the defined collection of aspects of personality will the assessor be trying to isolate? Which observed behaviors would serve to suggest a specific personality aspect is present? What other factors, intrinsic and extrinsic, might also produce similar behavior to the observed behavior? With out an anchor, any professional attempting to delineate a useful construct needs some guidance.
The Joint Committee on Standards for Educational and Psychological Testing (1999) has written standards to address this and other issues concerning test development. Standard 3.2 specifically states:
The purpose(s) of the test, definition of the domain, and the test specifications should be stated clearly so that judgments can be made about the appropriateness of the defined domain for the stated purpose(s) of the test and about the relation of items to the dimensions of the domain they are intended to represent ( p. 43).
The importance of choosing a specific enough domain (construct) to measure cannot be understated as it is the beginning point of any test construction.
Assumption # 2
Closely related to the first assumption, the second assumption is that the defined traits and states can be reliably quantified and measured. Deciding how to measure some aspect is nearly as difficult as deciding what to measure. Cohen & Swerdlik (2002) use the term “aggressive” to illustrate this point. An “aggressive salesperson,” “aggressive killer,” “aggressive dancer,” and “aggressive football player” all utilize “aggressiveness” in a different manner (Cohen & Swerdlik, 2002, p. 14). So how does one go about measuring such a phenomena? Very carefully.
We create data from the measurement device in the hopes that it will provide us with a pattern or relationship of the behavior. It is assumed that this illumination is possible because behavior aspects can be quantified into a pattern of numbers and symbols. Since these numbers and symbols have no place in the natural world outside of humanity, it is critical to remember that facts are created by those who look for them, that facts, scientific and otherwise, are forever constructions by us, and do not tangibly exist.
Assumption # 3
Just as an inch and centimeter can both measure length, so too different types of assessments are believed to be useful in measuring various aspects. Psychologists come from a wide array of theoretical frameworks, and each theory has its own methods of measuring constructs dependent upon their definition and source. A collection of different methods seems, logically, more sound than does one type or style of test. However, not all test styles and types provide equivalent data on a given subject. For example, although subjective personality measures are widely seen as less reliable than objective measures because of the nature of their ambiguity in test administration and interpretation, some authors (i.e. Masling, 1997; Exner, 1986) have noted that long-term behavior can more reliably be predicted with subjective tests.
Tests may vary in the way they are linked to a particular theory, in which the test items are selected, whether they are developed rationally vs. empirically; in the way they are presented, in the way they may be administered, scored, interpreted, and applied (Cohen & Swerdlik, 2002). The range of test options provides ample opportunity for a psychologist to assess some aspect in a number of different ways. Because more than one psychological test can measure a given construct, or characteristic (Joint Committee on Standards for Educational and Psychological Testing, 1999), the exact purpose of evaluation and theoretical background of the psychologist will greatly influence which type of test will be used in a certain situation. Indeed, a diligent psychologist is obligated to utilize a number of different tests and methods.
Assumption # 4
In order to justify testing, it must be assumed that assessment is able to provide answers to momentous questions. Cohen & Swerdlik (2002) argue that users of tests and assessments must believe that the process of assessment is capable of providing useful answers. Forensic psychologists are often required to give “expert” testimony on the mental status of individuals. It would be contrary to the judicial system if the assessments used to determine the mental state of such individuals were not up to the task. Confidence in a test to measure what it reliably is supposed to measure is built up by research.
Assumption # 5
To be useful, assessments must pinpoint certain phenomena. The entire field of clinical psychology could be said to rest upon this assumption. If psychologists are to be useful to their clients then they must be able to diagnose psychological functions or behaviors. A diagnostic test may be administered to help guide a clinician towards future more specific avenues of assessment (Cohen & Swerdlik, 2002). Correctly diagnosing a patient should be a direct outcome of good assessment.
The modern psychologist has many well-researched assessments available to their disposal. For instance, Daniel (1997) has noted that since the mid-1980s, at least half a dozen new or fundamentally restructured intelligence batteries have been published, and the trend does not seem to be slowing. With so many well-researched tools at their disposal, psychologists have a wide range of choices in both how and what will be measured (Daniel, 1997). For the purposes of educational assessment, educators use a range of diagnostic tools geared to measure visual, auditory, motor, and cognitive functioning. Legal issues insuring the right of all students to the least restrictive class environment possible work to assure that proper testing is accomplished. However, given the dire state of funding in education, access to qualified licensed school psychologists is insufficient to make substantial change in the quality of mental health of many students.
While more violence has been reported in public schools, the number of children struggling with delinquency has also increased. According to a study done by Wilson, Lipsey, and Derzon (2003), programs aimed at decreasing aggressive behavior in schools have the greatest effect when they consist of behavioral and counseling strategies. In contrast, those programs that utilized peer mediation and multimodal strategies showed the smallest effects (Wilson et. al, 2003). The proper diagnosis of students with aggressive behavior is better left for qualified school psychologists. Unfortunately, teachers who do not have such training are often the only intervention with aggressive students, and tend to utilize mediation strategies that are less effective.
Assumption # 6
Because assessment is a multifaceted approach to evaluation, it is assumed that many sources of data are apart of this process. “Testing and assessment professionals understand that decisions that are likely to significantly influence the course of an examinee’s life are ideally made not on the basis of a single test score but, rather, from data from many different sources” (Cohen &Swerdlik, 2002, p. 18). Psychologists wishing for the best possible picture of a client’s condition should utilize interviews, personal history, and other types of information. It would be unfair to assess an alleged murderer solely with a test designed at identifying aggressive behavior. Interview material, and past medical records might serve to build a stronger picture of the ability of such a person to commit the alleged act.
Assumption # 7
Error is simply a part of the assessment process. Cohen and Swerdlik (2002) write “potential sources or error are legion” (p. 18). Humans are not perfect and cannot act so. Anything produced and evaluated by a human is imperfect. The overwhelming realization that every diagnosis is based only on an approximation of an evaluation, not on a “true” evaluation, is enough to make one cringe. Factors other than what a test attempts to measure, will, to some extent, influence an individual’s performance on a test (Cohen & Swerdlik, 2002). In addition, test administrators and developers are also sources of error. Because there are many sources of bias and error in psychological testing, the practicing psychologist must be well versed in the statistical analysis of individual tests, open to collaboration with other knowledgeable professionals, reflective and consistent in testing practices, and treat all test takers in an equal fashion. Analysis of results should always be critically scrutinized so that conclusions based upon interpretations from test results reflect a high level of validity and usefulness.
The systematic reduction of possible sources of error is necessary for any given assessment tool. Fortunately for psychologists, standardization of error management has been discussed widely, and standards from the Joint Committee on Standards for Educational and Psychological Testing (1999) provide a structured blue print for error analysis. Possibly the best source of error feedback comes from within the discipline of psychology. As stated earlier, the APA estimates more than 20,000 new psychological assessments are published every year (APA, 1993, as cited in Cohen & Swerdlik, 2002). Of these, many will not be used widely by the disclipline, some, however, will be researched and debated by professionals in the field; and it is this scholarly debate that will drive the error analysis in a productive direction.
Assumption # 8
All tests and measurement devices have strengths and weaknesses. The ability of tests to measure constructs effectively must be researched and discussed before such tests are useful to psychology. An argument for collaboration with other test professionals is that they share different expertise with different psychological assessment tools. Understanding the limitations of a test is “emphasized repeatedly in the codes of ethics of associations of assessment professionals” (Cohen & Swerdlik, 2002, p. 19).
Assumption # 9 and # 10
Psychologists must assume that test related behavior is capable of predicting non-test behavior. In general, both testing and assessment are preformed under the presumption that meaningful generalizations can be made from the test data to behavior the lies outside of the specific testing situation (Cohen & Swerdlik, 2002). Often, psychological tests require a response that has nothing to do with the actual measured behavior. For example, a student may be asked to write T-for true or F-for false on a number of questions. The behavior of writing T’s and F’s has no relevance to the test’s ability to measure, say, “depression” or “self-confidence.” Computerized assessments frequently require a test taker to press a key signifying a response. It might seem rather odd for a psychologist to assess the individuals ability to press a computer key, unless the test was a measure of motor functioning.
Testing theory also must assume that individuals will duplicate or indicate real life behavior on a test. Since a test is given on a particular day, the sample of behavior might not be assessed while the behavior is clearly evident on other days. An evaluator attempting to assess manic-depression disorder might only be able to test the individual during depressive states due to the fact that during manic episodes the client is not likely to think problems exist, and may fail to seek treatment.
Training and experience with individual assessments greatly increases the likelihood that psychologists will be able to recognize the success of an instrument to delineate a certain behavior at the time of testing. Other tools of assessment, such as case history, personal or family interview, or observation, can provide a more accurate picture of behavior (Cohen & Swerdlik, 2002).
Assumption # 11
Tests must be conducted in a fair and unbiased manner. This premise is so important that Cohen and Swerdlik (2002) remark, “If we had to pick the one of these 12 assumptions that is more controversial than the remaining 11, this one is it” (p. 20). Tests not measured or interpreted in as “fair” and “unbiased” a manner as possible are worthless to the assessor, psychologist, and client. The fact that psychological measuring tools are relied upon to measure important aspects of human psychology demands that the utmost care and attention be given to their fairness.
Many sources of possible bias exist, both within the test and outside of the actual test. Cultural, language, age, sex, social and economic status, physical condition, test situation, test purpose, and a buffet of other factors not taken into consideration during test development, serve to diminish the usefulness of test data in making sound interpretations of a client’s mental condition. “Some potential problems related to test fairness are more political than psychometric in nature” (Cohen & Swerdlik, 2002, p. 20). Many assessments in social programs that may require ethnic or cultural information are surrounded with stigma. With a realization of the different backgrounds test takers may come from and the different purposes for the assessment, one can reflect on the possible sources of error, and in good faith, try to limit, to the extent possible, deviation in scores due to these influences.
A person’s true score is a hypothetical error-free value that characterizes an examinee at the time of testing (Joint Committee on Standards for Educational and Psychological Testing, 1999). Because this true score cannot exist in actuality, estimates of deviation from the examinee’s actual score on a given assessment are assigned as measurement errors. Identification of measurement errors serve to strengthen the interpretation of scores by the psychologist.
“Errors of measurement are generally viewed as random and unpredictable” (Joint Committee on Standards for Educational and Psychological Testing, 1999, p. 26). The Standards for Educational and Psychological Testing call for systematic procedures to be established for evaluations so that error management can effectively take place. When tests are normed in a small cross section of society, they run the risk of unfairly biasing different cultural, and other groups. Modern psychology benefits from the action of courts which have issued rulings supporting the careful management of bias in tests, especially when used in political arenas.
Assumption # 12
Testing is useful. If the testing and assessment process was not beneficial to a variety of agencies of society it would not be a multi-billion dollar industry. It serves to categorize, prioritize, criminalize, organize, and many other –ize’s. Tests are useful to measure progress and change in the professional world as well as education. Evaluations are necessary for society to function on a highly complex level. Requirements for medical doctors and other licensed health professionals stems from the premise that these people are “highly qualified and competent” in their knowledge and abilities. With out tests and measurements there would be no accountability for qualifications of individuals in positions that demand it.
Standardized tests and measurement tools in education have placed themselves in the forefront of controversy over the last few decades. As politicians try to implement educational reform they are seeking a cheap measure of the success of new programs. Consequently, standardized achievement tests have become a norm across the country. The tests scores determine the schools future funding and in some cases, whether the administration will be taken over by state agencies if test scores fail to make some defined climb. But can testing go to far?
According to an internet article by Stephen Horowitz (2001), too much focus on testing results in a decline of creative thinking skills. Since the purpose of this type of assessment is to evaluate the students’ progress, as far as learned material, a cautionary note should be issued to those proponents and practitioners of “teaching to the test.” Educators forced to spend an abundant amount of time preparing students to take standardized achievement tests run the risk of sacrificing valuable learning; leading to poorer abilities to think critically, and in the long run, resulting in lower test scores. While education, at the moment, seems to be overburdened with standardized testing, other alternative assessments are gaining wide spread support by teacher organizations.
Closing Remarks: A Synthesis
Attempting to create an individualized perspective on the field of psychometrics is like trying to stuff an iceberg into a paper cup. The magnitude of information and relevant issues demand that any professional in the field be highly trained and aware of the statistical data supporting testing and score interpretation. Computer assessment tools have given psychologists a precious advantage in the processing of data and the formulation of models, hitherto, that did not exist or were impossible. Given that the computer technology available to psychology will only expand in the future, the development and use of computer assessments will have a profound affect on how and what aspects psychologists will continue to and be able to measure.
The author has had a difficult time coming to terms with the immensity of testing theory. One theme that seems to surface again and again is the purpose of testing. Beyond the criticals of reliability and validity studies, psychologists must ask deeply, what am I trying to assess? What tools are available? These questions pervade everything in testing. One question that seems to be less prevalent is this: What good will assessments do the client. An article by Brown & Dean (2002) suggests that in forensic settings where assessment is typically devoid of action significantly beneficial for the assessed, the possibility of useful therapy originating from the evaluation process can result in very significant growth for families and children. What good is testing if it does not directly affect the one being assessed?
Often in practice assessment is done in less than perfect environments. Managed health care organizations driven by profit try desperately to cut spending costs wherever possible, resulting in disabling decreases in the number and type of assessments psychologists have available. Chronically under funded mental health facilities cut back where necessary, eliminating properly trained personnel, leaving other, untrained assessors the complex job of assessment. In school settings, psychologists are so few that it takes weeks if not months before some students can be properly assessed for learning disabilities, social disorders, or emotional problems. Economics, perhaps, are the most driving factor in psychological measurement. With out the proper resources or trained professionals, error coefficients increase dramatically, assessment relevance to the client is diminished, and future costs of treatment are increased due to improper diagnosis.
The usefulness of evaluation cannot be touted. However, the efficiency and effectiveness of assessment needs to be analyzed and adapted where it suffers waste. Brown and Dean (2002) make an excellent observation:
“the current economics of both public and private mental health sectors in both developed and developing countries, and in adult as well as in child and adolescent areas of work, are experienced as demanding increasingly brief clinical contact with consumers. It can be anticipated, of course, that longitudinal cost-benefit studies will eventually demonstrate the brevity of clinical contact does not necessarily mean less expense to the community in the long-term” (paragraph 4 “Implications for Clinical Psychological Practice).
Though Brown and Dean (2002) are speaking specifically about the economics of clinical visits, it is easy to see how this passage easily translates, if not includes, proper clinical assessment.
A thorough clinician must not only comprehend the psychometric aspects of the evaluation tools used, he/she must persevere in using the correct tools in the proper situations whenever he/she is called upon to assess an individual. Economic factors will probably continue to be a limiting factor on the amount of “recommended” tools. What seems to supercede the importance of the economic problem is the value for the individual being assessed. What good will proper assessment due for the client? This question drives valuable, scientific, productive measurement. A clinician who remembers that he serves an individual’s interest will consistently be more effective in positively effecting change in the larger society.
References
Benton, A. (1994). Neuropsychological assessment. Annual Review of Psychology, 1994, Vol. 45 Issue 1, p1.
Retrieved on March 2, 2003, from Academic Search Premier.
Brown, P., Dean, S. (2002) Assessment as an intervention in the child and family forensic
setting. Professional Psychology: Research and Practice. Vol. 33 (3). pp. 289-293. Retrieved on February 23,
2003 from PsycARTICLES.
Cohen, R. J., & Swerdlik, M. E. (2002). Psychological testing and assessment: An introduction to tests and
measurement (5th ed.). McGraw-Hill Companies, Inc.
Daniel, Mark. (1997). Intelligence testing: Status & trends. American Psychologist. Vol.52, 10. Retrieved on
January 20, 2003 from PsycARTICLES
Exner, J. E. (1986). The Rorschach: A comprehensive system: Vol. 1. Basic foundations (2nd ed.). New York:
Wiley.
Horowitz, S. (2001). When good tests go bad. Retrieved on March 17, 2003,
from http://www.buzzrantrave.com/culture/countrylife/.
Joint Committee on Standards for Educational and Psychological Testing. (1999). Standards for educational
and psychological testing. Washington, DC: American Educational Research Association, American
Psychological Association, National Council on Measurement in Education.
Masling, J. (1997). On the nature and utility of projective tests and objective tests. Journal of Personality
Assessment. 69(2), 257-270. Retrieved on February 11, 2003 From EBESCOhost.
Wilson, S., Lipsey, M., Derzon, J. (2003). The effects of school-based intervention programs on aggressive
behavior: A meta-analysis. Journal of Consulting and Clinical Psychology. Vol. 71 (1) pp. 136-149. Retrieved