回顾本章节的知识,本章在第一节中着重分析了经典测量理论中存在的缺陷与不足,以及现代测量理论IRT的优势所在。同时,介绍了IRT的基本概念、基本假设和特点。项目反应理论(IRT)依据被试在项目上的实际作答反应结果,经过数学模型的运算,统一估计出项目参数以及被试的潜在特质参数,可以克服经典测量理论在测验中存在的误差估计不精确以及对于项目性能的估计依赖于被试样本等困难。
在第二节中进一步讨论各种IRT模型及其假设,在二级计分模型中,早期的测量学家为了更好地描述被试的潜在特质与项目参数提出了正态卵形模型、Logistic模型等测量模型,在实际应用中Logistic模型更为常用,在此模型下可涉及三种参数(难度、区分度、猜测),在本节中也详细介绍了各种参数的实际意义。此外,随着测验技术的发展,为了解决如何用更少的项目获取更多的信息这一问题,多级计分测验被广泛应用,常用于测量人格、智力等潜在心理特质,为了更好地评估被试的潜在特质以及项目的性能,等级反应模型(GRM)、分步评分模型(PCM)、扩展的分步评分模型(GPCM)以及名义反应模型(NRM)等各种复杂模型开始被应用到多级计分测验当中。
在第三节中,以实际数据为例,具体介绍了基于项目反应理论的二级计分模型、多级计分模型在Excel加载项eirt以及Mplus软件中的实现方法,并展示了如何解读结果,如何根据项目特征曲线、类别反应曲线和信息函数曲线来判断项目性能的优劣等。
与其他测量理论相似,项目反应理论自诞生以来,不断接受研究人员的考查与检验,仍在不断发展和精进。目前,项目反应理论被大众普遍接受,且应用十分广泛,其应用主要涉及项目功能差异的检验、测验等值、计算机化自适应测验、异常作答的诊断与识别等各个方面。相信在未来的研究中,项目反应理论的存在能够大力推动心理与教育测量领域的蓬勃发展。
曹亦薇.(2001). 异常反应模式的识别和分类. 心理学报,33(6),558-563.
曹亦薇.(2003). 项目功能差异在跨文化人格问卷分析中的应用. 心理学报,35(1),120-126.
陈婧,康春花,钟晓玲.(2013). 非参数项目反应理论回顾与展望. 中国考试,6,18-25.
陈平,丁树良,林海菁,周婕.(2006). 等级反应模型下计算机化自适应测验选题策略. 心理学报,38(3),461-467.
戴海崎,张锋,陈雪枫.(2018). 心理与教育测量(第四版). 广州: 暨南大学出版社.
康春花,辛涛.(2010). 测验理论的新发展:多维项目反应理论. 心理科学进展,18(3),530-536.
刘红云,骆方.(2008). 多水平项目反应模型在测验发展中的应用. 心理学报,40(1),92-100.
刘拓,曹亦薇,戴晓阳.(2011a). 个人拟合指标在艾森克人格测验中的应用. 中国临床心理学杂志,19(3),323-326.
刘拓,曹亦薇,戴晓阳.(2011b). 个人不拟合对IRT项目参数估计的影响及净化对策. 中国临床心理学杂志,19(5),323-326.
刘拓,戴晓阳.(2011). 不拟合被试对测验信、效度的影响. 中国临床心理学杂志,19(6),743-745-762.
罗照盛.(2012). 项目反应理论基础. 北京: 北京师范大学出版社.
罗照盛,欧阳雪莲,漆书青,戴海琦,丁树良.(2008). 项目反应理论等级反应模型项目信息量. 心理学报,40(11),1212-1220.
毛秀珍,辛涛.(2011). 计算机化自适应测验选题策略述评. 心理科学进展,19(10),1552-1562.
漆书青,戴海崎.(1992). 项目反应理论及其应用研究. 南昌: 江西高校出版社.
漆书青,戴海崎,丁树良.(2002). 现代教育与心理测量学原理. 北京: 高等教育出版社.
任世秀,古丽给娜,刘拓.(2020). 中文版无手机恐惧量表的修订. 心理学探新,40(3),247-253.
涂冬波,蔡艳,戴海崎,漆书青.(2008). 现代测量理论下四大认知诊断模型述评. 心理学探新,28(2),64-68.
涂冬波,漆书青,戴海琦,蔡艳,丁树良.(2008). 教育考试中的认知诊断评估. 考试研究,4(4),4-15.
王烨晖,边玉芳,辛涛.(2011). 垂直等值的应用及最新发展述评. 心理学探新,31(5),472-476.
王昭,郭庆科,岳艳.(2007). 心理测验中个人拟合研究的回顾与展望. 心理科学进展,15(3),559-566.
辛涛,乐美玲,张佳慧.(2012). 教育测量理论新进展及发展趋势. 中国考试,5,3-11.
辛涛,刘拓.(2013). 认知诊断计算机化自适应测验中选题策略的新进展. 南京师大学报(社会科学版),6,81-87.
叶萌,辛涛.(2015). 测验链接中的锚题代表性研究. 心理科学,38(1),209-215.
余娜,辛涛.(2009). 认知诊断理论的新进展. 考试研究,5(3),22-34.
曾秀芹,孟庆茂.(1999). 项目功能差异及其检测方法. 心理学动态,7(2),41-47+57.
郑日昌.(1987). 心理测量. 长沙:湖南教育出版社.
Andrich,D.(1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement,2 (4),581-594.
Baker F.& Kim S.H.(2004). Item Response Theory: Parameter Estimation Techniques. CRC PRESS,Taloy &Francis Group.
Binet,A.,& Simon,T. H.(1916). The development of intelligence in children. Vineland,NJ: The Training School.
Birnbaum,A.(1957). Efficient design and use of tests of a mental ability for various decision-making problems. Series Report No. 58-16, Texas.
Birnbaum,A.(1958). Further considerations of efficiency in tests of a mental ability. Technical Report No. 17, Texas.
Birnbaum,A.(1958). On the estimation of mental ability. Series Report No. 15, Texas.
Birnbaum,A.(1968). Some latent trait models and their use in inferring an examinee’s ability. In F.M. Lord &M.R. Novick (Eds.),Statistical theories of mental test scores (pp. 395-479). Reading,MA: Addison-Wesley.
Bock,R. D.(1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika,37 (1),29-51.
Bock,R. D.,& Aitkin,M.(1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika,46 (4),443-459.
Brown F.G.(1985). Psychology Testing and Assement. Ally & Bacon.
De Ayala,R.J.(2009). The theory and practice of item response theory. Guildord Press.
DeVellis,R. F.(2006). Classical test theory. Medical Care,44 (Suppl 3),S50-S59.
Eisinga,R.,Grotenhuis,M. T.,& Pelzer,B.(2013). The reliability of a two-item scale: Pearson,Cronbach or Spearman-Brown?. International Journal of Public Health ,58(4),637-642.
Embretson,S.E.,Reise,S.P.(2000). Item Response Theory for Psychologists. Lawrence Erlbaum Associates,Inc.: Mahwah,NJ.
Fraley,R. C.,Waller,N. G.,& Brennan,K. A.(2000). An item response theory analysis of self-report measures of adult attachment. Journal of Personality and Social Psychology , 78 (2),350-365.
Hambleton,R.K.,& Swaminathan,H.(1985). Item Response Theory: Principles and applications. Springer Science & Business Media.
Hansen,J.,Sadler,P.,& Sonnert,G.(2019). Estimating High School GPA Weighting Parameters With a Graded Response Model. Educational Measurement: Issues and Practice , 38 (1),16-24.
Holland,P. W.,& Dorans,N. J.(2006). Linking and equating. In R. L. Brennan.(Ed.),Educational measurement (4th ed.)New York: American Council on Education/Praeger.
Jackson,P. H.(1973). The Estimation of True Score Variance and Error Variance in The Classical Test Theory Model. Psychometrika , 38 (2),183-201.
Karabatsos,G.(2003). Comparing the Aberrant Response Detection Performance of Thirty-Six Person-Fit Statistics. Applied Measurement in Education,16 (4),277-298.
Koch,W. R.(1983). Likert Scaling Using the Graded Response Latent Trait Model. Applied Psychological Measurement , 7 (1),15-32.
Lawley,D. N.(1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh: Section A Mathematics,61 (3),273-287.
Lawley,D. N.(1944). The factorial analysis of multiple item tests. Proceedings of the Royal Society of Edinburgh: Section A Mathematics,62 (1),74-82.
Linacre,J. M.(1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre,J. M.,& Wright,B. D.(2002). Construction of measures from many - facet data. Journal of Applied Measurement,3 (4) , 486-512.
Liu,T.,Lan,T.,& Xin,T.(2019). Detecting Random Responses in a Personality Scale Using IRT-Based Person-Fit Indices. European Journal of Psychological Assessment,35 (1),126-136.
Liu,T.,Sun,Y. C.,Li,Z.,& Xin,T.(2019). The Impact of Aberrant Response on Reliability and Validity. Measurement: Interdisciplinary Research and Perspectives,17 (3),133-142.
Lord,F.M.(1952). A theory of test scores (Psychometric Monograph No. 7) . Richmond,VA:Psychometric Corporation.
Lord,F.M.(1980). Applications of item response theory to practical testing problems. Hillsdale,NJ:Erlbaum.
Magno,C.(2009). Demonstrating the Difference between Classical Test Theory and Item Response Theory Using Derived Test Data. The International Journal of Educational and Psychological Assessment, 1(1),1-11.
Masters,G. N.(1982). A Rasch model for partial credit scoring. Psychometrika, 47,149-174
Masters,G. N.,& Wright,B. D.(1997). The Partial Credit Model. In: W. J. van der Linden & R.K. Hambleton(Eds.). Handbook of Modern Item Response Theory. Springer,New York,NY.
Muraki,E.(1992). A Generalized Partial Credit Model: Application of an EM Algorithm. Applied Psychological Measurement,16 (2) , 159-176.
Matlock,K. L.,& Turner,R. C.(2016). Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions. Educational and Psychological Measurement,76 (2),258-279.
Meijer,R. R.(1996). Person-fit research: An introduction. Applied Measurement in Education,9 (1),3-8.
Meijer,R. R.,& Sijtsma,K.(2001). Methodology Review: Evaluating Person Fit. Applied Psychological Measurement,25 (2),107-135.
Meijer,R. R.,Tendeiro,J. N.,& Wanders,R. B. K.(2014). The use of nonparametric item response theory to explore data quality. In: S. P. Reise & D. A. Revicki (Eds.). Handbook of item response theory modeling: Applications to typical performance assessment. New York:Routledge.
Nering,M. L.,& Meijer,R. R.(1998). A Comparison of the Person Response Function and the lz Person-Fit Statistic. Applied Psychological Measurement,22 (1),53-69.
Parkin,J. R.,Beaujean,A. A.,Firmin,M. W.,Qiu,X.,& Firmin,R. L.(2018). Validity and Reliability Evidence for the Comprehensive Test of Nonverbal Intelligence - Second Edition Scores from an Independent Sample. Journal of Psychoeducational Assessment,36 (5),423-435.
Rasch,G.(1960). Probabilistic models for some intelligence and attainment tests. Chicago:University of Chicago Press.
Rasch,G.(1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests.
Raju,N. S.(1990). Determining the significance of estimated signed and unsigned Are as between to item response functions. Applied Psychological Measurement,14 (2),197-207.
Reise,S. P.,& Flannery,P.(1996). Assessing Person-Fit on Measures of Typical Performance. Applied Mea surement in Education , 9(1),9-26.
Richardson,M. W.(1936). The relation between difficulty and the differential validity of a test. Psychometrika, 1(2),33-49.
Samejima,F.(1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17) . Richmond,VA: Psychometric Society.
Thomas,M. L.(2011). The Value of Item Response Theory in Clinical Assessment: A Review. Assessment,18 (3),291-307.
Traub,R. E.(1983). A priori consideration in choosing an item response model. In R. K. Hambleton (Ed.)Applications of item response theory (pp. 57-70). Vancover,B.C.: Educational Research Institute of British Columbia.
Tucker,L. R.(1946). Maximum validity of a test with equivalent items. Psychometrika,11 (1),1-13.
van den Berg,S. M.,Glas,C. A.,& Boomsma,D. I.(2007). Variance Decomposition Using an IRT Measurement Model. Behavior Genetics ,37(4),604-616.
van der Linden,W. J.,& Hambleton,R. K.(1997). Handbook of Modern Item Response Theory New York:Springer.
van der Linden,W. J.,& Glas,G. A. W.(2000). Computerized Adaptive Testing: Theory and Practice . Springer.
Wainer,H.,& Lewis,C.(1990). Toward a Psychometrics for Testlets. Journal of Educational Measurement , 27 (1),1-14.
Weiss,D. J.,& Kingsbury,G. G.(1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement,21 (4),361-375.
Wright,B. D.(1968). Sample-Free Test Calibration and Person Measurement Proceedings of the 1967 Invitational Conference on Testing Problems. Educational Testing Service. Princeton,Nj .