Author, Subjects, Keywords

Cited Author

 

 
   » By Author or Editor
 » Browse Author by Alphabet
 » By Journal
 » By Subjects
 » Malaysian Journals
 » By Type
 » By Year
 » By Latest Additions
 
 
   » By Author
 » Top 20 Authors
 » Top 20 Article
 » Top Journal Cited
 » Top Article Cited
 » Journal Citation Statistics
 » Usage Since Sept 2007


 
 
 

Login | Create Account

Comparing Two Language Version Of Science Achievement Tests Using Differential Item Functioning

Ong, Saw Lan, (2007) Comparing Two Language Version Of Science Achievement Tests Using Differential Item Functioning. Malaysian Journal of Educators and Education, 22 . pp. 45-59.

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
245Kb

Official URL: http://www.usm.my/education/publication/JPP%20ONG%20SAW%20LAN%20ART%203%20(45-59).pdf

Affiliations

Universiti Sains Malaysia, School of Educational Studies.

Abstract

At the national level, the Ministry of Education in Malaysia assesses the achievement of primary school students in reading and writing, mathematics and science. The results of the assessments are used for selection decisions as well as for grading students. Since the implementation of the new language policy of teaching science and mathematics in English, both Malay and English have been used as the language of assessment. The validity of interpretation for tests results across different language version is an important issue that needs to be investigated. Translating a test from a source language to a target language does not necessarily produce two psychometrically equivalent tests. The purpose of this study is to identify item(s) in translated achievement tests that may function differently across languages. Differential Item Functioning (DIF) analysis is useful to reveal items with psychometric characteristics that have been altered by the translation. Two statistical analyses were conducted to identify and evaluate DIF item(s). The simultaneous item bias test (SIBTEST), a nonparametric statistical method of assessing DIF in an item is used. The result obtained is then compared with the one-parameter logistic model, analyze using BILOG-MG V3.0 in assessing DIF in translated items. Both statistical analyses identified approximately 50% of the science items displayed DIF. This result suggests that substantial psychometric differences exist between the two language versions of the science test at the item level.


Kementerian Pendidikan Malaysia menjalankan peperiksaan peringkat nasional untuk mentaksir pencapaian murid-murid sekolah rendah dalam bacaan dan penulisan, matematik dan sains. Keputusan pentaksiran digunakan untuk tujuan pemilihan serta penggredan. Semenjak pelaksanaan polisi mengajar sains dan matematik dalam bahasa Inggeris, ujian telah ditadbir dalam bahasa Malaysia dan bahasa Inggeris untuk membantu pelajar memahami kehendak soalan. Satu isu yang penting dalam pentafsiran keputusan ujian yang menggunakan versi bahasa yang berlainan adalah kesahan. Menterjemah ujian daripada satu bahasa kepada bahasa yang lain tidak semestinya menghasilkan dua ujian yang setara dari segi psikometrik. Tujuan kajian ini ialah untuk mengenal pasti item-item ujian pencapaian yang mungkin berfungsi secara berbeza dalam bahasa yang berlainan. Perbezaan fungsi item berguna untuk mengemukakan item yang ubah cirinya hasil daripada penterjemahan. Dua analisis statistik dilaksanakan untuk mengenal pasti dan menilai DIF. Kaedah bukan parametrik, SIBTEST dan model logistik satu-parameter menggunakan BILOG-MG V3.0 digunakan untuk menilai DIF dalam item terjemahan. Kedua-dua analisis statistik mengenal pasti hampir 50% item sains sebagai DIF. Keputusan ini mencadangkan bahawa wujudnya perbezaan psikometrik antara ujian sains dalam bahasa yang berlainan.

Item Type:Journal
Keywords:Educational assessment, Schools, Malaysia, English language, Medium of instruction
Subjects:L Education
ID Code:6544

Allalouf, A., Hambleton, R. K. and Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36, 185–198.

Allalouf, A. (2003). Revising translated differential item functioning items as a tool for improving cross-lingual assessment. Applied Measurement in Education, 16(1), 55–73.

Angoff, W. H. and Cook, L. L. (1988). Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude Rest: College Entrance Entrance Test. In T. Oakland and R. K. Hambleton (eds.). International perspectives on academic assessment. Boston: Kluwer Academic, 207–217.

Beller, M. (1995). Translated versions of Israel's Inter-University Psychometric Entrance Test. In T. Oakland and R. K. Hambleton (eds.). International Perspectives on Academic Assessment. Boston: Kluwer Academic, 207–217.

Budgell, G. R., Raju, N. S. and Quartetti, D. A. (1995). Analysis of differential item functioning in translated assessment instruments. Applied Psychological Measurement, 19, 309–321.

Clauser, B. E. and Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practices, 17(1), 31–44.

Englehard, G., Hansche, L. and Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification tests. Applied Measurement in Education, 3, 347–360.

Ercikan, K. (1998). Translation effects I international assessment. International Journal of Educational Research, 29, 543–553.

Ercikan, K. (1999). Translation DIF on TIMSS. Paper presented at the annual meeting of the National Council on Measurement in Education. Montreal, Quebec, Canada.

Ercikan, K. and McCreith, T. (2002). Effects of adaptations on comparability of test items and test scores. In D. Robitaille and A. Beaton (eds.), Secondary analysis of the TIMSS results: A synthesis of current research. Dordrecht, Netherlands: Kluwer, 391–407.

Ercikan, K., Gierl, M. J., McGreith, T., Puhan, G. and Koh, K. (2004). Comparability of bilingual versions of assessment: Sources of incomparability of English and French versions of Canada’s national achievement tests. Applied Measurement in Education, 17(3), 301–321.

Gierl, M. J. and McEwen, N. (1998). Differential Item Functioning on the Alberta Education Social Studies 30 Diploma Exams. Paper presented at the annual meeting of the Canadian society for studies in education. Ottawa, Ontario: Canada.

Gierl, M. J., Rogers, W. T. and Klinger, D. (1999). Using statistical and judgment reviews to identify and interpret differential item functioning. Alberta Journal of Educational Research, XLV(4), 353–376.

Gierl, M. J. and Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.

Gafni and Canaan-Yehoshafat (1993). An examination of differential item functioning for Hebrew and Russian-speaking examinees in Israel. Paper presented at the Conference of the Israeli Psychological Association, Ramat-Gan.

Hambleton, R. K.(1993). Translating achievement tests for use in cross-cultural studies. European Journal of Psychological Assessment, 9, 57–68.

Hambleton, R. K. (1994). Guidelines for adapting educational psychological tests: A progress report. European Journal of Psychological Assessment, 10, 229–234.

Hambleton, R. K. and Jones, R. W. (1995). Comparison of empirical and judgemental procedures for detecting differential item functioning. Educational Research Quarterly, 18, 21–36.

Hambleton, R. K. and Patsula, L. (1998). Adapting tests for use in multiple languages and cultures. Social Indicators Research, 45, 153–171.

Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures. In R. Hambleton, P. Merenda and C. D. Spielberger (eds.), Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Erlbaum.

Hulin, C. L. (1987). A psychometric theory of evaluations of item and test translations: Fidelity across languages. Journal of Cross-cultural Psychology, 67, 115–142.

Jöreskog, K. G. and Sörbom, D. (1993). LISREL 8.14: A computer program for structural equation modeling. Chicago, IL: Scientific Software.

Muraki, E. and Engelhard, G. (1989). Examining differential item functioning with BIMAIN. Paper presented at the annual meeting of the American Educational Research Asociation, San Francisco.

Plake, B. S. (1980). A comparison of statistical and subjective procedure to ascertain item validity: One step in the validation process. Educational and Phsychological Measurement, 40, 397–404.

Rengel, E. (1986). Agreement between statistical and judgemental item bias methods. Paper presented at the annual meeting of the American Educational Research Association. Washington, DC.

Rogers, H. J. (2005). Differential item functioning. In B. S. Everitt and D. C. Howell (eds.). Encyclopedia of Statistics in Behavioral Sciences. Colchester, UK: John Wiley & Sons.

Roussos, L. A. and Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenzel type I error performance. Journal of Educational Measurement, 33, 215–230.

Sandoval, J. and Miille, M. P. W. (1980). Accuracy of judgements of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48, 249–253.

Shealy, R. and Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159–194.

Shepard, L. A., Camilli, G. and Averill, M. (1981). Comparison of six procedures for detecting test item bias using both internal and external ability criteria. Journal of Educational Statistics, 6, 317–375.

Sireci, S. G. Fitzgerald, C. and Xing, D. (1998). Adapting credentialing examinations in international uses. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

Sireci, S. G. Fitzgerald, C. and Xing, D. (1999). Standards for education and psychological testing. Washington, DC: American Education Research Association, American Psychological Association, and National Council on Measurement in Education.

Sireci, S. G. and Berberoglu, G. (2000). Using bilinguals to evaluate translated assessment questions. Applied Measurement in Education, 13(3), 229–248.

van der vijver, F. J. R. and Leung, K. (1997). Methods and data analysis for cross-cultural research. Thousand Oaks, CA: Sage.

van der Vijver, F. J. R. and Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross-cultural assessment. European Journal of Psychological Assessment, 13, 21–29.

van der Vijver, F. J. R. and Tanzer, N. K. (1998). Bias and equivalence in cross-cultural assessment, European Review of Applied Psychology, 47(4), 263–279.

Repository Staff Only: item control page