Mapping Teacher Produced Tests to a Usefulness Model

Abstract views: 47 / PDF downloads: 40




ELT, Test Usefulness, Measurement and Evaluation, Teacher Produced Tests


Tests are designed as an integral part of the teaching process, necessarily including stakeholders from the onset of preparations to grade allocation, the administration of the test, and the interpretation of the results. The process commences with selecting content to evaluate, deciding upon the skills to be tested, and to meet course objectives (Giraldo & Murcia Quintero, 2019; O’Louglin, 2013; Vogt and Tsagari, 2014). Several questions arise in terms of how to standardize the development process and to evaluate their usefulness. Typically: What is the best test for our context? What does this test actually test?, What relevant information does the test provide?, How does this test affect teaching and learning behavior? and In what ways is the test useful?. Although each language program’s particular needs may differ, the answers given to the questions above provide a basis for institutional decisions. None are set in stone and at their root is the critical role testing plays in facilitating what gets learned. The current study initiated action to develop and analyze an achievement test specifically designed for a compulsory A1 level English course delivered to all freshmen students enrolled in Turkish-medium departments at state universities across Türkiye. 150 students who are enrolled in several undergraduate programs at the Faculty of Education at a state university constituted the universe of the study. The researcher analyzed the test after administration and mapped the qualities according to a test usefulness model aiming to address the research gap regarding quality teacher produced tests.



Abraham, R. G., Chapelle, C. A. (1992). The meaning of cloze test scores: An item difficulty perspective. Modern Language Journal, 76(4), 468–479. DOI:

Adams, R. J., Griffin, P. E., Martin, L. (1987). A latent trait method for measuring a dimension in second language proficiency. Language Testing. 4(1), 9–28. DOI:

Ahmad, S., Rao. (2012). A Review of the Pedagogical Implications of Examination Washback. Research on Humanities and Social Sciences, 2(7), 11- 20.

Alan, B. (2003). Novice teachers’ perceptions of an in-service teacher training course at Anadolu University. Unpublished master’s thesis. Bilkent University, Ankara.

Alderson, J. C., Hamp-Lyons, L. (1996). TOEFL preparation courses: a study of washback. Language Testing. 13(3), 280–97. DOI:

Aschbacher, P. R. (1991). Performance assessment: state of activity, interest, and concerns. Applied Measurement in Education. 4, 275–88. DOI:

Atay, D. (2008). Teacher research for professional development. ELT Journal, 62(2), 139-147. DOI:

Bachman, L. F. (1991). ‘What Does Language Testing Have to Offer?’ TESOL Quarterly. 25: 671-704. DOI:

Bachman, L. F., Lynch, B. K. & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing 12, 238-257. DOI:

Bachman, L. F. & Palmer, A.S. (1996). Language testing in practice. Oxford: Oxford University Press.

Bachman, L.F. & Eignor, D.R. (1997). Recent advances in quantitative test analysis. In Clapham, C. and Corson, D., editors, Encyclopedia of language and education. Volume 7: Language testing and assessment. Dordrecht: Kluwer Academic, 227–42 DOI:

Bachman, L.F. (1997). Generalizability theory. In Clapham, C. and Corson, D., editors, Encyclopedia of language and education. Volume 7: Language testing and assessment. Dordrecht: Kluwer Academic Publishers, 255–62. DOI:

Bachman, L. F. (2000). ‘Modern language testing at the turn of the century: Assuring that what we count counts.’ Language Testing. 17:1-42. DOI:

Bailey, K.M. (1996). Working for washback: a review of the washback concept in language testing. Language Testing. 13(3), 257–79. DOI:

Balbay, S., & Pamuk, I., Temir, T. Doğan, C. (2018). Issues in pre-service and in-service teacher training programs for university English instructors in Turkey. Journal of Language and Linguistic Studies, 14(2), 48-60.

Ballıdağ, S. (2020). Exploring the Language Assessment Literacy of Turkish In-service EFL Teachers.

Ballıdağ, S., & Inan Karagül, B. (2021). Exploring The Language Assessment Literacy of Turkish In-service EFL Teachers. Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü Dergisi. DOI:

Banerjee, J., Luoma, S. (1997). Qualitative approaches to test validation. In Clapham, C. and Corson, D., editors, Encyclopedia of language and education. Volume 7: Language testing and assessment. Dordrecht: Kluwer Academic, 275–87. DOI:

Bell, R. T. (1981). An introduction to applied linguistics. London: Batsford Academic Ltd.

Bolt, R.F. (1992). Cross Validation of item response curve models using TOEFL data. Language Testing. 9, 79–95. DOI:

Bonner, S. M., Torres Rivera, C., & Chen, P. P. (2018). Standards and assessment: Coherence from the teacher’s perspective.Educational Assessment, Evaluation and Accountability,30, 71-92. DOI:

Brown, A. (1995). The effect of rater variables in the development of an occupation-specific language performance test. Language Testing. 12(1), 1–15. DOI:

Brown, H. D. (2004). Language assessment: Principles and classroom practices. New York: Longman.

Brown, J. D. (2013). My twenty-five years of cloze testing research: So, what? International Journal of Language Studies, 7(1), 1-32.

Brown, H. D., Abeywickrama, P. (2010). Language Assessment. Principles and Classroom Practices [2003]. White Plains, NY.

Bull, M. and Yoneda, M. (2012). Designing assessment tools: The principles of language assessment. Humanities and Social Sciences, 60, 41-49.

Canale, M. & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 8, 67–84. DOI:

Cheng, L. (1999). Changing assessment: washback on teacher perceptions and actions. Teaching and Teacher Education. 15, 253–71. DOI:

Cizek, G. J. (2000). Pockets of resistance in the assessment revolution. Educational Measurement: Issues and Practice, 19(2), 16-23. DOI:

Clapham, C. (1996). The development of IELTS: a study of the effect of background knowledge on reading comprehension. Cambridge: University of Cambridge Local Examinations Syndicate/Cambridge University Press.

Cochran, J. L., McCallum, R. S., Bell, S. M. (2010). Three A’s: How Do Attributions, Attitudes, and Aptitude Contribute to Foreign Language Learning? Foreign Language Annals, 43(4), 566–582. DOI:

Coniam, D. (2009). Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality. System. 37: 226–242. DOI:

Coombs, A., DeLuca, C., LaPointe-McEwan, D., & Chalas, A. (2018). Changing approaches to classroom assessment: An empirical study across teacher career stages.Teaching and Teacher Education,71, 134-144. DOI:

Darwesh, A. J. A. (2010). Cloze tests: An integrative approach. Journal of the College of Basic Education, 15(64), 105-116.

Davies, A. (1997). The construction of language tests. London: Oxford University Press.

Douglas, D. (2000). Assessing language for specific purposes: theory and practice. Cambridge: Cambridge University Press.

Ekşi, G. (2010). An assessment of the professional development needs of English language instructors working at a state university. (Unpublished master’s thesis). Middle East Technical University, Ankara.

Fanrong, W., & Bin, S. (2022). Language Assessment Literacy of Teachers. In Frontiers in

Psychology (Vol. 13). Frontiers Media S.A. DOI:

Fulcher, G. (2012). Assessment literacy for the language classroom, Language Assessment Quarterly, 9(2), 113-132, DOI:

Galluzzo, G. R. (2005). Performance assessment and renewing teacher education the possibilities of the NBPTS standards.The Clearing House: A Journal of EducationalStrategies, Issues,and Ideas, 78(4), 142-145. DOI:

Ginther, A., Stevens, J. (1998). Language background and ethnicity, and the internal construct validity of the Advanced Placement Spanish Language Examination. In Kunnan, A.J., editor, Validation in language assessment. Mahwah, NJ: Lawrence Erlbaum, 169–94.

Giraldo, F., & Murcia Quintero, D. (2019). Language Assessment Literacy and the Professional

Development of Pre-Service Language Teachers. Colombian Applied Linguistics Journal,

(2), 243–259. DOI:

Grondlund, N. E., Linn, R. L. (1990). Measurement and Evaluation in Teaching. New York: Macmillan.

Gruba, P. and Corbel, C. (1997). Computer-based testing. In Clapham, C., Corson, D., editors, Language testing and assessment. Language testing and assessment. Dordrecht: Kluwer Academic, 141–49. DOI:

Gültekin, İ. (2007). The analysis of the perceptions of English language instructors at TOBB University of Economics and Technology regarding inset content. Unpublished master’s thesis. Middle East Technical University, Ankara.

Halliday, M. A. K. (1973). Explorations in the functions of language. London: Edward Arnold.

Heaton, J. B. (1990). Writing English language tests. New York: Longman.

Hoekje, B. & Linnell, K. (1996). Authenticity in language testing: Evaluating language tests

for international teaching assistants. TESOL Quarterly, 28(1), 103-126. DOI:

Horwitz, E. (2001). Language anxiety and achievement. Annual review of applied linguistics. 21(1), 112-126. DOI:

Höl, D. (2023). Standardized testing in Turkey: EFL teachers’ perceptions and experiences on Cambridge Young Learner Exams (YLE). International Journal of Curriculum and Instruction. 15(2), 984-1007.

Hughes, A. (2010). Testing for language teachers. Cambridge: CUP.

Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research. 60, 237–63. DOI:

Hymes, D. H. (1972). On communicative competence. In J. B. Pride & J. Holmes (Eds.), Sociolinguistics. Harmondsworth, UK: Penguin Books.

Kirschner, M., Wexler, C., Specto, E. (1992). Avoiding obstacles to student comprehension of test questions. TESOL Quarterly. 26: 537-556. DOI:

Kirschner, M., Wexler, C., Specto, E. (1996). A Teacher Education Workshop on the Construction of EFL Tests and Materials. TESOL Quarterly. 30: 85-111. DOI:

Kirkgöz, Y. (2008). Globalization and English Language Policy in Turkey. Educational Policy, 23 (5), 663-684. doi: 10.1177/0895904808316319. DOI:

Köksal, D., Erten, İ. H., Zehir Topkaya, E., Yavuz, A., Yüksel, G., Aksu, İ. E. A., Şirin, E. (2006). English Course for Young Adults: Campus Life, Nobel Yayın Dağıtım, Ankara.

Krashen, S. D. (1982). Principles and practice in second language acquisition. New York: Pergamon Press.

Lado, R. (1964). Language testing: The construction and use of foreign language tests. New York: McGraw-Hill.

Lan, C., & Fan, S. (2019). Developing classroom-based language assessment literacy for in-service

EFL teachers: The gaps. Studies in Educational Evaluation, 61, 112–122. DOI:

Lantolf, J., Frawley, W. (1985). Oral proficiency testing: a critical analysis. Modern Language Journal 69, 337–45. DOI:

Latif, M. W., & Wasim, A. (2022). Teacher beliefs, personal theories and conceptions of assessment literacy—a tertiary EFL perspective. Language Testing in Asia, 12(1). DOI:

Lewkowicz, J. (2000). Authenticity in language testing: Some outstanding questions. Language Testing. 17(1), 43‐64. DOI:

Llosa, L. (2011). Standards-based classroom assessments of English proficiency: A review of issues, current developments, and future directions for research. Language Testing. 28 (3), 367-382. DOI:

Lumley, T., McNamara, T. F. (1995). Rater characteristics and rater bias: implications for training. Language Testing. 12(1), 54–71. DOI:

Luoma, S. (2001). Test review. Language Testing, 18, (2) 225–23. DOI:

Lynch, B. K., Davidson, F., Henning, G. (1988). Person dimensionality in language test validation. Language Testing. 5(2), 206–19. DOI:

Madsen, H. (1983). Techniques in testing. Oxford: Oxford Pub.

Marzano, R. J. (2000). Transforming classroom grading. Alexandria, VA: Association for Supervision and Curriculum Development.

McNamara, T.F. (1991). Test dimensionality: IRT analysis of an ESP listening test. Language Testing. 8(2), 139–59. DOI:

Mertler, C. A. (2004). Secondary teachers’assessment literacy: Does classroom experience make a difference?American Secondary Education,33(1),49-64.

Miles, M. B. and Huberman, M. A. (1994). An expanded sourcebook: Qualitative data analysis. Newbury Park, CA: Sage.

O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test users. Language Testing, 30(3), 363-380. DOI:

Oller, J. W., Jr. (1976). Evidence for a general language proficiency factor. DieNeuren Sprachen, 76, 165–174.

Ölmezer-Öztürk, E., & Aydin, B. (2019). Investigating language assessment knowledge of efl teachers. Hacettepe Egitim Dergisi, 34(3), 602–620. DOI:

Pill, J. (2016). Drawing on indigenous criteria for more authentic assessment in a specific‐purpose language test: Health professionals interacting with patients. Language Testing, 33(2), 175–193. DOI:

Pollitt, A. (1997). Rasch measurement in latent trait models. In Clapham, C. and Corson, D., editors, Encyclopedia of language and education. Volume 7: Language testing and assessment. Dordrecht: Kluwer Academic, 243–54. DOI:

Read, J., Chapelle, C. A. (2001). A framework for second language vocabulary assessment. DOI:

Language Testing, 18(1), 1-32.

Purpura, J. E. (1997). An analysis of the relationships between test takers’ cognitive and metacognitive strategy use and second language test performance. Language Learning. 47, 289–325. DOI:

Richards, J. (2002). 30 Years of TEFL/TESL experience: A personal reflection. RELC Journal, 33:1-35. DOI:

Sahlberg, P. (2006). Education reform for raising economic competitiveness. Journal of Educational Change,7(4), 259-287. DOI:

Sariyildiz, G. (2018). Department of Foreign Language Education English Language Teaching A Study Into Language Assessment Literacy Of Preservice English As A Foreign Language Teachers In Turkish Context.

Sasaki, M. (1996). Second language proficiency, foreign language aptitude, and intelligence: quantitative and qualitative analyses. Peter Lang.

Shohamy, E. (2001). The power of tests. London: Longman.

Shohamy, E. (2020). The Power of Tests: A critical perspective on the uses of language tests(1st ed.). Routledge. DOI:

Shohamy, E., Donitsa-Schmidt, S., Ferman, I. (1997). Test impact revisited: washback effect over time. Language Testing 13(3), 298– 317. DOI:

Spolsky, B. (1978). Introduction: Linguists and language testers. In Spolsky, B. (ed.) Advances in language testing research: Approaches to language testing. Vol. 2. Washington, DC: Center for Applied Linguistics.

Spolsky, B. (2002). Prospects for the survival of the Navajo language. Anthropology and Education Quarterly, 33(2), 139-162. DOI:

Stansfield CW. Lecture (2008). Where we have been and where we should go. Language Testing. 25(3), 311-326. doi:10.1177/0265532208090155 DOI:

Şentuna, E. (2002). The interests of EFL instructors in Turkey regarding inset content. Unpublished master’s thesis. Bilkent University, Ankara.

Terwilliger, J. (1998). Semantics, psychometrics and assessment reform: a close look at ‘authentic’ assessments. Educational Researcher. 26(8), 24–27. DOI:

Tomak, B., Karaman, A. C. (2013). Mentoring in a professional development program for novice teachers at a state university in Turkey: a qualitative inquiry. The International Journal of Research in Teacher Education, 4 (2), 1-13.

Ur, P. (1996). A course in language teaching: Practice and theory. Cambridge: CUP.

Vogt, K., Tsagari, D., & Spanoudis, G. (2020). What Do Teachers Think They Want? A Comparative Study of In-Service Language Teachers’ Beliefs on LAL Training Needs. Language Assessment Quarterly, 386–409. DOI:

Wall, D. (1996). Introducing new tests into traditional systems: insights from general education and from innovation theory. Language Testing. 13(3), 334–54. DOI:

Wall, D., Alderson, J. C. (1993). Examining washback: the Sri Lankan impact study. Language Testing. 10, 41–69. DOI:

Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing. 15(2), 263–67. DOI:

Willis, D. (2003). Rules, patterns and words. Cambridge: Cambridge University Press.

Wu, W. M., Stansfield, C. W. (2001). Towards authenticity of task in test development. Language Testing, 18(2), 187-206. American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association. DOI:

Yamtim, V., & Wongwanich, S. (2014). A Study of Classroom Assessment Literacy of Primary

School Teachers. Procedia - Social and Behavioral Sciences, 116, 2998–3004. DOI:




How to Cite

Dogan, C. (2023). Mapping Teacher Produced Tests to a Usefulness Model . International Journal of Contemporary Educational Research, 10(3), 635–648.