Abstract: This study investigates the effectiveness of ChatGPT-generated multiple-choice exams in evaluating cognitive learning outcomes in civil engineering education, specifically in the subject Road Construction I at Universidad Técnica Particular de Loja, Ecuador. Using the revised Bloom's Taxonomy as a framework, a 32-question exam was developed, covering the first four cognitive levels: remember, understand, apply, and analyze. The test was administered to 101 students divided into two groups, and the results were analyzed based on difficulty and discrimination indices, as well as internal reliability using the KR-20 coefficient. Findings indicate that while ChatGPT-generated questions demonstrated acceptable internal reliability (KR-20 > 0.7) and discrimination indices, but reveal that 40–50% of questions fell outside the optimal difficulty range. Unexpectedly, higher-order cognitive questions yielded better scores, underscoring both the potential and challenges of AI in creating balanced assessment. This study underscores the potential of ChatGPT as a tool for generating assessment instruments but also identifies limitations, particularly in creating balanced difficulty distributions and higher-order cognitive questions.
Keywords: Bloom's Taxonomy, Road Design Learning, Cognitive Assessment
DOI: 10.24874/PES08.01B.024
Recieved: 23.04.2025 Revised: 17.07.2025 Accepted: 05.08.2025
UDC:
Reads: 887 