Assessment helps us shine a light on how our education systems are or are not preparing students with the knowledge and skills they need to be successful in both future jobs and as members of civil society.
Assessments can also help school district leaders, teachers, parents, and students themselves understand what they have learned and what they still need to work on. Unfortunately, many assessments have fallen short of this goal, and both teachers and parents have raised legitimate concerns about their use.
We could take these objections as a reason to abandon the entire enterprise of assessment, but that would ignore the benefit of seeing and understanding where there are gaps in student learning. Another path is to look at new technologies and their potential to improve how we assess students.
Generative AI is the type of artificial intelligence designed to produce output, especially text or images, normally requiring human intelligence. When thoughtfully adapted for education, it offers new possibilities to assess skills differently, and to assess skills that have previously been difficult to measure.
Historical limits of multiple choice
We cannot peer into peoples brains and see what they know and can do, so we have to give them activities to complete, which we use as evidence to infer those things. When we decided we wanted to be able to assess a lot of people at the same time, we turned to multiple-choice items.
They could be given to large groups simultaneously and, importantly, scored reliably and quickly. Those who build large-scale assessments found that these multiple-choice questions were able to predict future success in college and in a number of jobs.
Unfortunately, answering multiple-choice questions does not look like most of the real-world applications of knowledge and skills. We are making a big leap in inference to say that a student selecting the correct choice from amongst four options will be able to know when and how to apply that skill outside of the classroom.
This is particularly true for skills like communication and collaboration, which have mostly remained unassessed in large-scale ways. In addition, it is difficult for teachers to learn much from just knowing a student has selected an incorrect answer.
Some authors of multiple-choice questions have gotten clever to make different question options align to particular misconceptions or errors, but most teachers will say just having a quick conversation with a student would reveal more. The challenge is that it is difficult for one teacher to have a conversation with every student in their class, and there is no way to capture all of them for future reference.
How conversational AI reveals hidden understanding
Generative AI now offers the potential to address some of the shortcomings of our previous limitations. AI is very good at generating language and having conversations.
If you knew nothing about assessment and asked how we might assess someones skill at, say, persuasion (a specific communication skill), you would likely want to see them engage in a conversation where they tried to persuade someone of something.
It is now possible to set up a scenario where someone could have that conversation with an AI tool that could be prompted to respond in particular ways. The conversation could then be saved and scored.
Even when looking at the more foundational academic skills we regularly assess, rather than guessing what students are thinking when they select an option on a multiple-choice test, we could ask them to engage in conversation about what they were thinking.
has been testing an AI feature this year in order to design assessments in a way that better understands what students know. In a feature called Explain Your Thinking, students answer a traditional math question and then engage in a conversation with an AI tool about that question.
The interaction is meant to mimic the conversation a student would have with a teacher who would sit down next to them and say, Tell me what this answer means, or Tell me why you did this step here. This isnt just an open interaction with an AI tool.
Under the hood, we actually have three AI agents: one conversing, one judging the students response, and one ensuring the conversation does not give too much help to students taking a test. The AI agents are designed by an assessment teamformer teachers, assessment experts, psychometricians, content experts, designers, and engineers.
When students use AI to explain their thinking
Did this AI conversation tell us anything that the original question did not? Yes. Among students who engaged, 20% of algebra students and more than 30% of geometry students revealed understanding that they previously had not shown just by answering the question.
Information from the conversation allows more detailed summaries about student understanding for teachers, students and parents, providing information about what to work on next. When paired with our practice platform, the results can translate into instruction and practice.
We do not need to consistently rely on the same technologies in our assessments. We can gather better information about what students know and can do, which will help us better understand how to help more students learn more.

