Navigator Episode 3: CPSx Game Study

For Ep. 3 of the ACTNext Navigator podcast, we go in-depth on the CPSx game study that measures collaborative problem-solving. In this episode, Saad Kahn and Dave Edwards discuss the game, “Crisis in Space,” they use to gather data from players. The data comes from natural language processing, eye-tracking, and other records.

Saad introduced Dave’s presentation at an ACT Power Hour lunch.

This podcast includes an audio clip from KCRG TV’s news story about the CPSx study in Iowa City and Cedar Rapids schools. The clip is used with their permission. Thank you!

Podcast transcript:

You are listening to episode three of the ACTNext Navigator podcast. I’m Adam Burke. We’re talking again about a video game.

{Pacman sound effects]

No, not that one. This is our follow-up podcast about the collaborative problem-solving game ACTNext is developing called CPSx. The game was featured in a recent TV news story by KCRG reporter Aaron Scheinblum. He interviewed ACTNext data scientist Dave Edwards.

(KCRG news clip) You start out not really knowing what to do and the whole point of it is to try to figure out what to do there’s a crisis in space. Chase and Athena are the two hoping to solve it got the second wife left it’s a problem solving game students are volunteering to play they’re placed in separate rooms with a headset as their only way to work together for exactly three seconds but kind of struggling a little bit but after a little bit as good but it’s actually a study that ACT is monitoring or just trying to figure out ways of improving our ability to measure that clever problem-solving specifically really in order to help students learn how to collaborate better they’re tracking everything from communication skills to where someone focuses their eyes. We’ll watch the players face and see whether or not they’re showing signs of frustration or happiness especially in the face of failure and success. Educators are hoping this game could make a difference to help students learn.

Not just Fortnight or some of those other fun games that they play on their free time but it can also be used to solve real-world problems and prepare them for life after high school.

If they find it fun and it’s helpful then we’ve really kind of hit the sweet spot of something that they’re wanting to do that also helps. Those students trying to beat the clock here’s some advice… just talk a lot make sure you know your partner… yeah communicate.

In that clip you also heard Northwest Junior High principal Elizabeth Breuning and some North West junior high students who played the game.

Now let’s hear from Saad Khan. He’s the AI and machine learning director at ACTNext and he’ll give an overview of the game and talk about the chain of evidence researchers will use to measure collaboration.

Saad Khan: We want to talk to you about a project that we’re super excited about it’s called CPSx. CPS stands for collaborative problem-solving the initiative is about how do we build valid ecologically valid measures of a very complex scale you know collaborative problem solving and not just in the form of a summative test but really a formative learning experience as well and this is a project that’s been ongoing for about I want to say in a year or so and it’s a large team effort within HD next so here are the project people that you see who are you know in the thick of it but there are others who have contributed this to this project in one way or the other we got Praveen Chopade a who’s on the zoom link today he couldn’t be here unfortunately but you know we’ll cover for him we’ve got David Edwards who’s gonna be talking about most of what you’ll hear today and has been leading a lot of the work when it came to our poly data collection study that you’ll hear a lot about today we have Alejandro Adrade on zoom as well he’s based out of Denver office Scott Pu who is local he’s right here myself and Brian Maddox who’s our collaborator from University of East Anglia. I’m just gonna give you guys a brief overview and then you know hand it over to Dave to talk about a lot of the details around the pilot data collection study that we just concluded as well as some of the analytics and the analysis that we’ve done.

There is a broad acknowledgment that we want to go beyond just your traditional cognitive skills assessments or what’s necessary perhaps even to be successful in higher education in the workforce is beyond skills in math writing and you know science of what have you those are really essential but you need beyond that and the kinds of skills that are necessary to be successful in the modern workforce include the so-called 21st century competencies. Things like communication ability teamwork creative thinking problem-solving and so forth while we all acknowledge that is the case we also know that’s very hard to actually come up with valid measures or assessments for skills like those and then you know how do you actually remediate those skills is just another question from here. They’re all a part of the reason for that is that it’s not just the end goal that matters that you have a score that you can determine on collaborative problem-solving.

The challenge really is that the kind of data that we want to be able to use to be able to come up with a measure and provide feedback is inclusive of the processes that led to a final goal to be able to really do that you want to expand the kind of data that you can capture so it’s gonna be beyond just responses to items actually be data that’s capture in the wild so it could be a sample of somebody’s communications with each other so it could be video data it could be speech data it could be log files data that are captured on a simulation platform that is designed to elicit skills that we’re interested in this session it’s not as if we don’t have access to enough data to be able to measure these validly somewhat paradoxically it’s that there’s a data deluge and the challenge is how do you actually extract meaningful evidence actionable evidence from that deluge of data. So to that end a lot of the work that my team the AI machine learning team at a city next does is work on innovations and advances in the state of the art to address that challenge and one of them is what we call multimodal analytics the idea what there is that can we combine multi-sensory data. Data could be video audio speech or have you and expect from it a multitude of behavioral markers that could then be fused to get to a holistic assessment holistic picture of the state of the trainee so what we know for sure is that it’s almost impossible to get to a summative score of a very complex competencies by taking into account just a raw low-level data that you can capture is it possible that you captured the complex data that I just talked about that could be in the form of video or 3d or speech and maybe build a regression model that would give you a score on how well you do as far as your communication skills are concerned. Now we know that’s just simply not possible data is too complex too noisy too interconnected and so forth an alternative approach where I feel is a lot more valid and I think powerful in addressing this particular challenge is to create a deep hierarchical inference model so it’s essentially a progressive set of accumulations of evidences or more high level of abstraction as you go from the low level all the way to the top level what we’re trying to do is marry data-driven and very driven approaches so you might actually be able to capture the raw data in a multitude of modalities but instead of just trying to make a direct connection between the low-level data with the high-level construct we can break up the high-level construct into intermediate mid-level representations so in this case we’re talking about collaborative problem-solving perhaps it’s a combination of things like knowledge assimilation positive communication how does knowledge assimilation actually look in the real world so we worked with SMEs to break that further apart and say perhaps knowledge assimilation is a combination of a number of elements like turn-taking having engagement with the content which in turn can be broken down further into elements of speech, facial expressions, gestures and so forth and now we have really a chain of evidence a bridge really connecting the low-level data at the bottom and the high level models that in a nutshell are you know are two of the major pieces of innovation that the team’s been bringing to bear and in solving the problem which is how do we measure collaborative problem-solving. We’ve taken the tact that instead of building a traditional test for it or even a questionnaire, and there are some examples of that out there, we wanted to tap into the power of online multiplayer video games like kids and even adults love to play games and you know if you ever played World of Warcraft or Fortnight you know that they can really be engaging as far as the participants are concerned to elicit cooperative play and collaboration and we feel that’s a great vehicle to be able to not only elicit that kind of behavior but also make it ecologically valid that it’s actually something as well perhaps it’s not going to be necessarily the exact rendition of the kinds of collaborations that might take place in the wild perhaps in the worth of what have you but it’s a pretty close sort of connection to it the game particularly the game genre that we’ve chosen is called a jigsaw puzzle games alright so the idea is that two or more people can get together in solving a problem solving tasks each player or each participant has a different part of the puzzle than they want to bring together to solve the whole game and so by design there’s a cooperative or a collaborative element what to what the game is.

Saad will give a demo of the game at the 2019 Education Technology and Computational Psychometrics Symposium or ETCPS19.

Now we have some of Dave Edwards’ presentation on CPSx at the ACT campus in Iowa City. Dave is a video gamer and you can really hear his love of gaming here he discusses what collaborative problem-solving is and where this project will go next.

Dave Edwards: So the game itself is a two-player cooperative game you can actually play with more than two if you want but the way we’ve designed is two players and it’s built by this game glass that’s coming a glass lab the game itself is basically you have a puzzle that you try to solve and one person has access to the puzzle and then there’s a bunch of instructions about how you might go about solving the puzzle and the other person has access to those instructions and so there’s really a lot of places where the people have to work together to share information and navigate through the instructions in order to get to the end so this game is also my life a commercial game which called keep talking nobody explodes I know if you’ve ever played this game but the idea of that game is you’re trying to defuse a bomb the premise of that game is exactly the end of the movie speed which I know is kind of a dated reference at this point but Keanu Reeves climbs down underneath the bus to try to defuse the bomb he’s just a regular guy but he’s on the phone with the bomb squad who tells him to cut the green wire or something and then he does that and then they win well they don’t die at least which is good so for the game itself is structured in this way there’s basically the game is divided up into kind of six segments and each of the different segments has some different similarities and some differences that are kind of interesting so the very first segment is kind of like getting into the game matching with your partner talking in your partner in the lobby before the game starts to kind of get the feeling of like what’s your name who are you what are you gonna do I guess it looks like I’m the astronaut this time you be the end you’ll be the engineer all that kind of stuff so there’s some conversation that goes on about getting ready to start the game so we’re collecting specific kinds of information out of that segment which is different from other segments then there are five mission segments where you actually like play the game and so a mission is kind of a timed experience which is in our case between four and eight minutes and in that mission you have a number of subtasks which you are trying to solve so the subtasks come from a collection of six which you could consider these like Tech enhanced item types if you will and each of these different six tasks is kind of designed in a slightly different way and we’ve been looking at the different types of evidence that we can collect from each one the other thing which is interesting is in between each mission segment there is an intermission segment where the players can talk with each other about what went well what didn’t go well what they learned that time oh I you know I found there was a button that this part in the screen which was really useful so you might want to use that button on the next mission so there’s definitely some conversation about what goes on what happened and so we’re also collecting different kinds of information in that intermission period from the actual task period and so there’s a lot of importance of the game where we’re finding different kinds of things the missions are here alternating in colors because the way that we’ve designed right now the players switch roles in between missions so I don’t know if you’ve ever had this thing where you call up the help desk and you say hey I’m trying to do something and say why don’t you share your screen right so the whole idea here is I don’t know what you’re seeing and so it’s a lot harder for me to work with you on what it is that you’re supposed to do well we’ve actually done that on purpose so that you can’t see what your what your partner sees but you do actually have an opportunity to take on their role for a certain period of time so you kind of understand what it is they’re supposed to do when they’re playing that role and so they go back and forth and so they learn over time what each person is supposed to do so they learn how to take on each role now one thing which is kind of important about this is that this game is kind of novel to every player and so what’s kind of interesting about that is that both students and also adults kind of have a very similar kind of learning experience and because you kind of start from scratch and so we think that this fact will allow to potentially be scalable across age groups this is an example of the two interfaces that the two players see and I apologize for those in the room it’s kind of small but if you want to come see or it’s not small it’s just dark but basically what’s going on is on this screen this is the person who’s the operator this they’re also kind of an astronaut so they’re on the International Space Station and they have different issues which are popping up they have alerts where maybe they’re receiving a signal or there’s an incoming asteroid and so those are the individual like tech enhance items or the tasks that they have to solve and then this is the engineer’s interface which is pretty much a book and so we can and since we’re tracking people’s eyes which I’ll mention in a bit we can actually see when people are reading different sections of the book and we can understand at what point they would otherwise know or we think that they should know what the thing is that they’re supposed to do next so because these two different people have very different interfaces they you know they participate in really very different ways. Here’s one example of how one task is worked so you can kind of understand and for those of you who are interns you will be subjected to this game so this is not really cheating but

it’s you know you’ll get a little bit of a leg up so the operator sees this task over here it’s a circuit board with a number of wires I’m not sure if the audience can see these wires at all but so the first step is they have to identify that they’re in the wires task. They have to let their partner know that they’re in the wires task and they have to navigate to the section of the booklet regarding wires. The first piece of information that usually is discussed is how many wires there are, what colors there are, based on the instructions here it’s structured in different paragraphs based on the number of wires. So early on students don’t know that the number of wires is the most important first piece of information to get but later on they do and so we can actually watch how long it takes for them to learn that that is the first piece of information to share and also there are other cases or even if they don’t, the engineer can compensate for that fact. So what we’re doing is we’re looking at a lot of different elements of their conversation around this information exchange. You can kind of think of it as if I have a deck of cards and you have a deck of cards and you’re asking me to give you some and so we’re kind of modeling this card passing as some of the thing what is clever problem-solving.

That’s a really big question and that’s one big problem in the industry really is that there’s a lot of different frameworks for what collaborative  problem-solving might be. So one of the frameworks that we’ve kind of set in line through Alejandro’s research is that basically the climate problem something kind of has four key components it has to be a novel process otherwise it’s not really problem solving. It’s just kind of working together the solution has to be something which is visible to the team members so it’s not really like learning because you have to have an outcome which you can see like we won or we lost and if you don’t have a visible solution it’s more of a learning activity as opposed to like a problem-solving activity so you can’t find a solution they’re not really solving a problem the rules have to be differentiated which means that people have to be doing different things. Otherwise it’s just parallel work so for example like the performance scoring team each one of you guys reads an essay independently so that’s not a differentiation of roles at that moment you’re working in parallel but it’s when you come back together and fuse the scores together now you have a different role you have the table leader who’s got a different role so in that sense it is collaborative and it also has to involve interdependence between the two players. So sort of an asymmetric situation which clearly as I’ve just described we have here.

The problem-solving itself also breaks down into two major components which are cognitive and socio-emotional so the cognitive is  kind of like you know reading through the instructions making sure you’re recognize what’s going on but then social-emotional is kind of monitoring your partner’s state like making sure that they’re following along with you making sure that they’re engaged that they understand what you’re talking about all those kinds of things and so we’re trying to address both of these simultaneously in our measurement so what we’re doing is we’re trying to collect information from watching them play the game so this is what our information collection setup looks like this our data collection setup so we have a laptop we have an eye tracker bar here courtesy of ACT research department thank you we have a webcam so we’re recording their face and then we’re using some other software to record their screen some of their mouse movements and then also they’re having an audio conference over Skype. So we’re recording that conversation and then we can take all of that and go through a series of processing through a lot of things solder is talking about computer vision models natural language processing models automatic speech recognition miles these kinds of things in order to get this kind of very low-level data in order to help us make this hierarchical inference as was described so we have something  called know this face reader that’s out of the box commercial software what it does is you feed in a profile video and then it analyzes that video for a number of different kinds of things so what’s going on down here is it’s locating different portions of the face. Whether or not they’re blinking their eyes or whether or not their cheekbones are moving which we can sort of infer are things like smiles or frowns things of that nature worth transcribing their audio. So we take their conversation and we push it up through an Amazon automatic speech recognition. So here’s an example of what they’re saying and then the idea here is that we’re going to be extracting certain features of their conversation which are indicating particular actions that they’re taking.

Am I sharing information with you? Am I acknowledging information that you just shared with me? Am I asking a clarifying question? Am I just making an interjection like just saying the word okay which is kind of an active listening kind of behavior so we’re going to be and there’s a slide a little bit later but we’re gonna be building natural language processing models to extract these kinds of features out of their transcripts some of the features that we’re looking for is like how many times does a person use you versus I are they more myopic or more looking at them.

Are they asking clarifying questions? Some of their conversation is actually just reading the booklet out loud so that they have a shared understanding of what the rule is so I find those very interesting and the people who do first don’t read it out loud is interesting there’s also something else we’re doing with the eye-tracking data so with the eye-tracking there’s something called the areas of interest and so what you do is you to find certain regions on the screen which are of interest and you kind of annotate them as being something so in this particular frame. There’s a user camera in the top left. This is basically a picture of myself over here is my collaborator who I may or may not be looking at any given point of time but when you look at the heat maps for these plots there’s a lot of density around their collaborator video. What’s interesting there is kind of differential behavior based on the video conferencing and then there’s obviously a giant frame in the center of the screen which is the game and then inside of the game frame there are a lot of sub-frames. Like here’s a paragraph, here’s a button, here’s a wire, all of that kind of stuff so what we can do is we can annotate their points and so we know where they’re looking at what points in time and so we can look at certain sequences like they saw the wire then they said the wire then their partner looked at the wire.

We’re really looking at this these complex processes of the collaboration what we’re doing is we are reviewing all of the video by hand right now in order to develop a training set for this kind of evidence that we’re looking for so we have right now a couple of research assistants some of them in room right now thank you very much who are watching all of this video and they’re an it’s getting along all of these different dimensions of what is going on at that point in the video so are they having are they having a speech are they making a gesture those different kinds of things here you can see this person is laughing at this point in time where they’re smiling for this period of time so the idea here is that we’re annotating what’s going on at different points in the video in order to create a training set so that then we can train a predictive model to annotate these videos automatically for us so that is what we have kind of here this is the idea of the predictive model. So these are all of these different kinds of tags that we’re looking for and you can see that we’re looking for these tags at different portions in the game. We might be looking at them doing the keypad tasks or doing the circuit tasks there are slightly different kinds of evidence that we’re looking for samples utterances that they might make or other kinds of multi cuts of modality that we’re looking at for the data as Saad was mentioning so it might be an utterance or a gesture or it might be an eye gaze data that we’re looking for in order to predict this tag so this is what we’re calling the predictive model where we can sort of let the machine watch the video and tell us what happens inside of it this is really important for scalability basically if we want to develop sort of this kind of experience for a lot of people then what we have is we basically we got all those tags down here at the bottom. This is what we call our evidence and then we infer using this hierarchical model that saw described earlier we infer our way up towards the top towards collaborative problem-solving so one example here is to suggest a course of action so suggesting course of action is related to decision making an execution I’m not sure where the arrow is up here but anyway the point is that there’s that which is related to them having saying something like state a wire to be cut so this is a way of suggesting a course of action we can then infer up towards their decision making an execution.

We just finished our second pilot study. We did the first pilot study last summer with the other ACT interns and this is some information about the data that we’re collecting for them so not only are we having them play the game but we’re also having them to respond to surveys that we understand a lot more about what’s going on with the survey because there’s been tons of psychometric research obviously on the surveys up till now so we have just  some high-level demographics survey data where we’re asking things like their gender how much they play games in the past those kinds of things different sorts of stuff we think might be related to how people play games differently from one another it’s obviously not everybody plays games the same way and so we want to be able to kind of control for that as much as we can understand how we have them take a specific collaborative problem-solving survey which is designed by our learning solutions director Gulf Rosen which is kind of a self-report survey of collaborative problem-solving skills through also collaboration with research we were able to have them take the Tessera assessment which was really nice. The Tessera reports track social-emotional on five scales which we’ll talk about in a minute and so these are both going to be a validity measure for us for some of the inferences but also potentially we may be able to learn other properties in order to predict certain behaviors that people might do based on like well they have a high leadership score so it’s more likely they’ll take the lead or maybe vice versa. We saw them take the lead and that correlates with our high test or escort right so there’s a couple different ways we can use the test or very excited about that and then a self-reported survey after they played the game like did you have fun? Did you know your partner? Do you think your partner did well? You know because sometimes when you work in groups I know I was just going to you guys happily like oh you know my partner didn’t really do their job so we put that question in there just so we can kind of control for that and then as they play the game we’re collecting all this other data as I mentioned before so we’ve got their audio transcripts we’ve got video of them we have all this telemetry out of the game log which we were able to develop in concert with the game developers and then we also have all this eye tracking data and areas of interests that were annotating so like I just said we just finished a pilot study in local schools which is really fun it was a humongous amount of work but it was also super cool so we went there and they were willing to partner with us I got a couple kids out of class at a couple of points of time they came they played our game and participated in our research and I got a lot of positive feedback from it because they really enjoyed it.

Someone thought it should be harder so I was excited to see that so looking forward what will this become like what would it be so what’s really interesting to think about this is because it’s kind of its we think of it a little bit like a formative learning experience really and so we wanted to be able to start envision how we would provide feedback to people who participated or who took this or something we’ve already written like three papers I want to say given at least to position a conference. I’m flying out next week to go give one with our partner in the UK so like we’re definitely already starting to produce some research out of these data.

Well thanks for listening. Thank you Dave and Saad and thank you KCRG TV for their permission to play a clip from their CPSx news segment which aired in May 2019. Thank you to Iowa City and Cedar Rapids schools, students, teachers and administrators for helping with the CPSx study and thank you ACT audio-visual expert Eric Dickerson. He recorded Saad and Dave’s Power Hour presentation at ACT.