This Page

has been moved to new address

Learning from Research

Sorry for inconvenience...

Redirection provided by Blogger to WordPress Migration Service
Learning from Research: October 2010

Sunday, October 3, 2010

On assessment of students in mixed graduate/undergraduate classes

Ten percent of the grade in my class "Introduction to Robotics" is based on reading exercises and filling out a 3-4 question online questionnaire before each lecture.  My goal with posing questions before the lecture is to a) stimulate students to actively participate in the class based on questions that arose during reading, b) provide a positive experience for those students whose understanding is enabled by the lecture and c) provide an incentive to actively study for the exam during the entire period of the class. Designing questionnaires that meet these goals is not easy, however.  If questions are too hard, students are frustrated. If the questions are too easy, students are lured not to pay attention during lectures. Finally, it is hard to make the questions the right level for everybody. In fact,  the questions are perceived as being "ambiguous", "taking too much time", and interestingly, as "poorly correlated" to the textbook content by some of the students. I believe this criticism to stem from the fact that the questions are explicitly designed to 1) differentiate among the students and 2) to be difficult to be answered just from skimming through the text. Yet, what is the right level of difficulty to reach these goals and yet providing positive feedback also to those students that struggle with the questions?

The questions usually follow one of the following schemes:

1) In order to do X, you have to
  a) Do A, B, C and then D
  b) Do A, C, D and then B
  c) Do A, B, and then D

2) When it rains outside, you need to
  a) take an umbrella
  b) the probability of rain is correlated to the probability of thunder
  c) wear a bathing suit

While the first scheme is geared towards understanding of algorithms and systems and requires careful reading of the options and matching them to language in the textbook, the second scheme tests understanding of concepts and is probably mostly responsible for confusion. Indeed, answer (2c) is not really wrong, it just makes a lot less (common) sense than answer (2a), however. Also, option (2b) is a favorite choice of some students as it's content is correct and it looks like the "smartest" answer in the pool. I find the second scheme to be particularly attractive as it requires the students not only to think about the question itself but also about its broader context. Yet, the risk is that the question differentiates too strongly for language skills and attentive reading rather than technical knowledge.

In order to understand whether the questions are indeed too hard, I decided to look at the distribution of the performance after the first 4 weeks of class. The following rules of thumb come to mind:
  1. If the distribution is skewed toward the lower end, i.e. everybody does bad, I probably did something wrong.
  2. If the distribution is slightly skewed toward the upper end, the questions are just right.
  3. If the distribution is strongly skewed toward the upper end, the questions are too easy, do not allow to differentiate among the students, and frustrate the top students.
Indeed, if (1) is the case, questions are just too hard for everyone and will not help in differentiating the students. This is analogous for (3) with too easy questions.

Notice that the students have to answer the questions solely based on reading before the material is actually taught in the lecture.  In fact, I consider it desirable if the distribution is skewed toward the upper end after lecture and accompanying exercise as this suggests that a broad student population actually has been reached. The data for all 27 students for the first four week of exercise is shown below. Each exercise is worth 12 points (48 points max).
The data is organized into 9 bins (from 43-48 points, 37-42 etc.). Not submitted assignment are counted as 0. Two students have not submitted any assignments and are therefore not considered in the analysis that follows.

At first glance, the data has a mean of 26.3 and a standard deviation of 13.3. Thus the mean is slightly above half of the possible score (24) and the data strongly differentiates the students.  If the data were normal-distributed, however, 50% of the students would score less than half of the points, which would be highly undesirable. Indeed, the distribution is not actually Gaussian, but seems to be bi-modal. Around half of the students in the class are graduate students (MS or PhD program). Their previous training and the fact that MS and PhD students are usually strongly selected upon admission might make them score systematically better than undergraduates. Looking at the overall performance of students in the course (here from the previous iteration of the class) classified into graduate (N=9) and undergraduate students (N=13) leads to the following plot:


Both distributions are clearly "long-tail" and skewed slightly toward full score. They suggest that graduate students perform in average better in class. If this is the case, however, different assessments for both groups should be chosen to reach the same satisfaction in the class for undergraduates and graduate students.

In conclusion, the preliminary data from the first four weeks of class confirms  that the testing methodology currently used in class is sufficient to differentiate among a pool of students with strongly varying backgrounds, such as it is the case in a class that is taken by both graduate and undergraduate students. However, as graduate students systematically perform better than undergraduates in this particular course, the assessment methodology currently in use might lead to frustration of the undergraduate population that has to think about questions that are geared to differentiate among graduate students. This frustration is counter-productive as it generates unnecessary pressure and anxiety.

There are three possible solutions: (1) making the questions easier, (2) better communicating the actual goals of this particular assessment, i.e. preparation for class participation as opposed to evaluation of learning goals, and (3) splitting the class into different offerings for graduate and undergraduate students.

Making the questions easier while maintaining the ability to use the test for student differentiation could be easily achieved by adding a set of questions that can be answered by everybody. The drawback of this approach is that also the better students skip on the hard questions as they think they already scored sufficient questions and the lecture might be perceived as pointless as it is unclear what remains to be learned. Also, communicating the particular goals of a testing methodology is hard as students might already be sufficiently frustrated. Thus, the option to split the class into two separate offerings for graduates and undergraduates emerges as the best solution. This insight is corroborated by the fact that some of the undergraduates that dropped the class articulated the concern that "there are too many graduate students in this class". Splitting the classes would compromise possible benefits from student-to-student learning, e.g. during mixed undergraduate-graduate student project work. For these reasons, I will organize future iterations of this class as separate offerings for the 3rd and the 5th year (the latter being "grad-level" classes), including different assessments but with a set of common lectures, laboratory exercises and projects.

Peer-to-Peer Learning

Results on "peer-to-peer learning" that I posted below have been published in the following paper


N. Correll and D. Rus. Peer-to-Peer Learning in Robotics Education:  Lessons from a Challenge Project Class. ASEE Computers in Education Journal. Special Issue on Novel Approaches in Robotics Education. 1(3):60-66, 2010. [preprint]