Created
October 8, 2023 18:04
-
-
Save nicoguaro/818b3ac245cfc0bcdd80faee957d4789 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import matplotlib.pyplot as plt | |
from wordcloud import WordCloud, STOPWORDS | |
#%% | |
f = open('output_file.txt', 'r', encoding="utf8") | |
text = f.read() | |
f.close() | |
# Stopwords | |
stop = [] | |
# f = open('stop-words-spanish-snowball-mod.txt', 'r', encoding="utf8") | |
# stop += f.read().split() | |
# f.close() | |
# f = open('spanish_stopwords.txt', 'r', encoding="utf8") | |
# stop += f.read().split() | |
# f.close() | |
wordcloud = WordCloud(stopwords=STOPWORDS.union(set(stop)), | |
background_color='#3c3c3c', | |
width=1800, | |
height=1400, | |
max_words=100, | |
colormap="magma", | |
#font_path='./CabinSketch-Bold.ttf' | |
) | |
wordcloud.generate(text) | |
plt.figure(figsize=(9, 7)) | |
plt.imshow(wordcloud) | |
plt.axis('off') | |
plt.tight_layout() | |
plt.savefig('word_cloud-SLR.png', dpi=300, transparent=True) | |
plt.show() |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Refocusing on the Traditional and Effective Teaching Evaluation: | |
Rational Thoughts About SETEs in Higher Education | |
Guo Cui | |
Panzhihua University | |
Zhong Ni | |
Huaihua University | |
Camilla Hong Wang | |
(Corresponding Author) | |
Shantou University | |
Some higher educational institutions often use a student evaluation of teaching effectiveness (SETE) as the | |
only way to evaluate teaching. Unfortunately, this instrument often fails to serve as a tool for improving | |
instruction. It often serves as a disincentive to introducing rigor. Studies have found that student feedback | |
is not enough to be the basis for evaluating teaching. This paper performs a literature review of student | |
evaluations to measure teaching effectiveness. Problems are highlighted, and suggestions are offered to | |
improve SETEs and refocus teaching effectiveness on outcome-based academic standards. | |
Keywords: SETE, teaching interaction, teaching evaluation, performance assessment | |
INTRODUCTION | |
Student evaluation of teaching effectiveness (SETE) originated in the United States (Zhou, 2009). | |
Experts who support SETE believe that students’ evaluation of teachers’ teaching is objective (Zhang, Ma, | |
and Jiang, 2017). From students’ perspective, the teaching effect can reflect classroom quality and be used | |
as the primary method to evaluate teaching quality in universities and vocational colleges (Wang and Yu, | |
2016). However, some scholars believe that if SETE is used only and not combined with other evaluation | |
bases, students will become the decision-maker of teachers’ appointment, evaluation, promotion and salary | |
increase (Uttl, White, & Gonzalez, 2017). Some scholars also argue that if teachers are evaluated by student | |
satisfaction, students are directly empowered to assess teaching effectiveness. It would significantly | |
negatively impact and lower teaching quality (Emery, Kramer, & Tian, 2003). | |
Many universities and higher vocational schools regard students as consumers rather than products | |
(Emery & Tian, 2002). As a result, SETE tends to reflect the popularity of teachers rather than the actual | |
quality of teaching. SETE results are subject to many factors and do not depend entirely on teachers’ | |
teaching levels and effectiveness. A study conducted by Chang et al. found that students’ “attitude toward | |
teaching evaluation,” “attitude toward learning,” and “attitude toward the course” significantly affected the | |
150 Journal of Higher Education Theory and Practice Vol. 22(3) 2022 | |
data error of SETE (Dong, 2014). The author argues that the existing SETE-based teaching evaluation | |
method can hardly improve the teaching level, so it is necessary to discuss the advantages and disadvantages | |
of the current SETE method and discuss them from the literature analysis and cases. | |
LITERATURE REVIEW | |
SETE was embraced by U.S. colleges and higher vocational education administrators as early as the | |
1960s and has been prevalent in U.S. higher education for more than 50 years because of its practicality, | |
sophistication, and accessibility. However, SETE is not the only or the best way to assess the quality of | |
teaching and learning. The author analyzes and concludes different dimensions of research cases regarding | |
the reliability and validity of SETE. | |
Personal Traits and Popularity | |
Most educational researchers believe that SETE essentially has nothing to do with teaching. In some | |
courses, the same materials and assessment methods are used, but different instructors teach them, and the | |
assessment results of teaching effectiveness are not the same for each instructor. Several Chinese and | |
foreign scholars have reached conclusions supportive of these ideas (Dooris, 1997; Xie & Zhang, 2019; | |
Guan, 2012; Wu, 2013; Zhong, 2012; Aleamoni, 1987). Research findings indicate that teachers’ | |
performance significantly impacts SETE results but not student achievement (Feldman, 1978). At the time | |
of SETE, students often base their evaluations on teachers’ attributes (Abrami, Leventhal & Perry, 1982). | |
Feldman noted a positive correlation between teacher personality and assessment results when evaluations | |
are based on what students or colleagues know about the teachers (Feldman, 1978). Abrami et al. have | |
suggested that schools should not decide teacher promotions and tenure based solely on SETE because | |
teachers who are popular with students receive good SETE scores regardless of teaching ability. Thus, using | |
SETE to assess teaching quality can be challenging academically (Abrami, Leventhal & Perry, 1982). | |
Student Achievement | |
Numerous studies have shown that student achievement is not related to actual evaluation results of | |
teaching effectiveness. Cohen noted that the coefficient of variation in overall SETE results due to | |
differences in student achievement was only 14.4% (Cohen, 1983). Dowell and Neal suggested that the | |
correlation between student achievement and SETE results was only 3.9% (Dowell & Neal, 1982). In a | |
broader study, Damron noted that SETE scores were not related to teachers’ ability to improve student | |
achievement. If the weight of classroom satisfaction on SETE results were increased, teachers would | |
receive lower evaluation scores, potentially depriving teachers of opportunities for promotion, salary | |
increase, or even succession (Damron, 1996). | |
Situational Factors and Effectiveness | |
Some researchers have proposed that situational factors can interfere with SETE (Damron, 1996), | |
making the results, not representative (Cohen, 1983). Cashin noted that there is a sizeable disciplinary bias | |
in SETE. Some surveys suggest that teachers in the arts and humanities consistently score higher on the | |
SETE results, while teachers in business, mathematics, and engineering consistently score lower. In | |
addition, differences between compulsory and optional courses and between senior and junior students may | |
affect the evaluation results (Aleamoni, 1989). The amount and intensity of course assignments can also | |
influence students’ teacher evaluation. A faculty member at a university teaches an introductory course. | |
Due to adopting a collectively developed syllabus, there is no coursework and only three multi-choice | |
exams. As a result, students give the teacher high evaluations every year, with scores higher than the college | |
average. The other two courses taught by the same teacher receive low evaluations from students because | |
they have developed their syllabus and are assigned more coursework. | |
It should be noted that the teacher is the leading scholar of these two courses. The textbook used is also | |
authored by the teacher, who is pretty familiar with the content of the course but has received poor | |
evaluations simply because of the large amount of coursework. In one of these courses, the average student | |
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 151 | |
evaluation score was 73. Still, the standard error was as high as 35, and we wondered what the validity of | |
such a teaching evaluation was. | |
Assessors | |
The issue of assessors in SETE deserves attention. Assessors who are not familiar with the assessment | |
system may be misled by useless data and draw conclusions that deviate from the facts. The evaluation of | |
teaching effectiveness should focus on scientific statistics, and any sample of fewer than 30 respondents is | |
a small sample, which requires a specific statistical method. An unscientific statistical approach may lead | |
to three types of errors. Firstly, data processing is not scientific. Secondly, assessors confuse the critical | |
difference factors and non-critical difference factors, and thirdly, assessors cannot reasonably explain the | |
differences of respondents and cannot identify the sources of these differences. Therefore, college | |
administrators should master scientific, statistical analysis theories and methods (Zhong, 2012). | |
Qualifications | |
Many researchers argue that students who are not equipped with critical thinking skills cannot assess | |
teachers. Therefore, most researchers believe that SETE can be a teaching evaluation. Still, the teaching | |
effectiveness can only be set to the extent that the student is qualified (Wu, 2004). It has also been proposed | |
that assessors receive appropriate training before evaluation (Aleamoni, 1989). Conversations between | |
assessors are generally protected by defamation suits, a fundamental civil right (Cascio and Bernardin, | |
1981). If the assessors are not qualified but still assess others, the assessors can sue the assessors for | |
defamation (Chen, 2012). | |
CASE ANALYSIS | |
The literature review revealed that administrators’ practice of using SETE as the sole basis for making | |
decisions about faculty promotions and salary increases had been widely resented and opposed by the | |
faculty. The following is an analysis from teachers’ and students’ perspectives, illustrating how to | |
rationalize this approach. | |
Case 1. What Is Excellent Teaching? | |
A professor at a university in the United States had a SETE average of 4.25 (out of 5) in the first | |
semester, 4.23 in the second semester, and 4.21 in the third semester. The professor constantly reflected on | |
his teaching and made improvements over the past three semesters, but his SETE scores were always below | |
average. The professor was recognized as an outstanding faculty member, with excellent performance on | |
all aspects of the performance evaluation. However, based on his SETE score, he was not awarded the | |
Excellence in Teaching Award. The award was granted to another professor who had a high SETE score | |
but performed poorly on the performance evaluation. This phenomenon was brought to the president’s | |
attention, who became aware that the SETE system was flawed (Emery, Kramer & Tian, 2003). | |
It is also worth noting that the professor’s scores are all above 4.0. In this regard, the authors questioned | |
how to achieve “good” if a score higher than 4.0 out of 5 is considered not good. If other factors are not | |
considered, how should SETE scores be measured? If these so-called “other factors” are more influential | |
than SETE, why is the SETE method used to assess teaching and learning? | |
Case 2. Differences in Scores of Different Classes Taught by the Same Professor | |
A professor at Anhui University of Finance and Economics took up the teaching task of 4 classes in | |
one semester, and his SETE score in one class was 94.33 (100 out of 100), which ranked 6th in the | |
university, while his score in another class was 62.5, which was the lowest score in the university. In other | |
words, the same professor is considered by one type to be one of the best teachers in the university, while | |
students in another class think him to be one of the worst teachers in the university. Assuming that SETE | |
is an indicator of the actual situation, the scores of the same professor should be very close. The above data | |
152 Journal of Higher Education Theory and Practice Vol. 22(3) 2022 | |
indicate that such a significant contrast calls into questions about the objectivity and validity of SETE | |
(Dong, 2014). | |
Case 3. Differences From the Control Group | |
A professor at a U.S. university who was not yet tenured received 4.10 and 4.24 in the two classes he | |
taught in the fall semester. In the following spring semester, he led the same course at the same university | |
and scored 4.04 and 4.33 in the two classes. The average score for the entire university was 3.99 in the fall | |
semester and 4.31 in the spring semester. The professor’s scores differed little between the two semesters | |
when compared longitudinally. However, compared to the school average, his teaching performance was | |
worse in the spring semester than in the fall semester. Could it be attributed to the improved quality of | |
teaching throughout the university during the spring semester? The answer is no. To some extent, these | |
differences depend on the composition of the faculty participating in SETE. In the fall semester, all faculty | |
members are required to take SETE, whereas, in the spring semester, only non-tenure-track professors and | |
teaching assistants are required to take SETE (Emery, Kramer & Tian, 2003). | |
Many researchers believe that teaching assistants are often more “likely” to meet student expectations | |
and, therefore, are more likely to receive high scores. In addition, because SETE has a significant impact | |
on faculty careers, non-tenure-track professors tend to make more effort to gain favor with students and | |
thus earn higher scores. Both of these factors contribute to higher SETE scores for the entire university. | |
Since SETE scores have little impact on their teaching careers, tenure-track professors are not required to | |
please their students to get higher student evaluations. Therefore, the overall average score decreases when | |
tenure-track professors are also involved in the SETE process. This phenomenon is quite common in U.S. | |
colleges and universities. In this way, does it mean that tenure-track and experienced professors are | |
considered inferior teachers (Feldman, 1986)? | |
Case 4. Score Differences and Teachers’ Teaching Styles | |
The researcher from Nanjing Communications Institute of Technology analyzed the correlation | |
between the personality traits of the interviewed teachers and the SETE results based on the research and | |
interviews with full-time teachers in several higher vocational colleges and universities and developed a | |
comparison table of teachers’ teaching style indicators. It can be seen that the SETE scores are relatively | |
low for teachers who are more demanding in terms of student attendance and classroom discipline and high | |
for teachers who are not. The SETE scores are lower for teachers who are more rigorous and formal in their | |
classroom style or appearance and more elevated for teachers who are not, as shown in the following Table | |
1 (Schmelkin, Spencer & Gellman, 1997). | |
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 153 | |
Teachers maintain | |
the dignity of the | |
teacher, maintain the | |
psychological | |
distance between the | |
teacher and the | |
student, and hold | |
“orthodox” values | |
Teachers and | |
students are friends; | |
teachers can | |
comment on fashion | |
or criticize current | |
affairs and | |
communicate with | |
students without | |
distance. | |
Teachers should | |
“teach students | |
according to their | |
abilities” so that | |
students’ | |
performance can | |
be reasonably | |
distributed and as | |
many “good | |
students” as | |
possible can | |
emerge. | |
Classroom | |
communication and | |
break-time | |
interaction | |
Teachers should | |
not leave students | |
unattended and | |
should not lower | |
their standards to | |
cater to them, or | |
else the quality of | |
graduates is | |
bound to decline. | |
Examination | |
standards and | |
requirements | |
154 Journal of Higher Education Theory and Practice Vol. 22(3) 2022 | |
Be strict in | |
attendance. | |
Teacher and | |
student are like | |
Faculty | |
father and son, | |
group | |
and the teacher | |
with lower | |
should criticize | |
SETE | |
the student if | |
scores | |
they make | |
some mistakes | |
deserving | |
criticism. | |
Teachers are | |
not necessarily | |
rigorous; | |
Faculty | |
teachers and | |
group | |
students are | |
with | |
like friends, | |
higher | |
and teachers | |
SETE | |
should be | |
scores | |
tolerant when | |
they should be | |
tolerant of | |
students | |
Attendance and | |
classroom | |
discipline | |
The teachers are | |
strict and severe | |
and dress | |
traditionally or | |
with slight | |
variation. | |
Classroom | |
style/teaching | |
manner and | |
appearance | |
The teachers are | |
Teachers want | |
relaxed, lively | |
students to talk to | |
(female) / | |
them, even if it is not humorous (male), | |
related to their | |
and dressed in | |
studies | |
fashionable and | |
neat styles. | |
Teachers rarely | |
communicate with | |
students outside of | |
class and do not | |
communicate with | |
them on matters | |
other than academic | |
work. | |
Extracurricular | |
communication and | |
life interactions | |
The teachers are skilled | |
in a case study or | |
scenario-based | |
teaching and enjoy | |
writing school-based | |
textbooks, reference | |
books or teaching | |
casebooks. | |
The teachers prefer | |
academic research, are | |
willing to teach | |
cutting-edge | |
educational theories, | |
and are meticulous in | |
deriving formulas. | |
Teaching and | |
research/teaching | |
preferences | |
TABLE 1 | |
COMPARISON TABLE OF TEACHING STYLE INDICATORS FOR TEACHERS WITH SIGNIFICANT | |
DIFFERENCES IN SETE SCORES | |
Case 5. Students’ Use of the Right to Evaluate Teaching at a University | |
A random sample of 350 students at a university was surveyed on how students evaluate their teachers. | |
The results showed that 68% of the students said they considered their teachers based on how much they | |
liked them. In other words, 68% of the students valued the teacher’s personality more than the basic | |
teaching skills or effectiveness. At the same time, 47% of the students surveyed admitted a disciplinary bias | |
when evaluating their teachers. A student who prefers music to physical education is likely to give a higher | |
rating to the music teacher and a lower rating to the physical education teacher (See Table 2). | |
TABLE 2 | |
QUESTIONNAIRE FOR STUDENTS’ EVALUATION OF TEACHERS | |
Question | |
Item | |
I do not attach much importance to the final I agree, I strongly agree | |
course evaluation, and I do not think it has I don’t know | |
much influence on the teachers | |
I can’t entirely agree. | |
I strongly disagree | |
The mechanism of student evaluation of I agree, I strongly agree | |
teachers weakens the authority of teachers | |
I don’t know | |
I can’t entirely agree. | |
I strongly disagree | |
Number of | |
Percentage | |
respondents | |
51.7% | |
181 | |
23.7% | |
83 | |
82 | |
179 | |
104 | |
62 | |
23.4% | |
51.1% | |
29.7% | |
17.7% | |
*Only valid data were selected. | |
To ensure the rigor and accuracy of the study, a questionnaire on the credit system and teacher | |
evaluation was distributed to the students to explore the relationship between course evaluation and teachers | |
and students in a quantitative way. We found that teacher evaluation did not seem to have the desired effect | |
based on the in-depth interviews. As shown in Table 2, more than half (51.7%) of the students thought that | |
course evaluation had little impact on the teachers, while only 23.4% disagreed with this statement. Thus, | |
it can be seen that most students do not think that course evaluation has much impact on teachers, so students | |
can hardly take assessment courses seriously. Therefore, students may give teachers positive or negative | |
comments, discouraging teachers’ motivation and weakening the teacher-student relationship. | |
In addition, more than half (51.1%) of the students were more optimistic about the statement that “The | |
mechanism of student evaluation of teachers weakens the authority of teachers,” and only 17.7% of the | |
students disagreed with this statement. This result is highly consistent with our interviews with some | |
teachers. It indicates that most students believe that student assessment of teachers’ courses could affect | |
teachers’ sense of authority. It can be inferred from both teachers and students that teachers’ power has | |
been weakened due to the SETE mechanism, which is far from the value of “a one-day teacher is a lifelong | |
father” in traditional Chinese culture. It has a significant negative impact on the teacher-student relationship | |
in colleges and universities. | |
At the same time, in-depth interviews also showed that 74% of students would change their teacher’s | |
opinion, thus changing their evaluation score. They get some unique benefits from the teacher outside of | |
teaching. A teacher who treats students to chocolate increases student favorability, resulting in higher scores | |
on student evaluations, which is highly consistent with Professor Emery’s findings (Emery & Tian, 2002). | |
In addition, it is interesting to note that 52% of the students did not evaluate the teaching based on the | |
teacher’s actual performance but gave the teacher a full 5 out of 5. There were two reasons for this group | |
of students to score. One is that they think the teachers work very hard and should be recognized and | |
appreciated; the other is that they believe it is convenient to achieve all 5s and complete the SETE task | |
quickly. | |
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 155 | |
DISCUSSIONS | |
Many scholars believe that the SETE method has more disadvantages than advantages: (1) SETE tends | |
to train mediocre people and discourages people from taking risks. (2) The SETE method focuses on shortterm performance and lacks a long-term perspective, ignoring critical factors that are not easily measured. | |
(3) This method focuses on individuals and is not conducive to teamwork. (4) This method is based on | |
detection, not aimed at prevention. (5) The method is unfair, and the assessment is highly subjective. (6) | |
This system does not distinguish between endogenous factors of individual differences and exogenous | |
factors that are not under human control (Huang and Qi, 2014; Trout, 2000; McGregor, 1972; Meyer, Kay | |
and& French, 1965). | |
American scholars Milliman and McFadden conducted a study in which they found that 90% of GM | |
employees considered themselves to be in the top 10% best employees in the company. In this regard, the | |
two scholars asked these employees whether their motivation would be seriously undermined if managers | |
did not evaluate their performance highly. It can be seen that the scientific evaluation of employee | |
performance has a significant impact on the labor productivity of the company. Likewise, suppose | |
employees are allowed to evaluate their supervisors backward. In that case, it can seriously affect | |
supervisors’ managerial motivation and, as a result, hurt the labor productivity of the company (Milliman | |
& McFadden, 1997). Therefore, the scholar Deming strongly condemned these performance evaluation | |
procedures (Deming, 1986). Human resource management scholars Porter and Lawler’s expectancy model | |
of motivation explain motivation models’ importance. If employees disagree that “the harder they work, | |
the greater the reward,” they will not work as hard as they should and will lose their way (Porter & Lawler, | |
1968). | |
In our opinion, the evaluation of teaching has two primary purposes: to serve as a basis for reward and | |
punishment, and the other serves as a reference for development. In the evaluation case for reward and | |
punishment, the evaluation results are used as the basis for teachers’ promotion and salary increase. In | |
contrast, in the case of evaluation for development, the evaluation results are used as a reference and | |
suggestion for teachers to improve their teaching and enhance their teaching skills. However, from our | |
observation and research, in China’s universities, rewards and punishments overwhelm development in | |
practice, and teaching evaluation is more like a convenient means of administrative control. As a result, | |
teachers who desire to receive feedback from students and improve their teaching seek alternative | |
approaches. | |
We also believe that the most significant value of evaluating teaching is to provide a platform for | |
teachers and students to communicate with each other. In implementing the evaluation system, school | |
administrators must clarify that evaluation scores should be used only as a reference for teachers to improve | |
their teaching. The evaluation scores should not be used as the basis for appraisal and promotion, at least | |
not as the only or primary basis for review and advertisement. In short, business managers may symbolically | |
provide employees with feedback on their work through performance appraisal methods to be aware of | |
their strengths and weaknesses. To a certain extent, performance appraisals are helpful for companies to | |
make decisions related to employee management. The author believes that the primary purpose of the SETE | |
for educational administrators is to provide information and feedback, but not to serve as a basis for making | |
decisions about teachers’ promotion. It should be the key to the sustainable development of teaching | |
evaluation by refocusing on the essence of teaching in higher education and attaching importance to the | |
practical effectiveness of education (Tan, 2014). | |
CONCLUSIONS AND RECOMMENDATIONS | |
The SETE approach, which is widely used today, actually rewards teachers for making high SETE | |
scores by catering to students, thereby lowering the expectations of students and thus diminishing the | |
quality of teaching (Emery, Kramer, and Tian, 2003; Zhong, 2012; Feldman, 1986; Tan, 2014). The purpose | |
of teaching evaluation is to help teachers improve their performance. Still, in practice, administrators use it | |
to make decisions about the fate of teachers (Abrami, d’Apollonia & Cohen, 1990). Worse still, many | |
156 Journal of Higher Education Theory and Practice Vol. 22(3) 2022 | |
colleges and universities have adopted various means and regulations to get students involved in teaching | |
evaluation. Some universities require students to evaluate their teachers before checking their final grades. | |
Others need students to assess their teachers before they can take a course. Others require that it affect | |
students’ final grades if they do not evaluate their teachers. The author believes that performance evaluation | |
is necessary for making decisions about individual teachers. SETE results should only be used as a reference | |
factor and not as a determinant. In this regard, some recommendations for management are proposed: | |
(1) The SETE method should be oriented to teaching performance rather than student satisfaction; | |
simultaneously, the sources of the evaluation data should be broadened, and SETE results | |
should not be used as the sole basis for measuring teaching quality. | |
(2) Teachers should be evaluated against some criteria, not just a cross-sectional comparison | |
between universities. Also, comparisons of course evaluations should be made between similar | |
courses. | |
(3) It should be ensured that the measures are feasible and that the data are statistically significant. | |
If a student gives a grade below satisfactory, the student should be requested to write a comment | |
to add credibility to the negative assessment. | |
(4) Assessors and third-party monitors should be trained to ensure that the evaluation system is | |
legitimate, adaptable, and diverse. | |
(5) Graduates can be invited to evaluate their former teachers. When there is no longer a stake | |
between teachers and students, and students are more mentally sophisticated due to their social | |
experience, the evaluation will be more objective, fair and rational. | |
In short, we should all believe in the principle that the teachers are responsible for teaching and the | |
students are accountable for their success. Likewise, we should encourage evaluation procedures that | |
evaluate professors based on their teaching performance. Teaching is essentially an interpersonal | |
interaction, and it cannot be separated from the students’ perceptions of the teacher’s characteristics. | |
Therefore, teaching evaluation must be based on teaching performance, and any other factors are considered | |
secondary and alternate. | |
REFERENCES | |
Abrami, P.C., d’Apollonia, S., & Cohen, P.A. (1990). Validity of Student Ratings of Instruction: What | |
We Know and What We Do Not. Journal of Education Psychology, 82(2), 219–231. | |
Abrami, P.C., Leventhal, L., & Perry, R.P. (1982). Educational seduction. Review of Educational | |
Research, 32, 446–464. | |
Aleamoni, L. (1987). Student rating: myths versus research facts. Journal of Personnel Evaluation in | |
Education, 1, 111–119. | |
Aleamoni, L. (1989). Typical faculty concerns about evaluation of teaching. In L.M. Aleamoni (Ed.), | |
Techniques for Evaluating and Improving Instruction. San Francisco, CA: Jossey-Bass. | |
Cascio, W.F., & Bernardin, H.J. (1981). Implications of performance appraisal litigation for personnel | |
decisions. Personnel Psychology, 34, 211–226. | |
Cashin, W.E. (1989). Defining and evaluating college teaching. IDEA Paper No. 21, Center for Faulty | |
Evaluation and Development, Kansas State University, Manhattan, KS. | |
Cashin, W.E. (1990). Students do rate different academic fields differently. In M. Theall & J. Franklin | |
(Eds.), Student Ratings of Instruction: Issues for Improving Practice. San Francisco, CA: JosseyBass. | |
Cashin, W.E. (1996). Developing an effective faculty evaluation system. IDEA Paper No. 33, Center for | |
Faulty Evaluation and Development, Kansas State University, Manhattan, KS. | |
Chen, Q. (2012). On the Development Path of Civil Rights Protection in the United States. The Journal of | |
Shandong Agricultural Administrators’ College, 6, 71–73. | |
Cohen, P.A. (1983). Comment on a selective review of the validity of student ratings of teaching. Journal | |
of Higher Education, 54, 448–458. | |
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 157 | |
Damron, J.C. (1996). Instructor personality and the politics of the classroom. Douglas College, New | |
Westminster, British Columbia, Canada. | |
Deming, W.E. (1986). Out of the Crisis. MIT Center for Advanced Engineering Study, Cambridge, MA. | |
Dong, G.C. (2014). A Study of Non-Classroom Factors in SETE. Higher Education Exploration, 2, 104– | |
106. | |
Dooris, M.J. (1997). An Analysis of the Penn State Student Rating of Teaching Effectiveness. A Report | |
Presented to The University Faculty Senate of the Pennsylvania State University. | |
Dowell, D.A., & Neal, J.A. (1982). A selective review of the validity of student ratings of teaching. | |
Journal of Higher Education, 53, 51–62. | |
Dowell, D.A., & Neal, J.A. (1983). The validity and accuracy of student ratings of instruction: A reply to | |
Peter A. Cohen. Journal of Higher Education, 54, 459–63. | |
Emery, C., & Tian, R. (2002). Schoolwork as Products, Professors as Customers: A Practical Teaching | |
Approach in Business Education. Journal for Business Education, 78(2), 97–102. | |
Emery, C.R., Kramer, T.R., & Tian, R.G. (2003). Return to Academic Standards: A Critique of Student | |
Evaluations of Teaching Effectiveness. Quality Assurance in Education, 11(1), 37–46. | |
Feldman, K.A. (1978). Course characteristics and college students’ ratings of their teachers: What we | |
know and what we don’t. Research in Higher Education, 9, 199–242. | |
Feldman, K.A. (1986). The perceived instructional effectiveness of college teachers as related to their | |
personality and attitudinal characteristics: a review and synthesis. Research in Higher Education, | |
24, 139–213. | |
Guan, H.H. (2012). An Empirical Study on the Effectiveness of SETE in Ningde Normal University. | |
Journal of Ningde Normal University, 3, 103–109. | |
Huang, T.Y., & Qi, H.X. (2014). An Analysis of the Factors Influencing SETE Based on Individual | |
Teachers’ Perspectives. Education and Vocation, 3, 103–105. | |
McGregor, D. (1972). An uneasy look at performance appraisal. Harvard Business Review, pp. 19–27. | |
Meyer, H.H., Kay, E., & French, J.R. (1965). Split roles in performance appraisal. Harvard Business | |
Review, pp. 28–37. | |
Milliman, J.F., & McFadden, F.R. (1997). Toward changing performance appraisal to address TQM | |
concerns: The 360-degree feedback process. Quality Management Journal, 4(3), 44–64. | |
Mohrman, A.M. (1989). Deming Versus Performance Appraisal: Is There a Resolution. Center for | |
Effective Organisations. Los Angeles, CA: University of Southern California. | |
Porter, L.W., & Lawler, E.E. (1968). Managerial Attitudes and Performance. Burr Ridge, IL: Irwin | |
Publishing. | |
Schmelkin, L.P., Spencer, K.J., & Gellman, E.S. (1997). Faculty perspectives on course and teacher | |
evaluations. Research in Higher Education, pp. 575–592. | |
Tan, Y.E. (2014). Reflection and Trend of Teaching Evaluation in Universities. Chongqing Higher | |
Education Research, 2(5), 83–87. | |
Trout, P.A. (2000). Flunking the Test: The Dismal Record of Student Evaluations. The Touchstone, 10(4), | |
11–15. | |
Uttl, B., White, C.A., & Gonzalez, D.W. (2017). Meta-analysis of faculty’s teaching effectiveness: | |
Student evaluation of teaching ratings and student learning are not related. Studies in Educational | |
Evaluation, 54, 22–42. | |
Wang, J., & Yu, J.J. (2016). Teaching-centered or Learning-centered Teacher Ratings by Students: An | |
Analysis Based on Indexes of 30 Institutions of Higher Education. Journal of Soochow University | |
(Educational Science Edition), 02, 104–112. | |
Wu, S. (2013). Study on the Factors Affecting SETE in China’s Universities. Dalian: Dalian University of | |
Technology. | |
Wu, Y.Q. (2004). The Actual Malice Rule as Applied Under American Defamation Law. National Chung | |
Cheng University Law Journal, 15, 1–97. | |
158 Journal of Higher Education Theory and Practice Vol. 22(3) 2022 | |
Xie, J.L., & Zhang, C. (2019). A Study on the Influence of Non-Instructional Factors on the Effectiveness | |
of SETE in Higher Education - Based on the Perspective of Student Subjects. Heilongjiang | |
Education (Higher Education Research & Appraisal), 7, 25–28. | |
Zhang, G.J., Ma, X.P., & Jiang, T.K. (2017). On the Feedback of SETE Outcomes. University Education, | |
7, 194–195. | |
Zhong, G.Z. (2012). Validity of College Students’ Evaluation of Teaching and Its Optimization | |
Strategies. Journal of Jimei University, 13(1), 74–77. | |
Zhou, W. (2009). SETE System in U.S. Colleges and Universities and Its Inspirations. Journal of | |
Hulunbeier College, 4, 107–110. | |
Journal of Higher Education Theory and Practice Vol. 22(3) 2022 159 | |
Teaching in Higher Education | |
Critical Perspectives | |
ISSN: 1356-2517 (Print) 1470-1294 (Online) Journal homepage: https://www.tandfonline.com/loi/cthe20 | |
Course evaluation scores: valid measures for | |
teaching effectiveness or rewards for lenient | |
grading? | |
Guannan Wang & Aimee Williamson | |
To cite this article: Guannan Wang & Aimee Williamson (2020): Course evaluation scores: valid | |
measures for teaching effectiveness or rewards for lenient grading?, Teaching in Higher Education, | |
DOI: 10.1080/13562517.2020.1722992 | |
To link to this article: https://doi.org/10.1080/13562517.2020.1722992 | |
Published online: 05 Feb 2020. | |
Submit your article to this journal | |
Article views: 56 | |
View related articles | |
View Crossmark data | |
Full Terms & Conditions of access and use can be found at | |
https://www.tandfonline.com/action/journalInformation?journalCode=cthe20 | |
TEACHING IN HIGHER EDUCATION | |
https://doi.org/10.1080/13562517.2020.1722992 | |
Course evaluation scores: valid measures for teaching | |
effectiveness or rewards for lenient grading? | |
Guannan Wanga and Aimee Williamsonb | |
a | |
Accounting Department, Suffolk University, Boston, MA, USA; bInstitute for Public Service, Suffolk University, | |
Boston, MA, USA | |
ABSTRACT | |
ARTICLE HISTORY | |
Course Evaluation Instruments (CEIs) are critical aspects of faculty | |
assessment and evaluation across most higher education | |
institutions, but heated debates surround the value and validity of | |
such instruments. While some argue that CEI scores are valid | |
measures of course and instructor quality, others argue that | |
faculty members can game the system, most notably with lenient | |
grading practices to achieve higher student ratings. This article | |
synthesizes the literature on course evaluation instruments as | |
they relate to student grades to assess the evidence supporting | |
and refuting the major theoretical frameworks (i.e. leniency | |
hypothesis and validity hypothesis), explores the implications | |
of research design and methods and proposes practical | |
recommendations for colleges and universities. This paper also | |
goes beyond the CEI-grade relationship and provides a framework | |
that illustrates the relationships between teaching quality and CEI | |
scores, and the potential confounding factors and omitted | |
variables which may significantly deteriorate the informativeness | |
of the CEI score. | |
Received 25 July 2019 | |
Accepted 23 January 2020 | |
KEYWORDS | |
Course evaluation | |
instrument; expected grade; | |
teaching quality; student | |
learning | |
JEL Classification | |
I20; I21 | |
1. Introduction | |
Course evaluation processes are critical and influential components of teaching, with | |
significant weight for review, tenure, and promotion decisions across most universities. | |
The course evaluation instrument (CEI) is widely used by institutions of higher education to evaluate and improve teaching quality. Student evaluations of courses are | |
common among colleges and universities and virtually all business schools use some | |
form of student evaluations (Clayson 2009; Brockx, Spooren, and Mortelmans 2011). | |
The first student rating forms were completed at the University of Washington in | |
the 1920s, and the first research on student ratings followed soon after (Kulik 2001). | |
Despite closing in on a century of use, there is still much debate as to the validity | |
and appropriate use of student evaluations of courses. Given the important role | |
these evaluations play in faculty tenure and promotion processes, it is not very surprising that such student evaluations continue to generate significant debate and attention | |
in the literature. | |
CONTACT Guannan Wang | |
igwang@suffolk.edu | |
© 2020 Informa UK Limited, trading as Taylor & Francis Group | |
Suffolk University, 120 Tremont Street, Boston, MA 02108, USA | |
2 | |
G. WANG AND A. WILLIAMSON | |
Student rating programs were originally designed, and continue to be used, for two | |
main reasons: (1) to help instructors improve their teaching and (2) to help administrators | |
oversee teaching quality across the institution and make related decisions (Kulik 2001; | |
Brockx, Spooren, and Mortelmans 2011). These broad goals have evolved into many significant and influential uses of student evaluations. These include the use of course evaluation scores in hiring new full time and adjunct faculty, annual review processes, | |
promotion and tenure decisions, teaching awards, assignment of faculty to courses, | |
accreditation reviews, development of professional development programs, merit pay, | |
and student selection of courses (Kulik 2001; Barth 2008; Benton 2011; Brockx, | |
Spooren, and Mortelmans 2011; Catano and Harvey 2011; Chulkov and Alstine 2011). | |
Some schools have thresholds for student course evaluation scores, below which a | |
faculty member is ineligible for tenure. One or two bad CEI scores may also mean that | |
an adjunct faculty member will not be given another opportunity to teach in a school. | |
It is critical that we do our best to fully understand the course evaluation process, | |
create valid and informative course evaluation forms, and use them in the most appropriate manner. | |
Student evaluations are the most widely used source for evaluating teaching effectiveness, even serving as the only source in many colleges (Benton 2011). The use and | |
influence of such evaluations have increased in recent years, however, in part due to | |
broader trends in accountability and marketization of higher education (Brockx, | |
Spooren, and Mortelmans 2011). Accreditation requirements may also drive the use of | |
student evaluations (Brockx, Spooren, and Mortelmans 2011). Even with the availability | |
of other forms of evaluation, student evaluations typically have the most impact and | |
receive the most attention (Dodeen 2013). | |
Research on CEIs has identified relationships between CEI scores and a variety of | |
factors, such as course grades (Krautmann and Sander 1999; McPherson 2006; Weinberg, | |
Hashimoto, and Fleisher 2009; Brockx, Spooren, and Mortelmans 2011; among others), | |
class attendance (Arnold 2009; Brockx, Spooren, and Mortelmans 2011; Braga, Paccagnella, and Pellizzari 2014), discipline (McPherson 2006; Nowell 2007; Driscoll and | |
Cadden 2010; Matos-Díaz and Ragan 2010), class type (Krautmann and Sander 1999; | |
Centra 2003; Driscoll and Cadden 2010), class level (Nelson and Lynch 1984; Nowell | |
2007; Driscoll and Cadden 2010; Ewing 2012) and many other factors. | |
The interpretation of these relationships has generated even further debate. It is critical | |
that we develop a good understanding of this process and its impact. As others have | |
suggested, if evaluation scores can be ‘bought’, the instrument most used for measuring | |
teaching effectiveness is flawed and may contribute to grade inflation at a more systemic | |
level (Krautmann and Sander 1999). | |
At its very core is the debate over whether course evaluation instruments are valid | |
measures of teaching. As Kulik (2001) succinctly states, ‘[t]o say that student ratings | |
are valid is to say that they reflect teaching effectiveness’ (p. 10). While some faculty | |
members see CEI scores as valid measures that inform their teaching and bring needed | |
accountability to higher education, others view CEI scores as invalid measures that are | |
more likely to reflect student bias and retaliation than instructor performance. Some | |
point out that student evaluation ratings are more appropriately measures of ‘satisfaction’ | |
than outcomes or teaching value (Benton 2011). Given that other measures of teaching | |
performance, such as exam scores and peer evaluations, carry similar or even stronger | |
TEACHING IN HIGHER EDUCATION | |
3 | |
concerns about validity and reliability, there is no holy grail by which to measure teaching | |
effectiveness and compare it to CEI scores (Kulik 2001). | |
2. Research questions & method | |
As explained above, one of the most controversial topics in the CEI literature is the association between students’ expected grades and CEI scores. We identified two critical questions surrounding this debate: (1) Is there a relationship between grades (actual, expected, | |
etc.) and CEI scores? (2) If so, what is the nature of or explanation for that relationship? | |
While previous studies have argued responses to these questions, the findings are mixed, | |
demonstrating a strong need for a more comprehensive analysis. | |
To answer these questions and inform the debate surrounding the validity and leniency | |
hypotheses, we conducted a comprehensive survey of the CEI literature, identifying and | |
analyzing pedagogical studies that shed light on the relationship between grades and | |
CEI scores, particularly student expected grades. First, we searched for educational articles | |
related to course evaluation in the major databases, including ABI/INFORM, Business | |
Source Complete, ScienceDirect, and Google Scholar by using a list of keywords.1 The | |
initial search found 72 published articles related to student evaluations. Second, given | |
that our focus is the impact of grades on student evaluations, we further limited the | |
sample to studies incorporating grade (actual grade or expected grade) in their study. | |
That narrowed the sample down to the 28 studies listed in Tables 1 and 2. | |
Tables 1 and 2 summarize the research type, research question, and data source of the | |
related literature. Table 3 presents a summary of the choice of research method, dependent | |
variables, independent variables, statistical results, and control variables. Our analysis | |
includes an evaluation of the arguments in the existing literature, implications of research | |
designs and methods, confounding factors, and practical implications. Among the 28 | |
studies reviewed in Tables 1 and 2, 24 of them are empirical analyses and thus discussed | |
in Table 3. | |
3. Prior discussion on the CEI-Grade relationship | |
3.1. Leniency hypothesis vs. validity hypothesis | |
As noted above, there has been widespread debate around the association between students’ grades and CEI scores. Many studies show consistent evidence that course | |
grades, both expected and relative among peers, have a positive relationship with the | |
CEI score (Marsh and Roche 2000; Isely and Singh 2005; Driscoll and Cadden 2010; | |
Brockx, Spooren, and Mortelmans 2011). However, several researchers have cast doubt | |
on that contention and find no significant association between course grades and CEI | |
scores or find that the impact of expected grades on CEI scores is subtle and can be | |
explained by other factors (Centra 2003; Arnold 2009). Among the 28 studies we surveyed, | |
24 studies performed statistical analyses on the relationship between grades and CEI | |
scores. 19 of these studies demonstrate a positive association to some degree between | |
the (average) CEI score and (average) grade expectation, 4 studies do not find any significant association, and 1 study finds a negative association. The most common measures | |
and proxies for grade are individual expected grade, class-average expected grade, | |
4 | |
G. WANG AND A. WILLIAMSON | |
Table 1. Pedagogical research on the impact of course grade on student evaluation: publication outlet | |
and research question. | |
Author | |
Year | |
Arnold, I. J. M | |
2009 | |
Do examinations influence | |
student evaluations? | |
Bausell, R.B. and | |
J. Magoon | |
1972 | |
Beleche, T., D. Fairris, | |
and M. Marks | |
2012 | |
Braga, M., | |
M. Paccagnella, | |
and M. Pellizzari | |
Brockx, B., P.Spooren, | |
and D. Mortelmans | |
2014 | |
Expected grade in a course, | |
grade point average, and | |
student ratings of the course | |
and the instructor | |
Do course evaluations truly | |
reflect student learning? | |
Evidence from an objectively | |
graded post-test | |
Evaluating students’ | |
evaluations of Professors | |
Butcher, K. F., | |
P. J. McEwan, and | |
A. Weerapana | |
Centra, J. A. | |
2014 | |
2003 | |
Clayson, D. E. | |
2009 | |
Driscoll, J. and | |
D. Cadden | |
2010 | |
Ewing, A. M. | |
2012 | |
Gorry, D. | |
2017 | |
Greenwald, A. G., | |
G. M.Gillmore | |
1997a | |
Greenwald, A. G., | |
G. M.Gillmore | |
1997b | |
Hoefer, P.,J. | |
Yurkiewicz, and | |
J. C. Byrne | |
2012 | |
2011 | |
Article Name | |
Taking the grading leniency | |
story to the edge. The | |
influence of student, teacher, | |
and course characteristics on | |
student evaluations of | |
teaching in higher education | |
The effects of an anti-gradeinflation policy at Wellesley | |
College | |
Will teachers receive higher | |
student evaluations by giving | |
higher grades and less course | |
work? | |
Student Evaluations of | |
Teaching: Are They Related to | |
What Students Learn?: A | |
Meta-Analysis and Review of | |
the Literature | |
Student evaluation | |
instruments: the interactive | |
impact of course | |
requirements, student level, | |
department and anticipated | |
grade | |
Estimating the impact of | |
relative expected grade on | |
student evaluation of | |
teachers | |
The impact of grade ceilings on | |
student grades and course | |
evaluations: Evidence from a | |
policy change | |
Grading leniency is a | |
removable contaminant of | |
student ratings | |
No pain,no gain? The | |
importance of measuring | |
course workload in student | |
ratings of instruction | |
The association between | |
students’ evaluation of | |
teaching and grades | |
Journal | |
International | |
Journal of | |
Educational | |
Research | |
Educational and | |
Psychological | |
Measurement | |
Economics of | |
Education Review | |
Economics of | |
Education Review | |
Research questions | |
Measures the impact of timing | |
on student evaluations | |
Examines the relation between | |
expected grade and the | |
course rating | |
The relationship between | |
student course evaluations | |
and an objective measure of | |
student learning | |
Contrasts measures of teacher | |
effectiveness | |
Educ Assc Eval Acc | |
Examines the influence of | |
course grades and other | |
characteristics of students on | |
student evaluations | |
Journal of | |
Economic | |
Perspectives | |
Research in Higher | |
Education | |
Evaluates the consequences of | |
the mandatory grade ceiling | |
on student evaluations | |
Examines the relationship | |
between the expected | |
grades, the level of difficulty, | |
workload in courses, and | |
course rating | |
The relationship between the | |
evaluations and learning | |
Journal of | |
Marketing | |
Education | |
American Journal of | |
Business | |
Education | |
Examines the relationship | |
between measures of | |
teaching effectiveness and | |
several factors, including the | |
students’ anticipated grade | |
Economics of | |
Education Review | |
Investigates instructors’ | |
incentives to ‘buy’ higher | |
evaluation scores by inflating | |
grades | |
The effects of a grade ceiling | |
policy on grade distributions | |
and course evaluations | |
Economics of | |
Education Review | |
American | |
Psychologist | |
Journal of | |
Educational | |
Psychology | |
Decision Sciences | |
Journal of | |
Innovative | |
Education | |
Examines the relation between | |
grading leniency and student | |
evaluations | |
Examines the relation between | |
course grade and student | |
evaluations | |
Examines the relation between | |
course grade and course | |
rating, and the moderating | |
role of gender, academic | |
level, and field. | |
(Continued ) | |
TEACHING IN HIGHER EDUCATION | |
5 | |
Table 1. Continued. | |
Author | |
Year | |
Isely, P. and H.Singh | |
2005 | |
Do higher grades lead to | |
favorable student | |
evaluations? | |
Article Name | |
The Journal of | |
Economic | |
Education | |
Krautmann, A. C,and | |
W.Sander | |
1999 | |
Grades and student evaluations | |
of teachers | |
Economics of | |
Education Review | |
Love, D. A. and | |
M. J. Kotchen | |
2010 | |
Grades, course evaluations, and | |
academic incentives | |
Eastern Economic | |
Journal | |
Marsh, H. W. and | |
L. A. Roche | |
2000 | |
Matos-Díaz, H. and | |
J. R. Ragan Jr | |
2010 | |
Journal of | |
Educational | |
Psychology | |
Education | |
Economics | |
McPherson, M. A. | |
2006 | |
Effects of grading leniency and | |
low workload on students’ | |
evaluations of teaching | |
Do student evaluations of | |
teaching depend on the | |
distribution of expected | |
grade? | |
Determinants of how students | |
evaluate teachers | |
Millea, M. and | |
P. W. Grimes | |
2002 | |
Nelson, J. P, and | |
K. Lynch | |
1984 | |
Nowell, C. | |
2007 | |
The impact of relative grade | |
expectations on student | |
evaluation of teaching | |
Remedios, R. and | |
D. A. Lieberman | |
2008 | |
I like your course because you | |
taught me well: The influence | |
of grades, workload, | |
expectations and goals on | |
students’ evaluations of | |
teaching | |
Stumpf, S. A. and | |
R. D. Freedman | |
1979 | |
Uttl, B., C. A. White, | |
and D. W. Gonzales | |
2017 | |
VanMaaren, V. G., | |
C. M.Jaquett, and | |
R. L.Williams | |
2016 | |
Expected grade covariation | |
with student ratings of | |
instruction: Individual versus | |
class effects | |
Meta-analysis of faculty’s | |
teaching effectiveness: | |
Student evaluation of | |
teaching ratings and student | |
learning are not related | |
Factors most likely to | |
contribute to positive course | |
evaluations | |
Weinberg, B. A., | |
M. Hashimoto, | |
2009 | |
Grade expectations and | |
student evaluation of | |
teaching | |
Grade inflation, real income, | |
simultaneity, and teaching | |
evaluations | |
Evaluating Teaching in Higher | |
Education | |
Journal | |
The Journal of | |
Economic | |
Education | |
College Student | |
Journal | |
The Journal of | |
Economic | |
Education | |
International | |
Review of | |
Economics | |
Education | |
British Educational | |
Research Journal | |
Journal of | |
Educational | |
Psychology | |
Studies in | |
Educational | |
Evaluatio | |
Innovative Higher | |
Education | |
Journal of | |
Economic | |
Education | |
Research questions | |
Examines the relation between | |
the expected grade in other | |
classes of the same course | |
and student evaluations | |
Examines the relation between | |
grading practices and | |
student evaluations | |
Investigate the incentives | |
created by academic | |
institutions affect students’ | |
evaluation on faculty and | |
grade inflation | |
Examines the relation between | |
grading leniency and student | |
evaluations | |
Examines the relation between | |
the distribution of expected | |
grades and student | |
evaluations | |
Grade expectations and | |
student evaluation of | |
teaching | |
Examines the links between | |
course rigor and grades to | |
evaluation scores | |
Examines the relation between | |
student evaluation and grade | |
inflation and the moderating | |
role of faculty real income | |
Examines the relation between | |
student evaluations and | |
relative grades among peers | |
Investigates how factors such | |
as students’ pre-course | |
expectations, achievement | |
goals, grades, workload, and | |
perceptions of course | |
difficulty affect how they rate | |
their courses | |
Compares individual and class | |
effects and their role on | |
student rating of instruction | |
Re-estimate previously | |
published meta-analyses and | |
examine the relationship | |
between CEI score and | |
student learning. | |
Determines the extent to which | |
students differentially rated | |
ten factors likely to affect | |
their ratings on overall course | |
evaluations | |
Examines the relation between | |
grading practices and | |
student evaluations and the | |
role of learnings | |
6 | |
G. WANG AND A. WILLIAMSON | |
Table 2. Pedagogical research on the impact of course grade on student evaluation: research type, data | |
source, and sample size. | |
Author | |
Year | |
Arnold, I. J. M | |
2009 | |
Archival | |
Method | |
Erasmus School of Economics | |
Target Sample (Survey/experimental) | |
Bausell, R.B. and J. Magoon | |
1972 | |
Archival | |
University of Delaware | |
Beleche, T., D. Fairris, and | |
M. Marks | |
Braga, M., M. Paccagnella, and | |
M. Pellizzari | |
Brockx, B., P. Spooren, and | |
D. Mortelmans | |
Butcher, K. F., P. J. McEwan, and | |
A. Weerapana | |
Centra, J. A. | |
2012 | |
Archival | |
2014 | |
Archival | |
Unidentified four-year public | |
university | |
Bocconi University | |
2011 | |
Archival | |
University of Antwerp | |
1,244 students | |
2014 | |
Archival | |
Wellesley College | |
104,454 students | |
2003 | |
Archival | |
55,000 classes | |
Clayson, D. E. | |
2009 | |
Driscoll, J. and D. Cadden | |
Ewing, A. M. | |
Gorry, D. | |
Greenwald, A. G., G. M.Gillmore | |
Greenwald, A. G., G. M.Gillmore | |
Hoefer, P.,J. Yurkiewicz, and | |
J. C. Byrne | |
Isely, P. and H.Singh | |
Krautmann, A. C,and W.Sander | |
Love,D. A. and M. J. Kotchen | |
Marsh, H. W. and L. A. Roche | |
Matos-Díaza, H. and J. R. Ragan Jr | |
McPherson, M. A. | |
Millea, M. and P. W. Grimes | |
Nelson, J. P, and K. Lynch | |
Nowell, C. | |
Remedios, R. and D. A. Lieberman | |
Stumpf, S. A. and R. D. Freedman | |
2010 | |
2012 | |
2017 | |
1997a | |
1997b | |
2012 | |
MetaAnalysis | |
Archival | |
Archival | |
Archival | |
Theory | |
Archival | |
Archival | |
Student Instructional Report II by | |
Educational Testing Service | |
More than 17 prior archival research | |
Quinnipiac University | |
University of Washington | |
Unidentified state university | |
N/A | |
University of Washington | |
Pace University | |
29,596 students | |
53,658 classes | |
281 classes | |
N/A | |
200 classes | |
381 Classes | |
2005 | |
1999 | |
2010 | |
2000 | |
2010 | |
2006 | |
2002 | |
1984 | |
2007 | |
2008 | |
1979 | |
Archival | |
Archival | |
Theory | |
Archival | |
Archival | |
Archival | |
Archival | |
Archival | |
Archival | |
Archival | |
Archival | |
Grand Valley State University | |
DePaul University in Chicago | |
N/A | |
American University | |
University of Puerto Rico at Bayamón | |
University of North Texas | |
Mississippi State University | |
Penn State University | |
A large public university in the US | |
Scottish University | |
New York University | |
Uttl, B., C. A. White, and | |
D. W. Gonzales | |
VanMaaren, V. G., C. M.Jaquett, | |
and R. L.Williams | |
Weinberg, B. A., M. Hashimoto, | |
and B. M. Fleisher | |
2017 | |
More than 58 prior research | |
2016 | |
MetaAnalysis | |
Archival | |
260 classes | |
258 Classes | |
N/A | |
5,433 classes | |
1,232 classes | |
607 classes | |
149 students | |
146 classes | |
716 students | |
610 students | |
5,894 Students and | |
197 classes | |
N/A | |
2009 | |
Archival | |
A large state university in the | |
southeastern US | |
Ohio State University | |
Sample Size | |
Around 3,000 | |
students | |
Over 17,000 | |
students | |
4,293 students | |
1,206 students | |
N/A | |
148 students | |
26,666 Students | |
individual expected grade divided by GPA, individual expected grade relative to the | |
section average, individual expected grade relative to the actual grade, actual course | |
grade, overall GPA, grade in the subsequent course, and high school grades. The most | |
used measures of CEI scores include overall course rating, overall instructor rating, and | |
rating on instructor’s teaching ability. We present a summary of the choices of research | |
methods, dependent variables, independent variables, statistical results, and control variables in Table 3. | |
While many studies have provided empirical evidence supporting the relationship | |
between grades and CEI scores, the interpretation of such a relationship is under | |
debate. Greenwald and Gillmore (1997) suggest that the grade–rating correlation primarily results from instructors’ grading leniency. This study established the fundamental | |
theory of the relationship between course grades and CEI scores and represents the | |
leniency hypothesis. Another interpretation is the validity hypothesis which posits that | |
Table 3. Sample selection and variable definition. | |
Sample | |
level | |
Author | |
Year | |
Bausell, R. B. and | |
J. Magoon | |
Stumpf, S. A. and | |
R. D. Freedman | |
1972 | |
Student | |
1979 | |
Dependent Variables | |
Independent Variables | |
Statistical Results | |
Control Variables | |
Course evaluation score/Instructor | |
evaluation score | |
Ratings of courses and instructors | |
Expected grade | |
ns | |
N/A | |
Expected grade | |
Positive; | |
**/Positive; *** | |
N/A | |
Average course evaluation score | |
Average expected grade | |
Positive; * | |
Absolute expected grade/ Relative | |
expected grade | |
Expected grade | |
Positive; ***/ | |
Positive; *** | |
Positive; *** | |
Average instructor evaluation, average | |
present grade, instructor’s average real | |
income by rank, instructor’s access, | |
instructor’s interest, instructor’s | |
organization, class time, class size, | |
Saturday meeting time, instructor | |
experience, class level, exam grade and | |
workload | |
Self-progress, same instructor, workload | |
Average expected grade | |
Attitude about remaining graded work | |
/Current earned grade | |
ns | |
Positive; | |
***/Positive; *** | |
Average expected grade | |
ns | |
Average expected grade/ Relative | |
expected grade (the gap between | |
expected grade and cumulative grade | |
point average of incoming students) | |
ns /Positive; | |
1984 | |
Greenwald, A. G., | |
G. M.Gillmore | |
Krautmann and | |
Sander | |
Marsh and Roche | |
Millea and Grimes | |
1997b | |
Class | |
Average course evaluation score | |
1999 | |
Student | |
Course evaluation score | |
2000 | |
2002 | |
Class | |
Student | |
Centra | |
2003 | |
Class | |
Average course evaluation score | |
The overall evaluation, the rating | |
directly related to the quality of | |
the course, and the rating directly | |
related to the quality of the | |
instructor | |
Average course evaluation score | |
Isely and Singh | |
2010 | |
Class | |
Average course evaluation score | |
Instructor gender, instructor rank, class | |
size, class type, and the class level | |
Perceived learning and course workload | |
Student’s gender, student’s race, student’s | |
age, student’s intellectual ability, and | |
course difficulty | |
Course difficulty, course workload, student | |
effort and involvement, course type, | |
course level, class size, institutional type, | |
teaching by lecture, teaching by | |
discussion or laboratories, and course | |
outcomes | |
Class size, percentage of students taking a | |
required course, percentage of students | |
that are majors, average cumulative GPA | |
of students in each class, intensive | |
writing requirements, length of class, | |
class time, class location, percentage of | |
7 | |
(Continued ) | |
TEACHING IN HIGHER EDUCATION | |
Nelson and Lynch | |
Student | |
and | |
Class | |
Class | |
Author | |
8 | |
Table 3. Continued. | |
Year | |
Sample | |
level | |
Dependent Variables | |
Independent Variables | |
Statistical Results | |
2006 | |
Class | |
Average course evaluation score | |
Average expected grade | |
Negative; *** | |
Nowell | |
2007 | |
Student | |
Course evaluation score | |
Positive; **/ | |
Positive; */ | |
Positive **/ns/ | |
Positive; ** | |
Remedios, R. and | |
D. A. Lieberman | |
2008 | |
Student | |
Course evaluation score | |
Individual expected grade/ Individual | |
expected grade divided by GPA/ | |
Individual expected grade relative to | |
the section average/Individual | |
expected grade relative to the course | |
average/ Individual expected grade | |
relative to the average grade given by | |
the instructor in all classes | |
Course Grade | |
Indirectly | |
Arnold | |
2009 | |
Student | |
The overall course evaluation score | |
and the scores on separate items | |
Course Grade | |
Positive; *** | |
Weinberg and | |
Hashimoto, | |
2009 | |
Class | |
Average course evaluation score | |
Average course grade | |
Positive; *** | |
Driscoll and Cadden | |
2010 | |
Student | |
Expected grade | |
Positive; *** | |
Matos-Díaza and | |
Ragan | |
2010 | |
Class | |
Instructor’s teaching ability and | |
whether the student would | |
recommend this instructor to a | |
friend. | |
Average course evaluation score | |
Average expected grade | |
Positive; ** | |
Brockx, Spooren, and | |
Mortelmans | |
2011 | |
Student | |
Course evaluation score | |
Course grade/ Overall grade | |
Positive; **/ ns | |
Achievement motivation, study hours, | |
perceived difficulty, and pre-course | |
expectations | |
High school grade, self-reported measures | |
of students’ class attendance and study | |
effort | |
Grades in future sections, female | |
instructor, foreign born instructor, | |
lecturer, graduate associate, instructor | |
has PhD, instructor’s experience, Multisection class, honors class, and class time | |
Discipline, course type, course level | |
Actual GPA, instructor’s rank, instructor’s | |
degree, instructor’s age, instructor’s | |
gender, class size, class time, discipline, | |
and academic term, | |
Course type, class attendance, instructor’s | |
gender, instructor’s age, student’s | |
gender, course workload, class size, and | |
examination period in which the | |
students received their highest course | |
grades | |
G. WANG AND A. WILLIAMSON | |
McPherson | |
Control Variables | |
class that is represented in course | |
evaluation, and number of years a | |
instructor has taught at the university | |
Discipline and the proportion of students | |
who completed the evaluation | |
questionnaire | |
Whether the instructor was part-time, the | |
percentage of the student’s grade that | |
was based on testing, course level, class | |
size, the number of times the class met | |
each week, disciplines, the student’s age, | |
the student’s gender, self-reported | |
measures of student’s effort in the class | |
2012 | |
Student | |
Course evaluation score | |
Grade in the current course/ Grade in the | |
subsequent course | |
Positive; **/ ns | |
Ewing | |
2012 | |
Class | |
Average course evaluation score | |
Average relative expected grade | |
Positive; *** | |
Hoefer, P.,J. | |
Yurkiewicz, and | |
J. C. Byrne | |
Braga, Paccagnella | |
and Pellizzari | |
2012 | |
Class | |
Average course evaluation score | |
Normalized student grade | |
Positive; * | |
2014 | |
Class | |
Average course evaluation score | |
Average high school grade/ Overall | |
teaching quality/ Overall clarity of the | |
lectures | |
Positive; **/ | |
Negative; **/ | |
Negative; ** | |
Butcher, K. F., | |
P. J. McEwan, and | |
A. Weerapana | |
VanMaaren, V. G., | |
C. M.Jaquett, and | |
R. L.Williams | |
2014 | |
Class | |
Average course evaluation score | |
Mandatory grade cap | |
Negative; *** | |
2016 | |
Student | |
Final grade | |
Expected grade | |
Positive; *** | |
Gorry | |
2017 | |
Class | |
Average course evaluation score | |
Average course grade | |
Positive; * | |
Cumulative high school GPA, placement | |
score, SAT verbal, SAT writing, ACT, and | |
indicators for missing SAT, ACT or | |
placement score, student’s age, | |
student’s gender, student’s ethnicity, | |
student’s housing status, first | |
generation, low income, term, | |
enrollment, course evaluation response | |
rate, withdrawal rate, and percent of | |
students repeating the class | |
Actual average grade, instructor’s ranking, | |
course level, class size, course evaluation | |
response rate, discipline, class time, and | |
class frequency. | |
Gender, academic level, discipline | |
Class size, class attendance, high school | |
grade, entry test score, percentage of | |
females, percentage of non-local | |
students, percentage of late enrollees, | |
student ability, class time, room’s floor, | |
and classroom building. | |
Age, faculty gender, faculty tenure status, | |
class level, class size | |
Gender, academic classification, class | |
characteristics such as relevant class | |
discussion, extra credit, well-organized | |
classes, small-group activities, course | |
papers, student presentation and course | |
standards. | |
Ceiling policy, class size, instructor, and | |
academic semester | |
TEACHING IN HIGHER EDUCATION | |
Beleche, Fairris, and | |
Marks | |
9 | |
10 | |
G. WANG AND A. WILLIAMSON | |
more effective teachers mean higher student learning, which translates into higher grades | |
and higher CEI scores. In the following sections, we provide a detailed discussion of the | |
two compelling hypotheses proposed by prior studies. | |
3.1.1. Leniency hypothesis | |
The leniency hypothesis posits that students give higher CEI scores to instructors from | |
whom they receive higher grades. Supporters of the leniency hypothesis generally argue | |
that ‘instructors can “buy” higher grades by grading more leniently’ (Krautmann and | |
Sander 1999; McPherson 2006; Weinberg, Hashimoto, and Fleisher 2009; among others). | |
In an early study, Greenwald and Gillmore (1997) find that courses that receive higher | |
CEI scores are those in which students expect to receive higher grades or a lighter workload, not necessarily those with higher teaching quality. Many studies interpret the | |
relationship between course grades and CEI scores to support the leniency hypothesis. | |
For example, Krautmann and Sander (1999) show that a one-point increase in the | |
expected classroom grade point average (GPA) leads to an improvement of between | |
0.34 and 0.56 in the CEI score. Similarly, McPherson (2006) finds that an increase of | |
one point on a four-point expected grade scale results in an improvement in the CEI | |
score of around 0.34 for foundational courses and 0.30 for upper-level courses. Brockx, | |
Spooren, and Mortelmans (2011) find that when a student’s course grade increases by | |
one point, the CEI score increases by 0.33 (grand-mean centered) and 1.56 (groupmean centered). Millea and Grimes (2002) report similar findings that both the current | |
grade and expected grade have a positive relationship with the CEI score. | |
Some studies dig deeper to provide clearer evidence of the leniency hypothesis. According to Handelsman et al. (2005), most college students can be classified as performanceoriented rather than mastery-oriented, indicating that their satisfaction with a course is | |
largely based on their grade in that course. Braga, Paccagnella, and Pellizzari (2014) performed a similar analysis of teaching effectiveness and find that teaching quality is negatively correlated with students’ CEI scores. | |
In addition to empirical evidence, both Gorry (2017) and Butcher, McEwan, and Weerapana (2014) provide anecdotal evidence regarding the impact of a change of grading | |
policy on CEI scores at Wellesley College. Butcher, McEwan, and Weerapana (2014) | |
examine the policy change at Wellesley College by comparing the CEI scores between | |
departments that were obligated to lower their grades with the outcomes in departments | |
that were not. The study finds that students in the ‘grading-decreasing’ courses lowered | |
their evaluations of the instructors accordingly. Similarly, Gorry (2017) analyzes the | |
effects of a grade ceiling policy implemented by a large state university on grade distributions and CEI scores; such research shows that lowering the grade ceiling significantly | |
decreases CEI scores across a variety of measures. | |
3.1.2. Validity hypothesis | |
The main difference between the leniency and validity hypotheses is whether student | |
evaluations reflect the quality of teaching or simply capture the grading-satisfaction | |
game between the instructors and students. Supporters of the validity hypothesis argue | |
that instructors who teach more effectively receive better evaluation scores because their | |
students learn more, thereby earning higher grades. In other words, CEI is a valid instrument (Centra 2003; Barth 2008; Remedios and Lieberman 2008; Arnold 2009; Clayson | |
TEACHING IN HIGHER EDUCATION | |
11 | |
2009). Essentially, the validity hypothesis suggests that even if there is a strong correlation | |
between student grades and CEI scores, we cannot be sure that there is causality. | |
Using more than 50,000 CEI scores, Centra (2003) investigates the previously examined | |
relationship between grades and student evaluations. Unlike previous researchers, Centra | |
(2003) controls for a series of variables in regression analyses, including factors such as | |
subject area, class size, teaching method, and student-perceived learning outcomes. Contrary to many other analyses, Centra (2003) does not find convincing evidence that students’ course ratings are influenced by the grades they receive from their instructors | |
when controlling for other factors. Rather, the findings suggest a curvilinear relationship | |
between the difficulty/workload level of courses and the CEI score, all of which are more | |
indicative of students’ learning experiences. | |
Centra’s (2003) arguments are further confirmed by a few other studies. Remedios and | |
Lieberman (2008) find that grades only have a small impact on student ratings compared | |
with other influential factors. By controlling for students’ achievement goals and expectations at the beginning of the semester, Remedios and Lieberman (2008) show that students’ course ratings are largely determined by the extent to which the students find their | |
courses stimulating, interesting, and useful. The impact of grades and course difficulty | |
appears to be small. Marsh and Roche (2000) find similar results that many CEI scores | |
are not related to grading leniency; rather, they are more related to the learning experience | |
and teaching efforts. Clayson (2009) conducts a meta-analysis on more than thirty studies | |
and shows a small average relationship exists between learning and the CEI score. | |
However, the author highlights that such a relationship is situational and may vary | |
across teachers, disciplines, or class levels. Barth (2008) shows that the overall instructor | |
rating is primarily driven by the quality of instruction. Beleche, Fairris, and Marks (2012) | |
examine the learning–CEI association by using a centrally graded exam as a proxy for | |
actual student learning. This exam was not related to any specific course, so the sample | |
was independent of course type, faculty grading policy, and students’ grade expectations. | |
The literature also suggests inconsistencies and a lack of linearity. For example, Arnold | |
(2009) finds that successful students do not increase the CEI score in response to their successful performance, whereas unsuccessful students externalize failure by lowering the CEI | |
score (Arnold 2009). Such results are inconsistent with the common criticism of CEIs, | |
which is that students use it as a tool to reward or penalize teachers. | |
3.2. Other factors impact CEI scores | |
As suggested above, it is well documented that the CEI-grade relationship varies considerably across different subgroups of observations, and that other factors are as impactful if | |
not more so than grade itself. This paper focuses on the relationship between student | |
grades and CEI scores, but it is important to remember that this is just one piece of a | |
complex picture. Figure 1 proposes a diagram representing various factors that impact | |
CEI scores and their relationships, including student grades. Going into depth on all of | |
these factors is beyond the scope of this paper, but it is important to keep in mind, particularly in cases where confounding factors may have a strong intersection with the | |
student grade-CEI score relationship. The most commonly documented confounding | |
factors include workload, course discipline, course level, class size, class attendance, percentage of non-local students, percentage of late enrollees, student effort, class time, class | |
12 | |
G. WANG AND A. WILLIAMSON | |
Figure 1. A Framework for Understanding the Relationship Between Teaching Quality and CEI Scores. | |
location, class frequency, instructor’s ranking, instructor’s gender, course evaluation | |
response rate, etc. (Krautmann and Sander 1999; Millea and Grimes 2002; Centra 2003; | |
among others). We will highlight a few factors found to have a strong impact on the | |
CEI score. | |
3.2.1. Workload | |
A number of studies find that there is a negative relationship between workload and CEI | |
score, as students typically rate courses higher if they are more manageable (Feldman | |
1978; Marsh 1987; Paswan and Young 2002; Centra 2003; Clayson 2009; Driscoll and | |
Cadden 2010). The results from Marsh and Roche (2000) and Centra (2003) indicate | |
that courses with lighter workloads, such as lower ‘hours per week required outside of | |
class’, receive higher student ratings. | |
3.2.2. Course characteristics | |
Course type, course level, and discipline all have a significant impact on CEI scores. | |
Brockx, Spooren, and Mortelmans (2011) conclude that instructors teaching elective | |
courses receive higher scores than instructors teaching required ones. Benton and | |
Cashin (2012) conclude that higher-level courses tend to receive higher evaluation | |
ratings in comparison to lower-level courses. Similarly, Ewing (2012) also finds that graduate courses tend to receive better evaluations than undergraduate courses. Such factors can | |
be so strong that they mitigate or exacerbate the CEI-grade relationship. For example, | |
Hoefer, Yurkiewicz, and Byrne (2012) extend the discussion and find the correlation | |
between grade and CEI score is stronger for courses that are for undergraduates and | |
TEACHING IN HIGHER EDUCATION | |
13 | |
those in some specific disciplines, such as management and marketing. Their results also | |
indicate that the CEI scores vary considerably across disciplines. Suggested by their study, | |
the highest SET scores are received in arts and humanities, followed by biological and | |
social sciences, business, computer science, math, engineering and physical science | |
(Matos-Díaz and Ragan 2010; Brockx, Spooren, and Mortelmans 2011). Nowell (2007) | |
finds that courses will receive higher CEI scores if students exert more effort in the | |
course or the class meets at least two times per week. Such variation creates endogeneity | |
issues when CEI scores are used to assess the instructor’s performance. because courses | |
with different characteristics may not be truly comparable. Driscoll and Cadden (2010) | |
suggest that, given that CEI scores vary significantly across courses, instructors should | |
be evaluated within their respective departments by a department average rather than | |
by an overall university measure. | |
3.2.3. Instructor characteristics | |
In addition, literature has documented that full-time faculty members generally receive | |
higher scores than part-time faculty (Nowell 2007; Driscoll and Cadden 2010). Ewing | |
(2012) further documents that pre-tenure professors tend to receive lower evaluation | |
scores than tenured professors. An instructor’s age may also have an impact on the CEI | |
score. Interestingly, this is not in the direction that would be predicted based on an expectation that experience improves teaching. Rather, Brockx, Spooren, and Mortelmans | |
(2011) find that younger professors tend to receive better evaluations. Driscoll and | |
Cadden’s (2010) literature review reports that other studies have found perceptions of | |
an instructor’s personality and/or enthusiasm to be strong factors in course evaluation | |
instruments (Clayson and Sheffet 2006; Clayson 2009; Driscoll and Cadden 2010). | |
Again, some factors have been found to strengthen the CEI-grade relationship, with | |
Hoefer, Yurkiewicz, and Byrne (2012) finding the correlation between grade and CEI | |
score to be stronger for courses that taught by female faculty. | |
4. The caveats of CEI score as a measure of teaching quality | |
To examine the relationship between grades and CEI scores, prior literature builds | |
different empirical models and uses various proxies for grades and CEI scores. | |
Beyond the CEI-grade relationship documented by prior literature (see 3.1 and 3.2 for | |
detailed review), there are a number of caveats which concern the validity of CEI as a | |
measure of teaching quality. In this section, we will discuss the possible biases introduced | |
by CEI: (1) relative performance and peer effect, (2) selection biases, and (3) grade | |
inflation. | |
4.1. Relative performance and peer effect | |
While most of the variables included in these studies capture an individual’s absolute | |
grade or CEI score, the relative student standing is also shown to have a significant | |
impact on the student’s decision making regarding CEI scores. Economists and sociologists have found that individuals’ satisfaction depends not only on their own performance but also on their circumstances relative to a reference group (Becker 1974). | |
Therefore, it is possible that although students’ satisfaction with a course – as captured | |
14 | |
G. WANG AND A. WILLIAMSON | |
by CEI scores – may be influenced by individual performance, it may also be influenced by | |
their relative performance among their peers. Knowing the impact of peer effect is important, as suggested by Nowell (2007): | |
If students reward teachers for high relative grades as opposed to simply high absolute grades, | |
there may be limits to an instructor’s ability to ‘purchase’ better teaching evaluations by | |
increasing the grades of all students. Conversely, if individual students reward teachers for | |
their own high grades as well as the high grades of their peers, it becomes expensive to | |
give low grades to anyone in the class and increases the incentive to ‘buy’ higher SET | |
ratings. (p. 44) | |
Stumpf and Freedman (1979) provide early evidence of the relationship between grades | |
and student ratings at both the individual and class levels. Their results suggest that both | |
the individual’s expected grade and the instructor’s overall expected grading policy contribute to the grade–rating relationship, and that the latter tends to have a stronger | |
impact. As an extension of Stumpf and Freedman (1979), several studies further | |
explore the relationship between relative performance and CEI score. Common measures | |
for relative performance include: (1) the difference between the expected grade for the | |
current course and the students’ historical GPA, (2) the average grade earned by all students who take the same course, (3) the expected grades in other classes in which the | |
student is enrolled, and (4) the distribution of expected grades. | |
Isely and Singh (2005) measure peer performance with two variables: expected grades | |
in other classes taught by the same instructor and the gap between the expected grade in | |
the current course and the students’ cumulative GPA. Their findings indicate that if an | |
instructor has other classes in which students expect higher grades, then the average | |
CEI score tends to be higher. | |
Analogous to the findings in Isely and Singh (2005), Nowell (2007) adopts three | |
measurements for peer performance: the difference between the expected grade for the | |
current course and the student’s historical GPA, the average grade earned by all students | |
who take the same course, and the expected grades in other classes in which the student is | |
enrolled. The study reveals that the grade students care most about has a considerable | |
impact on the CEI score. If the students use their own grades as benchmark, then the | |
grade-rating relationship is stronger. In contrast, if students use their peers’ grades as | |
benchmark, then the grade–rating relationship is weakened. | |
Matos-Díaz and Ragan (2010) explore the impact of the expected grade on the CEI | |
score from another perspective. They draw inferences from economics theories about | |
risk and uncertainty and argue that the variance of expected grades signals the teacher’s | |
reward structure. The narrow distribution of expected grades indicates that the penalty | |
for lower study time or unfavorable performance (e.g. poor performance on an examination or assignment) is relatively low and is, therefore, more likely to lead to favorable | |
student ratings. As expected, Matos-Díaz and Ragan (2010) report a negative relationship | |
between the variance of the expected grade and CEI score, showing that instructors can | |
strategically obtain favorable ratings by narrowing the grade distribution. This finding | |
also weakens the argument that students care more about their relative performance in | |
a class. | |
Overall, the literature on peer effect suggests that instructors can significantly increase | |
CEI scores, not only by increasing grades for individual students, but by lowering the | |
TEACHING IN HIGHER EDUCATION | |
15 | |
grading standards for the entire class. In this scenario, the incentives and costs of ‘buying’ | |
high CEI scores may be greater than has been suggested by the literature documented in | |
sections 3.1 and 3.2. | |
4.2. Self-selection bias | |
A favorable CEI score may also reflect factors that increase students’ satisfaction, but are | |
unrelated to teaching quality, such as students’ initial ability, course type, and instructor | |
grading leniency. To better isolate the link between the CEI score and teaching quality, it | |
is necessary to introduce objective measures of student characteristics at the individual | |
level to control for the impact of learning ability on the students’ evaluation of the | |
instructors. However, due to the anonymous nature of CEI processes, it is challenging | |
to incorporate individual level variables and self-selection bias may occur. CEI scores | |
are mostly calculated as course means and only present a subset of students who | |
choose to fill out the evaluations (Beleche, Fairris, and Marks 2012). This introduces | |
crucial measurement errors, especially when the pool of students who complete the | |
CEI differs from the total student population (Clayson 2009; Isely and Singh 2005; | |
Kherfi 2011). | |
Second, students who participate in the administration of a CEI cannot fully represent | |
the total student population. The course evaluation response rate is normally less than 100 | |
percent and it is questionable to simply assume that the students who do not complete the | |
survey are well represented by the students who do complete it. Assuming a random | |
sample of students, when the number of students incorporated into CEI scores has | |
decreased, the effect of individual variations and biases will be stronger (Isely and Singh | |
2005). Also, average CEI scores will be more statistically influenced by such bias if the | |
class size is small. | |
4.3. Grade inflation | |
CEI processes may exacerbate the problem of grade inflation and can even decrease a professor’s teaching effort (Krautmann and Sander 1999; Love and Kotchen 2010; Butcher, | |
McEwan, and Weerapana 2014). Love and Kotchen (2010) examine the effects of CEI | |
use on faculty behavior and showed that excessive institutional emphasis on teaching, | |
research, or both can exacerbate the problems of grade inflation and result in diminished | |
faculty teaching effort. To better align instructors’ incentives with the institution’s objectives on teaching and research, the authors suggest that universities should ensure uniform | |
grade distributions for individual classes and restrain grade inflation. | |
Nelson and Lynch (1984) find that the evaluation process produces grade inflation, | |
reaching similar conclusions. They also determine that faculty members’ grading policies | |
are related to their real incomes because faculty members are more willing to adopt easier | |
grading policies when the real income from teaching is falling. | |
Given the pressures on faculty to maintain favorable CEI scores and the impact of | |
expected grades on instructors’ evaluations, enforcing lower expected grades may inevitably cause adverse consequences on an instructor’s evaluation. Institutions should carefully | |
evaluate such impacts, especially when CEI scores are used in tenure and promotion | |
decisions. To ensure fairness across faculty, it would be important to ensure even | |
16 | |
G. WANG AND A. WILLIAMSON | |
application of uniform grade distributions across faculty and programs, and to account for | |
any overall reduction in CEI scores. | |
5. Discussion & recommendations | |
Overall, the literature suggests that course grades are positively correlated with CEI scores, | |
but there is considerably less evidence as to whether that relationship is properly attributed | |
to the leniency hypothesis or the validity hypothesis. Given the evidence of correlation | |
between grades and CEI scores and the lack of clear indication that the validity hypothesis | |
is more accurate, colleges and universities should consider potential actions to mitigate the | |
potential for various forms of bias in CEIs. We propose the following to continue efforts to | |
assess this relationship and mitigate its potential impact: (1) ensuring quality design of the | |
instrument, (2) attention to qualitative items on CEIs, (3) university level internal analyses | |
to identify (and address) potential biases and validity issues, (4) consideration of a portfolio approach to instructor evaluation, and (5) increased efforts to tease out the nature of | |
the relationship in future research. | |
In this section, we aim to provide examples and pratical techniques that schools can use | |
to improve the objectivity and informativeness of teaching evaluations. Particularly, we | |
tailor the recommendation section for schools and institutions that are going through a | |
CEI adoption or revision process. | |
5.1. Quality instrument design | |
First and foremost, it is imperative that colleges and universities review CEIs and design or | |
adopt a quality instrument. While instrument design alone cannot alleviate student biases, | |
a poorly designed instrument can exacerbate such biases. In particular, we advocate for | |
clarity in the wording of the items on CEIs and a clear separation of instructor versus | |
course questions to help avoid the exacerbation of biases. Item clarity is important to | |
reduce misinterpretation of items. While items should be broad enough to refer to all | |
types of courses and instructors, clear and directed questions will give the respondent | |
something specific to reflect upon. | |
Given that many instructors do not have complete control over course characteristics, | |
we also advocate for a clear separation of items focused on the instructor versus those | |
focused on the course. Based on this analysis of prior studies, it is clear that student | |
course ratings are determined by multiple variables beyond the instructor’s teaching performance, such as course characteristics, course grade, student qualities, and student | |
biases. Among these factors, course characteristics, which are frequently not under an | |
individual instructor’s control, have considerable impact on the student’s perception of | |
the course. This problem is particularly common when multiple sections of the same | |
course are taught by different instructors while the textbook, course syllabus, exams, | |
and other materials are all designed by one faculty member or a small group. In such | |
cases, instructors tend to have limited freedom in choosing course content or structure, | |
but these factors are still counted into the instructor’s evaluation. | |
To separate the uncontrollable factors from instructor effectiveness, universities can | |
design the course evaluation questionnaire to improve item clarity and reduce response | |
bias. For example, we recommend presenting questions related to ‘Evaluation of | |
TEACHING IN HIGHER EDUCATION | |
17 | |
Instructor’ and questions related to ‘Evaluation of Course’ separately to students. In cases | |
where a faculty member has little to no control over course content and design, the ‘Evaluation of Instructor’ items provide a more objective valuation on the instructor’s teaching | |
quality for hiring, tenure and promotion purposes. The ‘Evaluation of Course’ provides | |
insights on both course-level pedagogy and program-level curriculum, and can be used | |
by faculty members to improve and enhance their teaching skills. Below is a sampled | |
CEI from a business school located in Boston, MA. | |
Evaluation of Instructor | |
The instructor was well prepared and organized for class. | |
The instructor communicated information effectively. | |
The instructor promoted useful classroom discussions, as appropriate for the course. | |
The instructor demonstrated the importance of the subject matter. | |
The instructor provided timely and useful feedback. | |
The instructor was responsive to students outside the classroom. | |
Overall rating of this instructor. | |
Evaluation of Course | |
The syllabus clearly described the goals, content, and requirements of the course. | |
The course materials, assigned text(s), and/or other resources helped me understand concepts and ideas related to the | |
course. | |
The workload for this course (reading, assignments, papers, homework, etc.) was manageable given the subject matter | |
and course level. | |
Assignments (exams, quizzes, papers, etc.) adequately reflected course concepts. | |
Overall rating of this course. | |
5.2. Attention to qualitative items on CEIs | |
One disadvantage of quantitative CEI (scaled) questions is that the questions are specifically pre-designed, and the dimensions covered might be somewhat narrow. Qualitative | |
evaluation questions provide students opportunities to provide in-depth feedback on | |
broader dimensions, resulting in an extensive examination of the student experience | |
(Steyn, Davies, and Sambo 2019). Consistent with this argument, Sherry, Fulford, and | |
Zhang (1998) examine the accuracy, utility, and feasibility of both quantitative and qualitative evaluation approaches and find that both approaches efficiently capture the aspects | |
of the instructional climate. Grebennikov and Shah (2013) also focus on the use of qualitative evaluation feedback from students and find efficient use of student qualitative feedback and timely response to it helps increase student satisfaction and retention. | |
5.3. University-level analyses | |
The complexity of the literature, variety of findings, and heterogeneity of CEIs themselves | |
suggest that colleges and universities may wish to examine these questions internally to | |
evaluate the validity of their own instruments as a measure of teaching effectiveness, | |
assess the impact of grading policies, and identify potential biases. | |
While the literature provides mixed findings on the validity/leniency approach, universities often have thousands of data points they could use to conduct internal analyses of | |
CEI scores. With the variation in CEIs, grading scales, and other confounding factors | |
across universities, internal analyses could provide clearer evidence of the state of the | |
student grade- CEI score relationship as it exists in a particular university. In particular, | |
analyzing the grade-CEI score relationship for specific faculty members’ courses over | |
18 | |
G. WANG AND A. WILLIAMSON | |
time would control for teaching quality to some extent, especially if there is sufficient data | |
to analyze by specific course or type of course given that teaching quality could easily vary | |
based on a faculty member’s expertise in a particular course topic. | |
As an extension of this, universities could also consider adoption of a relative performance approach to mitigate the effects of teaching courses or disciplines that typically result | |
in lower course averages, such as more quantitatively focused courses or particularly challenging first year courses. A relative performance approach that compares the ratings of | |
the instructors with others who teach the same or similar courses, or at least within the | |
same discipline, can help reduce student grade effects on CEI scores. | |
5.4. Consideration of a portfolio approach to expand the measures of teaching | |
quality | |
The prior recommendations focus on the CEI itself and ways it can be designed or analyzed to mitigate the potential for student grades to drive CEI scores inappropriately. | |
The extent to which the instruments themselves, both quantitative and qualitative | |
items, can do this, however is limited. Thus, we also recommend that schools consider | |
a portfolio approach to expand measures of teaching quality, particularly in cases where | |
the internal analyses recommended above suggest significant student biases or the | |
ability for instructors to effectively ‘buy’ grades. | |
A portfolio approach is based on a combination of measures, such as student evaluations, peer evaluations, chair evaluations, and self-evaluations. Portfolio approaches are | |
well discussed in the current literature (Mullens et al. 1999; Laverie 2002; Berk 2005; | |
Chism and Banta 2007) and the details of such an approach are beyond the scope of | |
this article, so we will limit our discussion. As examples, Berk (2005) discusses some | |
potential sources of evidence of teaching effectiveness including student ratings, peer | |
ratings, self-evaluation, student interviews, alumni ratings, teaching scholarship, learnings | |
outcome measures, etc. While portfolio approaches cannot alleviate any student grade | |
biases in CEI scores, they allow for alternative measures of teaching effectiveness to | |
provide a more holistic evaluation. | |
While some of this information is more difficult to collect than others, these different | |
sources of information focus on different aspects of teaching effectiveness. The instructor’s | |
self-evaluation may provide informal evidence of teaching performance. Information provided by the department chair or course coordinator may highlight the instructor’s compliance with the internal policies and procedures. Colleagues who have expertise in the | |
discipline can provide important feedback through classroom visits or course material | |
reviews. Schools and departments can randomly select courses and solicit evaluations | |
from external professors in the same field. This approach allows the school and department to evaluate the teacher’s teaching skills from an educator’s point of view in addition | |
to the recipients’ (students’) perspectives, but there are recognizably more resource allocation costs involved with such an approach. | |
It is also important to recognize and balance the benefits and caveats for different types of | |
peer review. For instance, internal reviewers have a good understanding of schools’ institutional backgrounds, but may feel social pressure to overpraise the reviewees or understate | |
the concerns. Institutions need to balance the benefits and costs associated with these | |
different approaches and such debate might be further explored the future research. | |
TEACHING IN HIGHER EDUCATION | |
19 | |
5.5. Further research | |
Finally, give the literature’s mixed findings and continued debate over the relationship | |
between student grades and CEI scores, most notably whether or not such relationships | |
are causal, there is a strong need for continued research in this area, specifically targeted | |
to teasing out the nature of the relationship. As the discussion above demonstrates, it is not | |
sufficient to argue that there is a relationship between student grades and CEI scores, if the | |
argument can also be made that effective teaching leads to increased student grades. What | |
we really want to know is the extent to which student grades or expected grades bias | |
student evaluations of instructors. | |
Given that universities across the nation are already collecting troves of CEI data, the | |
real need is for strong methodologists to design studies that can better determine or refute | |
claims to causality. This is not to suggest that there are not challenges involved. Anonymity is a critical nature of CEIs, so disaggregating data to the individual student level is problematic, but with the increase in online CEI distribution, it may be increasingly possible | |
to do so. On a related note, further refinement of effective quantitative measurements and | |
analyses of CEI scores is advised. While the challenges of finding adequate proxies for | |
student learning is clear, additional efforts on this front are worthwhile, as the critical | |
nature of CEIs in higher education should not be underestimated. | |
Our study also suggests the need for more qualitative research in this area, as most prior | |
research in this stream of literature uses quantitative research designs such as correlation | |
tests or multivariate regression. Qualitative research, such as focus group interviews or | |
quasi-experiments, will provide valuable insights on how these course evaluation questions are truly perceived by students. Future studies can also investigate students’ judgments and decision making with regard to their responses to the quantitative CEI | |
questions. Such a study would help CEI designers to better align CEI questions with students’ perceptions of their own learning. | |
6. Conclusions and remarks | |
As described at length above, prior research has provided ample evidence on the relationship between CEI scores and various instructor and course factors, including grades and | |
many other characteristics. However, most previous literature either focuses on a single | |
study, or takes a broad look at the extensive factors that play a role, without fully unpacking particular relationships. In addition, there hasn’t been any conceptual framework that | |
synthesizes these existing theories and research findings. This article seeks to fill that gap, | |
focusing on the relationship between student grades and CEI scores, to synthesize the | |
findings to date, assess the leniency and validity hypotheses, identify closely related | |
factors, discuss potential biases, and make practical recommendations for schools and | |
universities. | |
Overall, the literature suggests that course grades are positively correlated with CEI | |
scores, but there is considerably less evidence as to whether that relationship is properly | |
attributed to the leniency hypothesis or the validity hypothesis. In this paper, we survey | |
28 prior studies and discuss the impact of course grades on course evaluation scores. | |
We specifically explore the leniency hypothesis, which posits that students give higher | |
CEI scores to instructors from whom they receive higher grades, and the validity | |
20 | |
G. WANG AND A. WILLIAMSON | |
hypothesis, which posits that instructors who teach more effectively receive better evaluation scores because their students learn more and therefore earn higher grades. Our | |
review reveals that existing research focuses more on the extent of the relationship than | |
the nature of that relationship. The empiricial studies that do assess this, however, tend | |
to be more consistent with the leniency hypothesis. | |
One of the major implications of these findings is that colleges and universities should | |
be thoughtful about their reliance on CEI scores in the broader faculty evaluation process | |
and consider a variety of approaches to meet their needs. To address these serious limitations on CEI and provide a more objective evaluation of the instructor’s teaching quality, | |
we propose five recommendations: quality design of the instrument, attention to qualitative items, university level internal analyses, a portfolio approach to instructor evaluation, | |
and increased efforts to tease out the nature of the relationship in future research. | |
In addition, as shown in Figure 1, this study proposes a conceptual framework that | |
illustrates the relationships between actual teaching quality and CEI scores, and suggests | |
where confounding factors may play a role. While we are trying to focus on one specific | |
relationship between CEI scores and grades, we believe that a broad overview of the evaluation-teaching quality relationships is informative to the readers of this study. The | |
poposed framework lays the groundwork for future research regarding the potential confounding factors and omitted variables which may significantly deteriorate the informativeness of the CEI score. | |
Note | |
1. The keywords include teaching evaluation, course evaluation, student evaluation, student | |
feedback, student perception, and student rating. | |
Disclosure statement | |
No potential conflict of interest was reported by the author(s). | |
References | |
Arnold, I. J. M. 2009. “Do Examinations Influence Student Evaluations.” International Journal of | |
Educational Research 48 (4): 215–224. | |
Barth, M. M. 2008. “Deciphering Student Evaluations of Teaching: A Factor Analysis Approach.” | |
Journal of Education for Business 84 (1): 40–46. | |
Bausell, R. B., and J. Magoon. 1972. “Expected Grade in a Course, Grade Point Average, and Student | |
Ratings of the Course and the Instructor.” Educational and Psychological Measurement 32 (4): | |
1013–1023. | |
Becker, G. S. 1974. “A Theory of Social Interactions.” Journal of Political Economy 82 (6): 1063– | |
1093. | |
Beleche, T., D. Fairris, and M. Marks. 2012. “Do Course Evaluations Truly Reflect Student | |
Learning? Evidence From an Objectively Graded Post-Test.” Economics of Education Review | |
31 (5): 709–719. | |
Benton, S. 2011. “Using Student Course Evaluations to Design Faculty Development Workshops.” | |
Academy of Educational Leadership Journal 15 (2): 41–53. | |
Benton, S., and W. E. Cashin. 2012. Student Ratings of Teaching: A Summary of Research and | |
Literature. IDEA Paper No. 50. | |
TEACHING IN HIGHER EDUCATION | |
21 | |
Berk, R. A. 2005. “Survey of 12 Strategies to Measure Teaching Effectiveness.” International Journal | |
of Teaching and Learning in Higher Education 17 (1): 48–62. | |
Braga, M., M. Paccagnella, and M. Pellizzari. 2014. “Evaluating Students’ Evaluations of Professors.” | |
Economics of Education Review 41 (August): 71–88. | |
Brockx, B., P. Spooren, and D. Mortelmans. 2011. “Taking the Grading Leniency Story to the Edge. | |
The Influence of Student, Teacher, and Course Characteristics on Student Evaluations of | |
Teaching in Higher Education.” Educational Assessment, Evaluation and Accountability 23 | |
(4): 289–306. | |
Butcher, K. F., P. J. McEwan, and A. Weerapana. 2014. “The Effects of an Anti-Grade Inflation | |
Policy at Wellesley College.” Journal of Economic Perspectives 28 (3): 189–204. | |
Catano, V., and S. Harvey. 2011. Student Perception of Teaching Effectiveness: Development and | |
Validation of the Evaluation of Teaching Competencies Scale (ETCS). Halifax, Nova Scotia, | |
QC, Canada: Routeledge. | |
Centra, J. A. 2003. “Will Teachers Receive Higher Student Evaluations by Giving Higher Grades | |
and Less Course Work?” Research in Higher Education 44 (5): 495–518. | |
Chism, N. V. N., and T. W. Banta. 2007. “Enhancing Institutional Assessment Efforts Through | |
Qualitative Methods.” New Directions for Institutional Research 136 (winter): 15–28. | |
Chulkov, D. V., and J. V. Alstine. 2011. “Challenges in Designing Student Teaching Evaluations in a | |
Business.” International Journal of Educational Management 26 (2): 162–174. | |
Clayson, D. E. 2009. “Student Evaluations of Teaching: Are They Related to What Students Learn: A | |
Meta-Analysis and Review of the Literature.” Journal of Marketing Education 31 (1): 16–30. | |
Clayson, D. E., and M. J. Sheffet. 2006. “Personality and the Student Evaluation of Teaching.” | |
Journal of Marketing Education 28 (2): 149–160. | |
Dodeen, H. 2013. “Validity, Reliability, and Potential Bias of Short Forms of Students’ Evaluation of | |
Teaching: The Case of UAE University.” Educational Assessment 18 (4): 235–250. | |
Driscoll, J., and D. Cadden. 2010. “Student Evaluation Instruments: The Interactive Impact of | |
Course Requirements, Student Level, Department and Anticipated Grade.” American Journal | |
of Business Education 3 (5): 21–30. | |
Ewing, A. M. 2012. “Estimating the Impact of Relative Expected Grade on Student Evaluations of | |
Teachers.” Economics of Education Review 31: 141–154. | |
Feldman, K. A. 1978. “Course Characteristics and Variability Among College Students in | |
Rating Their Teachers and Courses: A Review and Analysis.” Research in Higher Education 9: | |
199–242. | |
Gorry, D. 2017. “The Impact of Grade Ceilings on Student Grades and Course Evaluations: | |
Evidence from a Policy Change.” Economics of Education Review 56 (February): 133–140. | |
Grebennikov, L., and M. Shah. 2013. “Student Voice: Using Qualitative Feedback from Students to | |
Enhance Their University Experience.” Teaching in Higher Education 18 (6): 606–618. | |
Greenwald, A. G., and G. M. Gillmore. 1997a. “Grading Leniency is a Removable Contaminant of | |
Student Ratings.” American Psychologist 52 (11): 1209–1217. | |
Greenwald, A. G., and G. M. Gillmore. 1997b. “No Pain, no Gain? The Importance of Measuring | |
Course Workload in Student Ratings of Instruction.” Journal of Educational Psychology 89 (4): | |
743–751. | |
Handelsman, M. M., W. L. Briggs, N. Sullivan, and A. Towler. 2005. “A Measure of College Student | |
Course Engagement.” The Journal of Educational Research 98 (3): 184–192. | |
Hoefer, P., J. Yurkiewicz, and J. C. Byrne. 2012. “The Association between Students’ Evaluation of | |
Teaching and Grades.” Decision Sciences Journal of Innovative Education 10 (3): 447–459. | |
Isely, P., and H. Singh. 2005. “Do Higher Grades Lead to Favorable Student Evaluations?.” The | |
Journal of Economic Education 36 (1): 29–42. | |
Kherfi, S. 2011. “Whose Opinion is it Anyway? Determinants of Participation in Student Evaluation | |
of Teaching.” The Journal of Economic Education 42 (2): 19–30. | |
Krautmann, A. C., and W. Sander. 1999. “Grades and Student Evaluations of Teachers.” Economics | |
of Education Review 18 (1): 59–63. | |
22 | |
G. WANG AND A. WILLIAMSON | |
Kulik, J. A. 2001. “Student Ratings: Validity, Utility, and Controversy.” In The Student Ratings | |
Debate: Are They Valid? how can we Best use Them? Vol. 2001., edited by Michael Theall, | |
Philip C. Abrami, and Lisa A. Mets, 9–25. | |
Laverie, D. A. 2002. “Improving Teaching Through Improving Evaluation: A Guide to Course | |
Portfolios.” Journal of Marketing Education 24 (2): 104–113. | |
Love, D. A., and M. J. Kotchen. 2010. “Grades, Course Evaluations, and Academic Incentives.” | |
Eastern Economic Journal 36 (2): 151–163. | |
Marsh, H. W. 1987. “Students’ Evaluations of University Teaching: Research Findings, | |
Methodological Issues, and Directions for Future Research.” International Journal of | |
Educational Research 11 (3): 253–388. | |
Marsh, H. W., and L. A. Roche. 2000. “Effects of Grading Leniency and Low Workload on Students’ | |
Evaluations of Teaching: Popular Myth, Bias, Validity, or Innocent Bystanders?” Journal of | |
Educational Psychology 92 (1): 202–228. | |
Matos-Díaz, H., and J. R. Ragan Jr. 2010. “Do Student Evaluations of Teaching Depend on the | |
Distribution of Expected Grade?” Education Economics 18 (3): 317–330. | |
McPherson, M. A. 2006. “Determinants of how Students Evaluate Teachers.” The Journal of | |
Economic Education 37 (1): 3–20. | |
Millea, M., and P. W. Grimes. 2002. “Grade Expectations and Student Evaluation of Teaching.” | |
College Student Journal 36 (4): 582–590. | |
Mullens, J., M. S. Leighton, K. G. Laguarda, and E. O’Brian. 1999. Student Learning, Teaching | |
Quality, and Professional Development: Theoretical Linkages, Current Measurement, and | |
Recommendations for Future Data Collection. Working paper. | |
Nelson, J. P., and K. Lynch. 1984. “Grade Inflation, Real Income, Simultaneity, and Teaching | |
Evaluations.” The Journal of Economic Education 15 (1): 21–37. | |
Nowell, C. 2007. “The Impact of Relative Grade Expectations on Student Evaluation of Teaching.” | |
International Review of Economics Education 6 (2): 42–56. | |
Paswan, A. K., and J. A. Young. 2002. “Student Evaluation of Instructor: A Nomological | |
Investigation Using Structural Equation Modeling.” Journal of Marketing Education 24 (3): | |
193–202. | |
Remedios, R., and D. A. Lieberman. 2008. “I Liked Your Course Because you Taught me Well: The | |
Influence of Grades, Workload, Expectations and Goals on Students’ Evaluations of Teaching.” | |
British Educational Research Journal 34 (1): 91–115. | |
Sherry, A. C., C. Fulford, and S. Zhang. 1998. “Assessing Distance Learners’ Satisfaction with | |
Instruction: A Quantitative and a Qualitative Measure.” American Journal of Distance | |
Education 12 (3): 4–28. | |
Steyn, C., D. Davies, and A. Sambo. 2019. “Eliciting Student Feedback for Course Development: the | |
Application of a Qualitative Course Evaluation Tool among Business Research Students.” | |
Assessment and Evaluation in Higher Education 44 (1): 11–24. | |
Stumpf, S. A., and R. D. Freedman. 1979. “Expected Grade Covariation with Student Ratings of | |
Instruction: Individual Versus Class Effects.” Journal of Educational Psychology 71 (3): 293–302. | |
Uttl, B., C. A. White, and D. W. Gonzales. 2017. “Meta-analysis of Faculty’s Teaching Effectiveness: | |
Student Evaluation of Teaching Ratings and Student Learning are not Related.” Studies in | |
Educational Evaluation 54 (1): 22–42. | |
VanMaaren, V. G., C. M. Jaquett, and R. L. Williams. 2016. “Factors Most Likely to Contribute to | |
Positive Course Evaluations.” Innovative Higher Education 41 (5): 425–440. | |
Weinberg, B. A., M. Hashimoto, and B. M. Fleisher. 2009. “Evaluating Teaching in Higher | |
Education.” Journal of Economic Education 40 (3): 227–261. | |
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/259503622 | |
Appropriate and inappropriate uses of students' assessment of instruction | |
Article · January 2013 | |
CITATIONS | |
READS | |
0 | |
537 | |
1 author: | |
David M. McCord | |
Western Carolina University | |
78 PUBLICATIONS 871 CITATIONS | |
SEE PROFILE | |
Some of the authors of this publication are also working on these related projects: | |
Thesis Pilot Study View project | |
All content following this page was uploaded by David M. McCord on 15 March 2014. | |
The user has requested enhancement of the downloaded file. | |
View publication stats | |
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232718162 | |
Top five flashpoints in the assessment of teaching effectiveness | |
Article in Medical Teacher · October 2012 | |
DOI: 10.3109/0142159X.2012.732247 · Source: PubMed | |
CITATIONS | |
READS | |
39 | |
524 | |
1 author: | |
Ronald Alan Berk | |
Johns Hopkins University | |
76 PUBLICATIONS 3,502 CITATIONS | |
SEE PROFILE | |
Some of the authors of this publication are also working on these related projects: | |
Microaggressions in the academic workplace and classroom View project | |
All content following this page was uploaded by Ronald Alan Berk on 21 August 2016. | |
The user has requested enhancement of the downloaded file. | |
2013; 35: 15–26 | |
Top five flashpoints in the assessment of | |
teaching effectiveness | |
RONALD A. BERK | |
Johns Hopkins University, USA | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Abstract | |
Background: Despite thousands of publications over the past 90 years on the assessment of teaching effectiveness, there is still | |
confusion, misunderstanding, and hand-to-hand combat on several topics that seem to pop up over and over again on listservs, | |
blogs, articles, books, and medical education/teaching conference programs. If you are measuring teaching performance in | |
face-to-face, blended/hybrid, or online courses, then you are probably struggling with one or more of these topics or flashpoints. | |
Aim: To decrease the popping and struggling by providing a state-of-the-art update of research and practices and a ‘‘consumer’s | |
guide to trouble-shooting these flashpoints.’’ | |
Methods: Five flashpoints are defined, the salient issues and research described, and, finally, specific, concrete recommendations | |
for moving forward are proffered. Those flashpoints are: (1) student ratings vs. multiple sources of evidence; (2) sources | |
of evidence vs. decisions: which come first?’ (3) quality of ‘‘home-grown’’ rating scales vs. commercially-developed scales; | |
(4) paper-and-pencil vs. online scale administration; and (5) standardized vs. unstandardized online scale administrations. The first | |
three relate to the sources of evidence chosen and the last two pertain to online administration issues. | |
Results: Many medical schools/colleges and higher education in general fall far short of their potential and the available | |
technology to comprehensively assess teaching effectiveness. Specific recommendations were given to improve the quality and | |
variety of the sources of evidence used for formative and summative decisions and their administration procedures. | |
Conclusions: Multiple sources of evidence collected through online administration, when possible, can furnish a solid foundation | |
from which to infer teaching effectiveness and contribute to fair and equitable decisions about faculty contract renewal, merit pay, | |
and promotion and tenure. | |
Introduction | |
FLASHPOINT: a critical stage in a process, trouble | |
spot, discordant topic, or lowest temperature at | |
which a flammable liquid will give off enough | |
vapor to ignite. | |
If you have read any of my previous articles, you know I have | |
given off buckets of vapor. For you language scholars, | |
‘‘flashpoint’’ is derived from two Latin words, ‘‘flashus,’’ | |
meaning ‘‘your shorts,’’ and ‘‘pointum,’’ meaning, ‘‘are on fire.’’ | |
Why flashpoints? | |
This article is not another review of the research on student | |
ratings. It is a state-of-the-art update of research and practices, | |
primarily since 2006 (Berk 2006; Seldin & Associates 2006; | |
Arreola 2007), with specific TARGETS: the flashpoints that have | |
emerged, which are critical issues, conflicts, contentious | |
problems, and volatile hot buttons in the assessment of | |
teaching effectiveness. They are the most prickly, thorny, | |
vexing, and knotty topics that every medical school/college | |
and institution in higher education must confront. | |
These flashpoints cause confusion, misunderstanding, dissension, hand-to-hand combat, and, ultimately, inaccurate and | |
Practice points | |
. Polish your student rating scale, but start building | |
multiple sources of evidence to assess teaching | |
effectiveness. | |
. Match your highest quality sources to the specific | |
formative and summative decisions using the 360 MSF | |
model. | |
. Review current measures of teaching effectiveness with | |
your faculty and plan specifically how you can improve | |
their psychometric quality. | |
. Design an online administration system in-house or outhouse with a vendor to conduct the administration and | |
score reporting. | |
. Standardize directions, administration procedures, and a | |
narrow window for completion of your student rating | |
scale and other measures of teaching effectiveness. | |
unfair decisions about faculty. Although there are many more | |
than five in this percolating cauldron of controversy, the ones | |
tackled here seem to pop up over and over again on listservs, | |
blogs, articles, books, and medical education/teaching conference programs, plus they generate a firestorm of debate by | |
Correspondence: R.A. Berk, Johns Hopkins University, 10971 Swansfield Road, Columbia, MD 21044, USA. Tel: þ1 410 9407118; fax: þ1 206 | |
3091618; email: [email protected] | |
ISSN 0142–159X print/ISSN 1466–187X online/13/010015–12 ß 2013 Informa UK Ltd. | |
DOI: 10.3109/0142159X.2012.732247 | |
15 | |
R. A. Berk | |
faculty and administrators more than others. This contribution | |
is an attempt to decrease some of that percolating and | |
popping. | |
Trouble-shooting flashpoints | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
If you are currently using any instrument to measure teaching | |
performance in face-to-face, blended/hybrid, or online | |
courses, then you are probably struggling with one or more | |
flashpoints. This article is a ‘‘consumer’s guide to troubleshooting these flashpoints.’’ The motto of this article is: ‘‘Get to | |
the flashpoint and the solution.’’ | |
This is the inauguration of my new PBW series on problembased writing. Your problems are the foci of my writing. The | |
structure of each section will be governed by the PBW | |
perspective: | |
(1) | |
(2) | |
(3) | |
Definition: Each flashpoint will be succinctly defined. | |
Options: The options available based on research and | |
practice will be described. | |
Recommended Solution: Specific, concrete recommendations for faculty and administrators will be proffered | |
to move them to action. | |
There does not seem to be any short-cut, quick fix, or multilevel marketing scheme to improve the quality of teaching. | |
Tackling these flashpoints head-on will hopefully be one | |
positive step toward that improvement. | |
The top five flashpoints are: (1) student ratings vs. multiple | |
sources of evidence; (2) sources of evidence vs. decisions: | |
which come first?; (3) quality of ‘‘home-grown’’ rating scales | |
vs. commercially-developed scales; (4) paper-and-pencil vs. | |
online scale administration; and (5) standardized vs. unstandardized online scale administration. The first three relate to | |
critical decisions about the sources of evidence chosen and the | |
last two pertain to online scale administration issues. | |
Top five flashpoints | |
Student ratings vs. multiple sources of evidence | |
FLASHPOINT 1: Student rating scales have dominated as the primary or, usually, the only measure of | |
teaching effectiveness in medical schools/colleges | |
and universities worldwide and in a few remote | |
planets. This state of practice is contrary to the advice | |
of a cadre of experts and the limitations of student | |
input to comprehensively evaluate teaching effectiveness. Several other measures should be used in | |
conjunction with student ratings. | |
Student ratings. Historically, student rating scales have been | |
the primary measure of teaching effectiveness for the past 50 | |
years. Students have had a critical role in the teaching–learning | |
feedback system. The input from their ratings in summative | |
decision making has been recommended on an international | |
level (Strategy Group 2011; Surgenor 2011). | |
There are nearly 2000 references on the topic (Benton & | |
Cashin 2012) with the first journal article published 90 years | |
ago (Freyd 1923). There is more research and experience in | |
16 | |
higher education with student ratings than with all of the other | |
measures of teaching effectiveness combined (Berk 2005, | |
2006). If you need to be brought up to speed quickly with the | |
research on student ratings, check out these up-to-date | |
reviews (Gravestock & Gregor-Greenleaf 2008; Benton & | |
Cashin 2012; Kite 2012). | |
Unfortunately, in medical/healthcare education, student | |
ratings have not received the same level of research attention. | |
There is only a sprinkling of studies over the last 20 years | |
(e.g., Hoeks & van Rossum 1988; Jones & Froom 1994; Mazor | |
et al. 1999; Elzubeir & Rizk 2002; Barnett et al. 2003; Kidd & | |
Latif 2004; Pierre et al. 2004; Turhan et al. 2005; Maker et al. | |
2006; Ahmady et al. 2009; Barnett & Matthews 2009; Berk | |
2009a; Chenot et al. 2009; Donnon et al. 2010; Boerboom et al. | |
2012; Stalmeijer et al. 2010). There is far more research on peer | |
observation (e.g., Berk et al. 2004; Siddiqui et al. 2007; Wellein | |
et al. 2009; DiVall et al. 2012; Pattison et al. 2012; Sullivan et al. | |
2012). There are also a few qualitative studies that are | |
peripherally related (Stark 2003; Steinert 2004; Martens et al. | |
2009; Schiekirka et al. 2012). | |
With this volume of scholarly productivity and practice in | |
academia worldwide, student ratings seem like the solution to | |
assessing teaching effectiveness in medical/healthcare education and higher education in general. So, what is the problem? | |
Limitations of student ratings. As informative as student | |
ratings can be about teaching, there are numerous behaviors | |
and skills defining teaching effectiveness which students are | |
NOT qualified to rate, such as a professor’s knowledge and | |
content expertise, teaching methods, use of technology, | |
course materials, assessment instruments, and grading practices (Cohen & McKeachie 1980; Calderon et al. 1996; | |
d’Apollonia & Abrami 1997a; Ali & Sell 1998; Green et al. | |
1998; Hoyt & Pallett 1999; Coren 2001; Ory & Ryan 2001; | |
Theall & Franklin 2001; Marsh 2007; Svinicki & McKeachie | |
2011). Students can provide feedback at a certain level in each | |
of those areas, but it will take peers and other qualified | |
professionals to rate those skills in depth. BOTTOM LINE: | |
Student ratings from well-constructed scales are a necessary, | |
but not sufficient, source of evidence to comprehensively assess | |
teaching effectiveness. | |
Student ratings provide only one portion of the information | |
needed to infer teaching effectiveness. Yet, that is pretty much | |
all that is available at most institutions. When those ratings | |
alone are used for decision making, they will be incomplete | |
and biased. Without additional evidence of teaching effectiveness, student ratings can lead to incorrect and unfair career | |
decisions about faculty that can affect their contract renewal, | |
annual salary increase, and promotion and tenure. | |
It is the process of evaluation or assessment that permits | |
several sources of appropriate evidence to be collected | |
for the purpose of decision making. Assessment is a | |
‘‘systematic method of obtaining information from [scales] | |
and other sources, used to draw inferences about characteristics of people, objects, or programs,’’ according to | |
the US Standards for Educational and Psychological | |
Testing (AERA, APA, & NCME Joint Committee on | |
Standards 1999, p. 272). Student ratings represent one | |
measure and just one source of information in that process. | |
Flashpoints in teaching effectiveness | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Multiple sources of evidence. Over the past decade, there has | |
been a trend toward augmenting student ratings with other | |
data sources of teaching effectiveness. Such sources can serve | |
to broaden and deepen the evidence base used to assess | |
courses and the quality of teaching (Theall & Franklin 1990; | |
Braskamp & Ory 1994; Hoyt & Pallett 1999; Knapper & | |
Cranton 2001; Ory 2001; Cashin 2003; Berk 2005, 2006; Seldin | |
2006; Arreola 2007; Theall & Feldman 2007; Gravestock & | |
Gregor-Greenleaf 2008; Benton & Cashin 2012). In fact, several | |
comprehensive models of ‘‘faculty evaluation’’ have been | |
proposed (Centra 1993; Braskamp & Ory 1994; Berk 2006, | |
2009a; Arreola 2007; Gravestock & Gregor-Greenleaf 2008), | |
which include multiple sources of evidence with some models | |
attaching greater weight to student and peer ratings and less | |
weight to self-, administrator, and alumni ratings, and other | |
sources. All of these models are used to arrive at formative and | |
summative decisions. | |
15 Sources. There are 15 potential sources of evidence of | |
teaching effectiveness: (1) student ratings; (2) peer observations; (3) peer review of course materials; (4) external expert | |
ratings; (5) self-ratings; (6) videos; (7) student interviews; | |
(8) exit and alumni ratings; (9) employer ratings; (10) mentor’s | |
advice; (11) administrator ratings; (12) teaching scholarship; | |
(13) teaching awards; (14) learning outcome measures; and | |
(15) teaching (course) portfolio. Berk (2006) described several | |
major characteristics of each source, including type of measure | |
needed to gather the evidence, the person(s) responsible for | |
providing the evidence (students, peers, external experts, | |
mentors, instructors, or administrators), the person or committee who uses the evidence, and the decision(s) typically | |
rendered based on that data (formative, summative, or | |
program). He also critically examined the value and contribution of these sources for teaching effectiveness based on the | |
current state of research and practice. His latest recommendations will be presented in Flashpoint 2. | |
Triangulation. Much has been written about the merits and | |
shortcomings of these various sources of evidence (Berk 2005, | |
2006). Put simply: There is no perfect source or combination of | |
sources. Each source can supply unique information, but also | |
is fallible, usually in a way different from the other sources. For | |
example, the unreliability and biases of peer ratings are not the | |
same as those of student ratings; student ratings have other | |
weaknesses. By drawing on three or more different sources of | |
evidence, you can leverage the strengths of each source to | |
compensate for weaknesses of the other sources, thereby | |
converging on a decision about teaching effectiveness that is | |
more accurate and reliable than one based on any single | |
source (Appling et al. 2001). This notion of triangulation is | |
derived from a compensatory model of decision making. | |
Given the complexity of measuring the act of teaching in a | |
real-time classroom environment or online course, it is | |
reasonable to expect that multiple sources can provide a | |
more accurate, reliable, and comprehensive picture of teaching effectiveness than just one source. However, the decision | |
maker should integrate the information from only those | |
sources for which validity evidence is available (see Standard | |
14.13). The quality of the sources chosen should be beyond | |
reproach, according to the Standards (AERA, APA, & NCME | |
Joint Committee on Standards 1999). | |
Since there is not enough experience with multiple sources, | |
there is a scarcity of empirical evidence to support the use of | |
any particular combination of sources (e.g., Barnett et al. 2003; | |
Stalmeijer et al. 2010; Stehle et al. 2012). There are a few | |
surveys of the frequency of use of individual sources (Seldin | |
1999; Barnett & Matthews 2009). Research is needed on | |
various combinations of measures for different decisions to | |
determine ‘‘best practices.’’ | |
Recommendations. All experts on faculty evaluation recommend multiple sources of evidence to assess teaching effectiveness. Beyond student ratings, is it worth the extra effort, | |
time, and cost to develop the additional measures suggested in | |
this section? Just what new information do you have to gain? | |
As those instruments are being built, it should become clear | |
that they are intended to measure different teaching behaviors | |
that contribute to teaching effectiveness. Each measure should | |
bite off a separate chunk of behaviors. They should be | |
designed to be complementary, not redundant, although there | |
may be justification for some overlap for corroboration. | |
There is even research evidence on the relationships | |
between student ratings and several other measures to support | |
their complementarity. Benton and Cashin’s (2012) research | |
review reported the following validity coefficients with student | |
ratings: trained observers (0.50 with global ratings), self (0.30– | |
0.45), alumni (0.54–0.80), and administrators (0.47–0.62; 0.39 | |
with global ratings). Since 0.50 is only 25% explained variance | |
and 75% unexplained or new information, these coefficients | |
suggest a lot of insight can be gained using observers’, self, and | |
administrators’ ratings as sources of evidence. | |
Sources of evidence vs. decisions: Which come | |
first? | |
FLASHPOINT 2: Rating scales are typically administered and then confusion occurs over what to do | |
with the results and how to interpret them for specific | |
decisions. A better strategy would be to do exactly the | |
opposite of that practice. Spin your head around | |
180 , exorcist style. The decision should drive the | |
selection of the appropriate sources of evidence, the | |
types of data needed for the decision, and the design | |
of the report form. Custom tailor the sources, data, | |
and form to fit the decision. The information and | |
format of the evidence a professor needs to improve his | |
or her teaching are very different from that required | |
by a department chair or associate dean for annual | |
review (contract renewal or merit pay) or by a faculty | |
committee for promotion and tenure review. The | |
sources of evidence and formats of the reports can | |
either hinder or facilitate the decision process. | |
Types of decisions. According to Seldin (1999), teaching is | |
the major criterion (98%) in assessing overall faculty performance in liberal arts colleges compared to student advising | |
(64%), committee work (59%), research (41%), publications | |
(31%), and public service (24%). Although these figures may | |
17 | |
R. A. Berk | |
not hold up in research universities and, specifically, in | |
medical schools/colleges, teaching didactic, and/or clinical | |
courses is still a critical job requirement and criterion on which | |
most faculty members are assessed. | |
There are two types of individual decisions in faculty | |
assessment with which you may already be familiar in the | |
context of student assessment, plus one decision about | |
programs: | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
(1) | |
(2) | |
(3) | |
Formative decisions. These are decisions faculty make | |
to improve and shape the quality of their teaching. It is | |
based on evidence of teaching effectiveness they gather | |
to plan and revise their teaching semester after semester. This evidence and the subsequent adjustments in | |
teaching can occur anytime during the course, so the | |
students can benefit from those changes, or after the | |
course in preparation for the next course. | |
Summative decisions. These decisions are rendered by | |
the administrative-type person who controls a professor’s destiny and future in higher education. This | |
individual is usually the dean, associate dean, program | |
director, or department head or chair. This administrator uses evidence of a professor’s teaching effectiveness | |
along with other evidence of research, publications, | |
clinical practice, and service to ‘‘sum up’’ his or her | |
overall performance or status to decide about contract | |
renewal or dismissal, annual merit pay, teaching | |
awards, and promotion and tenure. | |
Although promotion and tenure decisions are often | |
made by a faculty committee, a letter of recommendation by the dean is typically required to reach the | |
committee for review. These summative decisions are | |
high-stakes, final employment decisions reached at | |
different points in time to determine a professor’s | |
progression through the ranks and success as an | |
academician. | |
Program decisions. Several sources of evidence can also | |
be used for program decisions, as defined in the | |
Program Evaluation Standards by the US Joint | |
Committee on Standards for Educational Evaluation | |
(Yarbrough et al. 2011). They relate to the curriculum, | |
admissions and graduation requirements, and program effectiveness. They are NOT individual decisions; | |
instead, they focus on processes and products. The | |
evidence usually is derived from various types of faculty | |
and student input and employers’ performance appraisal of students. It is also collected to provide documentation to satisfy the criteria for accreditation review. | |
Matching sources of evidence to decisions. The challenge is | |
to pick the most appropriate and highest quality sources of | |
evidence for the specific decision to be made; that is, match the | |
sources to the decision. The decision drives your choices of | |
evidence. Among the aforementioned 15 sources of evidence of | |
teaching effectiveness, here are my best picks based on the | |
literature for formative, summative, and program decisions: | |
Formative decisions | |
. student ratings, | |
. peer observations, | |
18 | |
. | |
. | |
. | |
. | |
. | |
. | |
peer review of course materials, | |
external expert ratings, | |
self-ratings, | |
videos, | |
student interviews, and | |
mentor’s advice. | |
Summative decisions (annual review for contract renewal | |
and merit pay) | |
. | |
. | |
. | |
. | |
. | |
. | |
student ratings, | |
self-ratings, | |
teaching scholarship, | |
administrator ratings, | |
teaching portfolio (for several courses over the year), | |
peer observation (report written expressly for summative | |
decision), | |
. peer review of course materials (report written expressly for | |
summative decision), and | |
. mentor’s review ( progress report written expressly for | |
summative decision). | |
Summative decisions ( promotion and tenure) | |
. | |
. | |
. | |
. | |
. | |
. | |
. | |
student ratings, | |
self-ratings, | |
teaching scholarship, | |
administrator ratings, | |
teaching portfolio (across several years’ courses), | |
peer review (written expressly for summative decision), and | |
mentor’s review ( progress report written expressly for | |
summative decision). | |
Program decisions | |
. Student ratings | |
. Exit and alumni ratings | |
. Employer ratings | |
The multiple sources identified for each decision can be | |
configured into the 360 multisource feedback (MSF) model of | |
assessment (Berk 2009a, 2009b) or other model for accreditation documentation of teaching assessment. The sources for | |
each decision may be added gradually to the model. This is an | |
on-going process for your institution. | |
Recommendations. So now that you have seen my picks, | |
which sources are you going to choose? So many sources, so | |
little time! Which sources are already available in your | |
department? What is the quality of the measures used to | |
provide evidence of teaching effectiveness? Are the faculty | |
stakeholders involved in the current process? | |
You have some decisions to make. Where do you begin? | |
Here are a few suggestions: | |
(1) | |
(2) | |
Start with student ratings. Consider the content and | |
quality of your current scale and determine whether it | |
needs a minor or major tune-up for the decisions being | |
made. | |
Review the other sources of evidence with your faculty | |
to decide the next steps. Which sources will your | |
faculty embrace which reflect best practices in | |
Flashpoints in teaching effectiveness | |
(3) | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
(4) | |
teaching? Weigh the pluses and minuses of the different | |
sources. | |
Decide which combination of sources is best for your | |
faculty. Identify which sources should be used for both | |
formative and summative decisions, such as self- and | |
peer ratings, and which sources should be used for one | |
type of decision but not the other, such as administrator | |
ratings and teaching portfolio. | |
Map out a plan to build those sources, one at a time, to | |
create an assessment model for each decision (see Berk | |
2009a). | |
Whatever combination of sources you choose to use, take | |
the time and make the effort to design the scales, administer | |
the scales, and report the results appropriately. The accuracy | |
of faculty assessment decisions depends on the integrity of the | |
process and the validity and reliability of the multiple sources | |
of evidence you collect. This endeavor may seem rather | |
formidable, but, keep in mind, you are not alone in this | |
process. Your colleagues at other institutions are probably | |
struggling with the same issues. Maybe you could pool | |
resources. | |
Quality of ‘‘home-grown’’ rating scales vs. | |
commercially-developed scales | |
FLASHPOINT 3: Many of the rating scales developed by faculty committees in medical schools/ | |
colleges and universities do not meet even the most | |
basic criteria for psychometric quality required by | |
professional and legal standards. Most of the scales | |
are flawed internally, administered incorrectly, and | |
rarely is there any evidence of score reliability and | |
validity. The serious concern is that decisions about | |
the careers of faculty are being made with these | |
instruments. | |
Quality control. Researchers have reviewed the quality of | |
student rating scales used by colleges and universities | |
throughout the US and Canada (Berk 1979, 2006; Franklin & | |
Theall 1990; d’Apollonia & Abrami 1997b, 1997c; Seldin 1999; | |
Theall & Franklin 2000; Abrami 2001; Franklin 2001; Ory & | |
Ryan 2001; Arreola 2007; Gravestock & Gregor-Greenleaf | |
2008). The instruments are either commercially developed | |
scales with pre-designed reporting forms or ‘‘home-grown,’’ | |
locally constructed measures built usually by faculty committees. The former exhibit the quality control of the company | |
that developed the scales and reports, such as Educational | |
Testing Service and The IDEA Center (see Flashpoint 4); the | |
latter have no consistency in the development process and | |
rarely any formal procedures for controlling psychometric | |
quality. | |
Quality of ‘‘home-grown’’ scales. That lack of quality control | |
may very well extend to institutions worldwide. It could be | |
due to a lack of commitment, importance, accountability, or | |
interest; inappropriate personnel without the essential skills; or | |
limited resources. No one knows for sure. Regardless of the | |
reason, the picture is ugly. | |
Reviewers of practices at institutions in North America have | |
found the following problems with ‘‘home-grown’’ scales: | |
. | |
. | |
. | |
. | |
. | |
. | |
. | |
. | |
poor or no specifications of teaching behaviors, | |
faulty items (statements and anchors), | |
ambiguous or confusing directions, | |
unstandardized administration procedures, | |
inappropriate data collection, analysis, and reporting, | |
no adjustments in ratings for extraneous factors, | |
no psychometric studies of score reliability and validity, and | |
no guidelines or training for faculty and administrators to | |
use the results correctly for appropriate decisions. | |
Does the term psychometrically putrid summarize current | |
practices? How does your scale stack up against those | |
problems? Fertilizer-wise, ‘‘home-grown’’ scales are not growing. Their development is arrested. They are more like ‘‘Peter | |
Pan scales.’’ | |
The potential negative consequences of using faulty | |
measures to make biased and unfair decisions to guide | |
teaching improvement and faculty careers can be devastating. | |
Moreover, this assessment only addresses the quality of | |
student rating scales. What would be the quality of peer | |
observations, self-ratings, and administrator ratings and their | |
interpretations? Serious attention needs to be devoted to the | |
quality control of all ‘‘home-grown’’ scales. | |
From a broader perspective, poor quality scales violates US | |
testing/scaling standards according to the Standards for | |
Educational and Psychological Testing (AERA, APA, & | |
NCME Joint Committee on Standards 1999), Personnel | |
Evaluation Standards (Joint Committee on Educational | |
Evaluation Standards 2009), and the US Equal Employment | |
Opportunity Commission’s (EEOC) Uniform Guidelines on | |
Employee Selection Procedures (US Equal Employment | |
Opportunity Commission 2010). The psychometric requirements for instruments used for summative ‘‘employment’’ | |
decisions about faculty are rigorous and appropriate for their | |
purposes. | |
Recommendations. This issue reduces to the leadership and | |
the composition of the faculty committee that accepts the | |
responsibility to develop the scales and reports and/or the | |
external consultant or vendor hired to guide the development | |
process. The psychometric standards for the construction, | |
administration, analysis, and interpretation of scales must be | |
articulated and guided by professionals trained in those | |
standards (AERA, APA, & NCME Joint Committee on Standards | |
1999). As Flashpoint 2 emphasized, if the committee does not | |
contain one or more professors with expertise in psychometrics, then it should be ashamed of itself. That is a prescription | |
for putridity and the previous problem list. Reviewers rarely | |
found any one with these skills on the committees of the | |
institutions surveyed. | |
It is also recommended that all faculty members be given | |
workshops on item writing and scale structure. In the | |
development process, they will be reviewing, selecting, | |
critiquing, adapting, and writing items. Even if faculty are | |
excellent test item writers, that does not mean they can write | |
scale items. | |
19 | |
R. A. Berk | |
The structure and criteria for writing scale items are very | |
different from test items (Berk 2006), not difficult, just different. | |
Even with commercially developed instruments, professors are | |
usually given the option to add up to 10 course-specific items; | |
in other words, they will need to write items. Rules for writing | |
scale items are available in references on scale construction | |
(Netemeyer et al. 2003; Dunn-Rankin et al. 2004; Streiner & | |
Norman 2008; Berk 2006; deVellis 2012). | |
catalog of items. These options are listed in order of increasing | |
cost. Depending on in-house resources, it is possible to | |
execute the entire processing in a very cost-effective manner. | |
Alternatively, estimates from a variety of vendors should be | |
obtained for the out-house options. | |
(1) | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Paper-and-pencil vs. online scale administration | |
FLASHPOINT 4: The battle between paper-andpencil versus online administration of student rating | |
scales is still being fought in medical schools and on | |
many campuses worldwide. Despite an international trend and numerous advantages and | |
improvements in online systems over the past | |
decade, there are faculty who still dig their heels in | |
and institutions that have resisted the conversion. | |
Much has been learned about how to increase | |
response rates, which is a flashpoint by itself, and | |
how to overcome many of the deterrents to adopting | |
an online system. Online administration, analysis, | |
and reporting can be executed in-house or by an | |
out-house vendor specializing in that processing. | |
Comparison of paper-and-pencil and online administration. A detailed examination of the advantages and disadvantages of the two modes of administration according to | |
15 key factors has been presented by Berk (2006). There are | |
major differences between them. Although it was concluded | |
that both are far from perfect, the benefits of the online mode | |
and the improvements in the delivery system with the research | |
and experiences over the past few years exceed the pluses of the | |
paper-based mode. Furthermore, most Net Geners do not | |
know what a pencil is. Unless it is an iPencil, it is not on their | |
radar or part of their mode. | |
The benefits of the online mode include ease of administration, administration flextime, low cost, rapid turnaround | |
time for results, ease of scale revision, and higher quality and | |
greater quantity of unstructured responses (Sorenson & | |
Johnson 2003; Anderson et al. 2005; Berk 2006; Liu 2006; | |
Heath et al. 2007). Students’ concerns with lack of anonymity, | |
confidentiality of ratings, inaccessibility, inconvenience, and | |
technical problems have been eliminated at many institutions. | |
Faculty resistance issues of low response rates and negative | |
bias and lower ratings than paper-based version have been | |
addressed (Berk 2006). Two major topics that still need | |
attention are lack of standardization (Flashpoint 5) and | |
response bias, which tends to be the same for both paper | |
and online. | |
Three online delivery options. Online administration, scoring, | |
analysis, and reporting of student ratings can be handled in | |
three ways: (1) in-house by the department of computer | |
services, IT, or equivalent unit; (2) out-house by a vendor that | |
provides all delivery services for the institution’s ‘‘homegrown’’ scale; or (3) out-house by a vendor that provides all | |
services, plus their own scale or a scale you create from their | |
20 | |
(2) | |
In-house administration. If you have developed or | |
plan to develop your own scale, you should consider | |
this option. Convene the key players who can make | |
this happen, including administrators and staff from IT | |
or computer services, faculty development, and a | |
testing center, plus at least one measurement expert. | |
A discussion of scale design, scoring, analysis, report | |
design, and distribution can determine initially | |
whether the resources are available to execute the | |
system. Once a preliminary assessment of the resources | |
required has been completed, costs should be estimated for each phase. A couple of meetings can | |
provide enough information to consider the possibility. | |
Your in-house system components, products, and | |
personnel can then be compared to the two options | |
described next. As you go shopping for an online | |
system, at least you will have done your homework and | |
be able to identify what the commercial vendors offer, | |
including qualitative differences, that you cannot execute yourself. Although the cost could be the dealbreaker, you will know all the options available to | |
make an informed final decision. Further, you can | |
always change your system if your stocks plummet, the | |
in-house operation has too many bumps that cannot be | |
squished and ends up in Neverland, or the commercial | |
services do not deliver as promised. | |
Vendor administration with ‘‘home-grown’’ scale. | |
If outsourcing to a vendor is your preference or you | |
just want to explore this option, but you want to | |
maintain control over your own scale content and | |
structure, there are certain vendors that can online your | |
scale. For some strange reason, they are all located in | |
Madagascar. Kidding. They include CollegeNET (What | |
Do You Think?), ConnectEDU (courseval), and IOTA | |
Solutions (MyClassEvaluation). They will administer | |
your scale online, perform all analyses, and generate | |
reports for different decision makers. Thoroughly | |
compare all of their components with yours. Evaluate | |
the pluses and minuses of each package. | |
Make sure to investigate the compatibility of the | |
packages with your course management system. The | |
choice of the system is crucial to provide the anonymity | |
for students to respond, which can boost response rates | |
(Oliver & Sautter 2005). Most of the vendors’ packages | |
are compatible with Blackboard, WebCT, Moodle, | |
Sakai, and other campus portal systems. | |
There are even free online survey providers, such as | |
Zoomerang (MarketTools 2006), which can be used | |
easily by any instructor without a course management | |
system (Hong 2008). Other online survey software, both | |
free and pay, has been reviewed by Wright (2005). | |
There are specific advantages and disadvantages of the | |
different packages, especially with regard to rating | |
Flashpoints in teaching effectiveness | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
(3) | |
scale structure and reporting score results (Hong 2008). | |
This is a viable online option worth investigating for | |
formative feedback. | |
Vendor administration and rating scale. If you want a | |
vendor to supply the rating scale and all of the delivery | |
services, there are several commercial student rating | |
systems you should consider. Examples include Online | |
Course Evaluation, Student Instructional Report II, | |
Course/Instructor Evaluation Questionnaire, IDEA | |
Student Ratings of Instruction, Student Evaluation of | |
Educational Quality, Instructional Assessment System, | |
and Purdue Instructor Course Evaluation Service. | |
Sample forms and lists of services with prices are | |
given on the websites for these scales. | |
This is the simplest solution to the student rating | |
scale online system: Just go buy one. The seven | |
packages are designed for you, Professor Consumer. | |
The items are professionally developed; the scale has | |
usually undergone extensive psychometric analyses to | |
provide evidence of reliability and validity; and there | |
are a variety of services provided, including the scale, | |
online administration, scanning, scoring, and reporting | |
of results in a variety of formats with national norms. | |
For some, you can access a specimen set of rating | |
scales and report forms online. All of the vendors | |
provide a list of services and prices on their websites. | |
Carefully shop around to find the best fit for your | |
faculty and administrator needs and institutional | |
culture. The packages vary considerably in scale | |
design, administration options, report forms, norms, | |
and, of course, cost. | |
Comparability of paper-and-pencil and online ratings. | |
Despite all of the differences between paper-based and | |
online administrations and the contaminating biases that afflict | |
the ratings they produce, researchers have found consistently | |
that online students and their in-class counterparts rate | |
courses and instructors similarly (Layne et al. 1999; Spooner | |
et al. 1999; Waschull 2001; Carini et al. 2003; Hardy 2003; | |
McGee & Lowell 2003; Dommeyer et al. 2004; Avery et al. | |
2006; Benton et al. 2010b; Venette et al. 2010; Perrett 2011; | |
Stowell et al. 2012). The ratings on the structured items are not | |
systematically higher or lower for online administrations. The | |
correlations between online and paper-based global item | |
ratings were 0.84 (overall instructor) and 0.86 (overall course) | |
(Johnson 2003). | |
Although the ratings for online and paper are not identical, | |
with more than 70% of the variance in common, any | |
differences in ratings that have been found are small. | |
Further, interrater reliabilities of ratings of individual items | |
and item clusters for both modalities were comparable (McGee | |
& Lowell 2003), and so were the underlying factor structures | |
(Layne et al. 1999; Leung & Kember 2005). All of these | |
similarities were also found in comparisons between face-toface and online courses, although response rates were slightly | |
lower in the online courses (Benton et al. 2010a). | |
Alpha total scale (18 items) reliabilities were similar for | |
paper-based (0.90) and online (0.88) modes when all items | |
appeared on the screen (Peer & Gamliel 2011). Slightly lower | |
coefficients (0.74–0.83) for online displays of one, two, or | |
four items only on the screen were attributable to response | |
bias (Gamliel & Davidovitz 2005; Berk 2010; Peer & | |
Gamliel 2011). | |
The one exception to the above similarities is the unstructured items, or open-ended comment section. The research | |
has indicated that the flexible time permitted to the onliners | |
usually, but not always, yields longer, more frequent and | |
thoughtful comments than those of in-class respondents | |
(Layne et al. 1999; Ravelli 2000; Johnson 2001, 2003; Hardy | |
2002, 2003; Anderson et al. 2005; Donovan et al. 2006; Venette | |
et al. 2010; Morrison 2011). Typing the responses is reported | |
by students to be easier and faster than writing them, plus it | |
preserves their anonymity (Layne et al. 1999; Johnson 2003). | |
Recommendations. Weighing all of the pluses and minuses | |
in this section strongly suggests that the conversion from a | |
paper-based to online administration system seems worthy of | |
serious consideration by medical schools/colleges and | |
every other institution of higher education using student | |
ratings. When the concerns of the online approach are | |
addressed, its benefits for face-to-face, blended/hybrid, and | |
online/distance courses far outweigh the traditional paperbased approach. (NOTE: Online administration should also be | |
employed for alumni ratings and employer ratings. The costs | |
for these ratings will be a small fraction of the cost of the | |
student rating system.) | |
Standardized vs. unstandardized online scale | |
administration | |
FLASHPOINT 5: Standardized administration | |
procedures for any measure of human or rodent | |
behavior are absolutely essential to be able to | |
interpret the ratings with the same meaning for all | |
individuals who completed the measure. Student | |
rating scales are typically administered online at the | |
end of the semester without regard for any standardization or controls. There doesn’t seem to be any | |
sound psychometric reasons for why the administrations are scheduled the way they are. This is, | |
perhaps, the most neglected issue in the literature | |
and in practice. | |
Importance of standardization. A significant amount of | |
attention has been devoted to establishing standardized | |
times, conditions, locations, and procedures for administering | |
in-class tests and clinical measures, such as the OSCE, as well | |
as out-of-class admissions, licensing, and certification tests. | |
National standards for testing practices require this standardization to assure that students take tests under identical | |
conditions so their scores can be interpreted in the same way, | |
they are comparable from one student or group to another, | |
and they can be compared to norms (AERA, APA, & NCME | |
Joint Committee on Standards 1999). | |
Unfortunately, standardization has been completely | |
neglected in the faculty evaluation literature for the administration of online student rating scales (Berk 2006). This topic | |
was only briefly mentioned in a recent review of the student | |
21 | |
R. A. Berk | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
ratings research (Addison & Stowell 2012). Although the | |
inferences drawn from the scale scores and other measures of | |
teaching effectiveness require the same administration precision as tests, procedures to assure scores will have the same | |
meaning from students completing the scales at the end of the | |
semester have not been addressed in research and practice. | |
Typically, students are given notice that they have 1 or 2 | |
weeks to complete the student ratings form with the deadline | |
before or after the final exam/project. | |
Confounding uncontrolled factors. Since students can complete online rating scales during their discretionary time, there | |
is no control over the time, place, conditions, or any | |
situational factors under which the self-administrations occur | |
(Stowell et al. 2012). Most of these factors were controlled with | |
the paper-and-pencil, in-class administration by the instructor | |
or a student appointed to handle the administration. | |
In fact, in the online mode, there is no way to insure that | |
the real student filled out the form or did not discuss it with | |
someone who already did. It could be a roommate, partner, | |
avatar, alien, student who has never been to class doing a | |
favor in exchange for a pizza, alcohol, or drugs, or all of the | |
preceding. Any of those substitutes would result in fraudulent | |
ratings (Standard 5.6). Bad, bad ratings! Although there is no | |
standardization of the actual administration, at least the written | |
directions given to all students can be the same. Therefore, the | |
procedures that the students follow should be similar if they | |
read the directions. | |
Timing of administration. The timing of the administration | |
can also markedly affect the ratings. For example, if some | |
students complete the scale before the final review and final | |
exam, on the day of the final, or after the exam, their feelings | |
about the instructor/course can be very different. Exposure to | |
the final exam alone can significantly affect ratings, | |
particularly if there are specific items on the scale measuring | |
testing and evaluation methods. It could be argued that the | |
final should be completed in order to provide a true rating of | |
all evaluation methods. | |
Despite a couple of ‘‘no difference’’ studies of paper-andpencil administrations almost 40 years ago (Carrier et al. 1974; | |
Frey 1976) and one study examining final exam day administration (Ory 2001), which produced lower ratings, there does | |
not seem to be any agreement among the experts on the best | |
time to administer online scales or on any specific standardization procedures other than directions. | |
What is clear is that whatever time is decided must be the | |
same for all students in all courses; otherwise, the ratings of | |
these different groups of students will not have the same | |
meaning. For example, faculty within a department should | |
agree that all online administrations must be completed before | |
the final or after, but not both. Faculty must decide on the best | |
time to get the most accurate ratings. That decision will also | |
affect the legitimacy of any comparison of the ratings to | |
different norm groups. | |
Standards for standardization. So what is the problem with | |
the lack of standardization? The ratings by students are | |
assumed to be collected under identical conditions according | |
22 | |
to the same rules and directions. Standardization of the | |
administration and environment provide a snapshot of how | |
students feel at one point in time. Although their individual | |
ratings will vary, they will have the same meaning. Rigorous | |
procedures for standardization are required by the US | |
Standards for Educational and Psychological Testing (AERA, | |
APA, & NCME Joint Committee on Standards 1999). | |
Groups of students must be given identical instructions, | |
which is possible, administered the scale under identical | |
conditions, which is nearly impossible, to assure the comparability of their ratings (Standards 3.15, 3.19, and 3.20). Only | |
then would the interpretation of the ratings and, in this case, | |
the inferences about teaching effectiveness from the ratings be | |
valid and reliable (Standard 3.19). In other words, without | |
standardization, such as when every student fills out the scale | |
willy-nilly at different times of the day and semester, in | |
different places, under different conditions, using different | |
procedures, the ratings from student to student and professor | |
to professor will not be comparable. | |
Recommendations. Given the limitations of online administration, what can be done to approximate some semblance of | |
standardized conditions or, at least, minimize the extent to | |
which the bad conditions contaminate the ratings? Here are a | |
few options extended from Berk’s (2006) previous suggestions, listed from highest level of standardization and control to | |
lowest level: | |
(1) | |
(2) | |
(3) | |
In-class administration before final for maximum | |
control: Set a certain time slot in class, just like the | |
paper-and-pencil version, for students to complete the | |
forms on their own PC/Mac, iPad, iPhone, iPencil, or | |
other device. The professor should leave the room and | |
have a student execute and monitor the process. | |
Adequate time should be given for students to type | |
comments for the unstructured section of the scale. | |
(NOTE: Not recommended if there are several items or | |
a subscale that measures course evaluation methods, | |
since the final is part of those methods.) | |
Computer lab time slots before or after final: Set certain | |
time slots in the computer lab or an equivalent location | |
during which students can complete the forms. The | |
controls exercised in the previous option should be | |
followed in the lab. If available, techie-type students | |
should proctor the slots to eliminate distractions and | |
provide technical support for any problems that arise. | |
One or two days before or after final at students’ | |
discretion: This is the most loosy-goosy option with the | |
least control, albeit, the most popular. Specify a narrow | |
window within which the ratings must be completed, | |
such as one or two days after the final class and before | |
the final exam, or one or two days after the exam | |
before grades are submitted and posted. This gives new | |
meaning to ‘‘storm window.’’ | |
Any of these three options will improve the standardization | |
of current online administration practices beyond the typical | |
1- or 2-week bay window. Experience and research on these | |
procedures will hopefully identify the confounding variables | |
that can affect the online ratings. Ultimately, concrete | |
Flashpoints in teaching effectiveness | |
guidelines to assist faculty in deciding on the most appropriate | |
administration protocol will result. | |
Declaration of interest: The author report no conflicts of | |
interest. The author alone is responsible for the content and | |
writing of the article. | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Top five recommendations | |
After ruminating over these flashpoints, it can be concluded | |
that there are a variety of options within the reach of every | |
medical school/college and institution of higher education to | |
improve its current practices with its source(s) of evidence and | |
administration procedures. Everyone is wrestling with these | |
issues and, although more research is needed to test the | |
options, there are tentative solutions to these problems. As | |
experience and research continue to accumulate, even better | |
solutions will result. | |
There is a lot of activity and discourse on these flashpoints | |
because we know that all of the summative decisions about | |
faculty will be made with or without the best information | |
available. Further, professors who are passionate about | |
teaching will also seek out sources of evidence to guide | |
their improvement. | |
The contribution of this PBW article rests on the value and | |
usefulness of the recommendations that you can convert into | |
action. Without action, the recommendations are just dead | |
words on a page. Your TAKE-AWAYS are the concrete action | |
steps you choose to implement to improve the current state of | |
your teaching assessment system. | |
Here are the top five recommendations framed in terms of | |
action steps: | |
(1) | |
(2) | |
(3) | |
(4) | |
(5) | |
polish your student rating scale, but also start building | |
additional sources of evidence, such as self, peer, and | |
mentor scales, to assess teaching effectiveness; | |
match your highest quality sources to the specific | |
formative and summative decisions using the 360 MSF | |
model; | |
review current measures of teaching effectiveness with | |
your faculty and plan specifically how you can improve | |
their psychometric quality; | |
design an online administration system in-house or outhouse with a vendor to conduct the administration and | |
score reporting for your own student rating scale or the | |
one it provides; and | |
standardize directions, administration procedures, and | |
a narrow window for completion of your student rating | |
scale and other measures of teaching effectiveness. | |
Taking action on these five can yield major strides in | |
improving the practice of assessing teaching effectiveness and | |
the fairness and equity of the formative and summative | |
decisions made with the results. Just how important is | |
teaching in your institution? Your answer will be expressed | |
in your actions. What can you contribute to make it better than | |
it is ever been? That is my challenge to you. | |
Notes on contributor | |
RONALD A. BERK, PhD, is Professor Emeritus, Biostatistics and | |
Measurement, and former Assistant Dean for Teaching at the Johns | |
Hopkins University, where he taught for 30 years. He has presented 400 | |
keynotes/workshops and published 14 books, 165 journal articles, and 300 | |
blogs. His professional motto is: ‘‘Go for the Bronze!’’ | |
References | |
Medical/healthcare education | |
Ahmady S, Changiz T, Brommels M, Gaffney FA, Thor J, Masiello I. 2009. | |
Contextual adaptation of the Personnel Evaluation Standards for | |
assessing faculty evaluation systems in developing countries: The | |
case of Iran. BMC Med Educ 9(18), DOI: 10.1186/1472-6920-9-18. | |
Anderson HM, Cain J, Bird E. 2005. Online student course evaluations: | |
Review of literature and a pilot study. Am J Pharm Educ 69(1):34–43. | |
Available from http://web.njit.edu/bieber/pub/Shen-AMCIS2004.pdf. | |
Appling SE, Naumann PL, Berk RA. 2001. Using a faculty evaluation triad to | |
achieve evidenced-based teaching. Nurs Health Care Perspect | |
22:247–251. | |
Barnett CW, Matthews HW. 2009. Teaching evaluation practices in colleges | |
and schools of pharmacy. Am J Pharm Educ 73(6). | |
Barnett CW, Matthews HW, Jackson RA. 2003. A comparison between | |
student ratings and faculty self-ratings of instructional effectiveness. | |
J Pharm Educ 67(4). | |
Berk RA. 2009a. Using the 360 multisource feedback model to evaluate | |
teaching and professionalism. Med Teach 31(12):1073–1080. | |
Berk RA, Naumann PL, Appling SE. 2004. Beyond student ratings: Peer | |
observation of classroom and clinical teaching. Int J Nurs Educ | |
Scholarsh 1(1):1–26. | |
Boerboom TBB, Mainhard T, Dolmans DHJM, Scherpbier AJJA, van | |
Beukelen P, Jaarsma ADC. 2012. Evaluating clinical teachers with the | |
Maastricht clinical teaching questionnaire: How much ‘teacher’ is in | |
student ratings? Med Teach 34(4):320–326. | |
Chenot J-F, Kochen MM, Himmel W. 2009. Student evaluation of a primary | |
care clerkship: Quality assurance and identification of potential for | |
improvement. BMC Med Educ 9(17), DOI: 10.1186/1472-6920-9-17. | |
DiVall M, Barr J, Gonyeau M, Matthews SJ, van Amburgh J, Qualters D, | |
Trujillo J. 2012. Follow-up assessment of a faculty peer observation and | |
evaluation program. Am J Pharm Educ 76(4). | |
Donnon T, Delver H, Beran T. 2010. Student and teaching characteristics | |
related to ratings of instruction in medical sciences graduate programs. | |
Med Teach 32(4):327–332. | |
Elzubeir M, Rizk D. 2002. Evaluating the quality of teaching in medical | |
education: Are we using the evidence for both formative and | |
summative purposes? Med Teach 24:313–319. | |
Hoeks TW, van Rossum HJ. 1988. The impact of student ratings on a new | |
course: The general clerkship (ALCO). Med Educ 22(4):308–313. | |
Jones RF, Froom JD. 1994. Faculty and administration views of problems in | |
faculty evaluation. Acad Med 69(6):476–483. | |
Kidd RS, Latif DA. 2004. Student evaluations: Are they valid measures of | |
course effectiveness? J Pharm Educ 68(3). | |
Maker VK, Lewis MJ, Donnelly MB. 2006. Ongoing faculty evaluations: | |
Developmental gain or just more pain? Curr Surg 63(1):80–84. | |
Martens MJ, Duvivier RJ, van Dalen J, Verwijnen GM, Scherpbier AJ, van der | |
Vleuten. 2009. Student views on the effective teaching of physical | |
examination skills: A qualitative study. Med Educ 43(2):184–191. | |
Mazor K, Clauser B, Cohen A, Alper E, Pugnaire M. 1999. The dependability | |
of students’ rating of preceptors. Acad Med 74:19–21. | |
Pattison AT, Sherwood M, Lumsden CJ, Gale A, Markides M. 2012. | |
Foundation observation of teaching project – A developmental model | |
of peer observation of teaching. Med Teach 34(2):e36–e142. | |
Pierre RB, Wierenga A, Barton M, Branday JM, Christie CD. 2004. | |
Student evaluation of an OSCE in paediatrics at the University of | |
the West Indies, Jamaica. BMC Med Educ 4(22), DOI: 10.1186/14726920-4-22. | |
Schiekirka S, Reinhardt D, Heim S, Fabry G, Pukrop T, Anders S, | |
Raupach T. 2012. Student perceptions of evaluation in undergraduate | |
medical education: A qualitative study from one medical school. | |
BMC Med Educ 12(45), DOI: 10.1186/1472-6920-12-45. | |
23 | |
R. A. Berk | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Siddiqui ZS, Jonas-Dwyer D, Carr SE. 2007. Twelve tips for peer | |
observation of teaching. Med Teach 29(4):297–300. | |
Stalmeijer RE, Dolmans DH, Wolfhagen IH, Peters WG, van Coppenolle L, | |
Scherpbier AJ. 2010. Combined student ratings and self-assessment | |
provide useful feedback for clinical teachers. Adv Health Sci Educ | |
Theory Pract 15(3):315–328. | |
Stark P. 2003. Teaching and learning in the clinical setting: A qualitative | |
study of the perceptions of students and teachers. Med Educ | |
37(11):975–982. | |
Steinert Y. 2004. Student perceptions of effective small group teaching. Med | |
Educ 38(3):286–293. | |
Sullivan PB, Buckle A, Nicky G, Atkinson SH. 2012. Peer observation of | |
teaching as a faculty development tool. BMC Med Educ 12(26), DOI: | |
10.1186/1472-6920-12-26. | |
Turhan K, Yaris F, Nural E. 2005. Does instructor evaluation by students | |
using a web-based questionnaire impact instructor performance? Adv | |
Health Sci Educ Theory Pract 10(1):5–13. | |
Wellein MG, Ragucci KR, Lapointe M. 2009. A peer review process for | |
classroom teaching. Am J Pharm Educ 73(5). | |
General higher education | |
Abrami PC. 2001. Improving judgments about teaching effectiveness using | |
rating forms. In: Theall M, Abrami PC, Mets LA, editors. The student | |
ratings debate: Are they valid? How can we best use them? (New | |
Directions for Institutional Research, No. 109). San Francisco, CA: | |
Jossey-Bass. pp 59–87. | |
Addison WE, Stowell JR. 2012. Conducting research on student evaluations | |
of teaching. In: Kite ME, editor. Effective evaluation of teaching: A guide | |
for faculty and administrators. pp 1–12. E-book [Accessed 6 June 2012] | |
Available from the Society for the Teaching of Psychology website | |
http://teachpsych.org/ebooks/evals2012/index.php. | |
AERA | |
(American | |
Educational | |
Research | |
Association), | |
APA | |
(American Psychological Association), NCME (National Council on | |
Measurement in Education) Joint Committee on Standards. 1999. | |
Standards for educational and psychological testing. Washington, DC: | |
AERA. | |
Ali DL, Sell Y. 1998. Issues regarding the reliability, validity and utility of | |
student ratings of instruction: A survey of research findings. Calgary, | |
AB: University of Calgary APC Implementation Task Force on Student | |
Ratings of Instruction. | |
Arreola RA. 2007. Developing a comprehensive faculty evaluation system: | |
A handbook for college faculty and administrators on designing and | |
operating a comprehensive faculty evaluation system. 3rd ed. Bolton, | |
MA: Anker. | |
Avery RJ, Bryan WK, Mathios A, Kang H, Bell D. 2006. Electronic course | |
evaluations: Does an online delivery system influence student | |
evaluations? J Econ Educ 37(1):21–37. | |
Benton SL, Cashin WE. 2012. Student ratings of teaching: A summary | |
of research and literature (IDEA Paper no. 50). Manhattan, KS: | |
The IDEA Center. [Accessed 8 April 2012] Available from http:// | |
www.theideacenter.org/sites/default/files/idea-paper_50.pdf. | |
Benton SL, Webster R, Gross A, Pallett W. 2010a. An analysis of IDEA | |
Student Ratings of Instruction in traditional versus online courses (IDEA | |
Technical Report no. 15). Manhattan, KS: The IDEA Center. | |
Benton SL, Webster R, Gross A, Pallett W. 2010b. An analysis of IDEA | |
Student Ratings of Instruction using paper versus online survey | |
methods (IDEA Technical Report no. 16). Manhattan, KS: The IDEA | |
Center. | |
Berk RA. 1979. The construction of rating instruments for faculty | |
evaluation: A review of methodological issues. J Higher Educ | |
50:650–669. | |
Berk RA. 2005. Survey of 12 strategies to measure teaching effectiveness. | |
Int J Teac Learn Higher Educ 17(1):4862. Available from http:// | |
www.isetl.org/ijtlthe. | |
Berk RA. 2006. Thirteen strategies to measure college teaching: | |
A consumer’s guide to rating scale construction, assessment, and | |
decision making for faculty, administrators, and clinicians. Sterling, VA: | |
Stylus. | |
24 | |
Berk RA. 2009b. Beyond student ratings: ‘‘A whole new world, a new | |
fantastic point of view.’’ Essays Teach Excellence 20(1). Available from | |
http://podnetwork.org/publications/teachingexcellence.htm. | |
Berk RA. 2010. The secret to the ‘‘best’’ ratings from any evaluation scale. | |
J Faculty Dev 24(1):37–39. | |
Braskamp LA, Ory JC. 1994. Assessing faculty work: Enhancing individual | |
and institutional performance. San Francisco, CA: Jossey-Bass. | |
Calderon TG, Gabbin AL, Green BP. 1996. Report of the committee on | |
promoting evaluating effective teaching. Harrisonburg, VA: James | |
Madison University. | |
Carini RM, Hayek JC, Kuh GD, Ouimet JA. 2003. College student responses | |
to web and paper surveys: Does mode matter? Res Higher Educ | |
44(1):1–19. | |
Carrier NA, Howard GS, Miller WG. 1974. Course evaluations: When? | |
J Educ Psychol 66:609–613. | |
Cashin WE. 2003. Evaluating college and university teaching: Reflections of | |
a practitioner. In: Smart JC, editor. Higher education: Handbook of | |
theory and research. Dordrecht, the Netherlands: Kluwer Academic | |
Publishers. pp 531–593. | |
Centra JA. 1993. Reflective faculty evaluation: Enhancing teaching and | |
determining faculty effectiveness. San Francisco: Jossey-Bass. | |
Cohen PA, McKeachie WJ. 1980. The role of colleagues in the evaluation of | |
teaching. Improving College Univ Teach 28(4):147–154. | |
Coren S. 2001. Are course evaluations a threat to academic freedom? | |
In: Kahn SE, Pavlich D, editors. Academic freedom and the inclusive | |
university. Vancouver, BC: University of British Columbia Press. | |
pp 104–117. | |
d’Apollonia S, Abrami PC. 1997a. Navigating student ratings of instruction. | |
Am Psychol 52:1198–1208. | |
d’Apollonia S, Abrami PC. 1997b. Scaling the ivory tower, part 1: | |
Collecting evidence of instructor effectiveness. Psychol Teach Rev | |
6:46–59. | |
d’Apollonia S, Abrami PC. 1997c. Scaling the ivory tower, part 2: | |
Student ratings of instruction in North America. Psychol Teach | |
Rev 6:60–76. | |
deVellis RF. 2012. Scale development: Theory and applications. 3rd ed. | |
Thousand Oaks, CA: Sage. | |
Dommeyer CJ, Baum P, Hanna RW, Chapman KS. 2004. Gathering | |
faculty teaching evaluations by in-class and online surveys: Their | |
effects on response rates and evaluations. Assess Eval Higher | |
Educ 29(5):611–623. | |
Donovan J, Mader CE, Shinsky J. 2006. Constructive student feedback: | |
Online vs. traditional course evaluations. J Interact Online Learn | |
5:283–296. | |
Dunn-Rankin P, Knezek GA, Wallace S, Zhang S. 2004. Scaling methods. | |
Mahwah, NJ: Erlbaum. | |
Franklin J. 2001. Interpreting the numbers: Using a narrative to help others | |
read student evaluations of your teaching accurately. In: Lewis KG, | |
editor. Techniques and strategies for interpreting student evaluations | |
(Special issue) (New Directions for Teaching and Learning, No. 87). | |
San Francisco, CA: Jossey-Bass. pp 85–100. | |
Franklin J, Theall M. 1990. Communicating student ratings to decision | |
makers: Design for good practice. In: Theall M, Franklin J, editors. | |
Student ratings of instruction: Issues for improving practice (Special | |
issue) (New Directions for Teaching and Learning, No. 43). San | |
Francisco, CA: Jossey-Bass. pp 75–93. | |
Frey PW. 1976. Validity of student instructional ratings as a function of their | |
timing. J Higher Educ 47:327–336. | |
Freyd M. 1923. A graphic rating scale for teachers. J Educ Res | |
8(5):433–439. | |
Gamliel E, Davidovitz L. 2005. Online versus traditional teaching | |
evaluation: Mode can matter. Assess Eval Higher Educ 30(6): | |
581–592. | |
Gravestock P, Gregor-Greenleaf E. 2008. Student course evaluations: | |
Research, models and trends. Toronto, ON: Higher Education | |
Quality Council of Ontario. E-book [Accessed 6 May 2012] Available | |
from http://www.heqco.ca/en-CA/Research/Research%20Publications/ | |
Pages/Home.aspx. | |
Green BP, Calderon TG, Reider BP. 1998. A content analysis of teaching | |
evaluation instruments used in accounting departments. Issues Account | |
Educ 13(1):15–30. | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Flashpoints in teaching effectiveness | |
Hardy N. 2002. Perceptions of online evaluations: Fact and fiction. Paper | |
presented at the annual meeting of the American Educational Research | |
Association, April 1–5 2002, New Orleans, LA. | |
Hardy N. 2003. Online ratings: Fact and fiction. In: Sorenson DL, Johnson | |
TD, editors. Online student ratings of instruction (New Directions for | |
Teaching and Learning, No. 96). San Francisco, CA: Jossey-Bass. | |
pp 31–38. | |
Heath N, Lawyer S, Rasmussen E. 2007. Web-based versus paper and | |
pencil course evaluations. Teach Psychol 34(4):259–261. | |
Hong PC. 2008. Evaluating teaching and learning from students’ | |
perspectives in their classroom through easy-to-use online surveys. | |
Int J Cyber Soc Educ 1(1):33–48. | |
Hoyt DP, Pallett WH. 1999. Appraising teaching effectiveness: Beyond | |
student ratings (IDEA Paper no. 36). Manhattan, KS: Kansas State | |
University Center for Faculty Evaluation and Development. | |
Johnson TD. 2001. Online student ratings: Research and possibilities. | |
Invited plenary presented at the Online Assessment Conference, | |
September, Champaign, IL. | |
Johnson TD. 2003. Online student ratings: Will students respond?. | |
In: Sorenson DL, Johnson TD, editors. Online student ratings of | |
instruction (New Directions for Teaching and Learning, no. 96). | |
San Francisco, CA: Jossey-Bass. pp 49–60. | |
Joint Committee on Standards for Educational Evaluation. 2009. The | |
personnel evaluation standards: How to assess systems for evaluating | |
educators. 2nd ed. Thousand Oaks, CA: Corwin Press. | |
Kite ME, editor. 2012. Effective evaluation of teaching: A guide for faculty | |
and administrators. E-book [Accessed 6 June 2012] Available from the | |
Society for the Teaching of Psychology website http://teachpsych.org/ | |
ebooks/evals2012/index.php. | |
Knapper C, Cranton P, editors. 2001. Fresh approaches to the evaluation | |
of teaching (New Directions for Teaching and Learning, no. 88). | |
San Francisco, CA: Jossey-Bass. pp 19–29. | |
Layne BH, DeCristoforo JR, McGinty D. 1999. Electronic versus | |
traditional student ratings of instruction. Res Higher Educ | |
40(2):221–232. | |
Leung DYP, Kember D. 2005. Comparability of data gathered from | |
evaluation questionnaires on paper through the Internet. Res Higher | |
Educ 46:571–591. | |
Liu Y. 2006. A comparison of online versus traditional student evaluation of | |
instruction. Int J Instr Technol Distance Learn 3(3):15–30. | |
MarketTools. 2006. Zoomerang: Easiest way to ask, fastest way to know. | |
[Accessed 17 July 2012] Available from http://info.zoomerang.com. | |
Marsh HW. 2007. Students’ evaluations of university teaching: | |
Dimensionality, reliability, validity, potential biases and usefulness. | |
In: Perry RP, Smart JC, editors. The scholarship of teaching and learning | |
in higher education: An evidence-based perspective. Dordrecht, the | |
Netherlands: Springer. pp 319–383. | |
McGee DE, Lowell N. 2003. Psychometric properties of student ratings of | |
instruction in online and on-campus courses. In: Sorenson DL, Johnson | |
TD, editors. Online student ratings of instruction (New Directions for | |
Teaching and Learning, no. 96). San Francisco, CA: Jossey-Bass. | |
pp 39–48. | |
Morrison R. 2011. A comparison of online versus traditional student end-ofcourse critiques in resident courses. Assess Eval Higher Educ | |
36(6):627–641. | |
Netemeyer RG, Bearden WO, Sharma S. 2003. Scaling procedures. | |
Thousand Oaks, CA: Sage. | |
Oliver RL, Sautter EP. 2005. Using course management systems to | |
enhance the value of student evaluations of teaching. J Educ Bus | |
80(4):231–234. | |
Ory JC. 2001. Faculty thoughts and concerns about student ratings. | |
In: Lewis KG, editor. Techniques and strategies for interpreting student | |
evaluations (Special issue) (New Directions for Teaching and Learning, | |
No. 87). San Francisco, CA: Jossey-Bass. pp 3–15. | |
Ory JC, Ryan K. 2001. How do student ratings measure up to a new validity | |
framework?. In: Theall M, Abrami PC, Mets LA, editors. The student | |
ratings debate: Are they valid? How can we best use them? (Special | |
issue) (New Directions for Institutional Research, 109). San Francisco, | |
CA: Jossey-Bass. pp 27–44. | |
Peer E, Gamliel E. 2011. Too reliable to be true? Response bias as a | |
potential source of inflation in paper and pencil questionnaire | |
reliability. Practical Assess Res Eval 16(9):1–8. Available from http:// | |
pareonline.net/getvn.asp?v=16%n=9. | |
Perrett JJ. 2011. Exploring graduate and undergraduate course evaluations | |
administered on paper and online: A case study. Assess Eval Higher | |
Educ 1–9, DOI: 10.1080/02602938.2011.604123. | |
Ravelli B. 2000. Anonymous online teaching assessments: Preliminary | |
findings. [Accessed 12 June 2012] Available from http://www.edrs.com/ | |
DocLibrary/0201/ED445069.pdf. | |
Seldin P. 1999. Current practices – good and bad – nationally. In: Seldin P & | |
Associates Changing practices in evaluating teaching: A practical guide | |
to improved faculty performance and promotion/tenure decisions. | |
Bolton, MA: Anker. 1–24. | |
Seldin P. 2006. Building a successful evaluation program. In: Seldin P & | |
Associates Evaluating faculty performance: A practical guide to | |
assessing teaching, research, and service. Bolton, MA: Anker 1–19. | |
Seldin P, Associates, editors. 2006. Evaluating faculty performance: A | |
practical guide to assessing teaching, research, and service. Bolton, MA: | |
Anker. pp 201–216. | |
Sorenson DL, Johnson TD, editors. 2003. Online student ratings of | |
instruction (New Directions for Teaching and Learning, no. 96). | |
San Francisco, CA: Jossey-Bass. | |
Spooner F, Jordan L, Algozzine B, Spooner M. 1999. Student ratings of | |
instruction in distance learning and on-campus classes. J Educ Res | |
92:132–140. | |
Stehle S, Spinath B, Kadmon M. 2012. Measuring teaching effectiveness: | |
Correspondence between students’ evaluations of teaching and | |
different measures of student learning. Res Higher Educ. DOI: | |
10.1007/s11162-012-9260-9. | |
Stowell JR, Addison WE, Smith JL. 2012. Comparison of online and | |
classroom-based student evaluations of instruction. Assess Eval Higher | |
Educ 37(4):465–473. | |
Strategy Group. 2011. National strategy for higher education to 2030 | |
(Report of the Strategy Group). Dublin, Ireland: Department of | |
Education and Skills, Government Publications Office. [Accessed 17 | |
July | |
2012] | |
Available | |
from | |
http://www.hea.ie/files/files/ | |
DES_Higher_Ed_Main_Report.pdf. | |
Streiner DL, Norman GR. 2008. Health measurement scales: A practical | |
guide to their development and use. 4th ed. New York: Oxford | |
University Press. | |
Surgenor PWG. 2011. Obstacles and opportunities: Addressing the | |
growing pains of summative student evaluation of teaching. Assess | |
Eval Higher Educ 1–14, iFirst Article. DOI: 10.1080/ | |
02602938.2011.635247. | |
Svinicki M, McKeachie WJ. 2011. McKeachie’s teaching tips: Strategies, | |
research, and theory for college and university teachers. 13th ed. | |
Belmont, CA: Wadsworth. | |
Theall M, Feldman KA. 2007. Commentary and update on Feldman’s (1997) | |
‘‘Identifying exemplary teachers and teaching: Evidence from student | |
ratings’’. In: Perry RP, Smart JC, editors. The teaching and learning in | |
higher education: An evidence-based perspective. Dordrecht, the | |
Netherlands: Springer. pp 130–143. | |
Theall M, Franklin JL. 1990. Student ratings in the context of | |
complex evaluation systems. In: Theall M, Franklin JL, editors. | |
Student ratings of instruction: Issues for improving practice (New | |
Directions for Teaching and Learning, no. 43). San Francisco, CA: | |
Jossey-Bass. pp 17–34. | |
Theall M, Franklin JL. 2000. Creating responsive student ratings systems to | |
improve evaluation practice. In: Ryan KE, editor. Evaluating teaching in | |
higher education: A vision for the future (Special issue) (New Directions | |
for Teaching and Learning, no. 83). San Francisco, CA: Jossey-Bass. | |
pp 95–107. | |
Theall M, Franklin JL. 2001. Looking for bias in all the wrong places: | |
A search for truth or a witch hunt in student ratings of instruction?. | |
In: Theall M, Abrami PC, Mets LA, editors. The student ratings | |
debate: Are they valid? How can we best use them? (New Directions | |
for Institutional Research, no. 109). San Francisco, CA: Jossey-Bass. | |
pp 45–56. | |
US Equal Employment Opportunity Commission (EEOC). 2010. Employment | |
tests and selection procedures. [Accessed 20 August 2012] Available from | |
http://www.eeoc.gov/policy/docs/factemployment_procedures.html. | |
25 | |
R. A. Berk | |
Med Teach Downloaded from informahealthcare.com by University of Dundee on 01/09/13 | |
For personal use only. | |
Venette S, Sellnow D, McIntire K. 2010. Charting new territory: Assessing | |
the online frontier of student ratings of instruction. Assess Eval Higher | |
Educ 35:101–115. | |
Waschull SB. 2001. The online delivery of psychology courses: Attrition, | |
performance, and evaluation. Comput Teach 28:143–147. | |
Wright KB. 2005. Researching internet-based populations: Advantages and | |
disadvantages of online survey research, online questionnaire | |
26 | |
View publication stats | |
authoring software packages, and web survey services. J Comput | |
Mediated Commun 10(3). Available from http://jcmc.indiana.edu/ | |
vol10/issue3/wright.html. | |
Yarbrough DB, Shulha LM, Hopson RK, Caruthers FA. 2011. The | |
program evaluation standards: A guide for evaluators and evaluation | |
users. 3rd ed. Thousand Oaks, CA: Sage. | |
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/268522696 | |
Views from below: Students’ perceptions of teaching practice evaluations and | |
stakeholder roles | |
Article in Perspectives in Education · December 2013 | |
CITATIONS | |
READS | |
6 | |
53 | |
1 author: | |
Lungi Sosibo | |
Cape Peninsula University of Technology | |
24 PUBLICATIONS 45 CITATIONS | |
SEE PROFILE | |
Some of the authors of this publication are also working on these related projects: | |
NRF Project on Maths and Science View project | |
Inequalities in Education View project | |
All content following this page was uploaded by Lungi Sosibo on 05 May 2018. | |
The user has requested enhancement of the downloaded file. | |
View publication stats | |
College Teaching | |
ISSN: 8756-7555 (Print) 1930-8299 (Online) Journal homepage: https://www.tandfonline.com/loi/vcol20 | |
Predicting Student Achievement in UniversityLevel Business and Economics Classes: Peer | |
Observation of Classroom Instruction and Student | |
Ratings of Teaching Effectiveness | |
Craig S. Galbraith & Gregory B. Merrill | |
To cite this article: Craig S. Galbraith & Gregory B. Merrill (2012) Predicting Student Achievement | |
in University-Level Business and Economics Classes: Peer Observation of Classroom | |
Instruction and Student Ratings of Teaching Effectiveness, College Teaching, 60:2, 48-55, DOI: | |
10.1080/87567555.2011.627896 | |
To link to this article: https://doi.org/10.1080/87567555.2011.627896 | |
Published online: 04 Apr 2012. | |
Submit your article to this journal | |
Article views: 426 | |
View related articles | |
Citing articles: 3 View citing articles | |
Full Terms & Conditions of access and use can be found at | |
https://www.tandfonline.com/action/journalInformation?journalCode=vcol20 | |
COLLEGE TEACHING, 60: 48–55, 2012 | |
C Taylor & Francis Group, LLC | |
Copyright | |
ISSN: 8756-7555 print / 1930-8299 online | |
DOI: 10.1080/87567555.2011.627896 | |
Predicting Student Achievement in University-Level | |
Business and Economics Classes: Peer Observation | |
of Classroom Instruction and Student Ratings of | |
Teaching Effectiveness | |
Craig S. Galbraith | |
University of North Carolina Wilmington | |
Gregory B. Merrill | |
St Mary’s College of California | |
We examine the validity of peer observation of classroom instruction for purposes of faculty | |
evaluation. Using both a multi-section course sample and a sample of different courses across | |
a university’s School of Business and Economics we find that the results of annual classroom | |
observations of faculty teaching are significantly and positively correlated with student learning | |
outcome assessment measures. This finding supports the validity of classroom observation as | |
an assessment of teaching effectiveness. The research also indicates that student ratings of | |
teaching effectiveness (SETEs) were less effective at measuring student learning than annual | |
classroom observations by peers. | |
There is no question that teaching effectiveness is a very | |
personal, highly complex, and ever changing process involving a multitude of different skills and techniques. Teaching | |
is also part of the mission of every institution of higher learning, although certainly the weightings between teaching and | |
other components, such as scholarship, service, and regional | |
engagement, may vary between individual campuses. As the | |
primary institutional service providers for the core mission of | |
teaching, the faculty’s teaching effectiveness must be evaluated for various personnel decisions, such as promotion, | |
tenure, and retention. Today, most universities systematically | |
use a combination of peer evaluations and student ratings. | |
The notion of peer evaluations has evolved significantly | |
since the 1980s and now represents a relatively broad definition that includes both direct classroom observation and | |
review of a faculty member’s teaching portfolio of syllabi, | |
exam samples, and possibly other data points, such as statements of teaching philosophy and reflective reactions to student feedback. As Yon, Burnap, and Kohut (2002) observe, | |
Correspondence should be sent to Craig S. Galbraith, University of North | |
Carolina Wilmington, Cameron School of Business, 601 South College | |
Road, Wilmington, NC 28403, USA. E-mail: [email protected] | |
“the expanding use of peers in the evaluation of teaching | |
is part of a larger trend in postsecondary education toward | |
a more systematic assessment of classroom performance” | |
(104). In fact, there now exists a broad normative literature describing the underlying theories, proposed protocols, | |
and content breadth of comprehensive peer evaluations (e.g., | |
Centra 1993; Cavanagh 1996; Malik 1996; Hutchings 1996 | |
1998: Bernstein and Edwards 2001; Bernstein et al. 2006; | |
Arreola 2007; Chism 2007; Bernstein 2008). | |
Of all the elements in a typical university peer evaluation | |
process, direct classroom observation continues to be one | |
of the more controversial. Not only can issues of peer bias, | |
observer training, and classroom intrusion be raised (e.g., | |
Cohen and McKeachie 1980; DeZure 1999; Yon et al. | |
2002; Costello et al. 2001; Arreola 2007; Courneya, Pratt, | |
and Collins 2007), but there remains a fundamental debate | |
whether classroom observation is most valid for formative | |
purposes in assisting faculty to improve their teaching effectiveness or for evaluative purposes in providing university | |
administrators and faculty colleagues useful data for personnel decisions (e.g, Cohen and McKeachie; 1980; Centra | |
1993; Shortland 2004; Peel 2005; Chism 2007). In practice, | |
most universities that use classroom observation for evaluative purposes generally use observations from a single class | |
PREDICTING STUDENT ACHIEVEMENT | |
“visit,” and that assessment is then assumed to reflect an evaluation of the faculty member’s overall teaching ability during | |
that particular time period, or at least until another “visit” is | |
conducted. Despite the intermittent nature of some aspects of | |
peer evaluations, surveys tend to support the argument that | |
at least the faculty themselves believe that peer evaluations | |
can be an effective measure of teaching effectiveness (e.g., | |
Peterson 2000; Yon et al. 2002; Kohut, Burnap, and Yon | |
2007). | |
While faculty may believe that peer evaluations and classroom observations, if done properly, are valid measures of | |
teaching effectiveness, it is difficult to draw any conclusion | |
at all from empirical validity studies of peer evaluations and | |
classroom observation. As with any instructional related metric used for faculty personnel decisions, the argument for, or | |
against, the validity of peer evaluations should be based upon | |
convincing evidence that they indeed measure teaching effectiveness or student learning. As Cohen and McKeachie | |
(1980) succinctly noted early in this debate, “clearly what | |
is needed are studies demonstrating the validity of colleague | |
ratings against other criteria of teaching effectiveness. One | |
possibility would be to relate colleague ratings to student | |
achievement” (149). | |
In spite of Cohen and McKeachie’s (1980) call for more | |
empirical research tied to student achievement, this type of | |
validity testing for peer evaluations or classroom observation | |
has simply not yet occurred. To date, almost all the empirical arguments for, or against, the validity of peer evaluations | |
and classroom observation are based upon their correlations | |
with student evaluations of teaching effectiveness (SETEs) | |
or some other purported measure of teaching excellence, | |
such as teaching awards. Feldman’s (1989b) meta-analysis | |
of these types of studies, for example, reports a mean correlation between peer evaluations and student ratings of 0.55, | |
with correlations ranging from 0.19 to 0.84. Empirical results since Feldman’s meta-analysis report similar correlations (e.g., Kremer 1990; Centra 1994). In general, higher | |
correlations with SETEs are found when peers examined a | |
complete teaching portfolio, and therefore may have been | |
influenced by student evaluations included in the portfolio, | |
while the lower correlations were from studies involving primarily classroom observations (Burns 1998). | |
While this line of research is interesting, given the fact that | |
SETEs themselves are often challenged as valid measures of | |
teaching effectiveness, studies that correlate peer evaluations | |
with SETEs simply provide little or no insight regarding | |
the validly of peer evaluations and classroom observations | |
for purposes of assessing faculty teaching effectiveness. The | |
validity of SETEs as a measure of teaching effectiveness has | |
been challenged for a number of reasons. | |
First, early research indicates only a moderate amount | |
of statistical variation in independent and objective measures of teaching effectiveness are explained by SETE | |
scores—depending on the meta-analysis study, between | |
about 4% and 20% for the typical “global” item on SETE in- | |
49 | |
struments (Cohen, P. 1981, 1982, 1983; Costin 1987; Dowel | |
and Neal 1982, 1983; McCallum 1984; Feldman 1989a, | |
2007)—with many of the studies finding validity within the | |
“weak” category of scale criterion validity suggested by Cohen, J. (1969, 1981)1. | |
Second, it has been noted that the vast majority of this | |
early SETE research relied upon data from introductory undergraduate college courses at research institutions taught by | |
teaching assistants (TAs) following a textbook or departmental created lesson plan. These types of TA taught introductory classes, however, only account for a small percentage | |
of a university’s total course offerings, and may not be at | |
all representative for non-doctorate granting colleges. In addition, as Taylor (2007) notes, it is in the more advanced | |
core, elective, and graduate courses where faculty members | |
have the greatest flexibility over pedagogical style, course | |
content, and assessment criteria—the factors most likely to | |
drive classroom learning. In fact, recent empirical research | |
has indicated a possible negative, or negatively bi-modal, relationship between SETEs and student achievement in more | |
advanced university courses (Carrell and West, 2010; Galbraith, Merrill, and Kline, 2011). | |
Third, in the past two decades a number of articles have | |
appeared that specifically challenge various validity related | |
aspects of SETEs (e.g., Balam and Shannon, 2010; Campbell | |
and Bozeman 2008; Davies et al. 2007; Emery, Kramer, and | |
Tian 2003; Pounder 2007; Langbein 2008; Carrell and West | |
2010). These include arguments that student perceptions of | |
teaching are notoriously subject to various types of manipulation, such as the often debated “grading leniency” hypothesis, or even giving treats such as “chocolate candy” prior | |
to the evaluation (e.g., Blackhart et al. 2006; Bowling 2008; | |
Boysen 2008; Felton, Mitchell, and Stinson 2004; Youmans | |
and Jee 2007). Other research has demonstrated that student | |
ratings are influenced by race, gender, and cultural biases | |
as well as various “likability and popularity” attributes of | |
the instructor, such as physical looks and “sexiness” (e.g., | |
Abrami, Levanthal, and Perry 1982; Ambady and Rosenthal | |
1993; Anderson and Smith 2005; Atamian and Ganguli 1993; | |
Buck and Tiene 1989; Davies et al. 2007; Felton, Mitchell, | |
and Stinson 2004; McNatt, 2010; Naftulin, Ware, and Donnelly 1973; Riniolo et al. 2006; Smith 2007; Steward and | |
Phelps 2000). | |
The lack of empirical studies linking classroom observations by peers to student achievement combined with the continuing questions surrounding the overall validity of SETEs | |
as an indicator of teaching effectiveness clearly underlines | |
the need for continued research as to how faculty members | |
J. (1969, 1981) refers to r = 0.10 (1.0% variance explained) as | |
a small effect, r = 0.30 (9.0% variance explained) as a medium effect, and | |
r = 0.50 (25.0% variance explained) as a large effect. Many researchers | |
have used an r<0.30 (less than 9% variance explained) to signify a “small” | |
effect for purposes of testing scale validity (e.g., Barrett et al. 2009; Hon et | |
al., 2010; Varni et al. 2001; Whitfield et al. 2006). | |
1Cohen, | |
50 | |
GALBRAITH AND MERRILL | |
are evaluated. In this study, we directly examine issues surrounding the validity of peer classroom observations in relationship to student learning. Our analysis differs from previous empirical efforts in several respects. First, we investigate | |
the validity of classroom peer observations by using standardized learning outcome measures set by an institutional | |
process rather than simply correlating peer evaluations with | |
SETEs. Second, our sample of advanced but required core | |
undergraduate and graduate courses represents a mid-range | |
of content control by individual instructors. Third, we compare the explanatory power of peer evaluation ratings with | |
SETEs, and fourth, we have both part-time instructors and | |
full-time faculty members in our sample. This allows for a | |
test regarding the possible impact of independence in selecting which “peers” observe a faculty member’s classroom | |
instruction. | |
Data | |
The data come from courses taught by thirty-four different | |
faculty at a “School of Business and Economics” for a private | |
university located in a large urban region. Classes are offered at both the undergraduate and graduate (masters) level. | |
Similar to many urban universities, a number of adjunct or | |
part-time instructors are used to teach courses. Some of the | |
adjunct instructors hold terminal degrees, however, and are | |
associated with other colleges in the region. Those part-time | |
instructors not holding terminal degrees would be considered | |
“professionally qualified” under standards set by the Association to Advance Collegiate Schools of Business (AACSB). | |
The university would be classified as a non-research intensive | |
institution offering masters degrees in the Carnegie Foundation classification, with a mission that is clearly “teaching” | |
in orientation. | |
Courses in the sample include the disciplines of marketing, management, leadership, finance, accounting, statistics, | |
and economics, with 48% of the sample being graduate | |
courses. Sixty percent of the sample courses were taught | |
by full-time instructors. Average class size is 16 students. | |
Measures | |
Teaching effectiveness—Achievement of student | |
learning outcomes (ACHIEVE) | |
Encouraged by the guidelines of various accrediting agencies, the School has used course learning outcomes for several years. Learning outcomes are established by a faculty | |
committee for each core and concentration required course | |
within the School. There is an average of six to ten learning outcomes per course, and these learning outcomes are | |
specifically identified in the syllabus of each course. | |
Recently the School has invested substantial time and | |
resources in revising and quantifying its learning outcome | |
assessment process. Quantified assessment of course learning outcome attainment by students is measured by a stan- | |
dardized student learning outcome test. The School’s student | |
learning outcome exams are developed individually for each | |
course in each program by a committee of content experts in | |
the subject area, with four questions designed to assess each | |
of the stated learning outcomes. Student outcome exams go | |
beyond simple final exam questions in that they are institutionally agreed upon and formally tied to programmatic objectives. Approximately one-third of the School’s core and | |
required courses are given student learning outcome exams | |
at the present time. | |
Student outcome exams are administered to every student | |
in every section of the course being assessed, regardless of | |
instructor and delivery mode. The same student outcome | |
assessment is used for all sections of the same course, and | |
instructors are not allowed to alter the questions. Student | |
learning outcome exams are given at the end of the course | |
period. Since there are four questions per learning outcome, | |
the exams are all scored on a basis of zero (0) to four (4) | |
points per learning outcome. In the present study, for the | |
ACHIEVE score we use the mean score of all the learning | |
outcome questions on that particular course exam. | |
Depending upon the course and learning outcome, student learning outcome exams consist of multiple choice, or | |
occasionally, short essay questions. For the short essay questions a grading rubric is created so that there is consistency | |
in scoring across all sections of a particular course. Less than | |
10% of all student learning outcome exam content is essay, | |
with over 90% multiple choice. Exam design and rigor is | |
specifically modeled after professional certification exams | |
such as the Certified Public Accounting (CPA) exam. This | |
type of assessment data directly related to carefully articulated “course learning outcomes” is exactly what McKeachie | |
(1979) referred to when he noted, “we take teaching effectiveness to be the degree to which one has facilitated student | |
achievement of education goals” (McKeachie 1979, 385). | |
Although the student learning outcome exam questions | |
are designed to be the same format, the same level of difficulty, and all scored on a 0 to 4 scale, for the full crosssectional sample the ACHIEVE measure does comes from | |
assessments for different courses using different questions. | |
For our full sample analysis we therefore dichotomize the | |
student learning data based upon the median (low student | |
achievement v. high student achievement). Dichotomizing | |
the outcome variable using the median is common in these | |
situations when outcome data come from a relatively small | |
cross-sectional sample, and there is not sufficient sample | |
size to accurately calculate multiple means to normalize the | |
outcome data across the different categories in the sample (e.g., Bolotin 2006; Baarveld, Kollen, Groenier 2007; | |
Mazumdar and Glassman 2000; Muennig, Sohler, and Mahato 2007; Muthén and Speckart 1983). In addition, the implied binary benchmarking of teaching effectiveness for faculty across different departments is common in practice. Most | |
obvious are tenure and promotion decisions (yes or no) for | |
full-time faculty at smaller teaching-driven colleges, annual | |
PREDICTING STUDENT ACHIEVEMENT | |
contract renewals for non-tenure track full-time and part-time | |
teaching lecturers, faculty teaching award nominations, and | |
the formal use of binary assessment metrics of faculty teaching effectiveness by some institutions (e.g., Glazerman et al., | |
2010). Not surprisingly, faculty themselves often tend to informally categorize colleagues (or themselves) as effective or | |
“good” teachers versus being less effective in the classroom | |
(e.g., Fetterley 2005; Sutkin et al. 2008). However, when examining multiple sections of a single course using exactly | |
the same set of learning outcome questions, we use the raw | |
ACHIEVE score in our analysis. | |
Classroom observation (PEER) | |
In our sample, faculty members are required to undergo | |
one classroom observation per year. The classroom observation procedure is typical to many universities—the peer | |
observer reviews the syllabus and arranges the time to visit | |
the class. Although faculty training in peer evaluation processes is often recommended (e.g., Cohen and McKeachie | |
1980; Bernstein et al. 2006, Chism 2007), in our sample faculty peer classroom observers were not provided any specific | |
training in observation techniques. While some universities | |
suggest multiple observers, our sample university required | |
only one observer per instructor. | |
An “evaluation” form is used where the classroom observer checks/scores various questions related to ten different | |
categories of teaching: class meets stated outcomes, level of | |
student understanding, enthusiasm for teaching, sensitivity to | |
student needs, giving clear explanations, use of instructional | |
material, teaching methods and pacing, knowledge of subject, clarity of syllabus, and the course’s assessment process. | |
The last two categories are from review of the syllabus. Comments can also be added. After reading the submitted written | |
observation form, the senior administrator gave a “class observation” score between “1” (low) and “7” (high) based upon | |
the scoring and information in the form. For this study we | |
used this numerical score. In our sample, the numerical classroom peer observation score ranged between “3” and “7”. | |
There was an important difference in the classroom observation process for part-time faculty versus full-time faculty. | |
Full-time faculty could generally request which colleague | |
observed his or her class, with an obvious possible bias toward requesting friends or colleagues who might provide | |
more favorable comments. In contrast, for part-time faculty | |
the classroom observer was appointed by the department | |
chairperson rather than requested by the instructor. | |
It should be noted that there are certainly other possible | |
differences between full-time faculty and part-time faculty, | |
such as tenure status, types of courses taught and terminal | |
degree education. However, in our sample we feel that the | |
most likely explanation for any differences between the ability of the peer evaluation ratings of part-time versus full-time | |
faculty to explain student achievement would come from differences in the “peer” selection process; that is, controlling | |
for the “peer selection bias” commonly mentioned in the | |
51 | |
literature. In fact, no apparent bias in the peer evaluation | |
score was noted for the part-time faculty across a number | |
of variables. For example, there was no significant difference in the mean peer evaluation score between part-time | |
faculty with terminal degrees versus those without terminal | |
degrees. Similarly, although full-time faculty taught a greater | |
percentage of graduate classes versus part-time faculty there | |
was no significant difference in the peer evaluation score for | |
part-time faculty that taught graduate classes versus undergraduate classes. | |
Student perception of teaching effectiveness | |
(SETE) | |
As with most universities, student course evaluations are | |
based upon multiple item forms that gather student perceptions, with several questions directly related to perceptions of | |
the instructor’s skill. We used the comprehensively worded | |
“global” item (SETE Global Instructor) asking students to | |
rate the instructor with the wording, “overall, I rate the | |
instructor of this course an excellent teacher.” Most SETE | |
scales use such a final “global” question, and from the authors’ experience it is this question that tends to hold the | |
most weight in faculty performance evaluations. | |
Control variables | |
As control variables we used class size, whether the course | |
was a graduate course, and delivery method (on-site versus | |
distance). Class size appears to be a particularly important | |
control variable. Zietz and Cochran (1997) found a negative | |
relationship between class size and test results, while Lopus | |
and Maxwell (1995) found a positive relationship in business | |
related classes. Pascarella and Terenzini (2005) argue that the | |
connection is still unknown. A more recent large-scale study | |
of science classes by Johnson (2010) indicates that while | |
class size negatively impacts student learning (as measured | |
by grades), the impact diminishes as class size increases. | |
ANALYSIS | |
We model the analysis close to the actual practice in universities. In our sample we have ACHIEVE, SETE, and the | |
control variables for forty-six classes taught by thirty-four | |
faculty within a one-year period. We use the faculty member’s annual classroom observation scores (PEER) from a | |
single “face-to-face” classroom visit that is closest to the | |
one-year period of our class-specific data2. The other important component of assessing teaching effectiveness in practice would be the collection of student ratings for the various | |
classes during the time period. | |
2We only have the numeral score for a faculty’s peer evaluation, not the | |
specific class it came from. | |
52 | |
GALBRAITH AND MERRILL | |
Within our sample, the bivariate correlation between | |
PEER and SETE is 0.43. This directly compares with | |
Feldman’s (1989b) meta-analysis report of a mean correlation between peer reviews and student evaluations of 0.55. | |
Since research indicates that ratings from direct class observation have somewhat lower correlations with SETEs than | |
for broad peer evaluations (e.g, Burns 1998), the 0.43 correlation between PEER and SETE suggests our sample is | |
probably representative. | |
Ideally, the best test of validity would involve multiple | |
sections of the same course, taught by different instructors, | |
using a common measurement of student performance. This | |
has been noted by several authors. For example, in their | |
discussion of the need to establish peer evaluation validity, | |
Cohen and McKeachie (1980) write, “this would require a | |
multi-section course with a standard post-term achievement | |
measures, such an endeavor would prove valuable for assessing the validity of colleague ratings” (149). In our sample, | |
one course (a graduate finance class) had a sufficient number of different instructors (N = 5) to calculate a correlation | |
between the faculty member’s annual classroom observation | |
score (PEER) and student learning outcomes (ACHIEVE)3. | |
All five of the instructors were full-time faculty. Since the | |
student learning outcome exam for this particular finance | |
course was the same across all sections, we could use the | |
raw scores in this analysis. The bivariate correlation between | |
PEER and ACHIEVE for this particular multi-section course | |
was 0.675 (p < 0.10, one-tailed), statistically significant and | |
in the expected direction in spite of the very small sample | |
size. On the other hand, for this one multi-section course | |
analysis, the bivariate correlation between student evaluation of teaching (SETE) and ACHIEVE was only 0.289; a | |
positive relationship but not statistically significant. In fact, | |
the amount of variance (8.26%) in student achievement explained by SETE in our analysis is very similar to many | |
of the SETE validity studies reviewed by Feldman (1989a, | |
2007) and falls within the “weak” category of scale criterion | |
validity suggested by Cohen, J. (1969, 1981). On the other | |
hand, the faculty member’s annual course observation score | |
(PEER) explains 45.6% of the variance in student achievement in this sample, and therefore falls within the “strong” | |
category of scale criterion validity. Thus, within this well controlled, albeit small, multi-section case, the faculty member’s | |
annual classroom observation score explained a much higher | |
percentage of student achievement than their SETEs. Given | |
the small sample size these results should certainly be interpreted cautiously, however, it should be noted that Feldman’s | |
(1989a, 2007) often cited meta-analyses of SETE validity | |
also includes research with only five instructors/sections in | |
their multi-section samples. | |
3The next largest multi-section course in our sample had only three | |
different instructors, and they were a combination of part-time and full-time | |
faculty. | |
TABLE 1 | |
Binary Logistic Regression Analysis—Explaining | |
Student Achievement (ACHIEVE) | |
Variables | |
Constant | |
Class Size | |
Online Class | |
Graduate Class | |
SETE | |
PEER | |
Nagelkerke R2 | |
Cox and Snell R2 | |
N | |
Pooled-Sample | |
Regression | |
Full-Time Faculty | |
Regression | |
Part-Time | |
Faculty | |
Regression | |
−7.738 | |
−0.117∗∗ | |
0.874 | |
2.610∗∗∗ | |
1.178∗ | |
0.598∗ | |
0.348 | |
0.258 | |
46 | |
−8.493 | |
−0.122∗ | |
1.771∗ | |
3.428∗∗ | |
1.351 | |
0.435 | |
0.487 | |
0.363 | |
28 | |
−13.847 | |
−0.077 | |
−0.120 | |
2.214∗ | |
1.139 | |
1.810∗∗ | |
0.374 | |
0.276 | |
18 | |
Note: ∗∗∗ p < 0.01; ∗∗ p < 0.05; ∗ p < 0.10 | |
We are also interested in comparing the relationship between the two measures commonly used to evaluate a faculty | |
member’s teaching effectiveness (SETEs and PEER) and our | |
independent measure of student achievement (ACHIEVE) | |
across the full range of courses. This is important since most | |
universities compare, either directly or indirectly, a faculty | |
member’s teaching evaluation assessments with other faculty members across departments and schools during annual | |
review, tenure, and promotion decision discussions. | |
For this analysis, ACHIEVE was the dependent variable, while PEER, SETE, and the control variables were | |
independent variables. As previously discussed, for this | |
cross-sectional analysis we used the bivariate measure of | |
ACHIEVE, “high student achievement” and “low student | |
achievement”—the appropriate regression technique is therefore logistic regression. We estimate binary logistic regression models for the full pooled sample, and both the full-time | |
and part-time faculty sub-samples. Table 1 reports the results | |
of this analysis. | |
With respect to the control variables, graduate classes and | |
smaller classes clearly tend to have higher levels of student | |
achievement. The graduate class variable was positive, and | |
statistically significant in all three models—the pooled sample, and both the full-time and part-time faculty sub-samples. | |
Class size, while negative in all three regressions was statistically significant in both in the pooled sample and the full-time | |
faculty sample. The on-line class variable had opposite signs | |
between the estimated regression models, and was statistically significant only in the full-time faculty sample. Overall, | |
all the regression models were statistically significant, and | |
had reasonably high R2s. | |
Of interest to our research are the two “teaching effectiveness” evaluation metrics: student evaluations of teaching | |
(SETE) and the faculty member’s annual classroom peer | |
observations (PEER). Both metrics were positive in all three | |
equations. The SETE variable, however, was statistically significant only for the pooled sample. The PEER variable was | |
also statistically significant in the pooled sample regression. | |
PREDICTING STUDENT ACHIEVEMENT | |
Most interesting are the results for the two sub-samples | |
of full-time and part-time faculty. As previously discussed, | |
there was a significant difference in the way “peers” were | |
selected between these two groups, with the selection of | |
“peers” for part-time faculty more of an independent, “armslength” process. Given this important difference, the pooled | |
sample may be too heterogeneous across the PEER variable | |
and the model estimates therefore misleading.. Examining | |
the two sub-samples should provide additional insight. In | |
the full-time faculty sub-sample, the PEER variable, while | |
indicating a positive relationship, was not statistically significant. However, in the part-time faculty model, which has a | |
much stronger peer selection control process, the classroom | |
observation variable (PEER) was both positive and statistically significant. The SETE metric, while positive in both | |
equations, was not statistically significant in either. | |
DISCUSSION AND CONCLUSION | |
As the debate continues about which measures of teaching | |
effectiveness should be used to evaluate faculty for personnel | |
decisions, there is an increasing need for continued investigation into the validity of these different metrics. With respect | |
to student evaluations of teaching, McKeachie (1996) succinctly summarized the problem, “If student ratings are part | |
of the data used in personnel decisions, one must have convincing evidence that they add valid evidence of teaching | |
effectiveness” (McKeachie 1996, 3). The same can certainly | |
be said for faculty peer evaluations. | |
While there is a large body of empirical literature examining the validity of SETEs, the results of these studies are | |
open to vast differences in interpretation. To date, however, | |
the empirical basis for arguing for the validity of peer evaluations or classroom observations of teaching is based primarily | |
on studies that correlate peer evaluations with SETES. Unfortunately, there are few, if any, studies that examine the | |
relationship between peer evaluations and an actual, independent measure of student achievement, and then compare | |
the strength of this relationship with student ratings. | |
Our research represents an attempt to start filling this gap | |
in our knowledge. Given that major institutions of higher | |
learning around the world regularly employ both student | |
ratings and peer evaluations of teaching for faculty personnel decisions without knowing more about the true validity | |
of these two metrics in assessing teaching effectiveness is | |
somewhat surprising. | |
In our study we were able to examine the validity of one | |
important component of peer evaluations, the classroom observation, from two perspectives. Using a multi-section class | |
taught by different instructors, we compared the annual classroom observation ratings for faculty members against the results of an independent student learning outcome assessment | |
measure in courses those faculty members taught. Not only | |
did we find that the annual classroom observation metric was | |
53 | |
significantly and positively correlated with student achievement, but that it was also a much better predictor of student | |
achievement than student ratings of teaching (SETEs) from | |
the classes. This is exactly the type of validity testing called | |
for by Cohen and McKeachie (1980). | |
We also wanted to examine the validity of both classroom | |
observation and SETEs in a manner which somewhat paralleled the way that most universities actually use such measures for personnel decisions, that is, across different instructors, courses and departments. Again, using the standardized | |
learning assessment, our analysis again offered two conclusions. First, a faculty member’s annual classroom observation | |
rating was positively related to student achievement, particularly when the process reflected a somewhat “arms-length” | |
selection of the actual observer. Second, under these conditions of stricter peer-selection control, a faculty member’s | |
annual classroom observation rating was more significantly | |
related to student achievement than the course SETEs. In addition, although not directly the focus of our study, we also | |
found evidence that class size was negatively related to student achievement, with smaller classes outperforming larger | |
classes on the average. | |
Our analysis supports the validity of university-level classroom observation by peers, particularly if done under relatively strict peer-selection controls. And it should be noted | |
that our peer evaluation process followed few of the complex | |
observation, training, feedback, and reporting protocols suggested by the rapidly expanding normative peer evaluation | |
literature—our reviewers were simply colleagues asked to | |
observe another’s class with a simple check-list. | |
Obviously there are limitations to our research. First, our | |
sample size was relatively small, particularly in our multisection analysis. While this should suggest caution in interpretation, this type of research will always struggle with sample size issues. Second, it would have been ideal if we could | |
have obtained the actual post-observation forms so that multiple, independent scorers could have provided the quantitative | |
ratings. This would have allowed for a test of inter-rater reliability. And finally, our data came from only one university, | |
albeit across different departments and disciplines. | |
Given these limitations, however, we feel that our results | |
are noteworthy, particularly since almost no published research has appeared that directly correlates classroom peer | |
observation results to an independent measure of student | |
achievement designed around agreed upon student learning outcomes. Although university accrediting bodies are | |
encouraging more assurance of learning outcome measurement, few universities at the present time are taking a standardized and quantifiable approach to assessing learning | |
outcomes across all disciplines that lend themselves to crossinstitutional analysis. We hope that as more and more outcome data do become standardized, quantified, and available | |
from different institutions that additional empirical analysis will continue to examine these fascinating, and highly | |
charged debates. | |
54 | |
GALBRAITH AND MERRILL | |
REFERENCES | |
Abrami, P., L. Levanthal, & R. Perry. 1982. Educational seduction. Review | |
of Educational Research 52: 446–464. | |
Ambady, N., & Rosenthal, R. 1993. Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. | |
Journal of Personality and Social Psychology 64: 431–441. | |
Anderson, K., & Smith, G. 2005. Students preconceptions of professors: | |
Benefits and barriers according to ethnicity and gender. Hispanic Journal | |
of Behavioral Sciences 27(2): 184–201. | |
Arreola, R. 2007. Developing a comprehensive faculty evaluation system. | |
3rd ed. Bolton, MA: Anker Publishing. | |
Atamian, R., & G. Ganguli. 1993. Teacher popularity and teaching effectiveness: Viewpoint of accounting students. Journal of Education for Business | |
68(3): 163–169. | |
Baarveld, F., B. Kollen, & K. Groenier 2007. Expertise in sports medicine | |
among family physicians: What are the benefits? The Open Sports | |
Medicine Journal 1: 1–4. | |
Balam, E., & D. Shannon. 2010. Student ratings of college teaching: A | |
comparison of faculty and their students. Assessment and Evaluation in | |
Higher Education 35(2): 209–221. | |
Barrett, J., K. Hart, J. Schmerier, K. Willmartch, J. Carey, & S. Mohammed. | |
2009. Criterion validity of the financial skills subscale of the direct assessment of functional status scale. Psychiatry Research 166(2/3): 148– | |
157. | |
Bernstein, D. 2008. Peer review and the evaluation of the intellectual work | |
of teaching. Change, March/April: 48–51. | |
Bernstein, D., A. Burnett, A. Goodburn, & P. Savory. 2006. Making teaching | |
and learning visible: Course portfolios and the peer review of teaching. | |
Bolton, MA: Anker Publishing. | |
Bernstein, D., & R. Edwards. 2001. We need objective, rigourous peer review | |
of teaching. Chronicle of Higher Education 47(17): B24. | |
Blackhart, G., B. Peruche, C. DeWall, & T. Joiner. 2006. Factors influencing | |
teaching evaluations in higher education. Teaching of Psychology 33: | |
37–39. | |
Bolotin, A. 2006. Fuzzy logic approach to robust regression of uncertain | |
medical categories. World Academy of Science, Engineering and Technology 22: 106–111. | |
Bowling, N. 2008. Does the relationship between student ratings of course | |
easiness and course quality vary across schools? The role of school academic rankings. Assessment and Evaluation in Higher Education 33(4): | |
455–464. | |
Boysen, G. 2008. Revenge and student evaluations of teaching. Teaching of | |
Psychology 35(3): 218–222. | |
Buck, S., & D. Tiene. 1989. The impact of physical attractiveness, gender, | |
and teaching philosophy on teacher evaluations. Journal of Educational | |
Research 82: 172–177. | |
Burns, C. 1998. Peer evaluation of teaching: Claims vs. research. University | |
of Arkansas, Little Rock, AK. http://eric.ed.gov/ERICWebPortal/search/ | |
detailmini . jsp ? nfpb = trueand andERICExtSearch SearchValue 0 = | |
ED421470andERICExtSearch SearchType 0=noandaccno=ED421470 | |
Campbell, J. and W. Bozeman. 2008. The value of student ratings: Perceptions of students, teachers, and administrators. Community College | |
Journal of Research and Practice 32(1): 13–24. | |
Carrell, S., & J. West. 2010. Does professor quality matter? Evidence from | |
random assignments of students to professors. Journal of Political Economy 118(3): 409–432. | |
Cavanagh, R. 1996. Formative and summative evaluation in the faculty | |
peer review of teaching. Innovative Higher Education 20(4): 235– | |
240. | |
Centra, J. 1993. Reflective faculty evaluation. San Francisco: Jossey-Bass. | |
Centra, J. 1994. The use of teaching portfolios and student evaluations for | |
summative evaluation. Journal of Higher Education 65: 555–570. | |
Chism, N. 2007. Peer review of teaching: A sourcebook. 2nd ed. Bolton, | |
MA: Anker Publishing. | |
Cohen, J. 1969. Statistical power analysis for the behavioural sciences, San | |
Diego, CA: Academic Press. | |
Cohen, J. 1981. Statistical power analysis for the behavioural sciences. 2nd | |
ed. Hillsdale, NJ: Lawrence Erlbaum Associates. | |
Cohen, P. 1981. Student ratings of instruction and student achievement. | |
Review of Educational Research 51(3): 281–309. | |
Cohen, P. 1982. Validity of student ratings in psychology courses: A research | |
synthesis. Teaching of Psychology 9(2): 78–82. | |
Cohen, P. 1983. Comment on a selective review of the validity of student | |
ratings of teaching. Journal of Higher Education 54(4): 448–458. | |
Cohen, P., & W. McKeachie. 1980. The role of colleagues in the evaluation | |
of college teaching. Improving college and university teaching 28(4): | |
147–154. | |
Costello, J., B. Pateman, H. Pusey, & K. Longshaw. 2001. Peer review | |
of classroom teaching: An interim report. Nurse Education Today 21: | |
444–454. | |
Costin, P. 1987 Do student ratings of college teachers predict student | |
achievement? Teaching of Psychology 5(2): 86–88. | |
Courneya, C., D. Pratt, & J. Collins. 2007. Through what perspective do | |
we judge the teaching of peers? Teaching and Teacher Education 24: 69– | |
79. | |
Davies, M., J. Hirschberg, J. Lye, & C. Johnston. 2007. Systematic influences | |
on teaching evaluations: The Case for Caution. Australian Economic | |
Papers 46(1): 18–38. | |
DeZure, D. 1999. Evaluating teaching through peer classroom observation. | |
In Changing practices in evaluating teaching, ed. P. Seldin. Bolton, MA: | |
Anker Publishing. | |
Dowel, D., & J. Neal. 1982. A selective review of the validity of student | |
ratings of teaching. Journal of Higher Education 32(1): 51–62. | |
Dowell, D., & J. Neal. 1983. The validity and accuracy of student ratings | |
of instruction: A reply to Peter A. Cohen. Journal of Higher Education | |
54(4): 459–463. | |
Emery, C., T. Kramer, & R. Tian. 2003. Return to academic standards: A critique of student evaluations of teaching effectiveness. Quality Assurance | |
in Education 11(1): 37–46. | |
Feldman, K. 1989a. The association between student ratings of specific | |
instructional dimensions and student achievement: Refining and extending the synthesis of data from multisection validity studies. Research in | |
Higher Education 30(6): 583–645. | |
Feldman, K. 1989b. Instructional effectiveness of college teachers as judged | |
by teachers themselves, current and former students, colleagues, administrators, and external (neutral) observers. Research in Higher Education | |
30(2): 137–194. | |
Feldman, K 2007. Identifying exemplary teachers and teaching: Evidence | |
from student ratings. In The scholarship of teaching and learning in | |
higher education: An evidence-based perspective, eds. R. Perry and J. | |
Smart, 93–129. Dordrecht, The Netherlands: Springer. | |
Felton, J., J. Mitchell, & J. Stinson. 2004. Web-based student evaluations | |
of professors: The relations between perceived quality, easiness and sexiness. Assessment and Evaluation in Higher Education, 29(1): 91–108. | |
Fetterley, J. 2005. Teaching and “my work”. American Literary History | |
17(4): 741–752. | |
Galbraith, C., G. Merrill, & D. Kline. 2011. Are student evaluations of | |
teaching effectiveness valid for measuring student learning outcomes | |
in business related classes? A neural network and Bayesian analysis. Research in Higher Education: 1–22. http://www.springerlink.com/ | |
content/2058756205016652. | |
Glazerman, S. S., Loeb, D., Goldhaber, S., Raudenbush, D., Staiger, G., & | |
Whitehurst, G. 2010. Evaluating teachers: The important role of valueadded. Palo Alto, CA: Center for Educational Policy Analysis, Stanford | |
University. | |
Hon, J., K. Lagden, A. McLaren, D. O’Sullivan, L. Orr, P. Houghton, & M. | |
Woodbury. 2010. A prospective multicenter study to validate use of the | |
PUSH© in patients with diabetic, venous, and pressure ulcers. Ostomy | |
Wound Management 56(2): 26–36. | |
PREDICTING STUDENT ACHIEVEMENT | |
Hutchings, P. 1996. The peer review of teaching: Progress, issues and | |
prospects. Innovative Higher Education 20(4): 221–234. | |
Hutchings, P., ed. 1998. The course portfolio. Sterling, VA: Stylus. | |
Johnson, I. 2010. Class size and student performance at a public research | |
university: A cross-classified model. Research in Higher Education. | |
http://www.springerlink.com/content/0l35t1821172j857/fulltext.pdf | |
Kremer, J 1990. Construct validity of multiple measures in teaching, research, and service and reliability of peer ratings. Journal of Educational | |
Psychology 82: 213–218. | |
Kohut, G., C. Burnap, & M. Yon. 2007. Peer observation of teaching: Perceptions of the observer and the observed. College Teaching 55(1): 19–25. | |
Langbein, L. 2008. Management by results: Student evaluation of faculty | |
teaching and the mis-measurement of performance. Economics of Education Review 27(4): 417–428. | |
Lopus, J., & N. Maxwell. 1995. Should we teach microeconomic principles | |
before macroeconomic principles? Economic Inquiry 33(2): 336–350. | |
Malik, D. 1996. Peer review of teaching: External review of course content. | |
Innovative Higher Education. 20(4): 277–286. | |
Mazumdar, M., & R. Glassman. 2000. Categorizing a prognostic variable: Review of methods, code for easy implementation and applications to decision-making about cancer treatments. Statistics Medicine 19: | |
113–132 | |
McCallum, L. 1984. A meta-analysis of course evaluation data and its use | |
in the tenure decision. Research in Higher Education 21: 150–158. | |
McKeachie, W. 1979. Student ratings of faculty: A reprise. Academe 65(6): | |
384–397. | |
McKeachie, W. 1996. Student ratings of teaching. Occasional Paper No. | |
33. American Council of Learned Societies, University of Michigan. | |
http://archives.acls.org/op/33 Professonal Evaluation of Teaching.htm | |
McNatt, B. 2010. Negative reputation and biases student evaluations of | |
teaching: Longitudinal results from a naturally occurring experiment. | |
Academy of Management Learning and Education 9(2): 225–242. | |
Muennig, P., N. Sohler, & B. Mahato. 2007. Scoioeconomic status as an independent predictor of physiological biomarkers of cardiovascular disease: Evidence from NHANES. Preventive Medicine. | |
http://www.sciencedirect.com. | |
Muthén, B., & G. Speckart. 1983. Categorizing skewed, limited dependent | |
variables. Evaluation Review 7(2): 257–269. | |
Naftulin, D., J. Ware, & F. Donnelly. 1973. The Doctor Fox lecture: A | |
paradigm of educational seduction. Journal of Medical Education 48: | |
630–635. | |
55 | |
Pascarella, E., & P. Terenzini. 2005. How college affects students: A third | |
decade of research. San Francisco: Jossey-Bass | |
Peel, D. 2005. Peer observation as a transformatory tool? Teaching in Higher | |
Education 10(4): 489–504. | |
Peterson, K. 2000. Teacher evaluation: A comprehensive guide to new directions and practices. 2nd ed.. Thousand Oaks, CA: Corwin Press. | |
Pounder, J. 2007. Is student evaluation of teaching worthwhile? An analytical framework for answering the question. Quality Assurance in Education 15(2): 178–191. | |
Riniolo, T., K. Johnson, T. Sherman, & J. Misso. 2006. Hot or not: Do professors perceived as physically attractive receive higher student evaluations? | |
The Journal of General Psychology 133(1): 19–35. | |
Shortland, S 2004. Peer observation: A tool for staff development or compliance? Journal of Further and Higher Education 28: 219–227. | |
Smith, B. 2007. Student ratings of teaching effectiveness: An analysis of endof-course faculty evaluations. College Student Journal 471(4): 788–800. | |
Steward, R., & R. Phelps. 2000. Faculty of color and university students: | |
Rethinking the evaluation of faculty teaching. Journal of the Research | |
Association of Minority Professors 4(2): 49–56. | |
Sutkin, G., E. Wagner, I.Harris, & R. Schiffer. 2008. What makes a good clinical teacher in medicine? A review of the literature, Academic Medicine | |
83(5): 452–466. | |
Taylor, J. 2007. The teaching/research nexus: A model for institutional | |
management. Higher Education 54(6): 867–884. | |
Varni, J., M. Seid, & P. Kurtin. 2001. PedsQLTM4.0: Reliability and validity | |
of the Pediatric Quality of Life InventoryTMVersion 4.0 Generic Core | |
Scales in healthy and patient populations. Medical Care 39(8): 800– | |
812. | |
Whitfield, K., R. Buchbinder, L. Segal, & R. Osborne. 2006. Parsimonious and efficient assessment of health-related quality of life in | |
osteoarthritis research, validation of the Assessment of Quality of | |
Life (AQoL) instrument. Health and Quality of Life Outcomes 4(19). | |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538577/#B19 | |
Yon, M., C. Burnap, & G. Kohut. 2002. Evidence of effective teaching: | |
Perceptions of peer reviewers. College Teaching 50(3): 104–110. | |
Youmans, R., & B. Jee. 2007. Fudging the numbers: Distributing chocolate | |
influences student evaluations of an undergraduate course. Teaching of | |
Psychology 34(4): 245–247. | |
Zietz, J., & H. Cochran. 1997. Containing cost without sacrificing achievement: Some evidence from college-level economics classes. Journal of | |
Education Finance 23: 177–192. | |
1007859 | |
research-article2021 | |
JMTXXX10.1177/10570837211007859Journal of Music Teacher EducationHash | |
Article | |
Reliability and Construct | |
Validity of the edTPA for | |
Music Education | |
Journal of Music Teacher Education | |
1–15 | |
© National Association for | |
Music Education 2021 | |
Article reuse guidelines: | |
sagepub.com/journals-permissions | |
https://doi.org/10.1177/10570837211007859 | |
DOI: 10.1177/10570837211007859 | |
jmte.sagepub.com | |
Phillip M. Hash1 | |
Abstract | |
The purpose of this study was to examine the psychometric quality of Educative | |
Teacher Performance Assessment (edTPA) scores for 136 preservice music teachers | |
at a Midwest university. I addressed the factor structure of the edTPA for music | |
education, the extent to which the edTPA fits the one- and three-factor a priori | |
models proposed by the test authors, and the reliability of edTPA scores awarded | |
to music education students. Factor analysis did not support the a priori one-factor | |
model around teacher readiness, or the three-factor model based on the edTPA | |
tasks of Planning, Instruction, and Assessment. Internal consistency was acceptable | |
for all rubrics together and for the Instruction task. However, estimates of interrater | |
reliability fell substantially below those reported by test administrators. These findings | |
indicate the need for revision of the edTPA for music education and call into question | |
its continued use among music teacher candidates in its current form. | |
Keywords | |
assessment, edTPA, music teacher preparation, student teaching, teacher readiness | |
The Educative Teacher Performance Assessment (edTPA) is a portfolio-based subjectspecific project completed by preservice candidates during their clinical semester. | |
Educator preparation programs in 41 states and the District of Columbia currently | |
administer the edTPA and at least 19 states and the District of Columbia use the assessment for initial licensure. The Stanford Center for Assessment, Learning, and Equity | |
(SCALE) is the sole developer of the edTPA and Stanford University is the exclusive | |
owner. The university has licensed the Evaluation Systems group of Pearson to provide | |
1 | |
Illinois State University, Normal, USA | |
Corresponding Author: | |
Phillip M. Hash, School of Music, Illinois State University, Campus Box 5660, Normal, IL 61790-5660, USA. | |
Email: [email protected] | |
2 | |
Journal of Music Teacher Education 00(0) | |
operational support for national administration of the assessment (Powell & Parkes, | |
2020; SCALE, 2019a). | |
Candidates completing the edTPA engage in three tasks: Planning, Instruction, and | |
Assessment. The complete portfolio consists of several artifacts including lesson | |
plans, instructional materials, assessments, written commentaries, teaching videos, | |
and student work samples, as dictated by separate handbooks for 28 content areas. | |
Music candidates follow the K–12 Performing Arts Assessment Handbook (SCALE, | |
2018a), which also includes theater and dance. SCALE (2013) states that the theoretical framework for the edTPA evolved from a three-step process that included the | |
following: | |
1. | |
2. | |
3. | |
Subject-specific expert design teams who provided content validity evidence | |
of the specific job-related competencies assessed within each subject area. | |
A job analysis study to confirm the degree to which the job requirements of a | |
teacher align to the edTPA. | |
A content validation committee to rate the importance, alignment, and representativeness of the knowledge and skills required for each edTPA rubric in | |
relation to national pedagogical and content-specific standards. | |
Among other requirements, candidates must attend to academic language demands in | |
all three tasks, which include teaching subject-specific vocabulary, engaging in a language function (e.g., analyze, describe, identify, create), and demonstrating the use of | |
syntax and/or discourse (e.g., speaking or writing) within the discipline (SCALE, | |
2018a). | |
According to SCALE (2019a), handbooks in all content areas share approximately | |
80% of their design. The other 20% contains key subject-specific components of | |
teaching and learning drawn from the content standards authored by national organizations. However, it is unclear how the edTPA relates to standards of the National | |
Association of Schools of Music (2020), the National Association for Music Education | |
(2014), or any other arts organization. | |
Candidates submit their portfolios to Pearson, who employs independent evaluators | |
to score the materials. Scoring for most portfolios involves 15 rubrics, five per task, | |
graded on a scale of one (novice not ready to teach) to five (highly accomplished | |
beginner). This process results in a possible maximum total score of 75. Evaluators | |
review specified artifacts and written commentary separately for each rubric rather | |
than considering all parts of the assessment together. Candidates not achieving the | |
minimum benchmark set by their state or institution can revise and resubmit one, two, | |
or all three tasks (Parkes & Powell, 2015; SCALE, 2018b). | |
The pool of edTPA scorers includes P–12 teachers and college faculty with pedagogical content knowledge and experience preparing novice teachers. They possess | |
discipline-specific expertise and score only those portfolios for which they are qualified. Although the performing arts include music, theater, and dance, only scorers with | |
knowledge and experience in music evaluate candidates in this discipline (Pearson, | |
personal communication, April 21, 2020). | |
Hash | |
3 | |
Evaluators complete an extensive training program and must demonstrate their | |
ability to determine scores consistently and accurately (SCALE, 2019a). SCALE randomly selects 10% of portfolios for double scoring to maintain reliability. In addition, | |
portfolios scored within a defined range above and below the state-specific (currently | |
35–41) or SCALE-recommended (currently 42) cut score undergo a second and sometimes third review. In these cases, a scoring supervisor resolves instances where Scorer | |
1 and Scorer 2 (a) are more than 1 point apart on any rubric or (b) determine total | |
scores on opposite sides of the cut score. The supervisor also resolves cases where | |
both scorers fall above or below the cut score but have five or more adjacent rubric | |
scores (SCALE, 2019b). | |
Proponents claim that the edTPA provides an authentic means of assessing teacher | |
readiness by measuring candidates’ ability to create lesson plans, implement instruction, and assess student learning in an actual classroom environment. Supporters also | |
emphasize the assessment’s uniformity across disciplines and seemingly impartial | |
evaluation, as well as the potential for the edTPA to shape teacher education programs | |
and curricula. Some college faculty believe that the edTPA has fostered their professional growth, while cooperating teachers in K–12 school districts report that the | |
assessment provides guidance for them in mentoring candidates during the student | |
teaching semester (Darling-Hammond & Hyer, 2013; Pecheone & Whittaker, 2016; | |
Sato, 2014). | |
Critics cite concerns with ecological validity of the edTPA and state that candidates | |
might make instructional decisions to meet the requirements of the rubrics rather than | |
long-term student needs (Parkes & Powell, 2015). In addition, the two required video | |
excerpts (maximum 10 minutes each) could alter the teaching environment, create | |
privacy concerns, foster anxiety among candidates, and fail to capture nuanced student | |
interactions and other aspects of teaching stipulated in the rubrics (Bernard & McBride, | |
2020; Choppin & Meuwissen, 2017). | |
Behizadeh and Neely (2018) questioned the consequential validity of the edTPA in | |
relation to positive and negative social outcomes, especially in an urban teacher preparation program focused on social justice. Participants (N = 16) in this study, who were | |
mostly candidates of color and first-generation college students, stated that the edTPA | |
increased their mental and financial stress and lacked a social justice orientation in the | |
scoring procedures. They also felt pressure to select the highest achieving classes for | |
their lesson segment and to teach content that fulfilled scoring criteria, regardless of | |
students’ needs. Authors have also criticized the corporate control of the scoring process, the high cost for teacher candidates ($300 for initial submission), and the effect | |
of the edTPA on preparation program autonomy (e.g., Dover et al., 2015; Heil & Berg, | |
2017; Parkes, 2020). | |
The content and evaluation standards of the edTPA can present problems specific to | |
the music classroom. For example, the timeline requiring candidates to teach their | |
entire unit in three to five consecutive lessons might not allow K–12 students to engage | |
in creative artistic processes authentically (Heil & Berg, 2017). The assessment can | |
also force candidates in secondary ensemble programs to teach edTPA lessons unrelated to the goals of the classroom and within a tight rehearsal schedule dictated by | |
public performances (Powell & Parkes, 2020). | |
4 | |
Journal of Music Teacher Education 00(0) | |
SCALE (e.g., 2015, 2018c, 2019a) annually reports the reliability and validity of | |
the edTPA using data from statistical tests conducted on aggregated scores of all content areas with at least 10 portfolio submissions from the previous calendar year. In | |
2018, internal consistency as measured by Cronbach’s α equaled .89 for the performing arts, and for all subjects combined. Interrater reliability estimates using the kappan (kn) statistic averaged .91 among the 15 evaluation rubrics. Factor analysis | |
supported both the one-factor and three-factor models, with all loadings exceeding | |
.50. According to SCALE (2015), these results “confirm [ ] that the tasks are measuring a common unifying teaching construct and that there are three common latent | |
constructs . . . which [comprise] each of the three tasks” (p. 22). Factor correlations | |
in the three-factor model ranged from .71 to .78, which SCALE (2018c) claims “supports the edTPA structure consisting of three correlated abilities: Planning, Instruction, | |
and Assessment” (p. 25). | |
Gitomer et al. (2021) questioned the psychometric validity and reliability of the | |
edTPA due to (a) the use of aggregated data across content areas, (b) the supposed | |
existence of both a one- and a three-factor model, (c) measures of internal consistency | |
involving scores of all evaluators combined, and (d) the utilization of exact + adjacent | |
agreements rather than only exact agreements to calculate interrater reliability through | |
the kn statistic. The authors illustrated the difference in interrater agreement indices | |
attained using exact agreements only versus exact + adjacent agreements as used by | |
SCALE. The simulation involved rubric scores from 184 students from one of the | |
author’s institutions and interrater agreement coefficients for all handbooks combined | |
from the 2017 edTPA Administrative Report (SCALE, 2018c). Results indicated that | |
interrater reliability for individual rubrics ranged from kappa indices of .06 to .32 (M | |
= .23) using only exact agreements, compared with .85 to .97 (M = .91) as reported | |
by SCALE. The authors acknowledged the need for analysis of individual content | |
areas and called for SCALE to make these data publicly available. | |
Musselwhite and Wesolowski (2019) used the Rasch Measurement Model to analyze edTPA scores of music students (N = 100) from three universities in the United | |
States. They examined (a) the validity and reliability of the 15 rubrics, (b) the extent | |
to which the rubric criteria fit the measurement model and vary in difficulty, and (c) | |
if category response structures for each criterion empirically cooperate to provide | |
meaningful measures. Reliability of separation, similar in interpretation to | |
Cronbach’s alpha, fell within the upper range of acceptability for students (Rel. = | |
.89) and rubric criteria (Rel. = .95), meaning edTPA scores could be used to separate | |
high- and low-achieving students and the most and least difficult rubric criteria. | |
Rubrics within each of the three tasks demonstrated adequate data-model fit. | |
However, based on underuse of the lowest (1) and highest (5) ratings, the authors | |
suggested that response categories were not capturing the full range of candidate | |
performance or the results may not reflect the expected and intended meaning of the | |
rubrics (e.g., “novice not ready to teach,” “highly accomplished beginner”). In addition, violations of monotonicity (i.e., the assumption that variables move consistently in the same or opposite directions) raised concerns with the overall rating | |
scale structure. | |
Hash | |
5 | |
Austin and Berg (2020) analyzed the reliability, validity, and utility of edTPA scores | |
for music teacher candidates (N = 60) over a 3-year period from 2013 to 2015. Scores | |
for all three tasks (α = .76-.81) and the 15 rubrics combined (α = .84) demonstrated | |
adequate internal consistency. Factor analysis supported the construct validity of the | |
assessment and produced a clear structure that corresponded to the three edTPA tasks. | |
Criterion-related validity evidence was mixed, however, with most correlations | |
between edTPA scores and the 16 variables examined being of modest magnitude | |
(<.25). | |
Purpose and Need for the Study | |
Annual edTPA Administrative Reports (e.g., SCALE, 2015, 2018c, 2019a) provide | |
factor analysis and interrater agreement data for all content areas combined, as well as | |
Cronbach’s alpha for each handbook with at least 10 submissions. The reports provide | |
no data related to internal consistency (α) of each task or to factor analysis for specific | |
disciplines. The 2018 Administrative Report states that “factor analyses models of | |
latent structure are reviewed for each field [handbook] with appropriate sample size” | |
(SCALE, 2019a, p. 15). However, only state-level technical advisory committees have | |
access to these data (Pearson, personal communication, March 18, 2020). | |
Detailed reliability and validity data for individual subject areas assessed by the | |
edTPA are necessary for policymakers and teacher educators to evaluate the efficacy | |
of this instrument. However, SCALE does not make these data available to the public | |
(Gitomer et al., 2021). Therefore, the purpose of this study was to examine the psychometric quality of edTPA scores for portfolios completed by 136 preservice music | |
teachers. Research questions were as follows: | |
Research Question 1: What factor structure emerges from edTPA scores for music | |
education? | |
Research Question 2: To what extent do edTPA scores for music education fit the | |
one- and three-factor a priori models proposed by SCALE (2013)? | |
Research Question 3: What is the internal consistency and interrater reliability of | |
the edTPA for music education students? | |
This research will help estimate the reliability and construct validity of the edTPA | |
specifically among preservice music educators and provide discipline-specific data to | |
compare against that available publicly (e.g., SCALE, 2019a). | |
Method | |
Data | |
Data for this study consisted of all edTPA rubric scores (N = 2,040) attained between | |
fall 2015 and spring 2020 for preservice music educators (N = 136) at one large university in the Midwestern United States. The sample involved 61 males and 75 females. | |
6 | |
Journal of Music Teacher Education 00(0) | |
All participants were pursuing a Bachelor of Music Education degree and following | |
either an instrumental (n = 93, 68.4%) or a vocal (n = 43, 31.6%) track. With one | |
exception, all students passed the edTPA on their initial attempt. The institution piloted | |
the edTPA beginning in 2013, two years before the state implemented the assessment | |
as a requirement for teacher licensure (Adkins et al., 2015). | |
A comparison of music scores from this study with national data for the K–12 | |
Performing Arts Handbook indicated higher than average final (Music: M = 51.62; | |
Performing Arts: M = 46.36) and rubric scores (Music: M = 3.44; Performing Arts: | |
M = 3.18). Individual rubric means all exceeded national averages and the 3.0 benchmark associated with candidates being “competent and ready to teach.” In addition, | |
frequency counts and skewness indices indicated a normal distribution (see Table S1 | |
in the online supplement). Consistent with national data for the K–12 Performing Arts | |
(SCALE, 2018c, 2019a, 2019c), 6% of rubric scores in this study consisted of a 1 or a | |
5 with about 95% of scores falling within the 2 to 4 range. | |
Construct Validity | |
Preliminary examination of construct validity involved a series of factor analyses | |
using various methods and rotations to determine the best model fit based on criteria | |
for simple structure (Asmus, 1989; J. D. Brown, 2009): | |
1. | |
2. | |
3. | |
4. | |
Each variable produces at least one zero loading (−.10 to +.10) on some | |
factor. | |
Each factor has at least as many zero loadings as there are factors. | |
Each pair of factors contains variables with significant loadings (≥.30) on one | |
and zero loadings on the other. | |
Each pair of factors contains only a few complex variables (loading ≥.30 on | |
more than one factor). | |
Final analysis for this study involved principal axis factoring using Kaiser normalization and promax rotation with kappa set at the default value of 4. The first analysis | |
used an eigenvalue of one criterion to determine if a factor structure other than that | |
determined by SCALE (2013) might emerge from the edTPA for music education. | |
Subsequent analysis tested the existence of the a priori models, which include a singlefactor solution around teacher readiness and a three-factor model aligned with edTPA | |
tasks: Planning, Instruction, and Assessment. | |
I considered the effectiveness of the models based on communalities (proportion of | |
each variable’s total variance explained by all factors) and the extent to which items | |
achieved a high loading on their intended factor. Generally, researchers consider loadings of .30 to .40 meaningfully large (Miksza & Elpus, 2018). The pattern matrix | |
(unique contribution of each factor to a variable’s variance) served as the primary | |
determinant used to identify which items clustered into factors. I also examined the | |
structure matrix (correlation of each variable and factor) to verify the interpretation. | |
Bartlett’s test of sphericity indicated if there were adequate correlations for data | |
Hash | |
7 | |
reduction, and the Kaiser-Meyer-Olkin measure determined sampling adequacy | |
(Asmus, 1989; J. D. Brown, 2009). Maximum interfactor correlations of .80 served as | |
the standard for adequate discriminant validity (T. A. Brown, 2015). | |
SCALE analyzes internal structure of the edTPA for all content areas combined | |
through a confirmatory factor analysis using maximum likelihood estimation, which | |
assumes a normal distribution and is most appropriate for large sample sizes (Costello | |
& Osborne, 2005; Miksza & Elpus, 2018). Principal axis factoring used in this study | |
better fit the data and proved more effective in achieving a simple solution (e.g., J. D. | |
Brown, 2009). | |
Reliability | |
Cronbach’s alpha provided a measure of internal consistency for the complete edTPA, | |
individual tasks determined by SCALE (2019a), and factors identified in this study. A | |
coefficient of α ≥ .80 served as the minimum acceptable benchmark as per general | |
practice in the social sciences (e.g., Carmines & Zeller, 1979; Krippendorff, 2013). | |
SCALE (2019a) analyzed interrater reliability for each rubric using the kappan | |
statistic: | |
kn = | |
AO − 1 / n | |
1 − 1/ n | |
where AO represents observed agreement and n equals the number of possible adjudication categories/classifications.1 Due to the lack of agreement indices for data in this | |
study, I replicated the procedure of Gitomer et al. (2021) and calculated Cohen’s kappa | |
formula instead: | |
A − AC | |
k= O | |
1 − AC | |
This estimate of interrater reliability used the proportions of exact agreement (AO) | |
reported in the 2018 edTPA Administrative Report for all content areas combined and | |
chance agreement (AC) coefficients from the music scores analyzed here. Chance | |
agreement indices equaled the sum of the cross-multiplied proportions of rubric scores | |
in each category (1–5) from portfolios that did not contain fractional numbers (e.g., | |
2.5) due to double scoring (n = 128).2 Thus, kappa is higher to the extent that observed | |
agreement exceeds the expected level of chance agreement (Brennan & Prediger, | |
1981). Due to the unavailability of data from two independent evaluators, calculations | |
of chance agreement involved multiplying duplicate proportions of one scorer | |
(Gitomer et al., 2021). | |
Like Gitomer et al. (2021), I only considered exact agreements when estimating | |
kappa to provide a more precise estimate of interrater reliability. Kappa coefficients | |
reported by SCALE (2019a) are likely inflated because calculations involved exact + | |
adjacent agreements on a 5-point scale, where about 95% of scores fell between 2 and | |
4 (Stemler & Tsai, 2008). Kappa can range from −1 to +1 with interpretations of poor | |
(below 0.00), slight (0.00-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial | |
(0.61-0.80), and almost perfect (0.81-1.00; Landis & Koch, 1977). | |
8 | |
Journal of Music Teacher Education 00(0) | |
The purpose of estimating k was to demonstrate the difference in readings based on | |
exact + adjacent agreements versus those attained with exact agreements only. The | |
use of exact + adjacent agreements is problematic on a 5-point scale, especially when | |
scorers rarely use the highest and lowest categories. This approach is less problematic | |
when estimating reliability for longer scales because of the underlying possible score | |
range and the precision required to attain perfect agreement (Stemler & Tsai, 2008). | |
Results | |
Factor Structure | |
I conducted an exploratory factor analysis (EFA) using principal axis factoring with | |
promax rotation and an eigenvalue of one criterion for extraction. Based on Bartlett’s | |
test output (χ2 = 726.9, p < .001) and Kaiser-Meyer-Olkin measure (.89), I concluded | |
the underlying data were adequately correlated and the sample size was appropriate | |
for conducting an EFA. Eigenvalues equaled 5.88 (Factor 1), 1.34 (Factor 2), and 1.06 | |
(Factor 3), with Factor 1 accounting for 35.7% of the variance in edTPA ratings, followed by Factors 2 (5.8%) and 3 (3.4%) for a cumulative explained variance of 44.9%. | |
The resulting three-factor model met all criteria for simple structure with the exception | |
of three rubrics failing to achieve a 0 loading on any factor, a criterion which may be | |
difficult to meet with smaller sample sizes and fewer extracted factors. | |
The three-factor model resulting from edTPA scores for music education (see Table | |
S2 in the online supplement) did not support the a priori structure proposed by SCALE | |
(2019a) around the three tasks. R1-R5, R11, R12, and R15 from Tasks 1 and 3 clustered into Factor 1. Factor 2 consisted of R6-R9 from Task 2, and Factor 3 consisted of | |
R10 from Task 2, and R13 and R14 from Task 3. The eight rubrics comprising Factor | |
1 suggest an interpretation of “Planning and Assessment.” Factor 2, consisting of | |
R6-R9, resembled Task 2 (Instruction). Factor 3, containing R10, R13, and R14, defied | |
a clear interpretation. | |
With interfactor correlations ranging from .51 (Factors 2 and 3) to .67 (Factors 1 | |
and 3), I concluded that the three-factor model provided adequate discriminant validity (T. A. Brown, 2015), despite lack of support for construct validity. An additional | |
analysis examined the one-factor solution around teacher readiness, which resulted in | |
factor loadings of .46 to .74 (M = .59; SD = .09) an d explained just 35.1% of the | |
variance (see Table S2 in the online supplement). | |
Reliability | |
(In subsequent discussions, tasks refer to the a priori groupings of rubrics around | |
Planning, Instruction, and Assessment [e.g., SCALE, 2019a] and factors denote groupings that emerged from the analysis described here.) I report two forms of reliability | |
estimation for preservice music teachers’ edTPA scores—internal consistency (how | |
consistent ratings are within a priori edTPA tasks or factors extracted through EFA) | |
and interrater reliability (how consistent edTPA rubric scores are across evaluators). | |
Hash | |
9 | |
Estimates of internal consistency (α) for the three a priori tasks ranged from .73 for | |
Task 1 (Planning) and .74 for Task 3 (Assessment) to .81 for Task 2 (Instruction). | |
Alpha coefficients for factors produced by the EFA ranged from .66 for Factor 3 | |
(rubrics 10, 13, 14) to .81 for Factor 1 (rubrics 1-5, 11, 12, 15) and .82 for Factor 2 | |
(rubrics 6-9). Regardless of whether tasks or factors served to frame the grouping of | |
rubrics, scores for rubrics thought to represent Instruction yielded the highest level of | |
internal consistency. When all 15 rubrics were considered together as a single measure | |
of teacher readiness, the resulting alpha was .88. | |
Estimated interrater reliability using only exact agreements for Cohen’s kappa | |
(Gitomer et al., 2021) ranged from .07 to .51 for individual rubrics and averaged .25 | |
(SD = .12) overall. These findings are similar to estimated k for the Performing Arts | |
(Range = −.01-.32; M = .24; SD = .09) calculated from rubric scores reported in the | |
spring 2019 edTPA National Performance Summary (SCALE, 2019c) and exact agreement indices for all content areas combined from the 2018 Administrative Report. | |
Estimated k from both analyses differed greatly from kappan statistics reported by | |
SCALE (Range = .85-.98, M = .91, SD = .04) for all handbooks together (SCALE, | |
2019a; see Table S3 in the online supplement). | |
Discussion | |
In this study, I examined the reliability and construct validity of edTPA scores for | |
preservice music teachers. Readers should interpret results with caution due to limitations of the study. In addition to a relatively small nonrandom sample, all data came | |
from one institution and may not reflect broader trends. It is also important to note | |
differences between statistical procedures used in this study and those involved in | |
analyses published in the Administrative Reports (SCALE, 2013, 2015, 2018c, 2019a) | |
when making comparisons. | |
Construct Validity | |
The factor structure that I obtained through EFA raises important questions about the | |
construct validity of the edTPA for music education. According to SCALE (2019a), all | |
28 content areas share approximately 80% of their design around Planning, Instruction, | |
and Assessment. However, this design results in standardization that might fail to capture the uniqueness of teaching and learning in some disciplines (e.g., Powell & | |
Parkes, 2020). The percent of variance explained by the one- and three-factor solutions in this study indicates that the 15 rubrics do not represent the totality of what | |
occurs in the music classroom. Individual factor loadings also suggest that some | |
rubrics might measure elements of instruction connected less to music than other content areas. Variables related to academic language demands, for example, loaded .39 to | |
.55 on either model. | |
It is unclear why three of the Assessment (Task 3) rubrics (R11, R12, & R15) loaded | |
with the Planning (Task 1) rubrics (R1-R5) onto Factor 1. The titles of these tasks, | |
“Planning for Instruction and Assessment;” and “Assessing Student Learning,” imply | |
10 | |
Journal of Music Teacher Education 00(0) | |
a relationship. Maybe these tasks are more closely related in music than in other subjects. However, the factor structure that emerged in Austin and Berg (2020) clearly | |
aligned with the Planning, Instruction, and Assessment tasks, and does not support this | |
assertion. Perhaps the teacher preparation program involved in this study taught | |
assessment in such a way that caused students to view planning and assessment as | |
being so closely associated that the scores they received for the a priori assessment | |
rubrics did not coalesce in a meaningful way and, instead, loaded onto two different | |
factors. | |
The failure of the theoretical model (e.g., SCALE, 2019a) to emerge in this study is | |
problematic when scorers evaluate individual tasks. Rubrics in Task 1 (R1-R5) are not | |
inclusive of those that represented a single construct (i.e., Factor 1: R1-R5, R11, R12, | |
R15). Likewise, R10 from Task 2 did not load with other instructional rubrics (R6R9), and rubrics associated with Task 3 (R11-R15) loaded onto two different factors. | |
SCALE could mitigate this concern by allowing graders to consult all materials as | |
evidence for any task. However, scoring rules treat the three tasks as separate entities | |
by prohibiting evaluators from considering evidence from one task when scoring | |
another. For example, a scorer cannot use lesson plans from Task 1 as evidence for | |
achievement on Task 3 (Parkes & Powell, 2015). | |
Reliability | |
Measures of internal consistency (α) in this study for all tasks and total scores exceeded | |
.70 and were similar to those attained by Austin and Berg (2020). However, only Task | |
2 (Instruction, R6-R10) met the .80 benchmark for acceptable reliability while Tasks 1 | |
(Planning, R1-R5) and 3 (Assessment, R11-R15) did not. Lower alpha coefficients for | |
individual tasks could be a function of the number of items in each (Carmines & Zeller, | |
1979). Alpha readings in this study, like those by SCALE (2019a), might also be inaccurate due to combining (a) different observations by multiple evaluators and (b) nonindependent rubric scores assigned by single raters. This procedure ignores the effects | |
of individual scores on internal consistency of the edTPA evaluation form, which might | |
result in inflated alpha coefficients (Gitomer et al., 2021; Miksza & Elpus, 2018). | |
Interrater reliability estimates (k) in this study might be imprecise because of the | |
statistical procedures used in the absence of two sets of evaluator ratings for preservice | |
music teachers in this study. However, the wide disparity between these and kn indices | |
listed for all content areas combined (SCALE, 2019a) were likely due to SCALE’s use | |
of exact + adjacent agreements in the calculations rather than differences in k and kn | |
formulas (Gitomer et al., 2021). Although coefficients based on adjacent + exact | |
agreements appear in the literature, their use depends on raters assigning scores across | |
all possible categories for discrete 5-point rubrics. Underuse of the highest and lowest | |
scoring options results in a scale where nearly all points will be adjacent and in agreement indices usually above 90% (Stemler & Tsai, 2008). About 95% of edTPA music | |
scores in this study and for content areas nationally (SCALE, 2019c) fell within a scale | |
of 2 to 4. Consequently, agreement indices for individual rubrics listed in the 2018 | |
Administrative Report (2019a) ranged from .94 to .99. | |
Hash | |
11 | |
Summary and Recommendations | |
Factor analysis indicated that while the three-factor model for the edTPA accounted | |
for almost one-half of the variance in music teacher readiness, the single-factor model | |
accounted for just over one-third. Although scores for all rubrics sufficiently loaded on | |
the single-factor model, the three-factor model lacked clarity and interpretability in | |
relation to a priori tasks proposed by the test authors (e.g., SCALE, 2019a). In addition, measures of internal consistency for two of the three tasks did not meet the .80 | |
benchmark for acceptability (Carmines & Zeller, 1979), and estimated interrater | |
agreement ranged from only slight to moderate (Landis & Koch, 1977). These findings support the need for analysis by content area (e.g., Gitomer et al., 2021) and challenge the aggregated data published by SCALE. | |
Policymakers, teacher educators, and other stakeholders should consider findings | |
from this study when making decisions about implementation and continuation of the | |
edTPA. Although this research focused solely on psychometric qualities, decisionmakers must also weigh ethical and philosophical concerns such as consequential and | |
ecological validity, socioeconomic factors, racial bias, and potential effects on K–12 | |
student learning (e.g., Powell & Parkes, 2020). If the edTPA is to continue to serve as | |
a high-stakes assessment for preservice music teachers, it should act as only one component among multiple measures of readiness. Perhaps policymakers should allow | |
candidates scoring below their benchmark to make up the deficiency through grade | |
point averages, student teaching evaluations, content exams, or other criteria (e.g., | |
Parkes, 2020). | |
SCALE should consider revising the edTPA for specific content areas, especially | |
when data do not support reliability and validity. For the performing arts, test authors | |
should divide music, theater, and dance into separate handbooks, and then work with | |
educators to develop scoring procedures and criteria to better reflect the specific types | |
of teaching and learning that occur in these classrooms. Changes might include altering | |
the number of rubrics and their descriptors to focus more on creating, performing, and | |
responding, and less on learning about the subject through writing and discussion. | |
These changes are not unprecedented. The world languages and classical languages | |
handbooks each contain 13 rubrics. In addition, one version of the elementary education handbook consists of four tasks with 18 rubrics total (SCALE, 2019a). Regardless, | |
scoring rules should allow evaluators to consult all materials throughout the grading | |
process to account for the holistic nature of teaching (e.g., Powell & Parkes, 2020) and | |
to compensate for different factor structures that might exist in various subject areas. | |
Public data published in the Administrative Reports for all handbooks combined | |
hold little meaning, since the assessment is designed, administered, and scored within | |
separate disciplines. Instead, these analyses should reflect a higher level of transparency and contain complete data for each area. Results from factor analysis, for example, should include information not currently available such as the percentage of | |
variance explained by each factor, communalities, and the type of matrix (e.g., pattern, | |
structure) used in the interpretation. | |
Internal consistency coefficients (α) should account for measurement error caused | |
by raters. One method might be to calculate α for all portfolios graded by an individual | |
12 | |
Journal of Music Teacher Education 00(0) | |
scorer, and then report an average for each task and all 15 rubrics combined within a | |
content area. Test administrators should also consider a different procedure for calculating interrater reliability. The current method of combining exact + adjacent agreements for use in kn is too liberal, especially with underuse of the lowest and highest | |
ratings (Stemler & Tsai, 2008). Likewise, using only exact agreements in the measurement might be too conservative concerning the practical application of edTPA scores | |
in readiness-for-licensure decisions, which flow from total scores rather than individual rubric scores. Instead, SCALE should consider use of a weighted kappa to provide | |
a more accurate representation of interrater reliability. This procedure penalizes disagreements in terms of their severity, whereas unweighted kappa treats all disagreements equally (Sim & Wright, 2005). Regardless, agreement indices and proportions | |
of scores for all rubrics in each content area should appear with other public data so | |
that scholars outside of SCALE and Pearson can verify statistical analysis and conduct | |
further research. | |
The high stakes nature of the edTPA for preservice teachers requires valid and reliable results in all disciplines. Continuous research is needed to monitor the psychometric qualities and identify weaknesses in this assessment. In the absence of publicly | |
available data, researchers could replicate this study and others (Austin & Berg, 2020; | |
Musselwhite & Wesolowski, 2019) by combining scores from multiple institutions to | |
create analyses that are more robust. Future studies should involve multiple statistical | |
procedures due to limitations and advantage of various methods. For example, the | |
Rasch model can compensate for differences in rater severity or sample characteristics | |
(Musselwhite & Wesolowski, 2019; Stemler & Tsai, 2008). Educator preparation programs considering the edTPA for internal use or states adopting the assessment as a | |
licensure requirement should not do so without evidence of validity and reliability for | |
each content area. | |
Declaration of Conflicting Interests | |
The author declared no potential conflicts of interest with respect to the research, authorship, | |
and/or publication of this article. | |
Funding | |
The author received no financial support for the research, authorship, and/or publication of this | |
article. | |
ORCID iD | |
Phillip M. Hash | |
https://orcid.org/0000-0002-3384-4715 | |
Supplemental Material | |
Supplemental material for this article is available online. | |
Notes | |
1. | |
SCALE (2019b) uses categories of agreement (n = 2) rather than rubric categories (n = 5) | |
as the unit of n in their calculations, stating that, “given the three possible classifications of | |
Hash | |
2. | |
13 | |
agreement (perfect, adjacent, and nonagreement), . . . perfect and adjacent were combined | |
as the agreement statistic” (p. 6). SCALE does not provide details about these calculations | |
beyond stating the use of kappan. However, calculating this statistic using the exact + | |
adjacent agreements for each rubric provided by SCALE (2019) and 2 as the value for n | |
resulted in the same kn coefficients provided in the 2018 Administrative Report. | |
Rubrics that undergo double scoring are averaged when Scorer 1 and Scorer 2 reach | |
adjacent agreement. Rubric scores more than one number apart are resolved by a scoring | |
supervisor. | |
References | |
Adkins, A., Klass, P., & Palmer, E. (2015, January). Identifying demographic and preservice | |
teacher performance predictors of success on the edTPA [Conference presentation]. 2015 | |
Hawaii International Conference on Education. Honolulu, Hawaii. http://hiceducation.org/ | |
wp-content/uploads/proceedings-library/EDU2015.pdf | |
Asmus, E. P. (1989). Factor analysis: A look at the technique through the data of Rainbow. | |
Bulletin of the Council for Research in Music Education, 101, 1–29. www.jstor.org/stable/40318371 | |
Austin, J. R., & Berg, M. H. (2020). A within-program analysis of edTPA score reliability, | |
validity, and utility. Bulletin of the Council for Research in Music Education, 226, 46–65. | |
https://doi.org/10.5406/bulcouresmusedu.226.0046 | |
Behizadeh, N., & Neely, A. (2018). Testing injustice: Examining the consequential validity of | |
edTPA. Equity & Excellence in Education, 51(3–4), 242–264. http://doi.org/10.1080/1066 | |
5684.2019.1568927 | |
Bernard, C., & McBride, N. (2020). “Ready for primetime:” edTPA, preservice music educators, | |
and the hyperreality of teaching. Visions of Research in Music Education, 35, 1–26. wwwusr.rider.edu/%7Evrme/v35n1/visions/Bernard%20and%20McBride_Hyperreality%20 | |
Manuscript.pdf | |
Brennan, R. L., & Prediger, D. J. (1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41(3), 687–699. https://doi. | |
org/10.1177/001316448104100307 | |
Brown, J. D. (2009). Choosing the right type of rotation in PCA and EFA. Shiken: JALT Testing | |
& Evaluation Newsletter, 13(3), 20–25. http://hosted.jalt.org/test/PDF/Brown31.pdf | |
Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford | |
Press. | |
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment. Sage. | |
Choppin, J., & Meuwissen, K. (2017). Threats to validity in the edTPA video component. Action | |
in Teacher Education, 39(1), 39–53, https://doi.org/10.1080/01626620.2016.1245638 | |
Costello, A, B., & Osborne, J. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research, and | |
Evaluation, 10, Article 7. https://doi.org/10.7275/jyj1-4868 | |
Darling-Hammond, L., & Hyer, M. E. (2013). The role of performance assessment in developing teaching as a profession. Rethinking Schools, 27(4). www.rethinkingschools.org/ | |
articles/the-role-of-performance-assessment-in-developing-teaching-as-a-profession | |
Dover, A., Schultz, B., Smith, K., & Duggan, T. (2015). Embracing the controversy: edTPA, | |
corporate influence, and the cooptation of teacher education. Teachers College Record, | |
Article 18109. www.tcrecord.org/books/Content.asp?ContentID=18109 | |
14 | |
Journal of Music Teacher Education 00(0) | |
Gitomer, D. H., Martinez, J. F., Battey, D., & Hyland, N. E. (2021). Assessing the assessment: | |
Evidence of reliability and validity in the edTPA. American Educational Research Journal, | |
58(1), 3–31. https://doi.org/10.3102%2F0002831219890608 | |
Heil, L., & Berg, M. H. (2017). Something happened on the way to completing the edTPA: | |
A case study of teacher candidates’ perceptions of the edTPA. Contributions to Music | |
Education, 42, 181–200. www.jstor.org/stable/26367442 | |
Krippendorff, K. (2013). Content analysis: An introduction to its methodology (3rd ed.). Sage. | |
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical | |
date. Biometrics, 33(1), 159–174. http://dx.doi.org/10.2307/2529310 | |
Miksza, P., & Elpus, K. (2018). Design and analysis for quantitative research in music education. Oxford University Press. | |
Musselwhite, D. J., & Wesolowski, B. C. (2019). Evaluating the psychometric qualities of the | |
edTPA in the context of pre-service music teachers. Research Studies in Music Education. | |
Advance online publication. https://doi.org/10.1177/1321103X19872232 | |
National Association for Music Education. (2014). 2014 Music standards. https://nafme.org/ | |
my-classroom/standards/core-music-standards/ | |
National Association of Schools of Music. (2020). Handbook 2019-20. https://bit.ly/3jTVKQi | |
Parkes, K. A. (2020). Student teaching and certification assessments. In C. Conway, K. | |
Pellegrino, A. M. Stanley, & C. West (Eds.), Oxford handbook of preservice music teacher | |
education in the United States (pp. 231–252). Oxford University Press. | |
Parkes, K. A., & Powell, S. R. (2015). Is the edTPA the right choice for evaluating teacher | |
readiness? Arts Education Policy Review, 116(2), 103–113. https://doi.org/10.1080/1063 | |
2913.2014.944964 | |
Pecheone, R. L., & Whittaker, A. (2016). Well-prepared teachers inspire student learning. Phi | |
Delta Kappan, 97(7), 8–13. https://doi.org/10.1177/0031721716641641 | |
Powell, S. R., & Parkes, K. A. (2020). Teacher evaluation and performativity: The edTPA as a | |
fabrication. Arts Education Policy Review, 121(4), 131–140. https://doi.org/10.1080/1063 | |
2913.2019.1656126 | |
Sato, M. (2014). What is the underlying conception of teaching of the edTPA? Journal of | |
Teacher Education, 65(5), 421–434. http://doi.org/10.1177/0022487114542518 | |
Sim, J., & Wright, C. C. (2005). The kappa statistic in reliability studies: Use, interpretation, | |
and sample size requirements. Physical Therapy, 85(3), 257–268. https://doi.org/10.1093/ | |
ptj/85.3.257 | |
Stanford Center for Assessment, Learning, and Equity. (2013). 2013 edTPA field fest: Summary | |
report. https://secure.aacte.org/apps/rl/res_get.php?fid=827 | |
Stanford Center for Assessment, Learning, and Equity. (2015). Educative assessment and | |
meaningful support: 2014 EdTPA administrative report. https://secure.aacte.org/apps/rl/ | |
res_get.php?fid=2188&ref=edtpa | |
Stanford Center for Assessment, Learning, and Equity. (2018a). edTPA K-12 Performing arts | |
assessment handbook (Version 06). http://ceit.liu.edu/Certification/EdTPA/2018/edtpapfa-handbook%202018.pdf | |
Stanford Center for Assessment, Learning, and Equity. (2018b). Understanding rubric level | |
progressions: K–12 performing arts (Version 01). https://concordia.csp.edu/teachered/wpcontent/uploads/sites/3/K-12-Performing-Arts-Rubric-Progressions.pdf | |
Stanford Center for Assessment, Learning, and Equity. (2018c). Educative assessment and | |
meaningful support: 2017 EdTPA administrative report. https://secure.aacte.org/apps/rl/ | |
res_get.php?fid=4271&ref=edtpa | |
Hash | |
15 | |
Stanford Center for Assessment, Learning, and Equity. (2019a). Educative assessment and | |
meaningful support: 2018 EdTPA administrative report. https://secure.aacte.org/apps/rl/ | |
res_get.php?fid=4769&ref=edtpa | |
Stanford Center for Assessment, Learning, and Equity. (2019b). Affirming the validity and | |
reliability of edTPA [White paper]. http://edtpa.aacte.org/wp-content/uploads/2019/12/ | |
Affirming-Validity-and-Reliability-of-edTPA.pdf | |
Stanford Center for Assessment, Learning, and Equity. (2019c). edTPA EPP performance summary: January 2019 - June 2019. https://sasn.rutgers.edu/sites/default/files/sites/default/ | |
files/inline-files/Jan%20to%20June%202019%20edTPA.pdf | |
Stemler, S. E., & Tsai, J. (2008). Best practices in interrater reliability: Three common | |
approaches. In J. Osborn (Ed.), Best practices in quantitative methods (pp. 29–49). Sage. | |
Revista Educación | |
ISSN: 0379-7082 | |
ISSN: 2215-2644 | |
[email protected] | |
Universidad de Costa Rica | |
Costa Rica | |
Actualización de la evaluación docente | |
de posgrados en una universidad | |
multicampus: experiencia desde la | |
Universidad Santo Tomás (Colombia)[1] | |
Patiño-Montero, Freddy; Godoy-Acosta, Diana Carolina; Arias Meza, Deyssy Catherine | |
Actualización de la evaluación docente de posgrados en una universidad multicampus: experiencia desde la | |
Universidad Santo Tomás (Colombia)[1] | |
Revista Educación, vol. 46, núm. 2, 2022 | |
Universidad de Costa Rica, Costa Rica | |
Disponible en: https://www.redalyc.org/articulo.oa?id=44070055006 | |
DOI: https://doi.org/10.15517/revedu.v46i2.47955 | |
Esta obra está bajo una Licencia Creative Commons Atribución-NoComercial-CompartirIgual 3.0 Internacional. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
Artículos científicos | |
Actualización de la evaluación docente de posgrados en una universidad | |
multicampus: experiencia desde la Universidad Santo Tomás (Colombia)[1] | |
Update of the Postgraduate Teaching Evaluation in a Multicampus University: Experience from the Santo Tomás | |
University (Colombia) | |
Freddy Patiño-Montero | |
Universidad Santo Tomás,, Bogotá, Colombia | |
[email protected] | |
DOI: https://doi.org/10.15517/revedu.v46i2.47955 | |
Redalyc: https://www.redalyc.org/articulo.oa? | |
id=44070055006 | |
https://orcid.org/0000-0001-5795-4911 | |
Diana Carolina Godoy-Acosta | |
Universidad Santo Tomás, Bogotá, Colombia | |
[email protected] | |
https://orcid.org/0000-0002-1903-0854 | |
Deyssy Catherine Arias Meza | |
Universidad Santo Tomás, Bogotá, Colombia | |
[email protected] | |
https://orcid.org/0000-0001-6689-5706 | |
Recepción: 20 Agosto 2021 | |
Aprobación: 20 Septiembre 2021 | |
Resumen: | |
Este artículo presenta los resultados de una investigación evaluativa cuyo objetivo estuvo orientado a realizar un ejercicio de | |
metaevaluación de la evaluación docente de posgrados de la Universidad Santo Tomás, durante el período 2017-2020. Los | |
referentes teóricos se ubican en orden a las categorías: evaluación educativa, evaluación del profesorado .investigación evaluativa, | |
así como los referentes institucionales que se tuvieron en cuenta dentro del proceso. La investigación se ubica en el paradigma | |
cualitativo y corresponde a una metodología de investigación evaluativa que permitió el diseño de ocho pasos que orientaron la | |
realización del estudio; esto posibilitó la metaevaluación de la evaluación docente de los posgrados de la Universidad Santo Tomás, | |
donde participaron personas estudiantes, docentes, directoras de programa y decanas de facultad en el diagnóstico, mesas de trabajo, | |
aplicación de pilotaje, evaluación del instrumento final e implementación de la evaluación. Los resultados alcanzados se presentan | |
en coherencia con la metodología, en consideración de que no se hacían efectivas políticas y procedimientos institucionales, unido | |
a que el derecho a réplica del profesorado era casi nulo. Por otro lado, lo más relevante es la definición de una evaluación docente | |
personalizada de acuerdo con su plan de trabajo y la evaluación del desempeño del personal docente contratado por orden de | |
prestación de servicios. Todo esto conllevó a la consolidación y parametrización de un aplicativo institucional. Finalmente, se | |
esbozan algunas conclusiones del proceso y recomendaciones de carácter metodológico para adelantar este tipo de trabajos en | |
instituciones de educación superior multicampus. | |
Palabras clave: Evaluación educativa, Evaluación docente, Educación superior, Investigación evaluativa. | |
Abstract: | |
is article presents the results of an evaluative research whose objective was oriented to carry out a meta-evaluation exercise of | |
the postgraduate teacher evaluation of the Santo Tomás University, during the period 2017-2020. e theoretical referents are | |
placed in order of the categories: educational evaluation, the evaluation of the teaching staff and evaluative research, as well as the | |
institutional referents that were considered within the process. e research is located in the qualitative paradigm and corresponds | |
to an evaluative research methodology that permitted the eight-step design that oriented the accomplishment of the study. is | |
made possible the meta-evaluation of the teaching evaluation of the postgraduate courses of the Santo Tomas University, where | |
students, teachers, program directors, and deans of faculty participated in the diagnosis, working tables, application of piloting, | |
evaluation of the final instrument, and implementation of the evaluation. e achieved results are presented in coherence with | |
the methodology, considering that institutional policies and procedures were not implemented, together with the fact that the | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
1 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
teachers' right to reply was almost nil. On the other hand, the most relevant is the definition of a personalized teacher evaluation | |
according to their work plan, and the evaluation of the performance of the teacher staff hired by order of service provision. | |
All this led to the consolidation and parameterization of an institutional application. Finally, some conclusions of the process | |
and recommendations of a methodological nature are outlined to carry out this type of work in multi-campus higher education | |
institutions. | |
Keywords: Educational Evaluation, Teacher Evaluation, Higher Education, Evaluative Research. | |
Epígrafe | |
“O la evaluación es útil o no tiene sentido realizarla; tiene que ser un instrumento para la acción y no un mero | |
mecanismo de justificación o para tranquilizar conciencias. Los evaluadores debemos ser beligerantes en este | |
sentido” (Escudero-Escorza, 2000, p. 406). | |
Introducción | |
La Universidad Santo Tomás [2] (USTA) es una Institución de Educación Superior de carácter privado, | |
con presencia nacional a través de la sede principal Bogotá, seccionales en Bucaramanga y Tunja, y sedes | |
en Medellín y Villavicencio. Adicionalmente, cuenta con Centros de Atención Universitaria (CAU) en 23 | |
ciudades y municipios del país. La oferta académica comprende 76 programas de pregrado y 129 de posgrado, | |
en los cuales están matriculadas cerca de 32,000 personas estudiantes, divididas en 29,000 en pregrado y 3000 | |
en posgrado. Para cumplir con su misión, la USTA cuenta con 2,350 docentes con dedicación de tiempo | |
completo, medio tiempo y hora cátedra (Mesa-Angulo, 2020). | |
Desde este contexto, el proceso investigativo inició con un ejercicio de diagnóstico, realizado en 2017, | |
sobre el estado de la evaluación docente en posgrado, donde se pudo constatar que esta no obedecía a las | |
mismas dinámicas que en pregrado, al punto que cada uno de los programas tenía sus propios instrumentos | |
y metodologías. Además de lo anterior, se identificaron algunos problemas como: | |
• | |
• | |
• | |
• | |
• | |
Poca significatividad del instrumento que diligencia el estudiantado, puesto que la redacción de los | |
descriptores en su mayoría está referida únicamente a la modalidad presencial. | |
La escasa motivación y participación por parte del estudiantado. | |
La participación intermitente por parte del profesorado. | |
La poca implementación de planes de mejoramiento por parte del cuerpo docente. | |
La escasa información para la toma de decisiones desde la gestión de los programas respecto a la | |
continuidad del cuerpo docente. | |
En virtud de lo anterior, el objetivo principal de esta investigación fue realizar un ejercicio de | |
metaevaluación de la evaluación docente de posgrados de la USTA. Sus objetivos específicos fueron: a) | |
analizar los referentes conceptuales y metodológicos que soportan los procesos de investigación evaluativa | |
y evaluación educativa, b) evaluar el nivel de implementación de las políticas y procedimientos para la | |
evaluación docente de posgrados de la USTA, y c) proponer una nueva batería de instrumentos que atienda | |
a las necesidades de los programas de posgrado. | |
Estado de la cuestión | |
Respecto a la evaluación educativa, como se evidencia en los estudios que se refieren a lo largo de esta | |
publicación, su evolución se ubica en la misma historia de la educación, en cuanto al concepto, alcance y | |
metodologías. Asimismo, indican que el siglo XX fue especialmente significativo en tanto que se alcanza la | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
2 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
profesionalización de la evaluación educativa y se supera la perspectiva objetivante centrada en la medición. | |
Así, con base en el enfoque constructivista, el propósito pasa a ser el aprendizaje del estudiantado, lo que | |
implica un cambio de perspectiva respecto a los fines de la educación y la misma evaluación, como se percibe | |
en los análisis de Santos-Guerra (2010), Casanova (2007), Escudero-Escorza (2003), Rossett y Sheldon | |
(2001), House (1995), Stufflebeam y Shinkfield (1987), Guba y Lincoln (1989), entre otros. | |
En cuanto a la evaluación del profesorado, se encuentra que los diversos trabajos de revisión teórica | |
identifican algunas tendencias, entre las cuales se destacan: la complejidad de la labor docente, la falta de | |
consenso frente a lo que significa ser docente de calidad en la universidad; la diversidad de criterios con | |
relación a la selección y evaluación, asociadas a las nociones subyacentes sobre la buena enseñanza; la tendencia | |
a reducir las funciones del profesorado universitario únicamente a la docencia; la influencia de la docencia | |
en la calidad educativa; la diversidad de funciones, agentes y metodologías de evaluación, hasta los estímulos | |
salariales y la carrera académica, entre otras, de acuerdo con los trabajos de Rueda (2014), Ramírez-Garzón | |
y Montoya-Vargas (2014), Montoya y Largacha (2013), Fernández y Coppola (2012), Escudero-Muñoz | |
(2010), Murillo-Torrecilla (2008), y Tejedor-Tejedor y Jornet-Meliá (2008). | |
De forma complementaria, la revisión permitió constatar que la investigación evaluativa se ha venido | |
fortaleciendo desde las últimas décadas como una de las metodologías de investigación, cuya finalidad se | |
centra en el mejoramiento de la calidad, especialmente del servicio educativo, con un alto énfasis a generar | |
participación de las partes involucradas y con una amplia flexibilidad metodológica, como indican BelandoMontoro y Alanís-Jiménez (2019), Escudero-Escorza .2006, 2019), Tejedor-Tejedor y Jornet-Meliá (2008), | |
Tejedor-Tejedor (2009), Litwin (2010) y Saravia-Gallardo (2004). | |
Referentes conceptuales | |
Evaluación educativa | |
Respecto a la primera categoría, evaluación educativa, se identifica como un proceso formativo que se | |
realiza sobre las acciones desarrolladas en el marco de las instituciones educativas, con la intención de detectar | |
dificultades e implementar planes de mejora que permitan solucionarlas de manera satisfactoria y pertinente. | |
Por ende, implica aspectos tales como los resultados de aprendizaje y la evaluación institucional. | |
Con base en el concepto propuesto, resulta relevante lo expuesto por Casanova (2007) en el Manual de | |
Evaluación Educativa, cuando afirma que este “consiste en un proceso sistemático y riguroso de recogida de | |
datos” (p. 60), cuyo propósito es disponer de información continua y significativa, que permita formar juicios | |
de valor para tomar decisiones que mejoren la actividad educativa. | |
Los elementos planteados por Casanova adquieren relevancia puesto que se entiende como el resultado de | |
un conjunto de actividades, claramente relacionadas entre sí, que se dan en el marco del transcurrir cotidiano | |
de las instituciones educativas, desde su fase inicial (diagnóstico) hasta la entrega de resultados, pero sin | |
terminar con estos, pues una vez se obtienen, se da inicio al ciclo de mejoramiento, que implica ir a cada | |
una de las instancias, factores y actores evaluados para establecer las rutas más adecuadas para asegurar que | |
efectivamente el proceso en sí mismo se vaya cualificando. | |
En línea con ello, Scriven (1967) afirma que “la evaluación es en sí misma una actividad metodológica que | |
es esencialmente similar si estamos tratando de evaluar máquinas de café o máquinas de enseñanza, los planes | |
para una casa o los planes para un programa de estudios” (p. 40). Scriven enmarca la evaluación como un | |
procedimiento, que lleva implícita la idea de secuencia, progresión en la ejecución de una serie de pasos. De | |
hecho, al revisar la propuesta de Scriven es posible afirmar que su objetivo consiste en desplazar la evaluación | |
desde los objetivos hacia las necesidades, en tanto que toda ella está orientada hacia la persona consumidora | |
(usuario). | |
Por su parte, para Rossett y Sheldon (2001), la evaluación es el proceso de examen de un programa o | |
proceso para determinar qué funciona, qué no y por qué. La evaluación determina el valor de los programas y | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
3 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
actúa como modelo para el juicio y la mejora. Respecto a este punto, las personas autoras nuevamente toman | |
el término proceso para referirse a la evaluación, al tiempo que incluyen el elemento valorativo que se debe | |
dar dentro de este, así como la intención de utilizarlos en perspectiva de mejoramiento, en este caso de los | |
programas. | |
Entonces, conviene preguntarse: ¿cuál evaluación y al servicio de quién? La evaluación educativa es un | |
tema que ha logrado mantener especial relevancia en el ámbito académico, en cuanto aspecto neurálgico | |
en los procesos educativos. Es decir, se ha evidenciado que la evaluación es un tema que trasciende los | |
espacios convencionales de debate, puesto que en ella convergen múltiples factores y relaciones humanas, | |
tales como: la dimensión social, en cuanto que es de alguna manera una forma de establecer relaciones entre | |
los el estudiantado, entre el estudiantado y el profesorado, y entre cuerpo docente, ya que se pregunta por | |
los valores, el respeto por las personas y el sentido de la justicia [por mencionar algunas] (Santos-Guerra, | |
2010); una dimensión política (House, 1995), por tanto, no debe ser solo veraz, sino justa, en la medida que | |
es tomada la mayoría de la veces como un instrumento de poder que, según su uso, llega a determinar la vida | |
de las personas; una dimensión filosófica, en cuanto que se debe preguntar por el fundamento, la razón de ser | |
de la acción evaluativa para que no quede reducida a una mera actividad desarticulada, es decir, al plano del | |
activismo sin ninguna reflexión; de igual manera una dimensión teleológica, es decir, que exista una mirada, | |
un horizonte claro hacia el cual se quiere llegar, un para qué, que ayude a darle sentido al proceso. | |
Ahora bien, de acuerdo con la línea teórica definida por personas autoras como Stake (2006), en la cual la | |
evaluación educativa es el proceso de emitir un juicio de valor con base en evidencias objetivas sobre el mérito y | |
deficiencias de algo. Del mismo modo, Cordero y Luna (2010) argumentan que “la evaluación comprende dos | |
componentes: el estudio empírico, determinar los hechos y recolectar la información de manera sistemática; | |
y la delimitación de los valores relevantes para los resultados del estudio” (p. 193). Justamente esa es la postura | |
que se asume al inicio de este apartado. | |
En síntesis, como afirma Cabra-Torres (2014): | |
la evaluación ha servido de motor para gran parte de los cambios de orientación de los sistemas educativos, en razón de | |
la información que produce y de los interrogantes que despiertan la gestión y el análisis de los resultados que entrega a la | |
sociedad (p. 178). | |
Evaluación del profesorado | |
En cuanto a la evaluación del profesorado, se concibe como una herramienta de gestión que posibilita | |
el desarrollo de la carrera docente, en el marco de una institución educativa (en este caso, de educación | |
superior). En este sentido, implica la recolección de información por parte de los agentes e instancias en las | |
que se desempeña, en el marco de las funciones universitarias, con el fin de establecer estrategias y actividades | |
que le sirvan al personal docente para identificar sus fallas y mejorarlas con apoyo de la institución. Al | |
mismo tiempo, posibilita a las Instituciones de educación superior la implementación de planes de formación | |
docente que redunden en beneficios para quienes obtienen bajas calificaciones en este proceso, lo cual ha | |
permitido caracterizar cada vez más ideas acerca de los atributos del buen profesor (Belando-Montoro y AlanísJiménez, 2019). Como última instancia, provee de herramientas a las Instituciones de Educación Superior | |
(IES) para la toma de decisiones informadas sobre la continuidad o no de un profesor o profesora. | |
Lo afirmado hasta el momento se encuentra en plena consonancia con los planteamientos de Montoya | |
y Largacha (2013), Vásquez-Rizo y Gabalán-Coello (2012), Fernández y Coppola (2010), y Luna-Serrano | |
(2008), quienes destacan que la evaluación de la docencia universitaria implica una amplia diversidad de | |
agentes evaluadores, en tanto que la profesión académica no se limita únicamente a la función docente, | |
sino que son amplias y diversas las funciones y roles del profesorado en las universidades. De allí que aún | |
hoy no haya consenso respecto a cómo evaluarla ni cuál es el mejor método. En consecuencia, como indica | |
Rueda (2014), “es necesario reconocer la relevancia del rol que puede cumplir la evaluación sistemática del | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
4 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
desempeño docente en la profesionalización y perfeccionamiento permanente del profesorado” (p. 99). Es | |
decir, se enmarcan en una categoría más amplia como es la profesión académica. | |
De allí que un error bastante común es considerar la profesión académica desde un ideal de institución | |
de educación superior, desde el desempeño de la función docente propiamente dicha e incluso desde una | |
modalidad tradicional, puesto que “la actividad docente no se restringe a la interacción en el aula, existen | |
otros modelos de enseñanza como la formación en servicio o la educación a distancia” (Rueda, Luna, García | |
y Loredo, 2011; citado en Rueda, 2014, p. 100) | |
Así, por ejemplo, al revisar textos clásicos sobre evaluación del maestro y la maestra, se encuentra con que | |
ya en ellos se enunciaban retos a los que se enfrenta como el incremento de conocimientos, los cambios del | |
estudiantado, como se indicaba en su momento, “la creciente investigación en la psicología, la sociología y | |
campos afines, que es pertinente a la enseñanza y aprendizaje” (Simpson, 1967, p. 12), toda vez que tensionan | |
sus propias prácticas pedagógicas. | |
Lo planteado hasta el momento evidencia la necesidad de realizar un análisis multidimensional del trabajo | |
del profesorado, que atiende a diferentes perspectivas de quienes reciben su servicio, e incluso que propios | |
miembros del cuerpo docente se puedan autoevaluar a partir de los mismos criterios con que son evaluados | |
externamente, de manera que este ejercicio realmente se haga a partir de aspectos conmensurables. En este | |
sentido, se debe tener en cuenta la finalidad del proceso evaluativo y autoevaluativo. Es decir, “para que | |
el maestro adquiera una preparación excelente y su enseñanza alcance un nivel superior, se requiere que | |
preste continua atención al problema de la autoevaluación y su meta reconocida: el automejoramiento del | |
maestro” (Simpson, 1967, p. 11). | |
Ahora bien, en cuanto a estos aspectos metodológicos, se encuentra que la estrategia y el instrumento más | |
común para realizar la evaluación es el uso de cuestionarios de opinión que responde el estudiantado, los | |
cuales remiten especialmente a aspectos didácticos y evaluativos, tal como se encuentra en las investigaciones | |
de Rueda, Luna, García y Loredo (2011; citado en Rueda, 2014) y Litwin (2010), realizadas en México y | |
Argentina, respectivamente. | |
Sobre los últimos rasgos mencionados, es pertinente enunciar que, efectivamente, también son utilizados | |
en la evaluación realizada en la USTA, como se aprecia en algunas referencias a documentos institucionales: | |
Referentes institucionales | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
5 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
TABLA 1 | |
Referentes institucionales | |
Fuente: elaboración propia con base en documentos institucionales | |
Fuente: elaboración propia con base en documentos institucionales | |
Procedimientos Metodológicos | |
De acuerdo con diversas personas autoras, especialmente al revisar la obra de Escudero-Escorza (2000, 2006, | |
2016 y 2019), la investigación evaluativa es concebida como una metodología del ámbito de las ciencias | |
sociales que se ha fortalecido en las últimas décadas, en tanto que brinda las herramientas suficientes para | |
la implementación de un ejercicio de evaluación riguroso, con la participación de los directos involucrados. | |
Ello, en perspectiva de que sea posible la definición de los aspectos que requieren ajustes, modificaciones o | |
supresiones en el marco del mejoramiento de los procesos o las prácticas donde se requieran implementar | |
cambios a través de un ejercicio evaluativo más participativo, consciente y pertinente, que contribuya a la | |
calidad de la educación. | |
Delimitada de esta forma, se puede afirmar que se ubica en campo de la investigación cualitativa, en tanto | |
que “incluye formulaciones paradigmáticas múltiples y, también, complejas críticas, epistemológicas y éticas, | |
a la metodología de investigación tradicional en las ciencias sociales” (Denzin y Lincoln, 2012, p. 24). Es | |
decir, al centrarse en la evaluación respecto a diferentes campos de conocimiento, advierte en sí misma una | |
intención de valorar y mejorar los procesos que se dan en su interior. Dicha perspectiva transformadora | |
enfatiza, como afirma Escudero-Escorza (2016), “la función de esta al servicio del cambio social y, en | |
concreto, al servicio de la mejora social” (p.14). En ese sentido, la investigación evaluativa en educación | |
beneficia de forma significativa a todos los agentes e instancias educativas y, por tanto, a las instituciones y | |
a la sociedad en general. | |
Dado que en parte su sello diferenciador radicó en la evaluación de programas sociales, en algún punto de su | |
desarrollo llegó a confundirse con esta actividad. Sin embargo, dada su amplitud y fundamentación terminó | |
por imponerse la tradición de la investigación evaluativa; “mientras que la evaluación de programas se definió | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
6 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
como la investigación evaluativa directamente aplicada a programas sociales” (Owen y Rogers, 1999, citado | |
en Escudero-Escorza, 2006, p. 181). | |
En el mismo texto, Escudero-Escorza (2006) presenta una serie de elementos que permiten identificar la | |
investigación evaluativa. A continuación, se retoman algunos de estos: | |
• | |
• | |
• | |
• | |
• | |
• | |
La solución de problemas concretos como finalidad. | |
Se investiga sobre todo en situaciones naturales. | |
La observación es la principal fuente de conocimiento. | |
Se emplean tanto cuantitativos como cualitativos. | |
Se busca la mejora de programas sociales. | |
Se informa a los responsables de tomar decisiones sobre programas y prácticas. (pp. 180 - 181) | |
Asimismo, no se puede soslayar la intencionalidad o los propósitos con los cuales se realizan las evaluaciones | |
del profesorado, entre los cuales, según un estudio reciente, se cuentan cerca de 15 tipos distintos de | |
propósitos (Escudero-Escorza, 2019). A pesar de ello, en la actualidad sigue sin haber suficiente consenso | |
respecto a lo que es “un buen profesor” (Tejedor-Tejedor, 2009, p. 79). Por tanto, cobra sentido el hecho | |
que sean las propias IES quienes, a la luz de su filosofía institucional y horizonte estratégico, puedan | |
“determinar el modelo de profesor que se quiere, estableciendo los comportamientos que se consideran | |
deseables para después analizar en qué medida la conducta del profesor satisface el referente de calidad | |
establecido” (Tejedor-Tejedor, 2009, p. 93). Lo cual, para el caso de la USTA está claramente identificado | |
en los elementos destacados de sus documentos institucionales, tal como se evidenció en la Tabla 1. | |
En función de lo anterior y en atención al proyecto definido se estableció una serie de etapas que hicieron | |
posible el ejercicio definido desde el alcance de sus objetivos. Como se percibe en la Figura 1, las personas | |
investigadoras trazaron una ruta metodológica propia para la investigación, compuesta por ocho (8) fases, | |
la cual recoge, en buena medida, las principales recomendaciones de los expertos en este campo, en el | |
entendido que “todas las aproximaciones metodológicas son útiles en algún momento y para alguna faceta | |
evaluativa y que todas tienen sus limitaciones y que en la práctica se requieren generalmente aproximaciones | |
metodológicas diversas y complementarias” (Escudero-Escorza, 2019, p. 24). Asimismo, dada la naturaleza | |
del estudio, se establece un muestreo cualitativo, en atención a las fases establecidas y a los agentes e instancias | |
intervinientes en el proceso, es decir, un tipo muestreo no probabilístico, como se recomienda en estos casos | |
(Hernández-Sampieri y Mendoza-Torres, 2018). Por tanto, se fijó la muestra por criterios y por conveniencia | |
(Otzen y Manterola, 2017). Por criterios, porque debido a la estructura orgánica de la universidad fue | |
necesario garantizar representación de diversos organismos colegiados, y por conveniencia, en cuanto a que | |
se realizaron invitaciones al grupo de representantes referido más adelante, el cual participó de manera | |
voluntaria en diversos ejercicios que contribuyeron en las diferentes fases del proceso. | |
A continuación, se representan los momentos, los cuales son detallados posteriormente en el apartado | |
sobre resultados y discusión. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
7 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
FIGURA 1. | |
Metaevaluación de la evaluación docente de posgrados | |
Fuente: elaboración propia | |
Análisis y discusión de resultados | |
Desarrollo de las fases: | |
1. Análisis contextual . Se realiza la revisión de referentes documentales que constate la necesidad de la | |
consolidación de un sistema de evaluación, en el cual se encontró que en 2013 la Unidad de Posgrados | |
realizó un primer ejercicio diagnóstico y según el Informe de Gestión de la Vicerrectoría Académica General | |
(VAG) - Plan de Acción 2011-2013, se recomendó consolidar el sistema institucional de evaluación docente | |
a nivel de Posgrados. Por otro lado, se evidencia que entre los años 2014 y 2016 los programas de Posgrados | |
aplicaron instrumentos de evaluación de manera espontánea, no unificados ni sistemáticos, estos ejercicios | |
evaluativos no fueron obligatorios y, en general, no contemplaron lo definido en la Dimensión de la Política | |
Docente ([USTA], 2015), sino criterios establecidos al interior de cada de programa. Se percibe una serie de | |
dificultades asociadas a la baja participación de docentes y de estudiantes, está última no llegaba al 30 %; lo | |
que evidenció que buena parte de quienes participaban son estudiantes que perdieron asignaturas. | |
A la luz de los resultados obtenidos y tal como se muestran en el acápite anterior, cabe resaltar que es | |
imposible generar una propuesta nueva si se desconocen los esfuerzos previos que ha realizado la institución, | |
dado que es allí donde se obtienen experiencias e insumos sobre los cuales replantearse dinámicas y alcances | |
para que, según un contexto con sus particularidades, se logre cumplir con los objetivos académicos y | |
administrativos de los programas. | |
2. Revisión de referentes conceptuales e institucionales . Esta actividad fue realizada de manera independiente | |
y posteriormente se unificaron y discutieron los hallazgos por parte de los equipos de trabajo de Currículo | |
de la DUAD [3] , en su momento VUAD [4] junto los profesionales de la Unidad de Posgrados de la | |
sede Principal. A partir de allí, se consolida el proyecto de evaluación docente, que responde a los procesos | |
planeados en la articulación con la VAG, donde se realizó un primer diagnóstico acerca de las metodologías | |
e instrumentos aplicados en los diferentes programas de posgrados, en la Sede Principal y de la DUAD. En | |
este punto se pudo constatar que, incluso, en términos de políticas y procedimientos institucionales, muchos | |
aspectos que estaban definidos en los documentos, o bien no se conocían o no se hacían efectivos en la DUAD. | |
3. Análisis de la implementación de la evaluación. Unido a los hallazgos del punto anterior, aquí se encontró | |
que la socialización de los resultados de su evaluación directamente con el profesorado y el derecho a réplica a | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
8 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
partir de estos, antes de consignarlos de manera definitiva era casi nulo. Asimismo, que un alto porcentaje de | |
docentes no consultaba los resultados de su evaluación, o lo hacía únicamente como requisito para participar | |
en las convocatorias de ascenso en el escalafón docente. Finalmente, y quizás, lo más relevante aquí fue | |
la definición de una evaluación docente personalizada en función del plan de trabajo elaborado por cada | |
docente al inicio de semestre, lo cual no ocurría y llamó la atención como la mayor oportunidad de mejora | |
en el nuevo procedimiento a implementar. También se identificó que la condición de profesores, tanto en los | |
procesos de autoevaluación con fines de renovación de Registro Calificado y Acreditación de Alta Calidad | |
de los programas de Posgrado, exigían información documental sobre los procesos de evaluación docente de | |
cara a la definición de planes de mejora y de formación propios de la carrera docente, esto exigía un ejercicio | |
más riguroso respecto a la trazabilidad del ejercicio docente. | |
4. Evaluación de la batería de instrumentos utilizada . A partir del trabajo articulado entre la Unidad | |
de Posgrados y el Equipo de Currículo de la DUAD, se constata la existencia de múltiples instrumentos | |
utilizados por los programas, diferentes al oficial. Ahora bien, respecto al instrumento oficial definido por | |
la Unidad de Desarrollo Curricular y Formación Docente [UDCFD], se constató que buena parte de | |
los descriptores estaban determinados para la modalidad presencial y que incluso no correspondían a las | |
dinámicas propias de los posgrados. A continuación, se refieren algunos a modo de ejemplo: | |
• | |
• | |
Manual del deportista fue divulgado a tiempo y es de total conocimiento | |
El docente utiliza el gimnasio de la USTA como soporte de la preparación física integral. | |
Asimismo, se evidenció que los factores relacionados con la integración de las funciones sustantivas desde | |
el currículo, y que son evidentes para el estudiantado, no se evalúan de manera directa. Es decir, no se evalúan | |
acciones relacionadas con investigación y proyección social, que son trabajadas desde la docencia. | |
En este mismo sentido es importante resaltar en implementaciones similares que existen instituciones | |
que, tal como la Universidad Santo Tomás, cuentan con distintas modalidades de enseñanza dentro de sus | |
Facultades, esto hace aún más desafiante el reto de construir un único modelo de evaluación, pues el ejercicio | |
académico y administrativo requiere de especificidades acorde a las necesidades de cada modalidad. Esto | |
requiere de tiempo y negociaciones entre los diferentes actores participantes del instrumento de evaluación | |
para lograr que en consenso se acojan las particularidades de cada uno. | |
5. Formulación del nuevo instrumento de evaluación. Teniendo en cuenta el marco de referencia que ofrece | |
el documento Dimensión de la Política Docente [DPD], en el cual se definen todos los aspectos de orden | |
conceptual y metodológico de la evaluación docente, el equipo de trabajo decide acatarlas en su gran mayoría, | |
especialmente aquellas de orden conceptual. Se contemplan adicionalmente las particularidades del personal | |
docente que está vinculado por orden de prestación de servicios (OPS), dado que suman un gran número en | |
los programas de posgrado. En el aspecto metodológico, específicamente en lo referido a los instrumentos y | |
a la escala de ponderación se proponen los cambios más significativos. A continuación, se presentan algunos | |
de ellos: | |
• | |
• | |
Luego de diversos análisis, el equipo define asumir la escala de valoración de la DPD que contempla | |
seis escalas de ponderación que van del 0 al 5, correspondiente con los siguientes criterios: 0 No se | |
cumple, 1 Se cumple insuficientemente, 2 Se cumple con bajo grado, 3 Se cumple medianamente, 4 | |
Se cumple en alto grado y 5 Se cumple plenamente. | |
La DPD define tres instrumentos: uno para estudiantes; otro para decanos; y uno para docentes. | |
Los tres son diferentes en cuanto a la redacción de los descriptores, número de descriptores por | |
aspecto evaluado, etc. Lo anterior se considera una oportunidad de mejora que se asume en la nueva | |
propuesta. Así, la nueva propuesta consta de los siguientes instrumentos: | |
I. | |
nstrumento de evaluación de estudiantes sobre el desempeño docente de posgrado. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
9 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
II. | |
III. | |
IV. | |
Instrumento de evaluación del director o líder del Programa Académico de Posgrado al | |
Docente. | |
Instrumento de evaluación del decano al docente, este se utilizaría en caso tal de que el | |
programa de posgrado no cuente con la figura de director o líder del Programa Académico | |
de Posgrado. | |
Instrumento de Autoevaluación Docente de Posgrado | |
Estos instrumentos están basados directamente en la evaluación que realiza el estudiantado sobre su | |
desempeño, ya que es este es quien tiene la asignación más alta en la escala de ponderación; la de estudiantes | |
llega al 50 % y la autoevaluación docente a un 25 %, y la evaluación de la persona decana o directora de | |
programa el restante 25 %, para un total de 100 %. | |
En este punto, se remitió al Departamento de Talento Humano el nuevo instrumento propuesto para | |
evaluar la viabilidad jurídica relacionada con la contratación de personal docente OPS. Además, para efectos | |
de garantizar el cumplimiento de aspectos señalados en la DPD, desde la UDCFD se asigna a una persona | |
docente que haga parte del equipo de trabajo para el diseño e implementación de la propuesta de la Evaluación | |
Docente de Posgrados. | |
Es importante resaltar que para los programas de posgrado presenciales se logró, en articulación con el | |
departamento de TIC de la USTA, que la batería de preguntas se filtrara según el plan de trabajo de cada | |
docente para que solo aparecieran aquellas preguntas que estuvieran relacionadas con las actividades para las | |
cuales fueron asignadas horas desde la nómina de cada programa (Plan de Trabajo Docente). Para el caso de | |
la DUAD, debido a una incompatibilidad de sistema, se realizó en un formulario de Google que permitía | |
tener toda la batería de preguntas con la posibilidad de decir no aplica a la actividad que no hacía parte del | |
plan de trabajo. | |
6. Socialización y ajustes con los agentes involucrados. Se realizó la socialización con grupos de representantes | |
de cada uno de los agentes involucrados, tales como: estudiantes, docentes, comités de currículo de | |
Facultad (DUAD), personas coordinadoras de Programa, decanas, decanos, vicerrectoras y vicerrectores de | |
la modalidad distancia (2017-2018). En todos los casos se recibieron las sugerencias y recomendaciones en | |
términos de forma, redacción y número de descriptores propuestos; las cuales fueron asumidos casi en su | |
totalidad. | |
7. Validación de métricas de los instrumentos. Habida cuenta del proceso anterior, se definió la versión | |
preliminar de los instrumentos para la Evaluación Docente de Posgrados, la cual se validó por parte de pares | |
académicos y de un experto en psicometría de la Facultad de Psicología. Conforme a la retroalimentación, | |
se realizaron los respectivos ajustes. | |
8. Implementación. La implementación comienza con un pilotaje con algunos grupos de estudiantes de los | |
programas de posgrado de modalidad distancia, que mostraban una continuidad en matrículas, tales como: | |
Maestría en Didáctica, Maestría en Educación, Especialización en Pedagogía para la Educación Superior, | |
Especialización en Patología de la Construcción, Maestría en Gestión de Cuencas Hidrográficas. | |
Además de lo anterior, se realizó una segunda validación del instrumento que se hizo a través de un segundo | |
ejercicio piloto para la parametrización del Aplicativo Institucional con los posgrados de Maestría en Calidad | |
y Gestión Integral, Especialización en Administración y Gerencia de Sistemas de la Calidad, Especialización | |
en Finanzas y Especializaciones en Finanzas y Gerencia Empresarial, todos ellos de modalidad presencial. | |
En este orden de ideas, parte de la innovación se concentró en la actualización y mejoramiento de la | |
herramienta de sistematización de los procesos de evaluación docente de posgrados en la Universidad Santo | |
Tomás, mediante la parametrización del instrumento, el cual consta de dos interfaces: la primera para el | |
administrador del aplicativo institucional a través de un micrositio, y la segunda para el usuario, en este caso | |
los actores involucrados (personal directivo, docentes y estudiantes). | |
Este instrumento genera información confiable y clasificada en 9 tipos de reportes que permiten evidenciar | |
oportunamente los resultados del proceso de Evaluación Docente en los Posgrados; también se puede acceder | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
10 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
fácilmente a esta información en línea, por parte de los decanos, decanas, directoras, directores y docentes, lo | |
anterior como suministro para la toma de decisiones en las diferentes instancias institucionales (Dirección | |
de Investigación e Innovación, 2021). | |
Después del desarrollo de la actualización de los instrumentos, la validación y la parametrización en | |
el aplicativo institucional, se continuó con la implementación de esta herramienta únicamente en los | |
posgrados de la sede principal en Bogotá, dadas las diferencias en los sistemas académicos entre los programas | |
presenciales y los programas a distancia, así como en las sedes y seccionales. | |
Así las cosas, el proceso de implementación se ha llevado a cabo desde el primer semestre de 2019 hasta el | |
segundo semestre de 2020. Sin embargo, dados los ajustes realizados a nivel institucional con ocasión de la | |
pandemia mundial por la COVID-19, durante el período 2020-1, en los programas presenciales no se realizó | |
la evaluación docente debido a una decisión de la alta dirección de la Universidad. En virtud de ello, en la | |
Figura 2, se presenta la información del proceso de aplicación en los diferentes períodos hasta el 2020-2: | |
FIGURA 2 | |
Relación de participantes en la evaluación docente, período 2019-1. | |
Fuente: elaboración propia | |
Como se evidencia en la figura, en contraste con lo afirmado en los apartados anteriores, es notable la | |
participación en el primer ejercicio de evaluación docente correspondiente al nuevo instrumento, aplicativo | |
y procedimiento, en todos los casos con cifras superiores al 70 %, cuando anteriormente estas llegaban al 30 | |
%. De esta manera, el grupo poblacional con mayor participación fue el personal directivo del programa con | |
un 96.43 %, seguido por el docente con un 81,27 % y, finalmente, el de estudiantes con un 72,87 %. | |
Del mismo modo, en la Figura 3 se puede apreciar que en el 2019-2 el porcentaje de estudiantes que | |
participaron en el proceso de evaluación docente fue del 69,47%, lo cual da cuenta de una leve disminución | |
respecto al período anterior, no así en los casos relacionados con el personal directivo del programa y docentes, | |
donde es notoria la disminución en la participación. En el primer caso, tal contracción está en el orden del | |
14 % de diferencia, mientras en el segundo es cercano al 6 %, con respecto al período 2019-1. Lo anterior | |
puede ser evidencia de la importancia de trabajar de forma sistemática en la cultura de la evaluación entre los | |
miembros de la comunidad académica, además, es posible que la mayor participación del profesorado en el | |
primer semestre obedece a contar con un insumo indispensable para la convocatoria al ascenso en el escalafón | |
docente que se realiza en el segundo semestre. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
11 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
FIGURA 3. | |
Relación participantes evaluación docente período 2019-2 | |
Fuente: elaboración propia | |
Tal como se observa en la Figura 4, en el 2020-2 del total de estudiantes matriculados, el 68,58 % llevó | |
a cabo el proceso de evaluación docente. Por su parte, la autoevaluación contó con una participación del | |
81,86 % del total de docentes y un 78,13 % de las personas directoras de posgrados. Estos porcentajes | |
evidencian que se mantiene la disminución en la participación del estudiantado, mientras que se presenta la | |
mayor participación del cuerpo docente desde la implementación del nuevo instrumento y procedimiento. | |
Asimismo, se recupera la tasa de participación de las personas directoras de programa. | |
FIGURA 4. | |
Relación de participantes en la evaluación docente, período 2020-2. | |
Fuente: elaboración propia | |
En las Figuras 2, 3 y 4 se observa que, dada la naturaleza voluntaria de la participación por parte del | |
estudiantado, se ha contado con una participación cercana al 70 % que evidencia la responsabilidad las | |
personas estudiantes en su proceso de aprendizaje y una conciencia respecto a las implicaciones que tiene su | |
voz en el mejoramiento de los programas académicos. Para el cuerpo docente, el porcentaje de participación | |
en su autoevaluación es cercano al 80 % y se considera como una participación lejana del ideal, la cual es del | |
100 %, dada las directrices institucionales que incentivan al personal docente a participar en su proceso de | |
calificación. Caso parecido ocurre con las personas directoras de los programas, que, aunque participan en | |
su gran mayoría, siguen faltando a la participación de la totalidad de ellos en su compromiso con evaluar al | |
estamento docente de los programas que dirigen. | |
Finalmente, los resultados obtenidos durante estos 3 periodos académicos dan cuenta de la necesidad de | |
seguir cultivando la participación de todos los actores con el fin de llegar al 100 % en la participación de todos | |
los integrantes. | |
En este mismo orden de ideas, y con el fin de complementar los resultados, en la Figura 5 se observa el | |
porcentaje promedio de participantes para el corte de los tres períodos de implementación. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
12 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
FIGURA 5. | |
Porcentaje promedio de participación en la evaluación de docentes de posgrado | |
Fuente: elaboración propia | |
Por otra parte, en la Figura 6 se presenta el promedio obtenido en la evaluación global de docentes | |
de posgrado en la modalidad presencial en los períodos 2019-1 y 2019-2, lo cual da cuenta de una alta | |
ponderación, en consideración de que la máxima escala posible es 5,0. | |
FIGURA 6. | |
Promedio general - Evaluación docente para los periodos 2019-1 y 2019-2 | |
Fuente: elaboración propia | |
Ahora bien, para el caso de los posgrados de la DUAD, se llevó a cabo la implementación de los nuevos | |
instrumentos de evaluación docente y la compilación de los datos a través de formularios inteligentes, en este | |
caso se utilizó la plataforma Google Forms, de la cual se obtuvo la información de la Figura 7: | |
FIGURA 7. | |
Participación en la evaluación docente en los posgrados de la DUAD periodos 2019-1 a 2020-2 | |
Fuente: elaboración propia | |
El estudiantado como principal fuente de información realiza de forma anónima el proceso de evaluación | |
docente, por lo anterior los datos extraídos de los formularios son con base en el total de docentes asignados | |
a espacios académicos en los diferentes periodos, es decir, que la participación presente en la Figura 7 se | |
determinó así: | |
Para el 2019-1, del total de 115 docentes, el 57 % fue evaluado mínimo por una persona estudiante, el 23 | |
% llevó a cabo su autoevaluación y el 73 % de los directores de posgrado evaluaron al estamento docente; | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
13 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
para el 2019-2, del total de 102 docentes el 53 % fue evaluado mínimo por una persona estudiante, el 55 % | |
llevó a cabo su autoevaluación y el 64 % de las personas directoras de posgrados la realizaron; para el 2020-1 | |
del total de 96 docentes el 74 % fue evaluado mínimo por una persona estudiante, el 60 % llevó a cabo su | |
autoevaluación y todas las personas directoras de posgrados evaluaron al equipo docente; para el 2020-2, | |
del total de 96 docentes, el 69 % fue evaluado mínimo por una persona estudiante, el 70 % llevó a cabo su | |
autoevaluación y el 72 % de las personas directoras de posgrados evaluaron al profesorado. | |
TABLA 2. | |
Disminución de matrícula entre los períodos 2019-1 y 2020-2 | |
Fuente: elaboración propia | |
En los datos de la Tabla 2 se aprecia la disminución de la matrícula entre los períodos relacionados, lo cual | |
da cuenta de un fenómeno nacional, marcado por una reducción significativa en la educación superior, la cual | |
aún hoy es objeto de estudio por parte de las IES, personas investigadoras y el mismo Ministerio de Educación | |
Nacional. Además, es evidente que, en período de lanzamiento del nuevo instrumento de evaluación, se | |
presentó la tasa más alta de participación en los tres agentes establecidos. Sin embargo, llama la atención | |
que la participación de estudiantes y directivos ha venido decreciendo, lo cual implica la implementación de | |
nuevas estrategias de motivación y divulgación por parte de la Unidad de Posgrados, para poder retomar los | |
índices iniciales e incluso llegar a mejorarlos. Este fenómeno no ocurre con cuerpo docente, quienes muestran | |
una participación sostenida durante los períodos, lo cual puede ser atribuible al interés que representan estos | |
resultados en aspectos relacionados con el escalafón docente. | |
Conclusiones | |
El objetivo principal de la investigación fue realizar un ejercicio de metaevaluación de la evaluación docente | |
de posgrado de la USTA, de manera que, a partir de allí, se pudiera proponer un nuevo procedimiento y | |
batería de instrumentos de evaluación unificado para todos los programas de posgrado. De allí que en 2019 | |
se implementó el aplicativo institucional de Evaluación Docente de Posgrados en todos los programas de | |
posgrados Bogotá de la sede principal. Esto permitió contar con información confiable, oportuna y clasificada | |
para tomar decisiones y formular los distintos planes de acción según las directrices institucionales vigentes. | |
Por los planteamientos realizados a lo largo del texto, se afirma que se cumplió el objetivo propuesto, al | |
tiempo que se espera que la evaluación docente adquiera mayor relevancia en el mejoramiento de las prácticas | |
pedagógicas en relación con su plan de trabajo; mejorar los resultados de aprendizaje en el estudiantado; | |
caracterizar mejor al personal docente para replantear la asignación de funciones en perspectiva de potenciar | |
las capacidades, y establecer planes de formación que permitan incidir directamente en los aspectos a mejorar. | |
Es decir, se espera que, a partir de las buenas prácticas en la implementación de la nueva propuesta de | |
evaluación docente, se impacte directamente en la toma de decisiones, de forma que antes de determinar | |
la salida de docentes de la Universidad, se logre aprovechar realmente los resultados de la evaluación en | |
la adecuada ubicación de ellos en las funciones que realizan mejor; al tiempo que se capaciten en aquellos | |
aspectos para los que su formación previa, su experiencia o simplemente la ausencia de ella, hayan llevado a | |
bajos resultados en los procesos de evaluación. Lo anterior, enmarcado plenamente en la cualificación de la | |
profesión académica al interior de la Universidad Santo Tomás. | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
14 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
Un aspecto importante dentro del proceso fue evidenciar que en las Universidades Multicampus, como es | |
el caso del estudio, aunque desde la gestión directiva se realizan esfuerzos significativos en el establecimiento | |
de políticas y procedimientos de carácter institucional, en ocasiones la inclusión de las tres modalidades de | |
la oferta educativa de la USTA en estos aspectos suele ser un elemento que requiere la implementación de | |
oportunidades de mejora. Ello, en tanto que en la actualidad la USTA cuenta con más de 34.000 estudiantes a | |
nivel nacional, lo que requiere sinergias entre sedes, seccionales y DUAD, para que se logre llevar a cabalidad la | |
planeación institucional que obedece en este caso al Plan Integral Multicampus, que proyecta la Universidad | |
hasta 2027. | |
La existencia de un nuevo procedimiento y batería de instrumentos, y su implementación, permitió a los | |
posgrados de la USTA contar con información y acceder a nuevos escenarios de participación para para | |
la toma de decisiones en las diferentes instancias institucionales. Asimismo, el seguimiento a la aplicación | |
periódica de estos instrumentos permitió disminuir la brecha de complejidad de la cultura evaluativa | |
universitaria, que presentaba dificultades para sintonizar a los actores y el proceso de evaluación. | |
Finalmente, es importante recomendar el apoyo de las directivas a este tipo de investigaciones que implican | |
ejercicios de metaevaluación de las prácticas, procesos y procedimientos de la vida universitaria, en tanto que | |
este tipo de acciones requieren buena disposición, recursos y toma de decisiones respecto a las conclusiones | |
o innovaciones que deriven de ellas. | |
Finalmente, se recomienda a todas las personas que se acerquen a un ejercicio similar al presente, | |
propiciar una cultura institucional donde se reconozca la importancia de la evaluación en todos los niveles y | |
participantes, no solo académicos sino también administrativos, ya que este proceso permea todas las áreas y | |
permite alcanzar niveles de calidad que potencian las instituciones a lo largo del tiempo, pues se deja bien en | |
claro que la evaluación no se reduce a un ejercicio puntual, sino que abarca un proceso de constante cambio | |
que implica una revisión periódica de sus instrumentos y procedimientos. | |
Referencias | |
Belando-Montoro, M. y Alanís-Jiménez, J. F. (2019). Perspectivas Comparadas entre los Docentes de Posgrado de | |
Investigadores en Educación de la UNAM y la UCM. REICE: Revista Iberoamericana sobre Calidad, Eficacia y | |
Cambio en Educación, 17(4), 93-110. https://doi.org/10.15366/reice2019.17.4.005 | |
Cabra-Torres, F. (2014). Evaluación y formación para la ciudadanía: una relación necesaria. Revista Ibero-Americana | |
De Educação, (64), 177-193. https://doi.org/10.35362/rie640413 | |
Casanova, M. (2007). Manual de Evaluación Educativa. (9ª Ed.). La Muralla. | |
Cordero, G. y Luna, E. (2010). Revista Iberoamericana de Evaluación Educativa, 3(1e), 191-202. https://revistas.ua | |
m.es/riee/article/view/4503/4927 | |
Dirección de Investigación e Innovación. (2021). Certificado de innovación de procedimiento y servicio. Universidad | |
Santo Tomás. | |
Denzin, N. K. y Lincoln, Y. S. (Coords.) (2012). Manual de investigación cualitativa. (Vol. 1). El campo de la | |
investigación cualitativa. Gedisa, S.A. | |
Escudero-Escorza, T. (2000). La evaluación y mejora de la enseñanza en la universidad: otra perspectiva. Revista de | |
Investigación Educativa, 18(2), 405-416. https://revistas.um.es/rie/article/view/121071/113761 | |
Escudero-Escorza, T. (2003). Desde los test hasta la investigación evaluativa actual. Un siglo, el XX, de intenso | |
desarrollo de la evaluación en educación. Relieve, 9(1), 11-43. https://ojs.uv.es/index.php/RELIEVE/article/v | |
iew/4348/4025 | |
Escudero-Escorza, T. (2006). Claves identificativas de la investigación evaluativa: análisis desde la práctica. Contextos | |
Educativos, 8(9), 179-199. https://redined.educacion.gob.es/xmlui/handle/11162/47847 | |
Escudero-Escorza, T. (2016). La investigación evaluativa en el Siglo XXI: Un instrumento para el desarrollo educativo | |
y social cada vez más relevante. RELIEVE, 22(1), 1-20. http://dx.doi.org/10.7203/relieve.22.1.8164 | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
15 | |
Revista Educación, 2022, vol. 46, núm. 2, Julio-Diciembre, ISSN: 0379-7082 2215-2644 | |
Escudero-Escorza, T. (2019). Evaluación del Profesorado como camino directo hacia la mejora de la Calidad Educativa. | |
Revista de Investigación Educativa, 37(1), 15–37. https://doi.org/10.6018/rie.37.1.342521 | |
Escudero-Muñoz, J. M. (2010). La selección y la evaluación del profesorado. Revista Interuniversitaria de Formación | |
del Profesorado, 24(2), 201-221. https://www.redalyc.org/articulo.oa?id=27419198010 | |
Fernández, N. y Coppola, N. (2010). La evaluación de la docencia universitaria desde un abordaje institucional. Revista | |
Iberoamericana de Evaluación Educativa, 3(1), 37-50. https://repositorio.uam.es/handle/10486/661582 | |
Fernández, N. y Coppola, N. (2012). Aportes para la reflexión sobre la evaluación de función docente universitaria. | |
Revista Iberoamericana de Evaluación Educativa, 5(1e), 106-119. https://revistas.uam.es/riee/article/view/4430 | |
Guba, E. G. y Lincoln, Y. S. (1989). Fourth Generation Evaluation [Evaluación de cuarta generación]. Sage | |
Hernández-Sampieri, R. y Mendoza-Torres, C. P. (2018). Metodología de la investigación: las rutas cuantitativa, | |
cualitativa y mixta. McGraw-Hill Interamericana Editores, S.A. | |
House, E. (1995). Evaluación, ética y poder. Morata. | |
Litwin, E. (2010). La evaluación de la docencia: plataformas, nuevas agendas y caminos alternativos. Revista | |
Iberoamericana de Evaluación Educativa 2010, 3(1), 51-59. https://revistas.uam.es/riee/article/view/4504 | |
Luna-Serrano, E. (2008). Evaluación en contexto de la docencia en posgrado. . REencuentro. Análisis de Problemas | |
Universitarios, 75-84. https://www.redalyc.org/pdf/340/34005307.pdf | |
Mesa-Angulo, J. (2020). La Santo Tomás: una universidad país. Ediciones USTA. https://repository.usta.edu.co/ha | |
ndle/11634/29077?show=full | |
Montoya, J. y Largacha, E. (2013). Calidad de la educación superior: ¿Recursos, actividades o resultados? En L. OrozcoSilva (Ed.), La educación superior: retos y perspectivas. (pp. 379-417). Ediciones Uniandes. | |
Murillo-Torrecilla, F. (2008). La evaluación del profesorado universitario en España. Revista Iberoamericana de | |
Evaluación Educativa, 1(3), 29-45. https://repositorio.uam.es/handle/10486/661532 | |
Otzen, T. y Manterola, C. (2017). Técnicas de Muestreo sobre una Población a Estudio. International Journal of | |
Morphology, 35(1), 227-232. https://scielo.conicyt.cl/pdf/ijmorphol/v35n1/art37.pdf | |
Ramírez-Garzón, M. I. y Montoya-Vargas, J. (2014). La evaluación de la calidad de la docencia en la universidad: Una | |
revisión de la literatura. REDU. Revista de Docencia Universitaria, 12(2), 77-95. https://riunet.upv.es/handle/ | |
10251/139977 | |
Rossett, A. y Sheldon, K. (2001). Beyond the Podium: Delivering Training and Performance to a Digital World. [Más | |
allá del podio: brindar capacitación y rendimiento en un mundo digital]. Jossey-Bass/Pfeiffer. | |
Rueda, M. (2014). Evaluación docente: La valoración de la labor de los maestros en el aula. Revista Latinoamérica de | |
Educación Comparada, 5(6), 97-106. http://www.saece.com.ar/relec/revistas/6/art1.pdf | |
Santos-Guerra, M. (2010). La evaluación como aprendizaje: una flecha en la diana. (3a ed.). Bonum. | |
Stake, R. (2006). Evaluación comprensiva y evaluación basada en estándares. Editorial Graó. | |
Saravia-Gallardo, M. A. (2004). Evaluación del profesorado universitario. Un enfoque desde la competencia profesional | |
[Tesis Doctoral, Universidad de Barcelona]. https://dialnet.unirioja.es/servlet/tesis?codigo=3411 | |
Scriven, M. (1967). e methodology of evaluation [La metodología de la evaluación]. En R. Tyler, R. Gagné y M. | |
Scriven (Eds.), Perspectives of curriculum evaluation. (pp. 39-83). Rand McNally. | |
Simpson, R. H. (1967). La autoevaluación del maestro. (E. F. Setaro, Trad.). Paidós. | |
Stufflebeam, D. L. y Shinkfield, A. J. (1987). Evaluación sistemática. Guía teórica y práctica. Paidós-MEC. | |
Tejedor-Tejedor, F. J. (2009). Evaluación del profesorado universitario: enfoque metodológico y algunas aportaciones | |
de la investigación. Estudios Sobre Educación, (16). https://dadun.unav.edu/handle/10171/9169 | |
Tejedor-Tejedor, F. J. y Jornet-Meliá, J. M. (2008). La evaluación del profesorado universitario en España. Revista | |
electrónica de investigación educativa, 10 (SPE), 1-29. https://redie.uabc.mx/redie/article/view/199 | |
Universidad Santo Tomás. (2004a). Estatuto Docente. Ediciones USTA | |
Universidad Santo Tomás. (2004b). Proyecto Educativo Institucional (3a ed.). https://usantotomas.edu.co/documen | |
tos-institucionales | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
16 | |
Freddy Patiño-Montero, et al. Actualización de la evaluación docente de posgrados en una universid... | |
Universidad Santo Tomás. (2010a). Dimensión de la Política Docente. https://usantotomas.edu.co/documentos-inst | |
itucionales | |
Universidad Santo Tomás. (2010b). Modelo Educativo Pedagógico. USTA Ediciones. | |
Universidad Santo Tomás. (2015). Documento Marco Desarrollo Docente. USTA Ediciones. | |
Vásquez-Rizo, F. E. y Gabalán-Coello, J. (2012). La evaluación docente en posgrado: variables y factores influyentes. | |
Educación y Educadores, 15(3), 445-460. https://www.redalyc.org/pdf/834/83428627006.pdf | |
Notas | |
[1] Este artículo es resultado del proceso de metaevaluación adelantado sobre la Evaluación Docente (Procedimientos e | |
Instrumentos), realizado por la Unidad de Posgrados y el Equipo de Currículo de la Facultad de Ciencias y Tecnologías, | |
Universidad Santo Tomás. Bogotá, 2017-2020. | |
[2] Primera universidad privada del país en obtener la Acreditación Institucional de Alta Calidad en la modalidad | |
Multicampus (Resolución número 01456 del 29 de enero de 2016, MEN). | |
[3] DUAD: División de Educación Abierta y a Distancia. | |
[4] VUAD: Vicerrectoría de Educación Abierta y a Distancia | |
Información adicional | |
Cómo citar: Patiño-Montero, F., Godoy-Acosta, D. C. y Arias-Meza, D. C. (2022). Actualización de la | |
evaluación docente de posgrados en una universidad multicampus. Experiencia desde la Universidad Santo | |
Tomás (Colombia). Revista Educación, 46(2). http://doi.org/10.15517/revedu.v46i2.47955 | |
PDF generado a partir de XML-JATS4R por Redalyc | |
Proyecto académico sin fines de lucro, desarrollado bajo la iniciativa de acceso abierto | |
17 | |
Çukurova Üniversitesi Eğitim Fakültesi Dergisi | |
Vol: 48 Numb: 2 Page: 1299-1339 | |
https://dergipark.org.tr/tr/pub/cuefd | |
Analyzing Academic Members’ Expectations from a Performance | |
Evaluation System and Their Perceptions of Obstacles to Such an | |
Evaluation System: Education Faculties Sample | |
Gürol YOKUŞ a*, Tuğba YANPAR YELKEN b | |
a | |
Sinop Üniversitesi, Eğitim Fakültesi, Sinop/Türkiye | |
Mersin Üniversitesi, Eğitim Fakültesi, Mersin/Türkiye | |
b | |
Article Info | |
DOI: 10.14812/cufej.467359 | |
Article history: | |
Received 04.10.2018 | |
Revised | |
25.03.2019 | |
Accepted 18.10.2019 | |
Keywords: | |
Performance evaluation, | |
Quality in higher education, | |
Accountability. | |
Abstract | |
The assesment and evaluation of academic members in faculties in a systematic way is | |
a crucial issue because higher education institutions put a large emphasis on a | |
transparent, efficient and successful management. This study aims to conduct a mixed | |
(quantitative and qualitative) research about the expectations of Education Faculties’ | |
academic members about a performance evaluation approach and the obstacles to such | |
an evaluation system. Convergent parallel mixed method design has been preferred as | |
research model. “Expectations from performance assessment” subscale and “barriers | |
to performance assessment” subscale have been used as data collection tools which are | |
developed by Tonbul (2008). Independent Samples t-test and ANOVA are used for | |
analysis of quantitave data; and content analysis is used for analysis of qualitative data. | |
As a result of this study, it is found out that academic members have a moderate level | |
of expectations from a performance evaluation approach. The highest expectations | |
belong to assistant professors while the lowest belong to professors. The mostly agreed | |
expectations of academic members from a performance evaluation approach are found | |
to be “developing a consensus about the criteria of an effective academician, affecting | |
professional development of academic members positively and increasing workload of | |
academic members”. The most frequent obstacles to a performance evaluation | |
approach emerged as “current organizational mechanism of higher education | |
institutions” and “workload of faculty academic members”. The scores of both | |
expectations and obstacles significantly differ depending on “taking academic incentive, | |
work experience in higher education, academic title and satisfaction level of | |
academicians from their institutions”. As a result of qualitative analysis, there emerge | |
many themes and codes related to a performance evaluation system. In “Attitude | |
Towards Performance Approach” theme, the most frequent codes appeared to be | |
“adopters, doubters”. In Academicians’ Priorities theme, the codes emerged as | |
“research and publications, evaluation of quality of instruction, advisory for | |
undergraduates and postgraduates”; In Positive Effects theme, the codes emerged as | |
“motivation, financial support, search of quality”; In Negative Effects theme, the codes | |
emerged as “intra-institutional rivalry, academic dishonesty”; In Obstacles theme, the | |
codes emerged as “intense workload, lack of instrintic motivation”; and finally In | |
Suggestions theme, the codes emerged as “more officer employment, institutional | |
support for academic efforts and research publishings”. | |
Eğitim Fakültesi Öğretim Elemanlarının Performans Değerlendirme | |
Yaklaşımından Beklentileri ve Performansın Önündeki Engellere İlişkin | |
Görüşlerinin İncelenmesi: Karma Yöntem Araştırması | |
* | |
Makale Bilgisi | |
Öz | |
DOI: 10.14812/cufej.467359 | |
Öğretim elemanlarının performansının sistematik şekilde ölçülmesi ve değerlendirilmesi | |
yükseköğretim kurumlarının kalitesi için önemlidir. Bu çalışmanın amacı, çeşitli devlet | |
üniversitelerinin Eğitim Fakültelerinde görev yapan öğretim elemanlarının performans | |
değerlendirme yaklaşımından beklentileri ve performans değerlendirmenin önündeki | |
Author: [email protected] | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Makale Geçmişi: | |
Geliş | |
04.10.2018 | |
Düzeltme 25.03.2019 | |
Kabul | |
18.10.2019 | |
Anahtar Kelimeler: | |
Performans değerlendirme, | |
Yükseköğretimde kalite, | |
Hesap verebilirlik. | |
engellere ilişkin görüşlerinin nicel ve nitel olarak incelenmesidir. Bu araştırma | |
kapsamında karma araştırma yöntemlerinden yakınsayan paralel karma desen tercih | |
edilmiştir. Veri toplama aracı olarak Tonbul (2008) tarafından geliştirilen “Performans | |
Değerlendirme Yaklaşımına İlişkin Beklentiler” altölçeği ve “Performans Değerlendirme | |
Sisteminin Önündeki Engeller” altölçeği kullanılmıştır. Nicel veriler için ilişksiz | |
örneklemler t-testi, ANOVA; nitel veriler için içerik analizi tercih edilmiştir. Çalışmanın | |
sonucunda, öğretim elemanlarının performans değerlendirme yaklaşımıyla ilgili | |
beklentilerinin orta düzeyde olduğu, performans değerlendirme yaklaşımıyla ilgili en | |
yüksek beklentiye sahip olanların doktor öğretim üyeleri, en düşük beklentiye sahip | |
olanların ise profesörler olduğu ortaya çıkmıştır. Performans değerlendirmenin | |
önündeki en önemli iki engelin ise yükseköğretim kurumlarının mevcut örgütsel işleyişi | |
ve öğretim üyelerinin iş yükü olduğu görülmüştür. Performans Değerlendirmeye İlişkin | |
Beklentiler ve Engellerle ilgili puanlar “akademik teşvik alma, çalışma deneyimi, | |
akademik unvan ve kurumdan memnuniyet düzeyi”ne göre anlamlı farklılık | |
göstermektedir. Nitel analiz sonucunda ise en sık tekrar eden kodlara bakıldığında ise | |
Değerlendirmeye Karşı Tutum temasında “benimseyenler, şüpheyle yaklaşanlar”; | |
Akademisyenlerin Öncelikleri temasında “akademik yayınlar”, “öğretimin kalitesinin | |
değerlendirilmesi”, “lisans ve lisansüstü danışmanlık”; Olumlu etkileri temasında | |
“motivasyon”, “maddi destek”, “kalite arayışı”; Olumsuz etkileri temasında “kurum içi | |
rekabet, akademik sahtekarlıklar”; Engeller temasında “yoğun iş yükü, içsel motivasyon | |
eksikliği”; Öneriler temasında ise “memur istihdamı, yayın ve çalışmaların kurumca | |
desteklenmesi” kodları ortaya çıkmıştır. | |
Introduction | |
Nowadays, many organizations focus on making a systematic performance evaluation of its members | |
for a transparent, efficient and successful management. In higher education, public or private universities | |
make effort to produce a reliable evaluation system. The higher education instutions feel the necessity to | |
identify performance indicators and announce their level of achieving mission and strategies due to a | |
variety of reasons such as global competitiveness and society pressure for transparency (Hamid, Leen, Pei | |
& Ijab 2008). Especially in competitive environment of 21 st century, a better performance evaluation | |
system creates advantages for universities and it offers opportunities for evaluating their own running | |
process and members in a more effective way. | |
When literature is reviewed, it is noticed that there are discussions related to accountability of higher | |
education institutions. The base of discussions focuses on evaluation of performance of institutions and | |
making a public announcement of the results involving stakeholders’ views. Also, universities are criticized | |
for their academic members behaving like ivory towers as a closed society (Glaser, Halliday, & Eliot, 2003). | |
The criticisms are summarized by Esen and Esen (2015): | |
• | |
• | |
• | |
• | |
• | |
The research conducted by academic members doesn’t focus on societal problems. | |
Their studies are too much theoretical. | |
Societal resources are wasted in vain (Etzkowitz, Webster, Gebhardt, & Terra, 2000). | |
Research are not transformed into communal, and they are conducted esoterically. | |
The identities of academicians transform into individuals with constricted autonomy who were | |
worried about disturbing university or administrative structure (Elton, 1999). | |
Higher education institutions should not be viewed as unamenable organizations and institutions, | |
although they function as autonomous. Higher education institutions have the power to influence the | |
society, economic structure and social life to which they belong. Therefore, instead of being ivory towers, | |
universities should take science, society and nation together and perform at international quality | |
standards and feel a conscientious responsibility to prioritize social benefit rather than career | |
development. Vidovich and Slee (2001) claims that it is necessary to make performance evaluations in | |
universities for the following reasons: | |
| |
| |
accountability to customers (continuous improvement activities for scientific research), | |
accountability to government (efficient and productive use of resources), | |
1300 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
| |
accountability to students and society (providing comprehensive educational experiences, | |
providing vocational training to improve the quality of life, meeting the labor force needs of the | |
society). | |
Since the beginning of 21st century, higher education has gone through significant changes. UNESCO | |
(2004) makes a list of the global developments which provide new inferences for higher education | |
institutions: i) the emergence of new education providers such as multi-national companies, corporate | |
universities, and media companies; ii) new forms of delivering education including distance, virtual and | |
new face-to-face, such as private companies; iii) greater diversification of qualifications and certificates; | |
iv) increasing mobility of students, programmes, providers and projects across national borders; v) more | |
emphasis on lifelong learning which in turn increases the demand for postsecondary education; and vi) | |
the increasing amount of private investment in the provision of higher education. Considering all these | |
developments, higher education institutions have the capacity of affecting the society, economic | |
structure and social life. Therefore, they are expected to make performance at international quality | |
standards considering science, community and nation altogether instead of being ivory towers and they | |
are expected to prioritize societal benefit as well as carreer development. Vidovich and Slee (2001) | |
emphasize that making a performance evaluation in universities is necessary in terms of accountability to | |
members (sustainable enhancement efforts for scientific research), accountability to government | |
(efficient and creative use of resources) and accountability to students&society (providing extensive | |
educational experience, providing professional education for increasing life quality, meeting the need of | |
society’s workforce). | |
Performance evaluation in higher education involves a variety of products and processes. In its | |
essence, performance evaluation indicates the minimum acceptable level in terms of quality and it | |
provides opportunity for identifying strenghts and weaknesses of individuals and institutions. In this way, | |
individuals and institutions not only become aware of their weakness, but also recognize at what aspects | |
they are good at. Batool, Qureshi and Raouf (2010) state that performance evaluation might not include | |
all dimensions of this concept and performance evaluation of an institution does not mean the same thing | |
with assessing academic programs, courses or the quality of graduates. They pointed out that | |
performance evaluation of an institution mean assessing the current situation in terms of the quality and | |
effectiveness of the institution. | |
Within the context of this study, performance evaluation in higher education is defined as «assessing | |
the professional qualifications of academic members related to their instructional roles and their level of | |
contribution to accomplishing institutional goals. Therefore, a performance evaluation system is | |
necessary for three purposes: assessing academic members’ a variety of studies such as research, | |
academic service, instruction and publications, offering them a comprehensive feedback supporting their | |
self-development and valuing their current performance. Vincent (2010) points out the advantages of a | |
performance evaluation approach in higher education: | |
• Development and progression of individuals stand on realistic goals. | |
• It creates conformity between individuals’ goals and institution’s goals. | |
• It helps to identify the strengths and weakness of individuals within an organization. | |
• It works as a feedback mechanism for purpose of enhancement. | |
• It helps to identify which courses and instruction are needed. | |
• It helps the institution to take a major role and responsibility in terms of education, society, | |
economics and politics. | |
Tonbul (2008) claims that performance evaluation practices increase the accomplishment level of | |
institutional goals, help to identify failing issues in organizational process and provide specific data about | |
the organizational climate and culture’s effect on members; which in turn lead to an increase in | |
institutional performance. It is seen that organizations become more successful and lasting which make | |
1301 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
an effective and functional use of feedback mechanism in processes related to workflow and organization | |
(Latham & Pinder, 2005). Kalaycı (2009) attracts the attention that it is very unlikely to predict success or | |
failure in higher education without a proper evaluation; however, with evaluation of educational | |
performances of academic members, it becomes open to criticism by other stakeholders and this situation | |
is challenging. This issue might result in negative circumstances. For instance, Kim et al. (2016) claim that | |
a large number of professors put a low emphasis on their role of educator while putting a greater | |
emphasis on their researcher identity; because faculty evaluation systems are mainly based on research. | |
In order not to cause negative consequences, performance evaluations should not be done for fulfilling | |
formality or obligation. This threat is especially valid for public universities funded by government. Kalaycı | |
and Çimen (2012) attract the attention that public universities need quality studies from now on and it | |
emerges as a necessity for them to perform institutional quality process practices not just for purpose of | |
formality but increasing quality and standing out in this competitive environment. | |
The major reasons which encourage universities to make performance evaluation in 21st century | |
emerge as institutional image and reputation, internationalization and global university rankings. There | |
are many factors affecting institutional reputation and image. In a report published by Higher Education | |
Authority (2013), it appears that academic members are closely interested in their field of expertise which | |
indicates that they are continually following studies which are conducted in litearature review. When it | |
comes to internationalization, an institution’s including both national and international students and | |
academic members indicates that it has a global identity and is ready for global competitiveness in global | |
market (O'Connor et al., 2013). However, the number of students and academic members is not a | |
sufficient indicator for quality. The quality of academic members and the quality of their teaching | |
performance should also be assessed because they affect the the quality of education and they are | |
regarded as assurance for quality control (Açan and Saydan, 2009). | |
When literature is reviewed, it is noticed that the most frequently used performance assessment and | |
evaluation techniques in higher education are Self-Assessment, Key Performance Indicators (KPI), Relative | |
Evaluation, Appraisal, Six Sigma and Total Quality Management (Çalışkan, 2006; Kalaycı, 2009; Paige, | |
2005). All of these techniques might not be appropriate for assessing individual performances of academic | |
members. For instance, performance comparison technique involves evaluating the current performance | |
of an individual with performance of another one who is accepted as leader within the same context. This | |
might not be inappropriate for evaluating academic members’ performance because it is strictly | |
dependent upon excellence of quality; however, each individual differs from each other in terms of | |
working style and self-development. Among these techniques, Key Performance Indicators stand out as a | |
convenient way as an evaluation method in higher education. In KPIs, performance indicators are | |
operationally defined and it is specified which operations constitute a concept. | |
When current practices are reviewed related to performance evaluation in Turkish higher education, | |
it is criticized that there is only made a quantitative assessment of academic members’ research and | |
publications and the evaluation is based on subjective judgements (Esen and Esen, 2015). In this regard, | |
Council of Higher Education started academic incentive system in 2015 to increase academic members’ | |
motivation in Turkey and to support their academic activities financially (Academic Incentive Grant | |
Regulation, 2015). Within this academic incentive regulation, the performance of the academic staff is | |
evaluated by the Council of Higher Education based on their national and international projects, research, | |
publications, exhibitions, patents received, references to their studies, and academic awards received. As | |
a result, faculty members who perform sufficient work are financially supported. Apart from academic | |
incentive, there is a variety of performance evaluations of academic members in Turkish higher education | |
system such as: | |
a) Registry system | |
b) Academic promotion and appointment criteria | |
c) Questionnaires of Academic Member Evaluation | |
d) Annual reports | |
1302 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
e) Surveys of Student Views | |
(Esen & Esen, 2015) | |
Performance evaluation in higher education is very important in terms of increasing the effectiveness | |
of the services provided; however, the criteria and reliability of this process are as important. In this | |
regard, Çakıroğlu, Aydın and Uzuntiryaki (2009) state that there are very promising studies which indicate | |
the reliability of the evaluations made by experienced faculty members and they emphasize that the | |
following criteria should be taken into consideration during evaluation: | |
| |
| |
| |
| |
| |
| |
| |
collecting data from various sources relaed to teaching performance (such as colleagues, | |
students, advisors, master students, graduates) and in different formats (student assessment | |
surveys, student interviews, observation results, course materials, student products, etc.), | |
clearly identifying evaluation criteria, | |
informing about evaluation process, | |
informing the assessors on how to make an assessment, | |
the candidates not playing an evaluative role, | |
random selection of the assessors among those who meet the criteria, | |
minimum 3 and maximum 5 members taking part in jury. | |
The basis of the evaluation of the performance of faculty members is to increase the effectiveness of | |
universities. There is increasing pressure on national and global universities to systematically perform | |
performance evaluations due to concepts such as quality, efficiency, effectiveness, accountability. The | |
reason why education faculties are preferred in this study is that Higher Education Council of Turkey | |
emphasizes accreditation studies especially in education faculties within the scope of “Bologna Process. | |
Higher education institutions in Turkey aim to increase their accountability as a quality indicator and | |
inform the internal and external stakeholders of the current situation. In order to prove that they have | |
accomplished their mission and vision within this scope, universities carry out performance evaluation | |
studies of the instructors and present this to the knowledge of the public, students, families, government | |
and private sector. In the accreditation process carried out in education faculties, it is important to identify | |
academic staffs’ expectations and barriers for performance assessment. Therefore, while performance | |
evaluation is so important for higher education institutions, research is needed to determine the | |
expectations of the instructors whose performance is evaluated. | |
Within the context of this study, there is made a quantitative and qualitative analysis of Education | |
Faculty academic members’ expectations from a performance evaluation system and the obstacles to | |
such an evaluation system. The following research questions are attempted to be answered: | |
1. What are the expectations of academic members in Education Faculties from a performance | |
evaluation system? | |
1.1. Do the expectations of academic members from a performance evaluation system differ | |
depending on following as variables: academic title, academic experience, academic incentive status and | |
satisfaction from institutions? | |
2. What are the perceptions of academic members in Education Faculties related to obstacles to a | |
performance evaluation system? | |
2.1. Do the perceptions of academic members related to obstacles to a performance evaluation | |
system differ depending on following variables: academic title, academic experience, academic incentive | |
status and satisfaction from institutions? | |
3. What are general views of academic members in Education Faculties related to performance | |
evaluation system? | |
1303 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Method | |
Convergent parallel mixed design has been preferred as research model in this study. Quantitative and | |
qualitative data were collected simultaneously, analyzed independently and then they were converged in | |
discussion. There is an equal emphasis on both quantitative and qualitative part in convergent mixed | |
design and there is made independent analysis and eventually interpretations are made using both data | |
(Creswell and Plano Clark, 2014). The Figure 1 shows the mixed design used in this research: | |
Descriptive | |
statistics | |
t-test and ANOVA | |
Qantitative data | |
collection and | |
analysis | |
Qualitative data | |
collection and | |
analysis | |
Interpretations of | |
both quantitative | |
and qualitative | |
analysis | |
Content | |
Analysis | |
Şekil 1. A model for a convergent parallel design in mixed research studies | |
Participants | |
The data of this study were collected in 2018 from academic members in Education Faculties in Turkey | |
including dr. research assistants, assistant professors, associate professors and professors. Participants | |
are from different regions of Turkey including Marmara, Black Sea, Egean, Mediterranean and East | |
Anatolia. The instructors who have too much course load are not included in the study group and data are | |
collected only from the faculty members who completed their doctoral education. Within the context of | |
this study, convenient sampling technique was used for quantitative data for sample selection and data | |
were obtained from 104 academic members in six universities who agreed to participate in this research. | |
For qualitative data, participants were selected with maximum diversity sampling technique for purpose | |
of collecting all kinds of different views about the current situation which is among purposeful sampling | |
techniques. Qualitative data were obtained from 50 academic members in Education Faculties. | |
Quantitative phase includes 25 dr. research assistants, 35 assistant professors, 31 associate professors | |
and 13 professors. Since convenient sampling is used, sampling is not made according to the department | |
criteria; but ultimately, 22 percent of participants teach in Science Education Department, 11 percent | |
teach in Pre-School Education Department, 28 percent teach in Educational Sciences Department and 31 | |
percent teach in Primary School Teaching Department. In qualitative phase, samples include 13 research | |
assistants, 17 assistant professors, 15 associate professors and 5 professors. Maximum diversity has been | |
achieved according to academic title and department variable. 20 percent of participants teach in Science | |
Education Department, 10 percent teach in Pre-School Education Department, 40 percent teach in | |
Educational Sciences Department and 30 percent teach in Primary School Teaching Department. | |
Data Collection Tool | |
In this study, personal information form, “Expectations from Performance Evaluation Approach” | |
subscale with 4-likert 16 items and “Obstables to Performance Evaluation Approach” subscale with 10 | |
items developed by Tonbul (2008) were used for data collection. Exploratory factor analysis and varimax | |
rotation were applied for scale development. The internal consistency reliability related to subscale | |
1304 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
“Expectations from Performance Evaluation Approach” was found to be .92, and subscale “Obstables to | |
Performance Evaluation Approach” found to be .87. The internal consistency of these subscales was | |
recalculated in this study and reliability of the first subscale appeared as .84 and second subscale as .78. | |
If Cronbach Alpha Coefficient value - an indicator of homogeneity between scale items- is between .60.80, it is an evident of high reliability (Tonbul, 2008). The items in these subscales are accumulated in one | |
factor and this one factor explains fifty-six percent of total variance. | |
Also, a questionnaire with open-ended questions was developed for purpose of supporting | |
quantitative data and making a deeper analysis. A professor from Educational Sciences Department, an | |
Associate Professor from Assessment and Evaluation Department and a Professor who works as an expert | |
in higher education quality studies analyzed the questions and made some suggestions. The questions | |
were revised in light of these suggestions. The final form of questions includes: | |
2.1. What do you think about making a periodic and data-based assessment of academic members? | |
2.2. What criteria should be assessed within performance evaluation? Could you order these criteria | |
according to significance level for you? | |
2.3. What are the positive and negative consequences of making a performance evaluation of | |
academic members? | |
2.4. What are the obstacles to performance of academic members in higher education and what do | |
you suggest for overcoming these obstacles? | |
Data Analysis | |
The equality of variances and normality of data were checked in order to identify the analysis method | |
for quantitative data. The skewness and kurtosis values ranged from -1 to +1 which indicated that data | |
distributed normally. Also, sample size was bigger than 50 (N=104); therefore, Kolmogorov Smirnov test | |
was done for normality of data and it was found not to be significant (p>.05) which was an indicator of | |
normality. As a result, parametric tests were used in the study. Independent Samples T test was done for | |
checking whether there was a significant difference between participants in terms of academic incentive | |
variable. One Way of Variance Analysis (ANOVA) was done for checking whether there was a significant | |
difference between participants in terms of variables of work experience, academic title and satisfaction | |
from institution. | |
Inductive content analysis was done for analyzing qualitative data. Rater reliability agreement | |
percentages were identified by investigating academic members’ views collected by open-ended | |
questions. Academic members’ views collected by questionnaire were coded by researcher and one | |
independent expert. Miles and Huberman (1994)’s reliability formula was used for calculation of | |
agreement percentages. | |
Reliability = Agreement/(Agreement + Disagreement) | |
The interrater-reliability related to all codes identified by two raters was found to be 0.89. It is possible | |
to assert that reliability is met for data analysis because %80 and above agreement percentage is accepted | |
as sufficient (Mokkink et al., 2010). In this study, there has been used a variety of validity strategies listed | |
by Creswell (2003) which are frequently used in qualitative research methods such as “Members’ Check, | |
“External Audits”, “Rich, Thick Description” and “Chain of Evidence”. The participants were asked whether | |
the findings of the study reflect their own ideas correctly, an independent expert who had little contact | |
with the study participants and who knew the method of study was consulted and this study remained as | |
loyal to the nature of the data as possible with direct quotations. | |
1305 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Findings | |
3.1 Findings Related to Expectations From a Performance Evaluation System | |
The first research question of this study “What are the perceptions of academic members related to | |
their expectations from a performance evaluation system?” was attempted to be answered. Table 1 | |
presents the general score mean of participants and Table 2 presents the score means depending on | |
academic titles. | |
Table 1. | |
The General Score Mean Related To Academic Members’ Expectations from a Performance Evaluation | |
System | |
General score mean | |
Expectation Subscale | |
of | |
N | |
Minimum | |
Maximum | |
Mean | |
104 | |
1,50 | |
3,31 | |
2,3023 | |
Standard Deviation | |
,43859 | |
In Table 1, when score means of academic members are reviewed, it is seen that their expectations | |
from a performance evaluation system is not at a high level ( =2,30), it is at moderate level (which means | |
partially agree). Table 2 presents ANOVA test results which indicate whether academic members’ | |
expectations significantly differ depending on academic titles: | |
Table 2. | |
The ANOVA Results Related To Whether Expectations from a Performance Evaluation System Differ | |
Depending On Academic Title | |
N | |
Standart | |
Deviation | |
Sum of | |
Squares df | |
Mean of F | |
Squares | |
p | |
Source of Difference | |
Research | |
Assistant>Associate | |
Prof. | |
25 2,4525 | |
,50688 | |
Research | |
Assistant | |
Between | |
Groups | |
3 | |
1,774 | |
12,24.000Assistant Prof. | |
>Associate Prof. | |
5,321 | |
Associate Prof. >Prof. | |
Assistant | |
Professor | |
5 | |
Associate | |
Professor | |
1 | |
Professor | |
Total | |
3 | |
2,4875 | |
,25174 | |
3 | |
2,1754 | |
,44177 Intragroup | |
1 | |
1,8173 | |
,16230 | |
104 2,3023 | |
,43859 | |
3 | |
14,492 100 | |
145 | |
When arithmetic mean and standard deviation values according to academic titles are analyzed, it is | |
observed that assistant professors have the highest, on the other hand, professors have the lowest | |
expectations from a performance evaluation system. As there appears a significant difference between | |
groups in Table 2, post hoc tests have been used for identifying between which groups the significant | |
1306 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
difference is. As the variances are found not to be equal with “Levene F” test, Games-Howel statistical | |
method has preferred which works well with unequal groups. As a result of analysis, it is found out that | |
Research Assistants and Assistant Professors have higher level of expectations than Associate Professors | |
and Professors. | |
When subscale is analyzed item by item, it appears that the highest expectations from a performance | |
evaluation system are: | |
It creates a consensus on the criteria of being an effective academic member ( | |
It positively affects academic members’ professional development ( | |
It increases workload of academic members ( | |
=3,42) | |
=3,27) | |
=2,40) | |
It causes tension within institution ( =2,39) | |
Academic members’ the lowest expectations from a performance evaluation system appear as: | |
It increases academic members’ motivations ( | |
=1,90) | |
It contributes to development of a qualified institutional culture (values, attitude towards work, | |
understanding of responsibility, relationships etc.) ( =1,76) | |
It helps academic members to get better prepared for their courses ( | |
=1,70) | |
Table 3 presents the analysis results related to whether academic members’ expectations froma | |
performance evaluation system differ depending on academic incentive status. | |
Table 1. | |
T-test Results Related to Whether Expectations from a Performance Evaluation System Differ Depending | |
On Academic Incentive Status | |
Standard | |
Deviation | |
N | |
Academic | |
Incentive | |
Yes, I take | |
52 | |
2,43 | |
,38 | |
No, I don’t | |
52 | |
2,16 | |
,45 | |
t | |
,322 | |
p | |
,002 | |
Firstly, equality of variances are checked with Levene test and significance value in appropriate t | |
column is accepted. As a result of analysis, expectations from a performance evaluation system | |
significantly differ depending academic incentive status [t(102)=3,22, p<.05)]. Academic members who | |
take academic incentive -a financial aid which is given to academic members who produce a certain | |
number of research and projects- have significantly higher level ol expectations than those who do not. | |
Table 4 presents the ANOVA results related to whether academic members’ expectations from a | |
performance system differ depending on work experience. | |
1307 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Table 4. | |
The ANOVA Results Whether Expectations from a Performance Evaluation System Differ Depending On | |
Work Experience | |
Standard | |
Deviation | |
N | |
Sum of | |
squares | |
df | |
Mean of | |
squares | |
F | |
p | |
Source | |
Difference | |
of | |
0-5 years> more | |
than 15 years | |
0-5 years | |
17 | |
2,43 | |
,51 | |
Between | |
Groups | |
1,55 | |
3 | |
4,67 | |
10,28 | |
,000 | |
6-10 years> more | |
than 15years | |
11-15 years>more | |
than 15 years | |
6-10 years | |
38 | |
2,43 | |
,28 | |
11-15 years | |
14 | |
2,51 | |
,44 | |
More than 15 | |
35 | |
years | |
2,00 | |
,39 | |
104 2,30 | |
,43 | |
Total | |
1,51 100 | |
Within | |
group | |
15,1 | |
103 | |
When Table 4 is reviewed, it is clearly seen that the lowest scores of expectations belong to academic | |
members who have more than 15 years working experience. The mean scores of other three groups are | |
higher than the mean score of this group at a significant level; but there is no significant difference | |
between the mena scores of these three groups. Table 5 presents the ANOVA results related to whether | |
academic members’ expectations from a performance system differ depending on satisfaction level from | |
their institutions. | |
Table 5. | |
The ANOVA Results Related To Whether Expectations from a Performance Evaluation System Differ | |
Depending On Satisfaction Level from Institution | |
Standard | |
Deviation | |
N | |
Low | |
10 | |
2,70 | |
,31 | |
Moderate 35 | |
2,39 | |
,32 | |
42 | |
2,00 | |
,47 | |
Very high 17 | |
1,80 | |
,11 | |
High | |
Total | |
104 2,30 | |
Sum of | |
squares | |
df | |
Mean | |
Square | |
Between | |
Groups | |
5,97 | |
3 | |
1,991 | |
Within | |
Groups | |
13,08 | |
100 | |
,138 | |
,43859 | |
103 | |
1308 | |
F | |
p | |
14,383 ,000 | |
Source of | |
variation | |
Low, Moderate | |
> High, Very | |
High | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
In Table 5, there is observed a significant difference between mean scores of groups (p<.05) ; therefore | |
Games-Howel post hoc test is done due to unequality of variances for identying the source of variation. | |
As a result of post-hoc test, it is observed that academic members who have low and moderate level of | |
satisfaction from their institutions have significantly higher level of expectations from a performance | |
evaluation system than those who have high and very high level of satisfaction from their institution. | |
3.2 The Obstacles to a Performance Evaluation System | |
The second research question of this study “What are the perceptions of academic members related | |
to the obstables to a performance evaluation system?” is attempted to be answered. Table 6 presents the | |
mean and standard deviation related to scores of academic members. | |
Table 6. | |
The General Mean Score of Academic Members Related To the Obstacles to a Performance Evaluation | |
System | |
The Obstacles Subscale | |
N | |
Minimum | |
Maximum | |
Mean Standard Deviation | |
104 | |
2,20 | |
3,80 | |
3,02 | |
,57517 | |
When Table 6 is reviewed, it is seen that the mean score of academic members is high ( =3,02), which | |
mean that academic members agree with the items in this subscale as obstacles to a performance | |
evaluation system. When it is analyzed item by item, the most frequently agreed obstacles are: | |
Higher education institutions’ current organizational structure (hieraricical organization, distribution | |
of authority and responsibilities, autonomy limits of units) = 3,80 | |
Academic members’ workload | |
=3,68 | |
Academic members least agree on the following obstacle to a performance evaluation system “cultural | |
structure (ignoring the problems, personal conflicts, exreme tolerance, discomfort of criticism, lack of | |
confidence, lack of competitive understanding at Eurapean standards) ( =1,91) | |
Table 7 presents the anaylsis results related to whether academic members’ perceptions of obstables | |
to a performance evaluation system differ depending on academic incentive status. | |
Table 7. | |
The T-Test Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance | |
Evaluation System Differ Depending On Academic Incentive Status | |
Standard | |
deviation | |
N | |
Academic Incentive | |
Yes, I take | |
52 | |
2,14 | |
,54 | |
No, I don’t | |
52 | |
2,74 | |
,51 | |
t | |
5,77 | |
p | |
,000 | |
When Table 7 is reviewed, it is seen that academic members’ perceptions of obstacles differ depending | |
on academic incentive status at a significant level [t(102)=5,77, p<.05)]. Academic members who take | |
academic incentive have significantly lower perceptions of obstacles to a performance evaluation system. | |
Table 8 presents the ANOVA results related to whether academic members’ perceptions of obstacles to a | |
performance evaluation system differ depending on academic title. | |
1309 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Table 8. | |
The ANOVA Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance | |
Evaluation System Differ Depending On Academic Title | |
Standard | |
Deviation | |
N | |
Sum of | |
Mean | |
squares df Square | |
2,98 | |
Between | |
.30181 | |
3 | |
Groups 11,089 | |
Assistant | |
Prof. | |
35 | |
3,42 | |
.36202 | |
Associate | |
31 | |
Prof. | |
3,38 | |
.63314 | |
Professor 13 | |
2,96 | |
.83254 | |
Research 25 | |
Assistant | |
Total | |
104 | |
3, | |
02 | |
3,696 | |
F | |
p | |
Source | |
variation | |
of | |
Assistant Prof.> | |
Research Assistant, | |
13,508 ,000 | |
Professor | |
Associate | |
Prof | |
>Research Assistant, | |
Professor | |
Within | |
Group | |
27,365 | |
100 ,274 | |
.61101 | |
When Table 8 is reviewed, it is seen that there is a statistically significant difference between groups | |
(p<.05); therefore, Games-Howel statistical post hoc test (for cases of unequal variations) is used for | |
identifying the source of variation. As a result of analysis, it is observed that the highest scores of obstacles | |
belong to Assistant Professors and Associate Professors, the lowest scores belong to Research Assistants | |
and Professors. Table 9 presents the ANOVA results related to whether academic members’ perceptions | |
of obstacles to a performance evaluation system differ depending on working experience. | |
Table 9. | |
The ANOVA Results Related To Whether Perceptions of Obstacles to a Performance Evaluation System | |
Differ Depending On Working Experience | |
Standard | |
Deviation | |
N | |
0-5 years 17 | |
6-10 years 38 | |
2,72 | |
,51 | |
3, 26 | |
,28 | |
14 | |
3,78 | |
,44 | |
More than | |
35 | |
15 years | |
2,88 | |
,39 | |
104 3,02 | |
,54 | |
11-15 | |
years | |
Total | |
Sum of | |
Mean | |
squares df Squares | |
Between | |
Groups | |
21,938 | |
3 4,67 | |
F | |
44,27 | |
p | |
,000 | |
Source | |
variation | |
of | |
11-15years, | |
6-10 years>05years, more | |
than 15 years | |
11-15 years | |
>6-10 years | |
Within Group | |
1,51 100 | |
16,51 | |
103 | |
1310 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
When Table 9 is reviewed, it is seen that there is a statistically significant difference; therefore GamesHowell post hoc test is done. As a result of test, it appears that academic members who have 0-5 year | |
working experience have the lowest scores from obstacles subscale, and those who have 11-15 years of | |
working experience have the highest scores from obstacles subscale. Academic members who work more | |
than 10 and less than 15 years think that almost all items in the subscale really pose an obstacle to a | |
performance evaluation. | |
Table 10 presents the ANOVA results related to whether academic members’ perceptions of obstacles | |
to a performance evaluation system differ depending on satisfaction level from institution. | |
Table 10. | |
The ANOVA Results Related To Whether Academic Members’ Perceptions of Obstacles to a Performance | |
Evaluation System Differ Depending On Satisfaction Level From Institution | |
Standard | |
Deviation | |
N | |
10 | |
3,36 | |
,31 | |
Moderate 35 | |
3,58 | |
,32 | |
42 | |
2,62 | |
,47 | |
Very high 17 | |
2,58 | |
,11 | |
Low | |
High | |
Total | |
104 3,02 | |
Sum of | |
squares | |
Between | |
Groups | |
Within | |
Group | |
5,97 | |
13,08 | |
,43859 | |
Mean | |
Squares | |
df | |
3 | |
100 | |
1,991 | |
F | |
14,38 | |
P | |
Source of | |
variation | |
Low,Moderate | |
Level>High, | |
,00 | |
Very high | |
,138 | |
103 | |
When Table 10 is reviewed, it is seen that there is a statistically significant difference; therefore | |
Games-Howell post hoc test is done. As a result of test, it appears that academic members who have high | |
and very high level of satisfaction from their institutions have significantly lower scores of obstacles to a | |
performance evaluation system. | |
3.3 Qualitative Analysis of Academic Members’ General Views Related to Performance Evaluation | |
System | |
Within context of this study, qualitative data have been obtained from academic members related to | |
their views about performance evaluation system. The data collected have been analyzed with content | |
analysis. As a result of content analysis, there emerges the following six themes: “attitude towards | |
performance evaluation theme, priorities of academic members, positive effects of performance | |
evaluation, negative effects of performance evaluation, obstacles to performance evaluation and | |
suggestions for obstacles”. | |
1. | |
What do you think about making a periodic and data-based assessment of academic | |
members? | |
There is a difference of opinion among academic members in Educaton Faculties. Although it appears | |
that most of the academic members support a perodic and data-based assessment, there are some other | |
academic members who have negative attitudes and criticism for such a system by asserting that it is wide | |
open to abuse. Table 11 presents the analysis of qualitative data about this theme. | |
1311 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Table 11. | |
The Analysis of Data Related to Making a Periodic Performance Evaluation | |
Theme | |
Attitude | |
Towards | |
Performance | |
Evaluation | |
Description | |
Having positive, negative or recessive | |
attitudes about performance evaluation | |
Codes | |
Frequency | |
Adopters | |
28 | |
Doubters | |
12 | |
Resistants | |
10 | |
When Table 11 is reviewed, it is seen that most of the academic members adopt such a system. They | |
claim that performance evaluation would support their development in many aspects. The codes related | |
to views of academic members are given below: | |
Adopters: “I believe that performance evaluation will bring about good results in assuring quality in | |
higher education”(K6) | |
Doubters: “It is very nice to be supported by system. But is it all about publishing? It is a matter of | |
question for me how will this evaluation be done, and by whom?” (K5) | |
Resistants: “performance can not be assessed. It is ridiculous to compare individuals. It has been tried | |
many times before, but it is found out to be useless” (K13) | |
2. What criteria should be assessed within performance evaluation? Could you order these criteria | |
according to significance level for you? | |
Academic members in Education Faculties express a variety of views related to what criteria should be | |
included within evaluation. They express significance level of these criteria, which provide valuable | |
qualitative data. Table 12 presents the analysis of qualitative data about which criteria should be included | |
within performance evaluation. | |
Table 12. | |
The codes related to academic members’ preference for performance evaluation criteria | |
Theme | |
Priorities of | |
Academic | |
Members | |
Descriptors | |
The criteria which | |
should be included | |
within performance | |
evaluation | |
and | |
ordering | |
these | |
criteria according to | |
significance level | |
Codes | |
Frequency | |
Research and publications | |
17 | |
The quality of instruction | |
10 | |
Undergraduate and postgraduate advisory | |
8 | |
Workload (course hours etc.) | |
6 | |
Jury memberships (Jury of thesis, Jury of | |
Associate Professor etc.) | |
Perosnal interest and career | |
5 | |
4 | |
When Table 12 is reviewed, it is seen that academic members in the first place want their research | |
and publications to be assessed, and then their teaching quality during classroom. According to them, | |
teaching quality includes methods they use, the quality of presentation of content, material use and every | |
1312 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
piece of effort which makes learning permanent. Academic members also want their personal interests | |
and career to be assessed by the system. The codes related to views of academic members about | |
performance evaluation criteria are given below: | |
Research and Publications: “The most important criteria in a performance evaluation system must be | |
academic members’ publications in terms of quantity and quality” (K6) | |
The quality of instruction: “Instruction is as important as academic studies. Classroom work, expecially | |
activities and teaching methods can be assessed” | |
Undergraduate and postgraduate advisory: “We are not just researchers, but also advisors and this | |
issue is ignored by the system. For instance thesis advisory is a tedious job and should be included within | |
evaluation”(K22) | |
Workload: “There is no time left for other things rather than teaching courses. An academic member | |
should be assessed with his/her courses, efforts he/she makes for students and administration. Academic | |
members who teach more courses are the best academicians.” (K30) | |
3. What are the positive and negative consequences of making a performance evaluation of academic | |
members? | |
Academic members in Education Faculties put emphasis on both positive and negative impacts of a | |
performance evaluation system. In “Positive Impacts” theme, the codes appear as “motivation”, “financial | |
support”, “search of quality”, “support of development via self-criticism”, “continuity of dynamism”; | |
however, in “Negative Impacts” theme, the codes appear as “intra-institutional rivalry”, “academic | |
dishonesty”, “cause of stress”, “domination of quantity over quality”. Table 13 presents the analysis of | |
qualitative data about positive and negative impacts of performance evaluation. | |
Table 13. | |
The codes related to academic members’ views about positive and negative impacts of performance | |
evaluation | |
Theme | |
Description | |
Codes | |
Frequency | |
Motivation | |
12 | |
Financial Support | |
8 | |
Search of quality | |
8 | |
Supporting development via selfcriticism | |
4 | |
The continuity of dynamism | |
4 | |
Positive Impacts | |
Positive consequences of | |
performance evaluation | |
system | |
Negative Impacts | |
Negative consequences of | |
performance evaluation | |
Intra-institutional rivalry | |
7 | |
Academic dishonesty | |
6 | |
Cause of stress | |
6 | |
Domination of quantity over quality | |
8 | |
When Table 13 is reviewed, it is seen that academic members express 36 views under 5 codes related | |
to positive impacts theme and 27 views under 4 codes related to negative impacts theme. The codes | |
1313 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
related to views of academic members about positive and negative impacts of performance evaluation | |
are given below: | |
Motivation: “This system motivates academic members to conduct new studies” (K24) | |
Search of quality: “academic members who are subject to an evaluation feel the need to pursue quality. | |
No one wants to be called a bad teacher” (K9) | |
The continuity of dynamism: “In public universities, especially old academic members are resistant to | |
renew themselves. This situation leads to fossilization in higher education; because there is no evaluation | |
and sanction. Evaluation results in dynamism”(K29) | |
Intra-institutional rivalry: “it prevents cooperation, grows jealousy, a competitive environment | |
increases egoist behaviors rather than productivitiy” (K36) | |
Academic dishonesty: “conducting research with fake data, request others to write his/her name as | |
the last name in studies with no effort” | |
Domination of quantity over quality: “publish publish publish, it is enough. There are lots of academic | |
members who make research but what about the quality? No one asks for this question. No one talks | |
about quality now” | |
4. What are the obstacles to performance of academic members in higher education and what do you | |
suggest for overcoming these obstacles? | |
Academic members in Education Faculties list a number of obstacles to a performance evaluation | |
system and then offer some suggestions for overcoming these obstacles. In “Obstacles” theme, the codes | |
appear as “intensive workload (courses, advisory, administrative duties)”, “efforts are not appreciated”, | |
“cumbersome organizational process”, “lack of internal motivation” and “too crowded classrooms”; in | |
“Suggestions” theme, the codes appear as “reducing course loads of academic members”, “institutional | |
support for academic efforts and research publishings”, “evaluation criteria determined by universities”, | |
“perodical budget allocation to academic members from The Council of Higher Education” and lasty | |
“employing more officers”. Table 14 presents the analysis of qualitative data related to academic | |
members’ views about obstacles and suggestions. | |
Table 14. | |
The Codes Related To Academic Members’ Views about Obstacles to a Performance Evaluation and Their | |
Suggestions | |
Theme | |
Description | |
Codes | |
Frequency | |
Obstacles | |
Intensive | |
workload | |
administrative duties) | |
Obstacles | |
to | |
a | |
performance evaluation | |
system | |
(courses, | |
advisory, | |
18 | |
Efforts are not appreciated | |
12 | |
Cumbersome organizational process | |
10 | |
Too crowded classrooms | |
8 | |
Lack of internal motivation | |
6 | |
Reducing courseload of academic members | |
11 | |
Suggestions | |
1314 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Suggestions | |
for | |
overcoming obstacles | |
institutional support for academic efforts and | |
research publishings | |
11 | |
Evaluation criteria should be determined by | |
universities | |
10 | |
Periodical budget allocation to academic members | |
from YÖK | |
8 | |
Employing more officers | |
4 | |
When Table 14 is reviewed, it is seen that academic members express 48 views under 5 codes in | |
“Obstacles” theme and 44 views under 5 codes in “Suggestions” theme. The codes related to views of | |
academic members about obstacles to a performance evaluation and their suggestions are given below: | |
Intensive workload: “it takes time to make something of high quality. There is left no time for academic | |
members. They teach courses, take care of students or are busy with administrative duties.” (K25) | |
Lack of internal motivation: “there are lost of things in academic life which decreases motivation. If an | |
individual starts this profession for some other reasons, he/she has low level of motivation for selfdevelopment”(K19) | |
Cumbersome organizational process: “burecracy and very slow running process put an obstacle to | |
performance while making projects or other studies” (K8) | |
Institutional support for academic efforts and research publishings: “the most important suggestion | |
for performance increase is that academic members should be supported by institution. This might include | |
research, publication, congress participation or educations for self-development” (K7) | |
Employing more officers: “If the institution employs more officers, academic members will be freed | |
from paperworks” (K21) | |
Periodic budget allocation to academic members from YÖK: “The Council of Higher Education should | |
allocate a certain amount of budget for academic members, ask them to plan their budget use and make | |
budget-product comparison at the end of period” (K10) | |
Discussion & Conclusion | |
In accordance with findings of this study, it is observed that there is a difference of opinion among | |
academic members related to performance evaluation system. It is seen that academic members in | |
education faculties who have more than 15 years of working experience and highly satisfied have lower | |
expectations about performance evaluations than others. When academic members’ views are reviewed | |
depending on academic title, it is seen that research assistants and assistant professors have positive | |
attitude towards performance evaluation, while associate professors and professor show low level of | |
positive behaviors towards performance evaluation. Accordingly, Stonebraker and Stone (2015) | |
emphasize that there is an increase in the average age of academic members with the elimination of | |
mandatory retirement and this raises some concerns about the impact of this aging on productivity in | |
class. They claim that age has a negative impact on student ratings of faculty members that is strong | |
across genders and groups of academic disciplines. However, this negative effect begins after faculty | |
members reach their mid-forties. This explains the reason for negative attitudes of professors towards | |
performance evaluation system. This finding is also parallel with Esen and Esen’s (2015) study findings. | |
They find out in their study that the there is a decrease in positive perception of academic members about | |
the positive impacts of performance evaluation as there is a progress in academic titles. Bianchini, Lissoni, | |
and Pezzoni, (2013) emphasize that the students tend to evaluate professors’ performance more | |
negatively than assistant professors. From a general point of view, it appears in this study that there is a | |
hesitation and lack of confidence in academic community about the efficiency of a performance | |
evaluation system. | |
1315 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
This study indicates that academic members expect from a performance evaluation system to develop | |
a consensus about the criteria of an effective academician, positively affect professional development of | |
academic members; on the other hand increase workload of academic members and lead to intrainstitutional tension. Qualitative analysis also shows that nearly half of academic members support | |
performance evaluation while a certain number of academic members hesitates about how it will be | |
applied and by whom. The academic members in the faculties of education claim that performance | |
evaluation increases motivation and search for quality; but it may also lead to competition within | |
institution and academic fraud. Traditionally, performance evaluation in faculties tend to focus on | |
research indicators (Bogt and Scapens, 2012); therefore higher education institutions plan their | |
evaluations considering governmental funding, research awards and high rankings which all lead to an | |
evaluation which only favours academic members with top publications (Douglas 2013, Hopwood 2008). | |
These findings differ to a certain extent from studies of Tonbul (2008); Esen and Esen (2015) and Başbuğ | |
and Ünsal (2010). Tonbul (2008) asserts that academicians have higher expectations from performance | |
evaluation approach because they think evaluation approach helps to identify the obstacles to an effective | |
performance and recognize one’s own deficiencies. Accordingly, Esen and Esen (2015) emphasize that | |
academic members expect from a performance evaluation system to develop a qualified organizational | |
culture, provide continuity of organizational innovation, positively affect professional development of | |
academic members and helps to recognize own deficiencies. This study also indicates that the most | |
important obstacles to performance evaluation appear as organizational process of higher education | |
institutions, intensive workload and lack of intrinsic motivation. Within the scope of the proposals, they | |
request for employment of more officers and institutional support for their publications and academic | |
studies. As a result of Tonbul (2008)’s study, he lists the obstacles to performance evaluation as | |
inadequacy of organizational opportunities, the organizational culture and uncertainty in evaluation | |
criteria. In study of Esen and Esen (2015), it is found out that the most important factors which put | |
obstacle to performance evaluation are inadequacy of organizational opportunities, current | |
organizational process of higher education institutions and academic promotion criteria. Also, Başbuğ and | |
Ünsal (2010) claim that the lack of physical conditions for scientific research is the most significant factor | |
which puts obstacle to academic performance. | |
Academic members in this study emphasize that they prefer to be evaluated according to following | |
criteria: first of all for their academic publications and research, secondly their quality of instruction and | |
thirdly their counseling service to postgraduates. This finding is supported by Braunstein ve Benston | |
(1973) as they find out that research and visibility are highly related in evaluation of performance of | |
academic members, but effective teaching is only moderately related to these performance criteria. In | |
practice, academic members’ performance of instruction is mostly done by students. Arnăutu and Panc | |
(2015) criticize this situation by claiming that research and scientific productivity, administrative capacity | |
and reputation are not presented in the evaluation made by students, therefore they do not have | |
information necessary to evaluate academic members’ role within faculty. Ünver (2012) conducts | |
research about evaluation of academic members by students and it comes out that most of the academic | |
members think that students fail to make an objective evaluation of academic members; therefore, they | |
prefer making academic studies rather than focusing on students’ views about their teaching | |
performance. Turpen, Henderson, and Dancy (2012) state that the faculties focus on the students' test | |
performance and academic success as quality criteria while higher education institutions focus on | |
quantitative scoring of students when evaluating the quality of teaching. Within this respect, the quality | |
of the measurement tools is very important for assessment of teaching performance. Kalaycı and Çimen | |
(2012) examine the assessment tools used in the process of evaluating the instructional performance of | |
academicians in higher education institutions and find out that quality of instruction and course | |
evaluation surveys are developed without any particular approach and twenty percent of items are | |
inappropriate according to item construction and writing rules, therefore these assessment tools fail to | |
evaluate academic members’ performance.It is shown in some studies that the assessment of the | |
performance of the instructors by the students may be related to the quality of the teaching as well as | |
the qualities of physical attraction and comfort of the course which are not related to the teaching | |
(Hornstein, 2017; Tan et al., 2019). Shao, Anderson, and Newsome (2007) claim that academic members | |
1316 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
request peers/colleagues’ considerations for performance assessment and other criteria such as class | |
visits, preparation for class, follow-up of current developments in the field. There are other factors | |
affecting performance evaluations of academic members. Özgüngör and Duru (2014) find out that there | |
is deterioration in the perceptions of the instructors as there is an increase in course load, instructors’ | |
experience, and the total number of students taking instructor’s course. It comes out that the students of | |
the Faculty of Education tend to give higher scores to the faculty members than the students of all other | |
faculties, whereas the students of the Faculty of Technical Education and Engineering give lower scores | |
to the faculty members. It is also revealed that faculty members with a course load of 45 hours or more | |
are evaluated more negatively than other faculty members with less course load. In Faculty of Education, | |
the faculty members with 60-100 students receive the worst performance evaluations. Arnăutu and Panc | |
(2015) refer to students and academic members’ different expectations from each other; claiming that | |
students focus on communicative issues and expect from professors a good relationship and personalized | |
feedback, while professors believe that the attention should be focused on the quality of the education | |
process (such as information update). | |
In this study, it is found out that the performance evaluation of the academic members creates a | |
consensus on the criteria of the effective academic member and positively affects the professional | |
development of the academic members. These qualifications enhance the professional quality of | |
academic members working in the faculties of education and provide a sustainable professional | |
development process. Filipe, Silva, Stulting and Golnik (2014) emphasize that sustainable professional | |
development improved through performance evaluation is not only limited to educational activities, but | |
also develops qualities such as management, teamwork, professionalism, interpersonal communication | |
and accountability. Açan and Saydan (2009) attempt to determine the academic quality characteristics of | |
the academic members and come up with those criteria: “the teaching ability of the instructor, the | |
assessment and evaluation skills of the instructor, the empathy of the instructor, the professional | |
responsibility of the instructor, the instructor's interest in the course and the gentleness of the instructor”. | |
Esen and Esen (2015) state that the performances of faculty members in the United States are generally | |
based on four factors which include instruction, research (professional development), community service | |
and administrative service. Among them, they emphasize that the most important ones are the instruction | |
and research dimension. Performance evaluation results are used for making decisions about whether | |
they are appropriate in their current position, promoting them or extending working periods of academic | |
members. | |
In this study, it is seen that academic members who do not take academic incentive have lower | |
expectations than those who deserve such a payment. Kalaycı (2008), regarding performance evaluation | |
system in Turkey, claim that it is not even in preparation stage compared to global practices. However, | |
there has occurred a number of promising developments in this area in Turkish higher education. Focusing | |
on this problem, the Council of Higher Education in Turkey decided to create Higher Education Quality | |
Council in 2015 to provide assurance that “a higher education institution or program fully fulfills the | |
quality and performance processes in line with internal and external quality standards”. In parallel, the | |
Academic Incentive Award Regulation has been put into practice in order to evaluate the performance of | |
academic staff working in higher education according to standard and objective principles, to increase the | |
effectiveness of scientific and academic studies and to support academic members. It seems to succeed | |
its aim because in this study academic members who take incentive are highly motivated and they make | |
consensus on the criteria of the effective faculty member which are in compliance with the academic | |
incentive award. | |
It is important to make performance evaluation in higher education in terms of increasing efficieny of | |
services; however, it is also important to determine which criteria will be used and assure reliability of | |
assesment. In this respect, Çakıroğlu, Aydın and Uzuntiryaki (2009) claim that there are very promising | |
research about the reliability of experienced academic members’ evaluations and they emphasize that | |
the following criteria should be considered within the context of evaluations: | |
1317 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
• Data about instructional performance should be collected from a variety of sources (colleagues, | |
students, advisors, postgraduate students, graduates etc.) and in a variety of formats (student evaluation | |
surveys, student interviews, observation results, course materials, student products etc.), | |
• clearly identifying evaluation criteria, | |
• informing evaluators about how to make evaluation process | |
• selecting evaluators randomly from candidates who meet criteria of being evaluator | |
• jury should include at least 3, at most 5 members. | |
To sum up, academic members’ views about performance evaluation are analyzed and it is recognized | |
that there is no consensus among academic members about performance evaluation. Academic members | |
are aware of positive impacts of such a system; however, they also have concerns about the realiability | |
of assessment, evaluation criteria, evaluation process and evaluators. This study indicates that the most | |
important criteria for academic members which should be included in evaluation are research and | |
publication, quality of instruction and undergraduate & postgraduate advisory. Among positive impacts | |
of performance evaluation system, it stands out that performance evaluation motivates academic | |
members, provides financial support and leads to search of quality; however, academic members put | |
emphasis on negative impacts of such a system which include intra-institutional competition and | |
academic fraud. Academic members make some suggestions for overcoming obstacles which include | |
reducing course loads, providing more institutional support for academic efforts, allocation of a certain | |
amount of budget for each member from the Council of Higher Education and employing more officers. | |
There is a variety of requests about performance evaluation criteria; however, it is important to establish | |
an effective evaluation system based on monitoring of peformance based on multiple data types in terms | |
of improving the quality of higher education and making systematic improvements. | |
As a result of this research, it is recommended that higher education institutions increase the | |
objectivity and efficiency in performance evaluations and create human resources services within | |
faculties. Also, they should design sustainable strong performance plans, use a holistic evaluation cycle, | |
provide consultancy services to academic members, students and internal stakeholders on how to | |
improve performance, prepare understandable and objective guidelines for performance evaluators, and | |
develop institutional culture which specifies that feedback is valuable not judgmental. | |
1318 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Türkçe Sürümü | |
Giriş | |
Performansın sistematik şekilde ölçülmesi ve değerlendirilmesi her tür organizasyonun şeffaf, etkili ve | |
başarılı bir işleyiş için üzerinde hassasiyetle durduğu bir boyuttur. Özel veya kamu yükseköğretim | |
kurumlarının çoğu sistematik bir değerlendirme için çeşitli çalışmalar yapmaktadırlar. Tüm yükseköğretim | |
kurumları, küresel ölçekteki artan rekabet ve şeffaflığa ilişkin toplumsal baskı gibi nedenlerle üniversiteler | |
standart performans göstergeleri belirlemenin yanı sıra vizyon, misyon ve stratejilere erişme düzeyini | |
ortaya koyma ihtiyacı hissetmektedirler (Hamid, Leen, Pei & Ijab 2008). Özellikle bugünün rekabetçi | |
ortamında, daha iyi bir değerlendirme sistemi üniversitelere yön gösterici avantajlar sunmakta ve kendi | |
çalışanlarını ve işleyişi değerlendirmelerine fırsat tanımaktadır. Devlet üniversitelerinin kamu tarafından | |
finanse edilmesi, baskı altında kalınmadan verimli bir performans değerlendirme yapılması için uygun | |
ortam sağlamaktadır. | |
Alan yazına bakıldığında yükseköğretim kurumlarının hesap verebilirliği ile ilgili çeşitli tartışmaların | |
olduğu görülmektedir. Bu tartışmaların temeli, kurumların performansının değerlendirilmesi ve sonuçların | |
halka açık biçimde diğer paydaşların da katılımına imkan tanıyacak şekilde yayınlanması ile ilgilidir. | |
Yükseköğretime getirilen başka bir eleştiri de üniversitelerin performansının en önemli belirleyicilerinden | |
olan öğretim üyelerinin, dünyadan kopuk “kapalı bir toplum” olarak bir “fildişi kule”de yaşadıkları | |
iddiasıdır (Glaser, Halliday, & Eliot, 2003). Esen ve Esen (2015) bu eleştirileri şu şekilde özetlemektedir: | |
• Öğretim üyelerinin yaptıkları çalışmaların toplumsal sorunlara dönük olmadığı, | |
• fazlasıyla teorik olduğu, | |
• toplumsal kaynakların boşa harcandığı yönündeki eleştiriler (Etzkowitz, Webster, Gebhardt, & | |
Terra, 2000). | |
• araştırmaların toplumsala dönüştürülmesi yerine, tek taraflı ve sadece o alanla sınırlandırılmış | |
olarak yürütülmesi, | |
• akademisyen kimliğinin bulunduğu üniversite ya da yönetsel yapıyı ürkütmekten tedirgin, özerkliği | |
daralmış bir kimliğe dönüşmesi (Elton, 1999). | |
Yükseköğretim kurumları, her ne kadar özerk olarak görev yapsa da bireysel organizasyon ve kuruluşlar | |
gibi ele alınmamalıdır. Yükseköğretim kurumları ait oldukları toplumu, ekonomik yapıyı ve sosyal yaşamı | |
etkileme gücüne sahip kurumlardır. Dolayısıyla, üniversiteler fildişi kuleler yerine bilim, toplum ve ulusu | |
birarada ele alıp uluslararası kalite standartlarında performans göstermeli ve kariyer gelişimi yerine | |
toplumsal faydayı ön planda tutmayı vicdani bir sorumluluk olarak hissetmelidirler. Üniversitelerde | |
performans değerlendirmenin yapılması, çalışanlara hesap verebilirlik (bilimsel araştırmalar için sürekli | |
iyileştirme faaliyetleri), devlete hesap verebilirlik (kaynakların verimli ve üretken kullanımı), öğrenci ve | |
topluma hesap verebilirlik (kapsamlı eğitsel deneyimler sunma, yaşam kalitesini artıracak mesleki | |
eğitimler sunma, toplumun işgücü ihtiyacını karşılama) açısından gereklidir (Vidovich & Slee, 2001). Ayrıca, | |
yükseköğretimde performans değerlendirmeyi gerekli hale getiren küresel gelişmeleri UNESCO (2004) | |
“girişimci üniversiteler, şirket üniversiteleri gibi yeni kurumlar; uzaktan, sanal ve özel şirketler gibi yeni | |
eğitim hizmeti dağıtım türleri; yeterlilik ve sertifikaların daha fazla çeşitlenmesi; yurtdışına yönelik artan | |
öğrenci, program, tedarikçi ve proje hareketliliği; yükseköğretim sunumunda artan özel yatırımlar şeklinde | |
sıralamıştır. Bu gelişmeler kalite, erişim, çeşitlilik ve finansman açısından yükseköğretime yönelik önemli | |
çıkarımlardır (akt.Tezsürücü & Bursalıoğlu, 2013). Yükseköğretimde performans değerlendirmesi hem | |
çeşitli süreçleri hem de ürünleri kapsamaktadır. Temelinde, performans değerlendirmesi kalite açısından | |
kabul edilebilir minimum düzeyi göstermekte ve bireylerin/kurumların gelişmeye açık yönlerini | |
1319 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
tanımalarına olanak sağlamaktadır. Birey veya kurumlar, sadece gelişmeye açık yönlerinin farkına | |
varmamakta; mevcut haliyle hangi yönlerde güçlü olduğunu da saptamaktadırlar. Batool, Qureshi & Raouf | |
(2010), performans değerlendirmesi denildiğinde bu kavramın bütün boyutları kapsamayabileceğini, | |
kurumsal performans değerlendirmesinin, akademik programların, derslerin veya mezunların kalitesini | |
ölçmekle aynı anlama gelmediğini belirtmişlerdir. Kurumsal performans değerlendirme daha çok kurumun | |
kalite ve etkililiği açısından mevcut durumunun değerlendirilmesi demek olduğunun altını çizmişlerdir. | |
Bu çalışma kapsamında yükseköğretimde performans değerlendirmesi «öğretim elemanlarının | |
öğretimsel rollerine ilişkin mesleki yeterliliğinin ve aynı zamanda kurumsal hedeflerin yerine getirilmesine | |
katkı düzeyinin ölçülmesi» olarak tanımlanabilir. Öğretim elemanlarının araştırma, akademik hizmet, | |
eğitim-öğretim, yayın gibi çeşitli çalışmalarının değerlendirilmesi, geri dönüt verilerek bireylerin | |
gelişiminin desteklenmesi ve çalışmalarının takdir edilmesi, performans değerlendirme sisteminin varlığını | |
zorunlu hale getirmektedir. Vincent ve Nithila (2010), yükseköğretimde gerçekleştirilecek bir performans | |
değerlendirmesi yaklaşımının sağlayacağı avantajlar arasında şunları dile getirmektedir: | |
• Bireyin gelişim ve ilerlemesinin gerçekçi hedeflere dayanmasını sağlar. | |
• Bireyin hedefleriyle kurumun hedeflerini birbirine uygun hale getirir. | |
• Organizasyon içindeki bireylerin zayıf yönleri ve güçlü yönlerini teşhis eder. | |
• İyileştirme amaçlı geri dönüt mekanizması işlevi görür. | |
• İhtiyaç duyulan eğitim ve kursları belirlemeye yarar. | |
• Kurumun eğitsel, toplumsal, ekonomik ve siyasal olarak daha büyük rol ve sorumluluklar almasını | |
sağlar. | |
Tonbul (2008) ise performans değerlendirme uygulamalarının, örgütsel hedeflerin gerçekleşme | |
düzeyini artıracağı, kurumsal işleyişte aksayan yönlerin saptanmasını kolaylaştıracağı, örgütsel iklim ve | |
kurum kültürünün çalışanlar üzerindeki etkisine ilişkin özgül veriler sağlayabileceği ve bu doğrultuda | |
örgütsel performansın artacağını belirtmiştir. İş akışı ve organizasyonla ilgili süreçlerde, geribildirim | |
düzeneğini etkin ve işlevsel biçimde işe koşan örgütlerin daha başarılı ve kalıcı oldukları görülmektedir | |
(Latham & Pinder, 2005). Kalaycı (2009) yükseköğretimde değerlendirme yapmadan başarıyı veya | |
başarısızlığı yordama olasılığının düşük olduğunu; fakat akademisyenlerin öğretim performanslarının | |
değerlendirilmesiyle birlikte öğrenme-öğretme ortamlarının herkesçe sorgulamaya açık hale geleceğini, | |
bu durumun ise oldukça zorlayıcı olduğunu ifade etmiştir. Bununla ilgili olarak, Kim ve diğerleri (2016) pek | |
çok profesörün eğitimcilik rolüne daha düşük önem verdiği, araştırmacı rolüne daha büyük öncelik | |
verdiğine; çünkü fakülte değerlendirme sisteminin araştırmaya dayalı olduğuna vurgu yapmışlardır. | |
Performans değerlendirme sadece zorunluluk ve formalite amacıyla yapılmamalıdır. Bu tehlike özellikle | |
devlet üniversiteleri için ihtimal dâhilindedir. Kalaycı ve Çimen (2012) “artık devlet üniversitelerinin de | |
“kalite süreçleri uygulamalarını” formaliteyi tamamlamak amacıyla değil, gerçekten kaliteyi yükseltmek ve | |
rekabette öne çıkmak amacı ile yürütmesi gerektiğini, devlet üniversitelerinin de kalite çalışmalarına | |
gereksinimi olduğunu” belirtmişlerdir. | |
21. yüzyılda üniversitelerini performans değerlendirmeye zorlayan sebepler arasında kurumsal itibar, | |
uluslararasılaşma ve dünya üniversite sıralamaları yer almaktadır. Kurumsal itibarı belirleyen pek çok | |
faktör bulunmaktadır. Higher Education Authority’nin (2013) araştırma ve öğretimle ilgili yaptığı itibar | |
araştırmasında; akademisyenlerin uzmanlık alanlarındaki bölümlerle yakından ilgilendikleri ve bilgi sahibi | |
oldukları ortaya çıkmıştır. Kurumun uluslararası ve ulusal hem öğretim elemanı hem öğrenci barındırması | |
kurumun global kimliğe sahip olduğu ve küresel markette rekabete hazır olduğu izlenimi verdiği ifade | |
edilmiştir (O'Connor ve diğerleri, 2013). Kurumun uluslararası öğretim elemanı, öğrenci bulundurması | |
başlı başına yeterli değildir. Bir üniversitenin kalitesi ve niteliğine ilişkin en önemli göstergelerden biri | |
öğretim elemanlarının performansı ve bununla doğrudan ilgili olarak verilerin derslerin kalite düzeyidir. | |
Öğretim elemanı kalitesi, eğitimin kalitesini doğrudan etkileyen faktörlerin başında gelmekte, öğretim | |
elemanlarının performanslarının değerlendirilmesi kalite kontrolünün güvencesi olarak görülmektedir | |
(Açan ve Saydan, 2009). | |
1320 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Yükseköğretim kurumları da dâhil olmak üzere kurumsal anlamda en sık kullanılan performans ölçüm | |
ve değerlendirme tekniklerine bakıldığında bunların «Öz Değerlendirme, Temel Performans Göstergeleri | |
(TPG), Göreceli Değerlendirme, Takdir Etme, Altı Sigma, Toplam Kalite Yönetimi» olduğu görülmektedir | |
(Çalışkan, 2006; Kalaycı, 2009; Paige, 2005). Öğretim üyelerinin bireysel değerlendirmesi kapsamında | |
burada belirtilen tekniklerin hepsi uygun veya uygulanabilir olmayabilir. Örneğin, performans | |
karşılaştırması tekniği bir bireyin aynı bağlamda öncü/örnek/lider kabul edilen bir başkasıyla | |
karşılaştırılarak, mevcut performansının değerlendirilmesini içermektedir. Anlaşıldığı üzere, bu teknik | |
mükemmeliyet arayışında en iyi örneklerin rehberlik edici yönünü kullanmak isteyen bir organizasyon için | |
uygun olabilir; fakat bütün personelin değerlendirilmesinde uygun değildir; çünkü her birey çalışma şekli | |
açısından ve kendini geliştirme yöntemi olarak birbirinden ayrılmaktadır. Bu teknikler arasında örneğin, | |
TPG tekniği, yükseköğretimde öğretici konumunda olanların performanslarını değerlendirmede kullanmak | |
için uygundur. TPG tekniğinde değerlendirilecek performans göstergelerinin işe vuruk tanımları yapılır. İşe | |
vuruk tanımda önemli olan, bir kavramın hangi işlemlerle tanımlandığının belirtilmesidir.Küresel düzeyde | |
gerçekleştirilen performans ölçümleri ve değerlendirme teknikleri her ülkenin yükseköğretim | |
kurumlarında birebir aynı şekilde uygulanmayabilir. Türkiye'de performans değerlendirmeye ilişkin | |
mevcut uygulamalara bakıldığında “öğretim üyelerinin sadece araştırma ve yayın etkinlikleri konusundaki | |
performansını nicel olarak ölçüldüğü ya da subjektif yargılar temelinde değerlendirme yapıldığı” | |
görülmektedir (Esen ve Esen, 2015). Bununla ilgili olarak Yükseköğretim Kurumu Türkiye’deki | |
akademisyenlerin akademik faaliyetlerini desteklemek ve motivasyonlarını artırmak amacıyla 2015 yılında | |
akademik teşvik uygulaması başlatmıştır. Bu yönetmelikte “Devlet yükseköğretim kurumları kadrolarında | |
bulunan öğretim elemanlarına yapılacak olan akademik teşvik ödeneğinin uygulanmasına yönelik olarak, | |
bilim alanlarının özellikleri ve öğretim elemanlarının unvanlarına göre akademik teşvik puanlarının | |
hesaplanmasında esas alınacak faaliyetlerin ayrıntılı özellikleri ve bu faaliyetlerin puan karşılıkları ile bu | |
hesaplamaları yapacak komisyonun oluşumu” hakkında detaylı değerlendirme ölçütleri yer almaktadır | |
(Akademik Teşvik Ödeneği Yönetmeliği, 2015). Akademik teşvik sistemi ile birlikte öğretim elemanlarının | |
ulusal ve veya uluslararası yürüttükleri proje, araştırma, yayın, sergi, aldıkları patent, çalışmalarına yapılan | |
atıflar, almış oldukları akademik ödüller esas alınarak Yükseköğretim Kurulu tarafından performansları | |
değerlendirilmektedir. Bunun sonucunda yeterli çalışmayı gerçekleştiren öğretim elemanları maddi açıdan | |
desteklenmektedirler. Alan yazındaki öğretim elemanlarının performans değerlendirmelerinin nasıl | |
yapıldığına bakıldığında ise çeşitli yöntemlerin olduğu görülmektedir. Türkiye'de öğretim üyelerinin | |
performanslarını değerlendirmede kullanılabilecek birbirinden bağımsız çeşitli yöntemler şunlardır: | |
a. Sicil sistemi | |
b. Akademik yükseltilme ve atanma kriterler | |
c. Öğretim üyesi değerlendirme anketleri | |
d. Yıllık sunulan faaliyet raporları | |
e. Akademik teşvik uygulaması | |
f. Öğrenci anketleri | |
(Esen ve Esen, 2015) | |
Yükseköğretimde performans değerlendirmenin yapılması, verilen hizmetlerin etkililiğini artırma | |
açısından oldukça önemlidir; fakat yapılacak performans değerlendirmenin hangi kriterlere göre | |
yapılacağı ve güvenirliği en az onun kadar önemlidir. Bu konuda, Çakıroğlu, Aydın ve Uzuntiryaki (2009) | |
“deneyimli öğretim üyelerinin yaptığı değerlendirmelerin güvenirliği konusundaki araştırmaların oldukça | |
ümit verici” olduğunu belirtmişler ve aşağıdaki kriterlerin göz önünde bulundurulması gerektiğine vurgu | |
yapmışlardır: | |
• öğretim performansına yönelik verilerin çeşitli kaynaklardan (meslektaş, öğrenci, danışman, | |
lisansüstü öğrencisi, mezun gibi) toplanması ve farklı formatlarda (öğrenci değerlendirme anketleri, | |
öğrenci görüşmeleri, gözlem sonuçları, ders materyalleri, öğrenci ürünleri vb.) olması, | |
1321 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
• değerlendirme kriterlerinin açıkça belirlenmesi, | |
• değerlendirilecek kişilere nasıl değerlendirileceğine yönelik bilgilendirme yapılması, | |
• değerlendiricilere nasıl değerlendirme yapacaklarına yönelik bilgilendirme yapılması, | |
• aday konumunda olan kişilerin değerlendirici rolü almaması, | |
• değerlendiricilerin kriterleri sağlayanlar arasında rastgele yöntemle seçilmesi, | |
• jürinin en az 3 en çok 5 üyeden oluşması. | |
Öğretim elemanlarının performanslarının değerlendirilmesinin temelinde üniversitelerin etkililiğini | |
artırma amacı yatmaktadır. Bu çalışmada eğitim fakültelerinin tercih edilmesinin sebebi özellikle “Bologna | |
Süreci” kapsamında YÖK’ün eğitim fakültelerinde akreditasyon çalışmaları üzerinde önemle durmasıdır. | |
Üniversitelerde eğitim fakültelerinde yürütülen akreditasyon çalışmalarında akademik personelin | |
performans değerlendirmeye yönelik beklentilerinin ve engellerin belirlenmesi amaca ulaşma bakımından | |
önemlidir. Türkiye’de bulunan yükseköğretim kurumları bir kalite göstergesi olarak hesap verebilirliğini | |
artırmayı ve mevcut durumlarını iç ve dış paydaşlarına bildirmeyi amaçlamaktadırlar. Üniversiteler bu | |
kapsamda misyon ve vizyonlarını gerçekleştirdiklerini kanıtlamak amacıyla öğretim elemanlarına ait | |
performans değerlendirme çalışmaları yürütmekte ve bunu rapor olarak halkın, öğrencilerin, ailelerin, | |
hükümetin, özel sektörün bilgisine arz etmektedirler. Ulusal ve küresel ölçekte üniversiteler üzerinde | |
kalite, verimlilik, etkililik, hesap verebilirlik gibi kavramlardan dolayı sistematik olarak performans | |
değerlendirmeleri yapmaya yönelik artan bir baskı bulunmaktadır. Dolayısıyla, performans değerlendirme | |
yükseköğretim kurumları için bu kadar önemliyken performansı değerlendirilen öğretim elemanlarının | |
beklentilerinin ne olduğunun belirlenmesi konusunda araştırma yapılmasına ihtiyaç bulunmaktadır. | |
Yükseköğretimde öğretim elemanlarının performans değerlendirme yaklaşımından beklentileri ve | |
performans değerlendirmenin önündeki engellere ilişkin görüşlerinin nicel ve nitel olarak incelemeyi | |
amaçlayan bu çalışma kapsamında, aşağıdaki sorular araştırılmıştır: | |
1. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin | |
beklentileri nasıldır? | |
1.1. Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentileri çeşitli | |
değişkenlere göre anlamlı farklılık göstermekte midir? (akademik ünvan, akademik deneyim, teşvik alma | |
durumu, kurumundan memnuniyet düzeyi) | |
2. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme sisteminin önündeki | |
engellere ilişkin görüşleri nasıldır? | |
2.1. Öğretim elemanlarının performans değerlendirme sisteminin önündeki engellere ilişkin algıları | |
çeşitli değişkenlere göre anlamlı farklılık göstermekte midir? (akademik ünvan, akademik deneyim, teşvik | |
alma durumu, kurumundan memnuniyet düzeyi) | |
3. Eğitim Fakültesindeki öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin genel | |
görüşleri nelerdir? | |
Yöntem | |
Bu araştırma kapsamında karma araştırma yöntemlerinden yakınsayan paralel karma desen tercih | |
edilmiştir. Nicel ve nitel veriler eş zamanlı toplanmış, ayrı ayrı analiz edilmiş ve bulguları karşılaştırılmıştır. | |
Yakınsayan paralel desende, nitel ve nicel araştırmalara eşit öncelik tanınır, analiz sırasında ayrı | |
çözümlemeler yapılır ve en sonunda birlikte yorumlama gerçekleşir (Creswell ve Plano Clark, 2014). Bu | |
araştırmada kullanılan karma desen Şekil 1‘de gösterilmiştir: | |
1322 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Betimsel İstatistik | |
Nicel veri | |
toplama ve | |
analizi | |
Nitel Veri | |
Toplama ve | |
analizi | |
t-test ve ANOVA | |
Nicel ve nitel | |
analizlerin | |
birlikte | |
yorumlanması | |
İçerik | |
Analizi | |
Şekil 2. | |
Karma Araştırmalarda Paralel Yakınsak Desen Önerisi | |
Katılımcılar | |
Bu çalışmanın verileri, 2018 yılı içerisinde devlet üniversiteleri bünyesinde faaliyet gösteren Eğitim | |
Fakültelerinde görev yapmakta olan araştırma görevlisi doktor, doktor öğretim üyesi, doçent ve profesör | |
kadrosunda bulunan öğretim elemanlarından elde edilmiştir. Çalışma grubu Marmara Bölgesi, Karadeniz | |
Bölgesi, Ege Bölgesi, Akdeniz Bölgesi ve Doğu Anadolu Bölgesinde yer alan devlet üniversitelerinin Eğitim | |
Fakültelerinde görev yapan katılımcılardan oluşmaktadır. Ders yüklerinin yoğunluğu yüzünden ve bu | |
araştırma kapsamında sadece doktora eğitimini tamamlayan öğretim elemanlarından veri toplandığı için | |
öğretim görevlileri çalışma grubuna dâhil edilmemiştir. Nicel boyuttaki veriler toplanırken elverişli | |
örnekleme tekniği kullanılmış ve çalışmaya katılmayı kabul eden altı üniversiteden 104 öğretim | |
elemanından veri toplanmıştır. Nitel boyutta ise örneklem seçimi maksimum çeşitlilik örneklemesiyle elde | |
edilmiş, ve incelenen durum hakkındaki farklı görüşleri temsil eden 50 katılımcıdan veri toplanmıştır. Nicel | |
aşamada 25 araştırma görevlisi doktor, 35 doktor öğretim üyesi, 31 doçent, 13 profesör yer almaktadır. | |
Elverişli örnekleme kullanıldığı için bölüm kriterine göre örneklem alımı yapılmamıştır; fakat nihai olarak | |
katılımcıların yüzde 22’si Fen Eğitimi, yüzde 11’i Okul Öncesi Eğitimi, yüzde 28’i Eğitim Bilimleri, yüzde 31’i | |
de Sınıf Eğitimi bölümünde görev yapmaktadır. Nitel aşamada 13 araştırma görevlisi doktor, 17 doktor | |
öğretim üyesi, 15 doçent ve 5 profesör yer almaktadır. Nitel aşamada katılımcılar belirlenirken akademik | |
unvan ve bölüm değişkenine göre maksimum çeşitlilik sağlanmıştır. Katılımcıların yüzde 20’si Fen Eğitimi, | |
yüzde 10’u Okul Öncesi Eğitimi, yüzde 40’ı Eğitim Bilimleri ve yüzde 30’u Sınıf Eğitimi bölümünde görev | |
yapmaktadır. | |
Kullanılan Veri Toplama Araçları | |
Bu çalışmada veri toplamak amacıyla kişisel bilgi formu, Tonbul (2008) tarafından geliştirilen 16 | |
maddeden oluşan 4’lü likert tipinde “Performans Değerlendirme Yaklaşımına İlişkin Beklentiler” altölçeği | |
ve 10 maddeden oluşan “Performans Değerlendirme Sisteminin Önündeki Engeller” altölçeği | |
kullanılmıştır. Ölçek geliştirilirken açımlayıcı faktör analizi ve varimax dik döndürme tekniği uygulanmıştır. | |
Kullanılan ölçme aracına ait Cronbach alfa güvenirlik değerlerinin, “Performans Değerlendirme | |
Yaklaşımına İlişkin Beklentiler” ölçeği için 0.92 olduğu, “Performans Değerlendirme Sisteminin Önündeki | |
Engeller” altölçeği için .87 olduğu ortaya çıkmıştır. Bu çalışmanın verileri ile tekrar güvenirlik analizi | |
gerçekleştirilmiş ve Cronbach alfa değeri birinci altölçek için .84, ikinci altölçek için .78 bulunmuştur. Ölçek | |
maddeleri arasındaki homojenliği ölçen Cronbach Alfa değeri .60 ile .80 arasında olması ölçeğin üst | |
düzeyde güvenirliğe sahip olduğunun bir kanıtıdır (Tonbul, 2008) Kullanılan bu ölçekte dağılım tek faktörde | |
toplanmış ve tek faktör toplam varyansın %55,8’ini açıklamaktadır.Ayrıca, nicel verilerin nitel verilerle | |
desteklenmesi ve zengin çözümleme amacıyla performans değerlendirme yaklaşıma ilişkin açık uçlu | |
sorular sorulmuştur. Eğitim Bilimleri alanından bir profesör, Ölçme ve Değerlendirme alanından bir doçent | |
1323 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
ve yükseköğretim çalışmaları alanında çalışan bir uzmandan görüşleri alınmış ve gerekli düzeltmeler | |
yapılmıştır. Açık uçlu soruların nihai hali şu şekildedir: | |
2.1. Akademisyenlerin performansının verilere dayalı ve periyodik olarak ölçülüp değerlendirilmesi | |
hakkındaki düşünceniz nedir? | |
2.2. Performansa dayalı değerlendirme yapılırken, içerisinde hangi boyutların olmasını istersiniz? Bu | |
boyutları önem sırasına göre maddeler halinde yazınız. | |
2.3. Performans değerlendirmesinin akademisyenlerin performansını etkileyen olumlu ve olumsuz | |
yönleri nelerdir? | |
2.4. Yükseköğretimde akademisyenlerin performansını artırma önündeki engeller nelerdir ve bu | |
engellerin ortadan kalkması için önerileriniz nelerdir? | |
Veri Analizi | |
Nicel verilerin hangi yöntemle çözümleneceğini belirlemek için varyansların eşitliği ve verilerin | |
dağılımına ilişkin normallik değerine bakılmıştır. Bu amaçla çarpıklık ve basıklık katsayılarına bakılmış ve (1,+1) aralığında olduğu görülmüştür. Ayrıca örneklem sayısı 50’den büyük olduğu için (N=104) Kolmogrov | |
Smirnov testi yapılmış ve test sonucunda anlamlılık değerinin (p>.05) olduğu görülmüştür. Normallik | |
varsayımı sağlandığı için, akademik teşvik alma durumu değişkeni açısından katılımcıların verdikleri | |
yanıtlar arasında anlamlı bir fark olup olmadığını test etmek için “İlişkisiz Örneklemler için t-test” | |
yapılmıştır. Çalışma deneyimi, akademik ünvan ve kurumundan memnuniyet düzeyi değişkenleri açısından | |
katılımcıların ölçek maddelerine verdikleri yanıtlar arasında anlamlı bir fark olup olmadığını test etmek | |
amacıyla tek yönlü varyans analizi (ANOVA) yapılmıştır. | |
Nitel verilerin analizinde tümevarımsal içerik analizi kullanılmıştır. Açık Uçlu Anket ile toplanan | |
akademisyen görüşleri üzerinden kodlayıcı güvenirliği uyuşum yüzdeleri belirlenmiştir. Bu değerler | |
belirlenirken açık uçlu ankette yer alan akademisyen görüşleri bir araştırmacı ve bir uzman tarafından | |
kodlanmıştır. Bu işlem ankette yer alan her madde için tekrar edilmiştir. Uyuşum yüzdeleri, Miles ve | |
Huberman’ın (1994) güvenirlik formülü kullanılarak hesaplanmıştır. | |
Güvenirlik = Görüş Birliği / (Görüş Birliği + Görüş Ayrılığı) | |
Hesaplama sonucunda performans değerlendirme yaklaşımıyla ilgili görüşlere ilişkin güvenirlik 0.89 | |
bulunmuştur. Uyuşum yüzdesinin % 80 ya da daha üstü olması yeterli görüldüğünden veri analizi açısından | |
güvenirliğin sağlandığı söylenebilir (Mokkink ve diğerleri, 2010). Bu araştırmada Creswell (2003) | |
tarafından sıralanan nitel araştırma yöntemlerinde kullanılan “Katılımcı Kontrolü, “Uzman Kanısı”, “Zengin | |
Betimleme” ve “Kanıt Zinciri” geçerlik stratejilerinden yararlanılmıştır. Katılımcılara çalışma bulgularının | |
kendi düşüncelerini doğru yansıtıp yansıtmadığını sorulmuş, çalışma katılımcılarıyla az teması olan ve | |
çalışma yöntemini bilen bağımsız bir uzmana danışılmış ve doğrudan alıntılarla verinin doğasına mümkün | |
olduğu ölçüde sadık kalınmıştır. | |
Bulgular | |
3.1 Performans Değerlendirme Yaklaşımına İlişkin Beklentiler | |
Araştırmada ilk olarak “Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin | |
beklentileri nasıldır?” sorusuna cevap aranmış ve katılımcıların ölçekten aldıkları genel puan ortalaması | |
Tablo 1’de sunulmuştur. | |
Tablo 1. | |
Öğretim Elemanlarının Performans Değerlendirme Yaklaşımına İlişkin Beklentilerinin Genel Ortalaması | |
Beklenti Genel Ortalama | |
N | |
104 | |
Minimum | |
1,50 | |
1324 | |
Maksimum | |
3,31 | |
Ortalama | |
2,3023 | |
Standart Sapma | |
,43859 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 1’de öğretim elemanlarının ölçekten elde ettiği puan ortalamasına bakıldığında ( =2,30), | |
performans değerlendirme yaklaşımıyla ilgili beklentilerinin yüksek olmadığı, orta düzeyde (kısmen | |
katılıyorum) olduğu dikkat çekmektedir. Öğretim elemanlarının performans değerlendirme yaklaşımına | |
ilişkin beklentilerinin akademik ünvan değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin | |
ANOVA test sonuçları Tablo 2’de yer almaktadır. | |
Tablo 2. | |
Akademik Ünvan Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin Beklentiler | |
ANOVA Testi | |
N | |
Arş.Gör.Dr. | |
Dr.Öğr.Üyesi | |
Doç.Dr. | |
Standart | |
Sapma | |
25 | |
2,4525 ,506 | |
35 | |
2,4875 ,251 | |
31 | |
2,1754 ,441 | |
Kareler | |
Toplamı | |
Gruplararası | |
Grup içi | |
Prof.Dr. | |
Toplam | |
5,321 | |
14,492 | |
13 1,8173 ,162 | |
104 2,3023 ,438 | |
Kareler F | |
Sd Ort. | |
p | |
Farkın kaynağı | |
Arş.Gör>Doç.Dr., | |
ProfDr.Öğr.Üyesi> | |
3 1,774 | |
12,24 .000 Doç, Dr., Prof. | |
Doç.Dr.>Prof. | |
10 | |
0 | |
145 | |
Ölçekten alınan ortalama puanların akademik ünvanlara göre aritmetik ortalama ve standart sapma | |
değerine bakıldığında ise performans değerlendirme yaklaşımıyla ilgili en yüksek beklentiye sahip olanların | |
doktor öğretim üyesi olduğu görülürken, en düşük beklentiye sahip olanların ise profesörler olduğu ortaya | |
çıkmaktadır. Tablo 2’de gruplar arası anlamlı farklılık olduğu ortaya çıktığı için, anlamlı farklılığın hangi | |
gruplar arasında olduğunu görmek amacıyla post hoc testlerine bakılmıştır. “Levene F” testine ait olan | |
(Sig) değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar | |
arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi | |
kullanılmıştır. Analiz sonucunda Araştırma Görevlileri ile doktor öğretim üyelerinin ortalama puanları | |
doçent ve profesörlerin puanlarından anlamlı derecede yüksektir. Araştırma görevlileri ile doktor öğretim | |
üyeleri arasında beklenti puanlarında anlamlı farklılık yoktur. | |
Ölçekte yer alan maddeler incelendiğinde ise performansa ilişkin en yüksek beklentilerin aşağıdaki | |
maddelerle ilgili olduğu görülmektedir: | |
Etkili öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması sağlanır. ( | |
Öğretim üyesinin mesleki gelişimi olumlu etkilenir. ( =3,27) | |
Öğretim üyesinin iş yükü artar. ( =2,40) | |
Kurum içi gerginliğe neden olur. ( =2,39) | |
=3,42) | |
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin en düşük beklentileri ise | |
şunlardır: | |
Öğretim üyelerinin motivasyonu artar. ( =1,90) | |
Nitelikli bir kurum kültürünün (değerler, işe ilişkin tutum ve sorumluluk anlayışı, ilişkiler vb) gelişmesine | |
katkıda bulunur. ( =1,76) | |
Öğretim üyesinin derslere daha hazırlıklı gelmesi sağlanır. ( =1,70) | |
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin akademik teşvik | |
alma durumu değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin analiz sonuçları Tablo | |
3’te yer almaktadır. | |
1325 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 2. | |
Akademik Teşvik Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin T- Testi | |
Sonuçları | |
N | |
Akademik Teşvik | |
SS | |
t | |
p | |
,322 | |
,002 | |
Aldım | |
52 | |
2,43 | |
,38 | |
Almadım | |
52 | |
2,16 | |
,45 | |
Tablo 3’te yer alan analiz sonucunda, performans değerlendirme yaklaşımına ilişkin beklentilerin | |
akademik teşvik alma durumuna göre anlamlı şekilde farklılaştığı görülmektedir [t(102)=3,22 p<.05)]. | |
Akademik teşvik almış olan öğretim elemanlarının performans değerlendirme yaklaşımından beklentileri, | |
almayanlara göre anlamlı derecede yüksektir. | |
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin çalışma deneyimi | |
değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları Tablo 4’te yer | |
almaktadır. | |
Tablo 4. | |
Çalışma Deneyimi Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin Beklentiler | |
ANOVA Testi | |
Standart | |
Sapma | |
N | |
0-5 sene | |
17 | |
2,43 | |
,51 | |
6-10 sene | |
38 | |
2,43 | |
,28 | |
14 2,51 | |
11-15 sene | |
15 | |
seneden | |
35 2,00 | |
fazla | |
Total | |
104 2,30 | |
,44 | |
Kareler | |
Toplamı | |
Gruplar | |
1,55 | |
arası | |
Grup içi | |
1,51 | |
Sd | |
3 | |
Kareler F | |
Ort. | |
4,67 | |
10,28 | |
p | |
,00 | |
Farkın kaynağı | |
0-5sene> 15seneden | |
fazla | |
6-10sene> | |
15seneden fazla | |
11-15sene>15 | |
seneden fazla | |
100 15,1 | |
,39 | |
,43 | |
103 | |
Tablo 4’te analiz sonucunda performans değerlendirmeyle ilgili beklentilere ilişkin en düşük puana | |
sahip olanların 15 senden fazla çalışma deneyimi olanlar olduğu ortaya çıkmıştır. Diğer bütün grupların | |
puan ortalamaları, bu grubun puan ortalamasından anlamlı derecede yüksektir. İlk üç grubun kendi | |
aralarında puan ortalamaları arasında anlamlı faklılık bulunmamaktadır. | |
Öğretim elemanlarının performans değerlendirme yaklaşımına ilişkin beklentilerinin kurumlarından | |
memnniyet düzeyi değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test | |
sonuçları Tablo 5’te yer almaktadır. | |
1326 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 5. | |
Kurumundan Memnuniyet Değişkenine Göre Performans Değerlendirme Yaklaşımı Beklentilerine İlişkin | |
ANOVA Testi | |
Standart | |
Sapma | |
N | |
Az | |
Orta Düzey | |
10 | |
2,70 | |
,31 | |
35 | |
2,39 | |
,32 | |
42 | |
2,00 | |
,47 | |
Tamamıyla | |
17 | |
1,80 | |
,11 | |
Toplam | |
104 2,30 | |
Oldukça | |
,438 | |
Kareler | |
Toplamı | |
Sd | |
Gruplar | |
5,97 | |
arası | |
Grup içi 13,08 | |
3 | |
Kareler Ort. F | |
p | |
1,991 | |
14,38 | |
,00 | |
Farkın kaynağı | |
Az,Orta | |
Düzeyde> | |
Oldukça, | |
Tamamıyla | |
100 | |
,138 | |
103 | |
Tablo 5’te ANOVA testi sonucunda gruplar arası anlamlı farklılık (p<.05) olduğu ortaya çıktığı için hangi | |
gruplar arasında anlamlı farklılık olduğuna bakılmıştır. “Levene F” testine ait olan (Sig) değeri p<.05 olduğu | |
için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar arasında karşılaştırma | |
yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi kullanılmıştır. Post-hoc testi | |
sonucunda bulunduğu kurumdan az ve orta düzeyde memnun olan öğretim elemanları, oldukça ve | |
tamamıyla memnun olanlara göre performans değerlendirme yaklaşımıyla ilgili anlamlı derecede daha | |
yüksek beklentilere sahiptir. | |
3.2 Performans Değerlendirme Yaklaşımının Önündeki Engeller | |
Araştırmada ikinci olarak “Öğretim elemanlarının performans değerlendirme yaklaşımının | |
önündeki engellere yönelik görüşleri nasıldır?” sorusuna cevap aranmış ve katılımcıların ölçekten aldıkları | |
puan ortalaması ve dağılımın standart sapması Tablo 6’da sunulmuştur. | |
Tablo 6. | |
Öğretim Elemanlarının Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin Genel Puan | |
Ortalamaları | |
Standart | |
N | |
Minimum | |
Maksimum | |
Ortalama | |
Sapma | |
Engeller Altölçeği | |
104 | |
2,20 | |
3,80 | |
3,02 | |
,57517 | |
Tablo 6’da öğretim elemanlarının ölçekten elde ettiği puan ortalamasına bakıldığında ( =3,02), | |
performans değerlendirme yaklaşımının önündeki engellerle ilgili, ölçekte yer alan maddelere katıldıkları | |
görülmektedir. Madde madde bakıldığında, öğretim elemanlarının performans değerlendirmenin | |
önündeki engellere ilişkin en fazla katıldıkları ifadelerin şunlar olduğu görülmektedir: | |
Yükseköğretim kurumlarının mevcut örgütsel işleyişi (hiyerarşik yapılanma, yetki ve sorumlulukların | |
dağılımı, birimlerin özerklik sınırları). = 3,80 | |
Öğretim üyesinin iş yükü. =3,68 | |
1327 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Performans değerlendirme yaklaşımına ilişkin en az katıldıkları ifade ise “Kültürel yapı (olumsuzlukları | |
görmezden gelme, kişisel çekişmeler, aşırı hoşgörü, eleştirilme rahatsızlığı, güvensizlik, Batı | |
standartlarında rekabetçi bir anlayışın eksikliği vb.). ( =1,91)”dır. | |
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engellere ilişkin görüşlerinin | |
akademik teşvik alma durumu değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin analiz | |
sonuçları Tablo 7’de yer almaktadır. | |
Tablo 7. | |
Akademik Teşvik Değişkenine Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin TTesti Sonuçları | |
N | |
Akademik Teşvik | |
SS | |
t | |
P | |
5,77 | |
,000 | |
Aldım | |
52 | |
2,14 | |
,54 | |
Almadım | |
52 | |
2,74 | |
,51 | |
Tablo 7’de performans değerlendirme yaklaşımına ilişkin beklentilerin akademik teşvik alma | |
durumuna göre anlamlı şekilde farklılaştığı görülmektedir [t(102)=5,77, p<.05)]. Akademik teşvik almış | |
olan öğretim elemanlarının, performans değerlendirme yaklaşımının önündeki engeller altölçeğinden | |
anlamlı derecede daha düşük puan aldıkları görülmektedir. | |
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engellere ilişkin görüşlerinin | |
akademik ünvan değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları | |
Tablo 8’de yer almaktadır. | |
Tablo 8. | |
Akademik Ünvana Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin Anova Testi | |
Standart | |
Sapma | |
N | |
Arş.Gör.Dr 25 | |
. | |
Dr.Öğr.Üy | |
35 | |
esi | |
2,98 | |
,30181 | |
3,42 | |
,36202 | |
Doç.Dr. | |
31 | |
3,38 | |
,63314 | |
Prof.Dr. | |
Toplam | |
13 2,96 | |
104 3,02 | |
,83254 | |
,61101 | |
Kareler | |
Toplamı | |
Gruplar | |
arası | |
Sd | |
3 | |
11,089 | |
Kareler Ort. | |
3,696 | |
F | |
p | |
Farkın kaynağı | |
Doç.>Arş.Gör.Dr., | |
13,50 ,000 Prof.Dr. | |
Dr.Öğr.Üyesi>Arş | |
.Gö.Dr,Prof.Dr. | |
Grup içi | |
27,365 | |
100 | |
,274 | |
Tablo 8’de gruplar arası anlamlı farklılık olduğu ortaya çıktığı için (p<.05), anlamlı farklılığın hangi | |
gruplar arasında olduğunu belirlemek için post hoc testlerine bakılmıştır. “Levene F” testine ait olan (Sig) | |
değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu durumlarda gruplar | |
arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel istatistik yöntemi | |
kullanılmıştır. Analiz sonucunda performans değerlendirmenin önündeki engellerle ilgili en yüksek | |
puanların dr. öğretim üyeleri ve doçentlere ait olduğu, en düşük puanların ise araştırma görevlileri ve | |
profesörlere ait olduğu ortaya çıkmıştır. Araştırma görevlileri ile profesörlerin engellere ilişkin puanları | |
arasında istatiktiksel olarak anlamlı farklılık yoktur. | |
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engeller altölçeği | |
puanlarının çalışma deneyimi değişkenine göre anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA | |
test sonuçları Tablo 9’da yer almaktadır. | |
1328 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 9. | |
Çalışma Deneyimi Değişkenine Göre Performans Değerlendirme Yaklaşımının Önündeki Engellere İlişkin | |
ANOVA Testi | |
Standart | |
Sapma | |
N | |
0-5 sene 17 | |
2,72 | |
,51 | |
6-10 sene 38 | |
3, 26 | |
,28 | |
14 | |
3,78 | |
,44 | |
35 | |
2,88 | |
,39 | |
104 3,02 | |
,54 | |
11-15 | |
sene | |
15 | |
seneden | |
fazla | |
Total | |
Kareler | |
Toplamı | |
Gruplar | |
21,938 | |
arası | |
Sd | |
Kareler Ort. F | |
3 | |
4,67 | |
p | |
44,276 | |
,000 | |
Farkın kaynağı | |
11-15sene, 6-10 | |
sene>0-5sene, | |
15seneden fazla | |
11-15 sene >610sene | |
Grup içi | |
1,51 | |
100 | |
16,516 | |
103 | |
Tablo 9’da yer alan post hoc analiz sonucuna göre, performans değerlendirmenin önündeki engeller | |
altölçeğinden en düşük puan alanların 0-5 sene çalışma deneyimi; en yüksek puan alanların ise 11-15 sene | |
çalışma deneyimi olanlar olduğu ortaya çıkmıştır. 11-15 sene çalışma deneyimi olanların performans | |
değerlendirme önündeki engellere ilişkin puanları diğer bütün gruplara göre anlamlı derecede yüksektir. | |
11-15 sene çalışma deneyimine sahip olan grup, çoğu şeyin performans değerlendirmeyi engellediğini | |
düşündükleri ve neredeyse her maddenin engel olarak adlandırıldığı bir grup olarak ortaya çıkmıştır. | |
Öğretim elemanlarının performans değerlendirme yaklaşımının önündeki engeller altölçeğinden | |
aldıkları puanlar, kurumlarından memnniyet düzeyi değişkenine göre (az, orta düzeyde, oldukça ve | |
tamamıyla) anlamlı bir farklılık gösterip göstermediğine ilişkin ANOVA test sonuçları Tablo 10’da yer | |
almaktadır. | |
Tablo 10. | |
Kurumdan Memnuniyete Göre Performans Değerlendirme Önündeki Engellere İlişkin ANOVA Testi | |
N | |
10 | |
2,58 | |
,31 | |
35 | |
2,62 | |
,32 | |
42 | |
3,48 | |
,47 | |
Tamamıyla | |
17 | |
3,36 | |
,11 | |
Toplam | |
104 3,02 | |
Az | |
Orta Düzeyde | |
Oldukça | |
Kareler | |
Toplamı | |
Standart | |
Sapma | |
Gruplar 5,97 | |
arası | |
Grup içi | |
Kareler Ort. F | |
Sd | |
3 | |
13,08 | |
,43859 | |
p | |
100 | |
1,991 | |
14,383 | |
,000 | |
Farkın | |
kaynağı | |
Az,Orta | |
Düzeyde> | |
Oldukça, | |
Tamamıyla | |
,138 | |
103 | |
Tablo 10’da ANOVA testi sonucunda gruplar arası anlamlı farklılık (p<.05) olduğu ortaya çıktığı için, | |
anlamlı farklılığın hangi gruplar arasında olduğunu belirlemek amacıyla post hoc testi yapılmıştır. “Levene | |
F” testine ait olan (Sig) değeri p<.05 olduğu için varyansların eşit olmadığı görülmektedir; dolayısıyla bu | |
durumlarda gruplar arasında karşılaştırma yaparken tercih edilen Post Hoc testlerinden Games-Howel | |
istatistik yöntemi kullanılmıştır. Post-hoc testi sonucunda bulunduğu kurumdan az ve orta düzeyde | |
memnun olan öğretim elemanları, oldukça ve tamamıyla memnun olanlara göre performans | |
değerlendirmenin önündeki engellerle ilgili belirtilen maddelere daha fazla katıldıklarını ortaya çıkmıştır. | |
1329 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
3.3 Performans Değerlendirmeye Yönelik Genel Yaklaşıma İlişkin Nitel Analiz | |
Çalışma kapsamında Eğitim Fakültesi öğretim elemanlarının performans değerlendirmeyle ilgili genel | |
yaklaşımlarına ilişkin nitel veriler toplanmıştır. Toplanan nitel veriler içerik analizi yöntemiyle analiz | |
edilmiştir. Yükseköğretimde performans değerlendirme yaklaşımına ilişkin dört açık uçlu soru sorulmuş ve | |
cevaplardan elde edilen nitel veriler içerik analiziyle incelenmiştir. İçerik analizi sonucunda “tutum boyutu, | |
akademisyenlerin öncelikleri, performans değerlendirmenin olumlu etkileri, performans | |
değerlendirmenin olumsuz etkileri, performans değerlendirme önündeki engeller, peformans | |
değerlendirmeyi engelleyen faktörlere ilişkin öneriler” temaları ortaya çıkmıştır. | |
1. Akademisyenlerin performansının verilere dayalı ve periyodik olarak ölçülüp değerlendirilmesi | |
hakkındaki düşünceniz nedir? | |
Türkiye’deki Eğitim Fakültelerindeki öğretim elemanları arasında, bu konuya ilişkin görüş | |
ayrılıkları bulunmaktadır. Katılımcıların çoğunluğu verilere dayalı ve periyodik bir değerlendirmeden yana | |
olsa da bu yaklaşıma ilişkin olumsuz tutumları olan ya da değerlendirme yaklaşımının suistimallere açık | |
olduğundan şüphelen bireyler bulunmaktadır. Buna ilişkin nitel verilerin analizi Tablo 11’de yer almaktadır. | |
Tablo 11. | |
Performans Değerlendirmenin Periyodik Yapılmasına Yönelik Verilerin İçerik Analiziyle Kodlanması | |
Tema | |
Tanım | |
Tutum | |
Boyutu | |
Performans Değerlendirme | |
yaklaşımına ilişkin olumlu, | |
olumsuz veya çekinik tutum | |
içerisinde olma | |
Kodlar | |
Frekans | |
Benimseyenler | |
28 | |
Şüpheyle yaklaşanlar | |
12 | |
Direnç gösterenler | |
10 | |
Tablo 11 incelendiğinde öğretim elemanlarının çoğu performans değerlendirmenin birçok yönden | |
olumlu olacağını ve böyle bir değerlendirmeyi destekleyeceklerini belirtmişlerdir. Öğretim elemanlarının | |
vermiş oldukları cevaplar “Tutum Boyutu” teması içerisinde yer alan “benimseyenler”, “şüpheyle | |
yaklaşanlar”, ve “direnç gösterenler” kodları altında incelenmiştir. Kanıt zinciri göz önünde bulundurularak | |
bu kodlara ilişkin görüşlerden bazıları aşağıda verilmiştir: | |
Benimseyenler: “Yükseköğretimde kalite ve niteliği sağlamada performans değerlendirmenin iyi sonuçlar | |
getireceğine inanıyorum” (K6) | |
Şüpheyle yaklaşanlar: “Çalışmalara destek verilmesi güzel. Ancak her şey yayın mı? Değerlendirmeyi | |
kimlerin nasıl yapacağı bende soru işareti” (K5) | |
Direnç gösterenler: “Performans ölçülemez. Bireyleri karşılaştırmak anlamsızdır. Tarih boyunca denendi, | |
bir faydası görülmedi, tekrar denemenin anlamı yok” (K13) | |
2. Performansa dayalı değerlendirme yapılırken, içerisinde hangi boyutların olmasını istersiniz? Bu | |
boyutları önem sırasına göre maddeler halinde yazınız. | |
Eğitim Fakültelerindeki öğretim elemanları performans değerlendirme yaklaşımında hangi boyutların | |
yer alması gerektiğiyle ilgili çeşitli görüşler belirtmişlerdir. Öğretim elemanlarının hangi boyutlara ne kadar | |
önem verdiklerini ifade etmeleri nitel açıdan önemli veriler sağlamıştır. Buna ilişkin nitel verilerin analizi | |
Tablo 12’de yer almaktadır. | |
1330 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 12. | |
Performans Değerlendirmede Yer Alması Gereken Boyutlara İlişkin Verilerin İçerik Analiziyle Kodlanması | |
Tema | |
Akademisyenlerin | |
Öncelikleri | |
Tanım | |
Performans | |
değerlendirme | |
yaklaşımında yer alması | |
gereken ögeler ve bu | |
ögelerin önem sırasına | |
konulması | |
Kodlar | |
Frekans | |
Akademik yayınlar | |
17 | |
Öğretimin | |
değerlendirilmesi | |
kalitesinin | |
10 | |
Lisans ve lisansüstü danışmanlık | |
8 | |
İş yükleri (ders saati vb) | |
6 | |
Jüri Üyelikleri (Tez, doçentlik vb.) | |
5 | |
Öznel ilgi ve uğraş alanları | |
4 | |
Tablo 12 incelendiğinde öğretim elemanlarının bir performans değerlendirme kapsamında, öncelikle | |
akademik yayınların sayısının ve kalitesinin ölçülmesini, bundan sonra sınıf içerisinde öğretim elemanının | |
ders işleme biçimi, kullandığı yöntemler, içeriği sunuş kalitesi, materyal kullanımı, öğrenmeyi kalıcı hale | |
getirmek için yaptığı her şeyin değerlendirilmesi gerektiğini belirtmişlerdir. Öğretim elemanlarının | |
değerlendirme ögelerinin önemiyle ilgili vermiş oldukları cevaplar “Akademisyenlerin Öncelikleri” teması | |
içerisinde yer alan “akademik yayınlar”, “öğretimin kalitesinin değerlendirilmesi”, “lisans ve lisansüstü | |
danışmanlık”, “iş yükleri”, “jüri üyelikleri” ve “öznel ilgi ve uğraş alanları” kodları altında incelenmiştir. | |
Kanıt zinciri göz önünde bulundurularak bu kodlara ilişkin görüşlerden bazıları aşağıda verilmiştir: | |
Akademik yayınlar: “Öğretim elemanlarıyla ilgili yapılan bir performans değerlendirmenin en başlıca | |
üzerinde durması gereken boyut öğretim elemanlarının yayın yapması, bu yayınların kalite ve niteliğinin | |
ölçülmesidir.” (K6) | |
Öğretimin kalitesinin değerlendirilmesi: “Akademik çalışmalar kadar önemli olan başka bir boyut | |
öğretimdir. Sınıf içi çalışmalar, özellikle aktivite ve öğretim yöntemlerine bakılabilir” | |
Lisans ve lisansüstü danışmanlık: “Öğrencilere yapılan danışmanlıklar göz ardı ediliyor. Mesela tez | |
danışmanlığı oldukça zahmetli bir iş. Bu performansın da değerlendirmeye alınması lazım”(K22) | |
İş yükleri: “Derse girmekten diğer şeylere zaman kalmıyor. Bir akademisyen yayından çok girdiği | |
derslerle ölçülebilir. Çok derse giren hocalar çok çalışan hocalardır.” (K30) | |
3. Performans değerlendirmesinin akademisyenlerin performansını etkileyen olumlu ve olumsuz | |
yönleri nelerdir? | |
Eğitim Fakültelerindeki öğretim elemanları, performans değerlendirme yaklaşımının yükseköğretimde | |
performansı olumlu veya olumsuz etkileyebileceğini belirtmişlerdir. “Olumlu Etkileri” teması altında | |
“motivasyon”, “maddi destek”, “kalite arayışı”, “özeleştirinin gelişmeyi teşvik etmesi”, “dinamizmin | |
sürekliliğinin sağlanması” kodları ortaya çıkarken; “Olumsuz Etkileri” teması altında “kurumiçi rekabet”, | |
“akademik sahtekarlıklar”, “stres kaynağı”, “niceliğin niteliği gölgelemesi”, kodları ortaya çıkmıştır. Buna | |
ilişkin nitel verilerin analizi Tablo 13’te yer almaktadır. | |
1331 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Tablo 13. | |
Performans değerlendirme yaklaşımının yol açacağı olumlu ve olumsuz etkilere ilişkin verilerin içerik | |
analiziyle kodlanması | |
Tema | |
Tanım | |
Kodlar | |
Frekans | |
Motivasyon | |
12 | |
Maddi destek | |
8 | |
Kalite Arayışı | |
8 | |
Özeleştirinin gelişmeyi teşvik etmesi | |
4 | |
Dinamizmin sürekliliğinin sağlanması | |
4 | |
Kurumiçi rekabet | |
7 | |
Akademik sahtekarlıklar | |
6 | |
Stres kaynağı | |
6 | |
Niceliğin niteliği gölgelemesi | |
8 | |
Olumlu Etkileri | |
Performans | |
değerlendirme | |
yaklaşımının yol | |
açacağı olumlu | |
durumlar | |
Olumsuz Etkileri | |
Performans | |
Değerlendirme | |
yaklaşımının yol açacağı olumsuz | |
durumlar | |
Tablo 13 incelendiğinde öğretim elemanlarının performans değerlendirilme yaklaşımının yol açacağı | |
hem olumlu hem olumsuz durumlarla ilgili görüş belirttiği görülmektedir. Öğretim elemanları performans | |
değerlendirmenin olumlu etkilerine yönelik 5 kod altında 36 görüş belirtirken; olumsuz etkilerine yönelik | |
4 kod altında 27 görüş belirtmişlerdir. Kanıt zinciri göz önünde bulundurularak bu kodlara ilişkin | |
görüşlerden bazıları aşağıda verilmiştir: | |
Motivasyon: “Öğretim üyesini yeni çalışmalar yapmaya yönlendirir” (K24) | |
Kalite Arayışı: “Değerlendirmeye tabi tutulan akademisyenler bir kalite arayışı içerisine girer. Kimse | |
kötü hoca olarak anılmak istemez” (K9) | |
Dinamizmin sürekliliği: “Devlet üniversitelerinde özellikle eski hocalar kendini yenilemek konusunda | |
isteksizler. Bu durum da yükseköğretimin köhneleşmesine yol açıyor; çünkü bir değerlendirme ve yaptırım | |
yok. Değerlendirme demek aynı zamanda dinamizm anlamına gelir”(K29) | |
Kurumiçi rekabet: “İş birliğini engeller, kıskançlıklar olabilir, bir rekabet ortamı doğarsa bu verimi | |
artırmak yerine egoist davranışları artırır” (K36) | |
Akademik sahtekarlıklar: “Sahte verilerle yayın yapma, sonuncu isim olarak adını yazdırma gibi şeyler | |
olabilir” | |
Niceliğin, niteliği gölgelemesi: “Yayın yayın yayın nereye kadar. Şimdi herkes bir sürü yayın yapıyor ama | |
kaçı kaliteli bu normal değil. Birisi çok sayıda kaliteli yayın yapabilir ama kaçı böyle?” | |
4. Yükseköğretimde akademisyenlerin performansını artırma önündeki engeller nelerdir ve bu | |
engellerin ortadan kalkması için önerileriniz nelerdir? | |
1332 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Eğitim Fakültelerindeki öğretim elemanları, performans değerlendirmenin önündeki engeller ve | |
bunlara yönelik çeşitli öneriler belirtmişlerdir. “Engeller” teması altında “yoğun iş yükü (ders, danışmanlık, | |
idari görevler)”, “içsel motivasyon eksikliği”, “kalabalık öğrenci sayısı”, “çabaların takdir görmemesi”, | |
“örgütsel işleyişin hantallığı” kodları ortaya çıkarken; “Öneriler” teması altında “memur istihdamı”, “yayın | |
ve çalışmaların kurumca desteklenmesi”, “ders yükünü düşük tutmak”, “bireye YÖK tarafından dönemlik | |
bütçe tahsisi” kodları ortaya çıkmıştır. Buna ilişkin nitel verilerin analizi Tablo 14’te yer almaktadır. | |
Tablo 14. | |
Performans Artırma Önündeki Engeller Ve Bu Engellere Yönelik Önerilere İlişkin Verilerin İçerik Analiziyle | |
Kodlanması | |
Tema | |
Tanım | |
Kodlar | |
Frekans | |
Yoğun iş yükü (ders,danışmanlık,idari görevler) | |
18 | |
Çabaların takdir görmemesi | |
10 | |
Örgütsel işleyişin hantallığı | |
8 | |
Kalabalık öğrenci sayısı | |
6 | |
İçsel motivasyon eksikliği | |
4 | |
Ders yükünü düşük tutmak | |
9 | |
Yayın ve çalışmaların kurumca desteklenmesi | |
8 | |
Ölçütlerin üniversiteler tarafından belirlenmesi | |
5 | |
Memur istihdamı | |
4 | |
Bireye, YÖK tarafından dönemlik bütçe tahsisi | |
4 | |
Engeller | |
Öğretim | |
performanslarını | |
önündeki engeller | |
elemanlarının | |
artırmasının | |
Öneriler | |
Performans artırma önündeki | |
engellerin ortadan kalkması için | |
öneriler | |
Tablo 14 incelendiğinde, öğretim elemanları engellere yönelik 5 kod altında 44 görüş belirtirken; | |
önerilere yönelik 4 kod altında 26 görüş belirtmişlerdir. Kanıt zinciri göz önünde bulundurularak bu kodlara | |
ilişkin görüşlerden bazıları aşağıda verilmiştir: | |
Yoğun iş yükü: “Kaliteli bir şey ortaya koymak için zaman lazım. Öğretim elemanlarının zamanı yok. Ya | |
ders vermekte, ya bir öğrencisiyle ilgilenmekte veya bir idari görevi var onun işleriyle uğraşmak | |
durumunda” (K25) | |
İçsel motivasyon eksikliği: “Akademide motive edici unsurlardan çok motivasyonu düşüren şeyler var. | |
Meslek seçiminde kişi isteyerek de başka sebeplerle akademiye girdiyse performansını iyileştirmesi gerekli | |
isteği olmaz”(K19) | |
Örgütsel işleyişin hantallığı: “Proje ve benzeri çalışmalarda çok yavaş işleyen resmi süreç, bürokrasi ve | |
kağıt işleri performans artışı önünde engel olur” (K8) | |
Yayın ve çalışmaların kurumca desteklenmesi: “Performans artışı için en büyük önerim çalışanların | |
çabalarının kurum tarafından desteklenmesidir. Bu yayın olur, kongre olur veya kişisel gelişim için eğitim | |
olur” (K7) | |
Memur istihdamı: “Bölümlere daha fazla memur alınırsa en azından öğretim elemanlarını uğraştıran | |
ve bir sürü zamanını alan evrak işlerinden kurtulmuş olurlar.” (K21) | |
1333 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
YÖK tarafından dönemlik bütçe tahsisi: “YÖK, her öğretim elemanına dönem başında bir bütçe ayırmalı, | |
bütçe kullanma süreçlerini planlamalarını istemeli ve dönem sonunda bütçe-ürün karşılaştırması | |
yapmalıdır”(K10) | |
Tartışma ve Sonuç | |
Bu çalışmanın bulguları doğrultusunda, yükseköğretimde performans değerlendirme yaklaşımıyla ilgili | |
eğitim fakültelerindeki öğretim elemanlarının görüşlerinin oldukça farklılaştığı görülmektedir. 15 seneden | |
fazla çalışma deneyimine sahip olanların ve bulunduğu kurumdan yüksek düzeyde memnun olan öğretim | |
elemanlarının performans değerlendirmeye ilişkin beklentilerinin diğerlerine göre düşük olduğu | |
görülmektedir. Akademik ünvana göre bakıldığında, doktor araştırma görevlileri ve doktor öğretim | |
üyelerinin performans değerlendirme yaklaşımına olumlu baktığı, doçentlerin ve profesörlerin ise | |
performans değerlendirmeye düşük düzeyde olumlu baktıkları görülmektedir. Benzer şekilde, | |
Stonebraker ve Stone (2015) zorunlu emekliliğin kalkmasıyla birlikte öğretim elemanlarının yaş | |
ortalamalarında artış olduğunu, bu yaşlanmanın sınıf içerisinde üretkenlik açısından getireceği | |
olumsuzluklar konusunda endişeler bulunduğunu belirtmektedirler. Öğretim elemanlarının | |
performanslarının öğrenciler tarafından değerlendirilmesinde yaş değişkeninin olumsuz bir etkisi | |
olduğunu ve bu etkinin cinsiyet ve adademik branş bazında da görüldüğü gözlenmektedir; fakat bu | |
olumsuz etki öğretim elemanları kırklı yaşların ortasına ulaşınca kadar görülmemektedir. Bu bulgu, Esen | |
ve Esen’in (2015) çalışmasıyla paralellik göstermektedir. Onların çalışmasında akademik unvanlar | |
yükseldikçe performans değerlendirmesinin hem öğretim üyeleri için, hem de kurumlar için yaratacağı | |
sonuçlara ilişkin olumlu algılamanın azaldığı ortaya çıkmıştır. Bianchini, Lissoni ve Pezzoni, (2013) | |
performans değerlendirme ile ilgili yaptıkları çalışmada öğrencilerinin profesörleri, doktor öğretim | |
üyelerinden daha olumsuz değerlendirdiklerini belirtmişlerdir. Genel olarak öğretim elemanlarının nitel | |
görüşlerine bakıldığında ise akademik camiada akademik unvan fark etmeksizin performans | |
değerlendirme yaklaşımıyla ilgili birtakım güvensizlik ve tereddütlerin olduğu görülmektedir. | |
Performans değerlendirme yaklaşımına ilişkin öğretim elemanlarının beklentilerine bakıldığında etkili | |
öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması, öğretim üyesinin mesleki gelişiminin | |
olumlu etkilenmesi, öğretim üyesinin iş yükünün artması ve kurum içi gerginliğe neden olması bakımından | |
yüksek beklenti içerisinde oldukları ortaya çıkmıştır. Nitel bulgulara bakıldığında ise öğretim elemanları | |
arasında performans değerlendirme konusunda benimseyenler ve şüpheyle yaklaşanlar olmak üzere | |
farklılaşmanın olduğu görülmektedir. Eğitim fakültelerindeki öğretim elemanları performans | |
değerlendirmenin motivasyon ve kalite arayışını artırdığını; fakat bunun yanında kurum içi rekabet ve | |
akademik sahtekarlıklara yol açabileceğini ifade etmişlerdir. Geleneksel olarak fakültelerde performans | |
değerlendirmeleri araştırma göstergeleri üzerinde odaklanmaktadır (Bogt ve Scapens, 2012); bu yüzden | |
yükseköğretim kurumları değerlendirme yaparken devletten alınan destek, araştırma ödülleri ve | |
araştırmada üst sıralarda olma gibi sadece en iyi yayınları yapan öğretim elemanlarını desteklemek üzere | |
eğilim göstermektedirler (Douglas 2013, Hopwood 2008). Bu çalışmada performans değerlendirmenin | |
önündeki en önemli engelleri yoğun iş yükü ve içsel motivasyon eksikliği olarak görürken; öneriler | |
kapsamında ise daha fazla memur istihdamını, yayın ve çalışmaların kurumca desteklenmesini | |
belirtmişlerdir. Bu bulgular, performans değerlendirmeyle ilgili çalışma yürüten Tonbul (2008); Esen ve | |
Esen (2015) ve Başbuğ ve Ünsal’ın (2010) bulgularından farklılık göstermektedir. Tonbul (2008), öğretim | |
üyelerinin, uygulamaya konulacak bir performans değerlendirme yaklaşımına genelde olumlu yaklaştığını, | |
beklenti açısından ise etkili performansın önündeki engellerin saptanması ve öğretim üyesinin kendi | |
eksiklerini görmesi bakımından daha yüksek beklenti düzeyi içerisinde olduklarını ifade etmiştir. Esen ve | |
Esen (2015) ise öğretim üyeleri arasında performansların değerlendirilmesinin kurumlar ve öğretim üyeleri | |
için yaratacağı katkının olumlu yönde olacağına dair bir algı bulunduğunu belirtmektedirler. Beklentilerle | |
ilgili olarak da performans değerlendirmeye yönelik nitelikli bir kurum kültürünün gelişmesi, kurumsal | |
yenileşmenin süreklilik kazanması, öğretim üyelerinin mesleki gelişiminin olumlu etkilenmesi ve öğretim | |
üyelerinin kendi eksiklerini daha iyi görmesi boyutlarında akademisyenlerin beklenti içerisinde olduklarına | |
vurgu yapmışlardır. | |
1334 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Bu çalışmanın sonucunda performansın önündeki en önemli engellerin yükseköğretim kurumlarının | |
mevcut örgütsel işleyişi ve öğretim elemanlarının iş yükü olduğu, Tonbul’un (2008) çalışmasında örgütsel | |
olanakların yetersizliği, kurumlarda egemen olan kültür ve değerlendirme ölçütleri konusundaki belirsizlik | |
olduğu, Esen ve Esen’in çalışmasında ise (2015) en önemli engellerin sırasıyla kurumsal olanakların | |
eksikliği, yükseköğretim kurumlarının mevcut örgütsel işleyişi ve akademik yükseltme kriterleri olduğu | |
ortaya çıkmıştır. Başbuğ ve Ünsal (2010) akademik personelin çoğunluğunun performans | |
değerlendirilmesine olumlu baktığını ve performansın etkileyen en önemli engelleyici faktörün bilimsel | |
araştırmanın gerektirdiği fiziksel koşullardan mahrum olmak (laboratuvar, oda, araç-gereç, vb.) olduğunu | |
belirtmişlerdir. Özgüngör ve Duru (2014) ise ders yükü, deneyim, öğretim elemanının toplam öğrenci sayısı | |
arttıkça öğretim elemanına yönelik algılarda olumsuzlaşma olduğunu tespit etmiştir. Eğitim Fakültesi | |
öğrencilerinin öğretim elemanlarına diğer tüm fakültelerin öğrencilerinden daha yüksek puanlar | |
verdiklerini, Teknik Eğitim ve Mühendislik Fakültesi öğrencilerinin ise öğretim elemanlarına diğer tüm | |
fakültelerin öğrencilerinden daha düşük puanlar verdiklerini göstermiştir. Ders yüküyle ilgili analizler, ders | |
yükü 45 saat ve daha fazla olan öğretim elemanlarının, ders yükü daha az olan tüm öğretim | |
elemanlarından daha olumsuz değerlendirildiklerini ortaya koymuştur. Eğitim Fakültesi için 60-100 arası | |
öğrencisi olan öğretim elemanları en kötü değerlendirmeleri almışlar. Arnăutu ve Panc (2015) öğrenci ve | |
öğretim elemanlarının farklı beklentileri olduğunu, öğrencilerin daha çok iletişimsel konular üzerinde | |
odaklanıp profesörlerden iyi bir ilişki kurmaları ve kişisel dönüt vermelerini bekledikleri, profesörlerin ise | |
eğitsel sürecin kalitesi üzerinde (bilginin güncelliği gibi) durduklarını belirtmektedirler. | |
Bu çalışmada öğretim elemanlarının performans değerlendirme kapsamında öncelikle araştırma ve | |
akademik yayınların değerlendirilmesini, daha sonra öğretim hizmetleri ve lisansüstü danışmanlık | |
hizmelerinin değerlendirilmesini istedikleri görülmektedir. Bu bulgu Braunstein ve Benston’ın (1973) | |
çalışması tarafından desteklenmektedir. Onların çalışmasında araştırma ve prestijin performans | |
değerlendirme birbiriyle yüksek derecede ilişkili olduğu, etkili öğretimin performans değerlendirmeyle | |
orta derecede ilişkili olduğu ortaya çıkmaktadır. Öğretim elemanlarının öğretim hizmetinin kalitesi | |
öğrenciler tarafından değerlendirilmektedir; fakat Arnăutu ve Panc (2015) bu durumu eleştirmekte ve bu | |
değerlendirmelerde araştırma ve yayın üretkenliği, yönetim yeterlilikleri ve akademik tanınırlık göz | |
önünde bulundurulmadığını, dolayısıyla öğrencilerin öğretim elemanlarının fakülte içerisinde roller | |
hakkında yeterli bilgiye sahip olmadıklarını vurgulamaktadır. Öğretim elemanlarının performansının | |
öğrenciler tarafından değerlendirmesi konusunda çalışma yürüten Ünver (2012), öğretim elemanlarının | |
çoğunun öğrencilerin öğretimi objektif olarak değerlendireceğini düşünmediğini, öğrencilerin kendilerine | |
dair ortaya koyduğu öğretim becerilerine ilişkin görüşleri üzerinde düşünmek yerine akademik çalışmalar | |
yapmayı tercih ettiğini belirtmiştir. Turpen, Henderson ve Dancy (2012) yükseköğretim kurumlarının | |
öğretimin kalitesini değerlendirirken öğrencilerden gelen niceliksel puanlamalar üzerinde odaklandığını; | |
fakültelerin ise öğrencilerin test performansları ve akademik başarılarını kıstas aldıklarını belirtmektedir. | |
Bu açıdan, öğretim performansı değerlendirmede kullanılan ölçme araçlarının niteliği oldukça önem | |
kazanmaktadır. Kalaycı ve Çimen (2012), yükseköğretim kurumlarında akademisyenlerin öğretim | |
performansını değerlendirme sürecinde kullanılan anketleri incelemiş ve anketlerin belli bir sistematiği | |
temele almadan hazırlandığını, anketlerde yer alan maddelerin beşte birinin madde yazım kurallarına | |
uygun olmadığını, dolayısıyla öğretim elemanlarının performansını ölçmede yetersiz kaldığını ortaya | |
koymuştur. Bazı çalışmalarda da öğretim elemanlarının performansının öğrenciler tarafından | |
değerlendirilmesi konusunda öğrencilerin değerlendirmelerinin öğretimin kalitesiyle ilgili olduğu kadar | |
öğretimle ilişiği olmayan fiziksel çekicilik ve dersin rahatlığı gibi niteliklerle de ilgilisi olabileceği ortaya | |
konulmuştur (Hornstein, 2017; Tan ve diğerleri, 2019). Shao, Anderson ve Newsome (2007) öğretim | |
hizmetinin kalitesinin değerlendirilmesi hususunda akademisyenlerin sınıf ziyaretleri, derse hazırlık, | |
alandaki güncel gelişmeleri takip etme durumu ve meslektaş değerlendirmelerine daha fazla yer | |
verilmesine ilişkin beklentilerinin olduğunu belirtmektedirler. | |
Bu çalışmada öğretim elemanları performans değerlendirmesinin etkili öğretim üyesinin kriterleri | |
konusunda görüş birliği oluşturduğu ve öğretim üyesinin mesleki gelişimini olumlu etkilediği ortaya | |
çıkmıştır. Bu nitelikler eğitim fakültelerinde görev yapan öğretim elemanlarının mesleki açıdan kalitelerini | |
artırmakta ve sürdürebilir bir mesleki gelişim süreci sağlamaktadır. Filipe, Silva, Stulting ve Golnik (2014) | |
1335 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
performans değerlendirme sayesinde iyileşen sürdürülebilir mesleki gelişimin sadece eğitsel etkinliklerle | |
sınırlı olmadığını, aynı zamanda yönetim, takım çalışması, profesyonellik, kişilerarası iletişim ve hesap | |
verebilirlik gibi nitelikleri de geliştirdiğini vurgulamaktadırlar. Açan ve Saydan (2009) öğretim elemanlarına | |
yönelik akademik kalite beklentilerini belirlenmeye çalışmışlar ve öğretim elemanının akademik kalite | |
özelliklerinin “öğretim elemanının öğretim yeteneği, öğretim elemanının ölçme-değerlendirme becerisi, | |
öğretim elemanının empati kurma becerisi, öğretim elemanının mesleki sorumluluğu, öğretim elemanının | |
derse ilgiyi özendirme becerisi, öğretim elemanının derse verdiği önem ve öğretim elemanının nezaketi” | |
boyutlarından oluştuğunu tespit etmişlerdir. Esen ve Esen (2015), Amerika Birleşik Devletleri’nde öğretim | |
üyelerinin performanslarının genellikle dört boyut esas alınarak yapıldığını, bu boyutların sırasıyla eğitimöğretim, araştırma (profesyonel gelişim), topluma hizmet ve yönetime hizmet olduğunu ifade etmiştir. Bu | |
dört boyut arasında ise en önemli olanların eğitim-öğretim boyutu ile araştırma boyutu olduğuna vurgu | |
yapmışlardır. Bu boyutlara göre yapılan performans değerlendirme sonuçlarının ise öğretim üyelerinin | |
görev süresinin uzatılmasında, bulunduğu kadrodaki uygunluğuna karar verilmesinde ve terfisinde | |
kullanıldığı ifade edilmiştir. | |
Bu çalışmada akademik teşvik almayan öğretim elemanlarının performans değerlendirmeye ilişkin | |
beklentilerinin diğerlerine göre düşük olduğu görülmektedir. Kalaycı (2008), Türkiye’de performans | |
değerlendirme ile ilgili olarak bu konudaki çabalar ve çalışmaların dünyadaki uygulamalar yanında henüz | |
mayalanma aşaması değil, malzemelerin hazırlanma aşamasında bile olmadığını belirtmektedir. Bu | |
sorunun üzerinde odaklanan Yükseköğretim Kurulu, 2015 yılında “bir yükseköğretim kurumunun veya | |
programının iç ve dış kalite standartları ile uyumlu kalite ve performans süreçlerini tam olarak yerine | |
getirdiğine dair güvence sağlayabilmek için” Yükseköğretim Kalite Kurulu oluşturulmuştur. Buna paralel | |
olarak, yükseköğretimde çalışan akademik personelin performansını standart ve nesnel esaslara göre | |
değerlendirmek, bilimsel araştırmalar ve akademik çalışmaların etkililiğini artırmak ve akademiyenleri | |
desteklemek amacıyla Akademik Teşvik Ödeneği Yönetmeliği yürürlüğe konulmuştur. Bu çalışmada ortaya | |
çıkan performans değerlendirme sisteminin olumlu etkileri arasında akademik elemanların motive olması, | |
öğretim elemanlarının etkili öğretim üyesinin kriterleri konusunda görüş birliğinin oluşması ile ilgili | |
beklentilerle akademik teşvik yönetmeliğinin uyumlu olduğu ve akademik teşviğe hak kazanan öğretim | |
elemanlarının performans değerlendirmeyle ilgili beklentilerinin yüksek olduğu görülmektedir. | |
Özet olarak, performans değerlendirme durumuna ilişkin eğitim fakültesindeki öğretim elemanları | |
arasında bir görüş birliği olmadığı görülmüştür. Öğretim elemanlarının performans değerlendirmesinin | |
olumlu etkileri konusunda farkındalıkları bulunmakta; fakat ölçmenin güvenirliği, değerlendirme kriterleri, | |
değerlendirme süreci ve değerlendiriciler hakkında endişeleri bulunmaktadır. Bu çalışma kapsamında, | |
öğretim elemanları için değerlendirmede yer alması gereken en önemli kriterlerin sırasıyla araştırma ve | |
yayın, yapılan öğretimin kalitesi, lisans ve lisansüstü danışmanlık olduğu ortaya çıkmıştır. Performans | |
değerlendirme sisteminin olumlu etkileri arasında akademik elemanların motive olması, finansal destek | |
sağlanması ve kalite arayışına sev etmesi olarak belirtilmiştir. Buna rağmen, öğretim elemanları | |
değerlendirme sisteminin olumsuz etkileri arasında kurumiçi rekabet ve akademik sahtekarlık yer | |
almaktadır. Öğretim elemanları tarafından performans değerlendirme ile ilgili sorunların çözülmesi | |
amacıyla ders yüklerinin azaltılması, akademik çabalara kurumsal destek sağlanması, öğretim elemanına | |
araştırmalar için YÖK tarafından bütçe ayrılması ve daha fazla memur istihdamı gibi öneriler getirilmiştir. | |
Performans değerlendirmede yer alması gereken ölçütlerle ilgili farklı talepler bulunsa da yükseköğretimin | |
kalitesini artırma ve sistematik iyileştirmeler yapma açısından performans takibi ve çoklu veri türlerine | |
dayalı etkili bir değerlendirme sistemi oluşturmak oldukça önemli görülmektedir. | |
Bu araştırmanın sonucunda öneriler kapsamında yükseköğretim kurumlarının performans | |
değerlendirme sürecinde nesnellik ve etkililiği artırmaları ve fakülteler içerisinde insan kaynakları | |
hizmetleri oluşturmaları tavsiye edilmektedir. Ayrıca, bu kurumların sürdürülebilir güçlü performans | |
planları tasarlamaları, bütüncül bir değerlendirme döngüsü kullanmaları, öğretim elemanlarına, | |
öğrencilere ve iç paydaşlara performansın nasıl iyileştirilebileceğine ilişkin danışmanlık hizmetleri | |
sunulması, performans değerlendiriciler için anlaşılır ve nesnel yönergeler hazırlanması ve dönütlerin | |
yargılayıcı değil değerli olduğunu düşündüren kurum içi kültürün geliştirilmesi önerilmektedir. | |
1336 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
References | |
Açan, B., & Saydan, R. (2009). Öğretim elemanlarının akademik kalite özelliklerinin değerlendirilmesi: | |
Kafkas Üniversitesi İİBF örneği. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 13 (2), 226-227. | |
Arnăutu, E., & Panc, I. (2015). Evaluation criteria for performance appraisal of faculty members. ProcediaSocial and Behavioral Sciences, 203, 386-392. | |
Başbuğ, G., & Ünsal, P. (2010). Kurulacak bir performans değerlendirme sistemi hakkında akademik | |
personelin görüşleri: Bir kamu üniversitesinde yürütülen anket çalışması. İstanbul Üniversitesi Psikoloji | |
Çalışmaları Dergisi, 29(1), 1-24. | |
Batool, Z., Qureshi, R. H., & Raouf, A. (2010). Performance evaluation standards for the HEIs. Higher | |
Education Commission Islamabad, Pakistan. Retrieved October 12, 2019 from | |
https://au.edu.pk/Pages/QEC/Manual_Doc/Performance_Evaluation_Standards_for_HEIs.pdf | |
Bianchini, S., Lissoni, F., & Pezzoni, M. (2013) Instructor characteristics and students’ evaluation of | |
teaching effectiveness: Evidence from an Italian engineering school. European Journal of Engineering | |
Education, 38 (1),38-57. | |
Bogt, H. J., & R. W. Scapens. (2012). Performance management in universities: Effects of the transition to | |
more quantitative measurement systems. European Accounting Review, 21 (3), 451–97 | |
Braunstein, D. N., & Benston, G. J. (1973). Student and department chairman views of the performance | |
of university professors. Journal of Applied Psychology, 58(2), 244. | |
Creswell, J.W., & Plano Clark, V.L. (2014). Designing and conducting mixed methods research. Thousand | |
Oakes, CA, Sage Publications. | |
Çakıroğlu, J., Aydın, Y., & Uzuntiryaki, E. (2009). Üniversitelerde öğretim performansının değerlendirilmesi. | |
Orta Doğu Teknik Üniversitesi Eğitim Fakültesi Raporu. | |
Çalışkan, G. (2006). Altı sigma ve toplam kalite yönetimi. Elektronik Sosyal Bilimler Dergisi, 5(17), 60-75. | |
Douglas, A. S. (2013). Advice from the professors in a university social sciences department on the | |
teaching-research nexus. Teaching in Higher Education, 18 (4), 377–88. | |
Elton, L. (1999). New ways of learning in higher education: managing the change. Tertiary Education and | |
Management, 5(3), 207-225. | |
Esen, M., & Esen, D. (2015). Öğretim üyelerinin performans değerlendirme sistemine yönelik tutumlarının | |
araştırılması. Yüksekögretim ve Bilim Dergisi, 5(1). 52-67 | |
Etzkowitz, H., Webster, A., Gebhardt C., & Terra., B.R.C. (2000). The future of the university and the | |
university of the future: evolution of ivory tower to entrepreneurial paradigm. Research Policy, 29(2), | |
313-330. | |
Filipe, H. P., Silva, E. D., Stulting, A. A., & Golnik, K. C. (2014). Continuing professional development: Best | |
practices. Middle East African journal of ophthalmology, 21(2), 134. | |
Glaser, S., Halliday, M. I., & Eliot, G. R. (2003). Üniversite mi? Çeşitlilik mi? Bilgideki önemli ilerlemeler | |
üniversitenin içinde mi, yoksa dışında mı gerçekleşiyor?. N. Babüroğlu (Ed.), Eğitimin Geleceği | |
Üniversitelerin ve Eğitimin Değişen Paradigması (ss. 167-178). İstanbul: Sabancı Üniversitesi Yayını. | |
Hamid, S., Leen, Y. M., Pei, S. H., & Ijab, M. T. (2008). Using e-balanced scorecard in managing the | |
performance and excellence of academicians. PACIS 2008 Proceedings, 256. | |
Higher Education Authority (2013). Towards a performance evaluation framework: Profiling Irish Higher | |
Education. Dublin: HEA | |
Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating | |
faculty performance. Cogent Education, 4(1), 1304016. | |
1337 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Hopwood, A. G. (2008). Changing pressures on the research process: on trying to research in an age when | |
curiosity is not enough. European Accounting Review, 17 (1), 87–96. | |
Kalaycı, N. (2009). Yüksek öğretim kurumlarında akademisyenlerin öğretim performansını değerlendirme | |
sürecinde kullanılan yöntemler. Kuram ve Uygulamada Egitim Yönetimi Dergisi, 15(4), 625-656. | |
Kalaycı N., & Çimen O. (2012). Yükseköğretim kurumlarında akademisyenlerin öğretim performansını | |
değerlendirme sürecinde kullanılan anketlerin incelenmesi. Kuram ve Uygulamada Eğitim Bilimleri, | |
12(2), 1-22 | |
Kim, H. B., Myung, S. J., Yu, H. G., Chang, J. Y., & Shin, C. S. (2016). Influences of faculty evaluating system | |
on educational performance of medical school faculty. Korean Journal Of Medical Education, 28(3), | |
289-294. | |
Latham, G. P., & Pinder, C. C. (2005). Work motivation theory and research at the dawn of the twenty-first | |
century. Annu. Rev. Psychol., 56, 485-516. | |
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2. Basım). | |
California: SAGE Publications. | |
Mokkink, L. B., Terwee, C. B., Gibbons, E., Stratford, P. W., Alonso, J., Patrick, D. L., & de Vet, H. C. (2010). | |
Inter-rater agreement and reliability of the COSMIN Checklist. BMC Medical Research Methodology, | |
10, 82. | |
O'Connor, M., Patterson, V., Chantler, A., & Backert, J. (2013). Towards a performance evaluation | |
framework: profiling Irish higher education. NCVER's free international Tertiary Education Research. | |
Retrieved September 8, 2019 from http://hea.ie/assets/uploads/2017/06/Towards-a-PerformanceEvaluation-Framework-Profiling-Irish-Higher-Education.pdf. | |
Özgüngör, S., & Duru, E. (2014). Öğretim elemanları ve ders özelliklerinin öğretim elemanlarının | |
performanslarına ilişkin değerlendirmelerle ilişkileri. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, | |
29 (29-2), 175-188. | |
Paige, R. M. (2005). Internationalization of higher education: Performance assessment and indicators. | |
Nagoya Journal of Higher Education, 5(8), 99-122. | |
Shao, L. P., Anderson, L. P., & Newsome, M. (2007). Evaluating teaching effectiveness: Where we are and | |
where we should be. Assessment & Evaluation in Higher Education, 32(3), 355-371. | |
Stonebraker, R. J., & Stone, G. S. (2015). Too old to teach? The effect of age on college and university | |
professors. Research in Higher Education, 56(8), 793-812. | |
T. C. Resmi Gazete. (2015). Akademik teşvik ödeneği yönetmeliği. Karar Sayısı: 2015/8305. Kabul tarihi: | |
14/12/2015. Yayımlandığı tarih: 18 Aralık 2015. Sayı: 29566. | |
Tan, S., Lau, E., Ting, H., Cheah, J. H., Simonetti, B., & Lip, T. H. (2019). How do students evaluate | |
instructors’ performance? Implication of teaching abilities, physical attractiveness and psychological | |
factors. Social Indicators Research, 1-16. | |
Tezsürücü, D., & Bursalıoğlu, S. A. (2013). Yükseköğretimde değişim: kalite arayışları. Kahramanmaraş | |
Sütçü İmam Üniversitesi Sosyal Bilimler Dergisi, 10 (2), 97-108. | |
Tonbul, Y. (2008). Öğretim üyelerinin performansının değerlendirilmesine ilişkin öğretim üyesi ve öğrenci | |
görüşleri. Kuram ve Uygulamada Eğitim Yönetimi, 56 (56), 633-662. | |
Turpen, C., Henderson, C., & Dancy, M. (2012, Ocak). Faculty perspectives about instructor and | |
institutional assessments of teaching effectiveness. In AIP conference proceedings, 1413 (1), 371-374. | |
UNESCO (2004), Higher Education in a Globalized Society. UNESCO Education Position Paper, France | |
1338 | |
Gürol YOKUŞ & Tuğba YANPAR YELKEN.– Çukurova Üniversitesi Eğitim Fakültesi Dergisi, 48(2), 2019, 1299-1339 | |
Ünver, G. (2012). Öğretim elemanlarının öğretimin öğrencilerce değerlendirilmesine önem verme | |
düzeyleri. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 43, 472-484. | |
Vidovich, L. ve Slee, R. (2001). Bringing universities to account? Exploring some global and local policy | |
tensions. Journal of Education Policy, 16(5), 431-453. | |
Vincent, T. N. (2010). A constructive model for performance evaluation in higher education institutions. | |
Retrieved from https://ssrn.com/abstract=1877598 adresinden erişilmiştir. | |
1339 | |
Journal of University Teaching & Learning Practice | |
Volume 18 | |
Issue 8 Standard Issue 4 | |
Article 14 | |
2021 | |
Preservice teachers’ perceptions of feedback: The importance of timing, | |
purpose, and delivery | |
Christina L. Wilcoxen | |
University of Nebraska, United States of America, [email protected] | |
Jennifer Lemke | |
University of Nebraska, United States of America, [email protected] | |
Follow this and additional works at: https://ro.uow.edu.au/jutlp | |
Recommended Citation | |
Wilcoxen, C. L., & Lemke, J. (2021). Preservice teachers’ perceptions of feedback: The importance of | |
timing, purpose, and delivery. Journal of University Teaching & Learning Practice, 18(8). https://doi.org/ | |
10.53761/1.18.8.14 | |
Research Online is the open access institutional repository for the University of Wollongong. For further information | |
contact the UOW Library: [email protected] | |
Preservice teachers’ perceptions of feedback: The importance of timing, purpose, | |
and delivery | |
Abstract | |
If the purpose of feedback is to reduce the discrepancy between the established goal and what is | |
recognized, then how can this discrepancy be minimized through support and guidance? Feedback is | |
instrumental to a preservice teacher development during their teacher preparation program. This | |
qualitative study examines 31 first year teachers’ previous experiences with feedback during their | |
undergraduate practicums. The two research questions addressed: What can be learned from PSTs’ | |
perceptions of feedback practices utilized in teacher preparation programs? and What modifications or | |
adaptations can be made to current feedback practices and structures in teacher preparation programs to | |
enhance teacher efficacy and classroom readiness? Semi structured interviews provided a comparison of | |
qualitative data and an opportunity for open ended questioning. Using descriptive analysis, researchers | |
discovered that current feedback loops and structures can inhibit pre-service teachers’ ability to make | |
meaning from the information and move their learning and instruction forward. As teacher preparation | |
programs work to establish more dialogic approaches to feedback that provide pre-service teachers with | |
multiple opportunities to reflect individually and collaboratively with university faculty, timing, purpose, | |
and delivery are important components to consider. Although this article is written based on preservice | |
teacher perceptions, the implications pertain to multiple fields and authors share a universal framework | |
for feedback. | |
Practitioner Notes | |
1. The goal of teacher preparation is simple: create teachers who are well equipped with the | |
knowledge and skills to positively impact PK-12 students. Field experiences are | |
embedded throughout teacher preparation programs to provide pre-service teachers | |
(PSTs) with meaningful opportunities to develop their ability and knowledge of effective | |
instructional practices. | |
2. As teacher preparation programs work to establish more dialogic approaches to feedback | |
that provide pre-service teachers with multiple opportunities to reflect individually and | |
collaboratively with university faculty, timing, purpose, and delivery are necessary | |
considerations. | |
3. What is the timing of the delivery? The timing of the delivery of feedback must be | |
considered. Frequency plays a large role in how PSTs view and utilize feedback. | |
4. Do receivers of the feedback understand the purpose? Ties to evaluation and the need for | |
directive solutions impact preservice teachers understanding of the purpose behind the | |
feedback. One way to support this need it to strengthen PSTs’ assessment feedback | |
literacy. | |
5. Does the delivery clarify the content and support reflection? As university faculty continue | |
to explore how to provide explicit feedback, delivery methods that support reflection and | |
pre-service teacher’s growth are important to consider. With the purpose of feedback | |
being to help reduce the discrepancy between the intended goal and outcome, pre-service | |
teachers must have easy access and retrieval of feedback. | |
Keywords | |
Preservice teaching, feedback literacy, assessment, teacher preparation | |
This article is available in Journal of University Teaching & Learning Practice: https://ro.uow.edu.au/jutlp/vol18/iss8/ | |
14 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Preservice Teachers’ Perceptions of Feedback: The Importance of | |
Timing, Purpose, and Delivery | |
The goal of teacher preparation is simple: create teachers who are well equipped with the | |
knowledge and skills to positively impact preschool through high school students. Field | |
experiences are embedded throughout teacher preparation programs to provide pre-service | |
teachers (PSTs) with meaningful opportunities to develop their ability and knowledge of effective | |
instructional practices. Practicum experiences in classrooms give PSTs opportunities to practice | |
specific pedagogies with students and refine their abilities in real time (Cheng, et al., 2012). It is | |
critically important for PSTs to experience the teaching process to develop pedagogical and | |
reflective skills as well as teacher efficacy (Darling-Hammond, 2012; Liakopoulou, 2012; | |
McGlamery & Harrington, 2007). These structured experiences can bridge understanding on how | |
to apply feedback and make connections in the context of a school setting (Flushman, et al., 2019). | |
This practice builds confidence in effectively delivering instruction and managing challenges that | |
occur in the learning environment. | |
If the purpose of feedback is to reduce the discrepancy between the established goal and what is | |
recognized (Hattie and Timperley, 2007), then how can this discrepancy be minimized through | |
support and guidance? Feedback is instrumental to a PSTs development during their teacher | |
preparation program and learning is optimized “when they receive systematic instruction, have | |
multiple practice opportunities and receive feedback that is immediate, positive, corrective and | |
specific (Scheeler et al., 2004, p. 405). It is important to guide PSTs to interpret their experiences | |
in authentic settings (Schwartz et al., 2018) and to support the development of effective teaching | |
practices (Hammerness et al., 2005). Constructive feedback coupled with reflective opportunities | |
allow the PST to distinguish effective classroom practices from those that are not (Hudson, 2014; | |
1 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Pena & Almaguer, 2007). “Good quality external feedback is information that helps students | |
troubleshoot their own performance and self-correct: that is, it helps students take action to reduce | |
the discrepancy between their intentions and the resulting effects” (Nicol & Macfarlane-Dick, | |
2006, p. 208). For feedback to be integrated effectively, it needs to be timely, specific, and | |
accessible to encourage the individual to apply what they learned in future teaching opportunities | |
(Van Rooij et al., 2019). This is correlational to self-efficacy. | |
Feedback can also be a significant source of self-efficacy in pre-service teachers (Mulholland & | |
Wallace, 2001; Mahmood et al., 2021; Schunk & Pajares, 2009). Though feedback can come in a | |
variety of formats, Rots et al. (2007) found that quality feedback and supervision provided by | |
university faculty correlated to higher levels of self-efficacy in pre-service teachers. Efficacy | |
increases when university faculty use prompts to encourage PSTs to focus on what went well and | |
build upon the strengths of the lesson (Nicol & Macfarlane-Dick, 2006). Timing, purpose, and | |
delivery play an important role in how faculty use feedback practices with pre-service teachers. | |
In many current teacher preparation program models, PSTs spend more time working in the field | |
than they do in coursework (National Council for Accreditation of Teacher Education [NCATE], | |
2010). With such an emphasis placed on practicum experiences (American Association of | |
Colleges of Teacher Education [AACTE], 2018; Lester & Lucero, 2017) and the critical role these | |
play in the development of pre-service teachers, one must consider if current feedback practices | |
and structures positively contribute to higher levels of teacher efficacy and classroom readiness. | |
The role of university faculty is to acknowledge and clearly articulate the strengths and | |
weaknesses of the lesson to promote productive behaviors that will positively contribute to student | |
learning (Fletcher, 2000). This gap in the research does not include preservice teacher | |
perceptions. Therefore, it is imperative to consider the perception of pre-service teachers regarding | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
2 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
their experiences with feedback, how these experiences align with high quality feedback practices, | |
and how they are designed for students who experience them (Smith and Lowe, 2021). | |
This qualitative study examines first year teachers’ previous experiences with feedback during | |
their undergraduate practicums. The study is expected to contribute to a deeper understanding of | |
what feedback practices pre-service teachers determine as beneficial and their interpretation of the | |
context, in addition to what action steps or modifications teacher preparation programs can take to | |
maximize feedback practices within practicum experiences. | |
The Purpose of Feedback | |
Feedback has often functioned as a punisher or reinforcer, a guide or rule, or served as a | |
discriminating or motivating stimulus for individuals (Mangiapanello & Hemmes, 2015). | |
Historically feedback has been a one-way transmission of information (Ajjawi & Boud, 2017), but | |
contemporary views on feedback recognize it as a reciprocal exchange between individuals | |
focused on knowledge building versus the arbitrary delivery of information (Archer 2010). | |
Daniels & Bailey (2014) defined performance feedback as, “information about performance that | |
allows a person to change his/her behavior” (p. 157). Studies show organizations that establish | |
strong feedback environments exhibit better outcomes in terms of employee performance | |
(Steelman et al., 2004). Constructive feedback in the presence of a well built feedback hierarchy, | |
builds intrinsic motivation of employees (Cusella, 2017; The Employers Edge, 2018). With that | |
explanation, appropriate and meaningful feedback are essential in ensuring that good practices are | |
rewarded, ineffective practices corrected and pathways to improvement and success identified | |
(Cleary & Walter, 2010). | |
3 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
A key purpose of feedback in teacher preparation programs is to enhance pre-service teachers’ | |
knowledge and skills (AACTE, 2018). Feedback serves as one component within complex | |
structures and interactions to support PSTs’ development (Evans, 2013). Through feedback, PSTs | |
realize their strengths and weaknesses, gain understanding of instructional methods, and develop a | |
repertoire of strategies to enhance their performance and student learning (Nicole & MacfarlaneDick, 2006). With this knowledge and understanding, PSTs have opportunities to act upon the | |
received feedback to improve their performance and enhance student learning (Carless et al., | |
2011). Feedback allows PSTs to define effective teaching practices and determine what | |
instructional methods are valued in specific learning environments. | |
Feedback is also meant to stimulate PST’s self-reflection. Feedback allows the pre-service teacher | |
to deconstruct and reconstruct instructional methods and practices with guidance from university | |
faculty. Specific feedback and reflective dialogue contribute to the pre-service teacher’s ability to | |
critically reflect on their performance individually and use this understanding and knowledge to | |
regulate future teaching experiences (Tulgar, 2019). These reflective opportunities to identify | |
strengths and weaknesses create pathways to improvement. | |
Feedback can also serve as a way for university faculty to monitor, evaluate and track pre-service | |
teacher’s progress and performance (Price et al., 2010). Many teacher preparation programs use | |
feedback as a measure in evaluating PST performance during practicums or other field-based | |
components. This feedback, often documented through rubrics or other assessment criteria, is | |
useful in helping establish measurable goals and effective teaching practices across a teacher | |
preparation program. When the feedback or assessment tools reflect the objectives and goals of the | |
program, they can strengthen the connection between theory and practice, thereby increasing PST | |
learning (Ericsson, 2002; Grossman, et al., 2008; Vasquez, 2004). PSTs rely on experienced | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
4 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
individuals such as university faculty to articulate, model and provide high quality feedback | |
through practicums (Darling-Hammond & MacDonald, 2000). This guidance increases | |
connections between coursework and the classroom. | |
With research suggesting that pre-service teachers welcome constructive feedback and the | |
opportunity to learn (Chaffin & Manfredo, 2009; Chesley & Jordan, 2012), university faculty must | |
seek collaborative opportunities to provide effective feedback that positively contributes to the | |
development of PSTs. A major role of university faculty is to guide the PST in setting goals for | |
practicum that foster their development and growth as an educator. When university faculty | |
clearly articulate the strengths and weaknesses of the lesson and assist the PST in identifying their | |
next actions, outcomes can be achieved faster. | |
Components of Effective Feedback | |
Effective feedback provides the learner with a clear understanding of how the task is being | |
accomplished or performed and offers support and direction in increasing their efforts to achieve | |
the desired outcome (Hattie and Timperley, 2007). This model reinforces the need for feedback to | |
be timely, content specific ,and delivered to meet the needs of the individual receiving it. | |
Timing | |
The timing of feedback plays an essential role in shaping PSTs understanding of effective teaching | |
practices and effective instructional methods. Feedback can be provided to PSTs in a variety of | |
structures and formats. Deferred feedback refers to notes or qualitative data collected when | |
observing shared upon completion of the lesson with the teacher (Scheeler et al., 2009). Deferred | |
5 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
feedback is less intrusive because it allows the teacher to deliver the lesson without disruption. | |
Immediate feedback refers to when university faculty stop the lesson or instructional activity being | |
observed to provide corrective feedback and/or modeling when a problem is noted (Scheeler et al., | |
2009). Scheeler et al. (2004) found “targeted teaching behaviors were acquired faster and more | |
efficiently when feedback was immediate” (p. 403). Immediate feedback also reduced the | |
likelihood of teachers continuing ineffective teaching practices. | |
Explicit, Quality Feedback | |
Corrective feedback that identifies errors and ineffective teaching methods with targeted ways to | |
correct them is one of the most influential means of feedback (Chan et al., 2014; Van Houten, | |
1980). Studies found that desired teacher behaviors resulted from feedback that was both positive | |
and corrective, focused on specific teaching behaviors and practices, and provided concise | |
suggestions for change (Scheeler et al., 2004; Woolfolk, 1993). Feedback that is individualized | |
and centered on the needs of the individual yields more effective outcomes for learning (Ciman & | |
Cakmak, 2020; Pinger et al., 2018). When this aligns to the goals and objectives of the specific | |
lesson, it provides valuable insight as to where the PST is in relation to the goal (Bloomberg & | |
Pitchford, 2017). This type of feedback increases self-efficacy as it allows the PST to see growth | |
over time. | |
Delivery | |
The delivery of observational feedback may vary depending on the development and readiness of | |
the PST. Although the goal is for teachers to engage in self-directed reflection, some teachers may | |
need more support and guidance as they maneuver through the dimensions and complexities of | |
teaching. A variety of differentiated coaching strategies have been researched over the years | |
regarding instructional practice and student learning (Aguilar, 2013; Costa & Garmston, 2002; | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
6 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Knight, 2016; Sweeney, 2010). These include both conversational and written feedback between | |
the PST and university faculty. | |
The New Teacher Center (2017) outlines three differentiated dialogic coaching approaches; | |
instructive, collaborative, and facilitative. Instructive coaching is directive and guided by the | |
university faculty who analyze performance and lead conversations. Collaborative coaching is less | |
directive and both the PST and university faculty have an equal voice in the conversation. | |
Facilitative coaching allows the teacher to lead the reflective conversation, while university | |
faculty provides feedback with probing questions to facilitate critical thinking and problem | |
solving. These conversations contain minimal feedback from university faculty and topics for | |
discussion are often directed by the teacher. | |
While oral feedback is a powerful tool in constructing relationships between the PST and | |
university faculty, written feedback is just as important as it provides pre-service teachers with | |
formal documentation of clearly articulated strengths and weaknesses. Written comments are far | |
more effective than a grade or evaluation (Black & Wiliam, 1998; Crooks, 1988) and provide both | |
the university faculty and the PST with a record of performance in response to learning needs | |
(Flushman et al., 2019). Conversation and dialogue include the thoughts and beliefs of the PST | |
and provide faculty an opportunity to gauge their depth of understanding. Written support | |
provides documentation and a reference for PSTs. | |
Methodology | |
7 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
This study looks to uncover how university faculty can effectively integrate high quality feedback | |
practices into practicum experiences. Specifically, what can be learned from PSTs’ perceptions of | |
feedback practices utilized in teacher preparation programs? What modifications or adaptations | |
can be made to current feedback practices and structures in teacher preparation programs to | |
enhance teacher efficacy and classroom readiness? In the context of this study, not only were | |
PSTs’ experiences with feedback considered, but also how these experiences and perceptions align | |
with high quality feedback practices. | |
Design and Participants | |
Researchers used semi-structured interviews to provide a comparison of qualitative data and an | |
opportunity for open ended questioning (Yin, 2016). The 30-minute interviews were recorded and | |
transcribed for analysis in Fall 2020. Participation was voluntary and researchers used purposeful | |
sampling (Yin, 2016) from a pool of participants in their first year of teaching. Researchers | |
selected beginning teachers because they are most relative to the practicum experiences since they | |
are recent graduates. Additionally, all participants experienced the same interruptions in teaching | |
during March 2020. Researchers sought a range of participant perspectives; therefore, the study | |
consisted of 31 beginning teachers who spanned seven school districts and 24 schools within a | |
midwestern metropolitan environment. All teachers held a bachelor’s degree and teaching | |
certification from a 4-year university or college. Representation included two private institutions | |
and three public institutions. All participants were female apart from one male. Grade levels | |
spanned preschool through eighth grade with five special education perspectives spanning grades | |
preschool through sixth grade. The school districts are in one state and serve approximately onethird of their state’s total student population (over 100,000 students). Demographic information is | |
presented in Table 1. | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
8 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Table 1 | |
Characteristics of Participants | |
Teaching Endorsement | |
Teachers | |
N = 31 teachers | |
PreK-K | |
5 | |
First - Third | |
10 | |
Fourth - Sixth | |
8 | |
Middle School | |
3 | |
Special Education | |
5 | |
Teaching Environment | |
District Representation | |
N = 7 districts, 24 schools | |
Suburban | |
51% | |
Rural | |
6% | |
Urban | |
42% | |
Data Collection & Analysis | |
Questions asked during the interviews addressed previous experiences with feedback during | |
practicums. Application was also addressed in reference to how it influenced teaching behaviors | |
and actions. More than one researcher took part in the collection, analysis, and interpretation of the | |
data. Both researchers were involved in the preparation of the questions and in the data analysis. | |
Using descriptive analysis to interpret the data obtained from the semi structured interviews, | |
researchers identified themes using the following process to construct theory: 1) review of the | |
transcribed interviews, 2) open coding, 3) identification of categories and/or themes, and 4) data | |
abstraction (Lawrence & Tar, 2013). Since researcher one conducted the interviews, researcher | |
two reviewed all the transcripts to familiarize themself with the content. Next, open coding | |
9 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
determined themes in participant answers. Patterns in the data showed consistency in ideas | |
(Eisenhardt, 1989; Orlikowski, 1993) and researchers identified overall themes amongst the | |
answers. Once established, researchers coded the remaining transcripts independently. Since | |
coding semi structured interviews involves determining the intent or meaning behind questions | |
answered, researchers also addressed intercoder reliability and agreement (Campbell et. al., 2013). | |
Both noted the same themes with only 20% discrepancy or 80% agreement. Using negotiated | |
agreement, researchers adjudicated the coding disagreements through negotiation for concordance. | |
After reconciling the initial disagreements, researchers coded the transcripts using the identified | |
themes. Inter-rater reliability was 97%. | |
Results | |
Results indicated three themes. All stemmed from participant perspectives of beneficial practices | |
and what they found value within or wanted more of during their PST experiences. Out of 31 | |
participants, 29 were coded with at least one of the three themes. Participants who mentioned | |
more than one theme were counted as part of each theme mentioned; 11 of the 31 mentioned more | |
than one identified theme. See Table 2. | |
Table 2 | |
Themes found in the feedback | |
Beneficial Practice | |
Frequency and structure of the feedback | |
Percent (n = 44) | |
40% | |
Example Comment: This respondent reflected on the difference between a few visits and multiple. “Let me come | |
observe you and give you tips here and there” as compared to someone providing feedback multiple times a | |
week. | |
The need for explicit and quality feedback | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
30% | |
10 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Example Comment: This respondent reflected on how grace and time are not always the most beneficial. My | |
institution “just gave a lot of grace and comfort and even during student teaching … I really enjoy getting told | |
what I can improve on because there’s always room for improvement and I like the different ideas.” | |
The need for conversation linked to feedback | |
30% | |
Example Comment: The respondent believed that “conversations more focused on do you think the students | |
understood the concept? How do you feel that it went?” would help PSTs engage in daily reflective practice and | |
goal setting. | |
Timing | |
Frequency was the most cited need at 40% and noted by 55% (n = 17) of respondents. | |
Overwhelmingly, participants referred to the feedback received as pre-service teachers as | |
“minimal”. Other phrases included “too spaced out”, “lumped together at the end” and “few”. | |
Multiple participants mentioned having only been provided feedback following an observation | |
only once or twice. Even when the feedback provided the next steps towards improvement, | |
participants still felt it was too late. “It’s like … now I can’t implement that until next semester” or | |
“Here’s the feedback. Remember when you get a job.” Participants felt the timing of the feedback | |
negatively affected the implementation. They wanted more consistency with small tips in real time | |
throughout the experience. | |
Explicit, Quality Feedback | |
A need for explicit and quality feedback was cited next at 30% and noted by 42% (n = 13) of | |
respondents. “I always like it straight forward. I want all of the feedback that I can get because I | |
feel like that's going to help me grow”. Another noted that they wanted specific feedback on areas | |
to improve instead of “a lot of grace and comfort.” They additionally noted building confidence | |
without the skills to back it, does not lead to improvement. Another commented that university | |
faculty was “really really nice but the feedback was all positive like she was kind of scared to give | |
constructive feedback.” One commented how she thought the feedback would provide her things | |
11 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
to work on, but instead the feedback was “you’re doing what you’re supposed to be doing.” | |
Participants wanted feedback to provide more direction and insight to enhance instructional | |
performance. Feedback only highlighting the positive aspects or acknowledging “no room for | |
growth” was not useful or beneficial. One respondent noted, I “hardly ever sat down to discuss | |
how I was doing. It was more in passing that the feedback took place.” This led to the third theme. | |
Delivery | |
A need for conversation linked to feedback was cited next at 30% and noted by 42% (n = 13) of | |
responders. Tied to this conversation was the need for explicit feedback mentioned above. | |
Participants struggled with the broad categories on rubrics which highlight multiple behaviors. “I | |
feel like not all rubric feedback is accurate”. This led some to request more specific targets. They | |
felt this could be reached through reflective conversations. One noted the importance of the | |
conversation when helping PSTs reflect on practice and setting goals. The respondent believed | |
that “conversations more focused on do you think the students understood the concept? How do | |
you feel that it went?” would help PSTs engage in daily reflective practice and goal setting. Others | |
noted how conversations allowed for “collaboration and brainstorming” and how conversations | |
better support the reflection process. Dialogue can be beneficial in the moment and authentic, | |
although it was noted that written conversation and feedback can be just as powerful when open | |
ended and used as a communication tool. | |
Participants noted the importance of written feedback as it provided opportunities to reflect and | |
respond. Also, it gave participants insight and context as to what was happening while they were | |
teaching. “I don't realize everything good that I'm doing or what I need to improve on. So, when | |
university faculty take notes, it really helps me see what I'm actually doing.” Another talked about | |
university faculty keeping a notebook. The two used it as a communication tool for written | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
12 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
conversations which the participant “thought was really helpful because … I can look back and | |
see what she wrote, and I feel like it was a little more immediate.” | |
The results indicated that PSTs believe that timely and explicit feedback are beneficial in both | |
goal setting and enhancing their instructional performance. Results also indicated that PSTs find | |
both dialogue and written feedback to be useful reflective tools. As teacher preparation programs | |
consider feedback structures and the levels of support, these are important implications to consider | |
when creating meaningful practicum experiences. | |
Discussion | |
Reflection is an expectation in teacher preparation (Brookfield, 1995; Darling-Hammond, 2006; | |
Liu, 2013). The link between reflection and learning is not new (Dewey 1933; Schön, 1983; | |
Ziechner, 1996) as studies highlight that reflection involves emotions and is a context-dependent | |
process impacted by social constructs. PSTs are expected to recognize when adjustments are | |
needed and make them to effectively meet the needs of the students they serve. A cycle of | |
observation, action, and reflection can help PSTs adjust their teaching. This is most effective when | |
the cycle is individualized, collaborative, and embeds frequent opportunities to make meaning of | |
the information for future use (Vartuli, et al., 2014). Current feedback loops and structures can | |
inhibit PSTs' ability to make meaning from the information and move their learning and | |
instruction forward. As teacher preparation programs work to establish more dialogic approaches | |
to feedback that provide PSTs with multiple opportunities to reflect individually and | |
collaboratively with university faculty, timing, purpose, and delivery are necessary components to | |
consider. See Figure 1. | |
13 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Figure 1 | |
Feedback Structure for Pre-service Teachers | |
What is the timing of the delivery? | |
When considering the results, frequency plays a large role in how PSTs view and utilize feedback. | |
It was clear that PSTs desire more frequent, immediate feedback to enhance their instructional | |
performance. Immediate feedback results in quicker acquisition of effective teacher behaviors and | |
greater overall accuracy in the implementation of those behaviors than when delayed feedback is | |
provided (Coulter & Grossen, 1997; O’Reilly et al., 1992; 1994). Though some question if | |
immediate feedback might interfere with the learning environment and reduce instructional | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
14 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
momentum, advancements in technology make the ability to provide immediate feedback both | |
manageable and efficient for both university faculty and pre-service teachers. Devices such as the | |
“bug in the ear” (BIE) have been used to provide immediate feedback in a variety of situations. | |
Results from various studies show these technologies effectively supported university faculty in | |
providing concise, immediate feedback to pre-service teachers to increase their ability to respond | |
to the various needs of students and alter or stop ineffective practices in the moment (Coulter & | |
Grossen, 1997; Scheeler et al., 2009). As teacher preparation programs consider how to increase | |
efforts for university faculty to provide specific, immediate feedback, technical devices have great | |
potential to increase desired teaching behaviors and students’ academic performance. | |
Do receivers of the feedback understand the purpose? | |
Pre-service teachers request explicit, quality feedback, but there is a clear disconnect between this | |
concept and the PSTs perceptions of the purpose of the feedback provided. The ties to evaluation | |
and the need for directive solutions will not change, so how can mindsets shift to better understand | |
the purpose? One way to do this is through strengthening PSTs’ assessment feedback literacy. | |
PSTs need opportunities and a repertoire of skills to engage with feedback in authentic ways, | |
make sense of the information provided, and determine how the information can be productively | |
implemented in future lessons (Carless & Boud, 2018; Price et al., 2010; Smith and Lowe, 2021). | |
Feedback literacy can strengthen reflective capacity as students have more opportunities to | |
engage, interact with, and make judgments about their own practice (Carless & Boud, 2018; | |
Sambell, 2011; Smith and Lowe, 2021). To close the feedback loop, PSTs must acquire the ability | |
to process the comments and information received and then act upon the feedback for future | |
instruction. Students must learn to appreciate feedback and their role in the process, develop and | |
refine their ability to make judgements, and develop habits that strive for continuous improvement | |
(Boud & Molloy, 2013). Designing a program curriculum that emphasizes the importance of the | |
15 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
feedback process and creates opportunities for pre-service teachers to self-evaluate their practice is | |
crucial in building capacity for them to make sound judgments. Equally as important is creating | |
space for pre-service teachers to co-construct meaning of the feedback and demonstrate how they | |
use the information to inform or enhance future instruction (Carless & Boud, 2018; O’Donovan et | |
al., 2016). Building programs grounded in feedback literacy provide opportunities to critically | |
reflect on choices and draw clear connections between feedback and its purpose. | |
Does the delivery clarify the content to support reflection? | |
Another consideration worth noting is the need for feedback that prompts both reflection and | |
growth of pre-service teachers. Participants in this study indicated that feedback from university | |
faculty was not always useful because it could not be applied immediately. They also noted the | |
feedback provided did not always prompt reflection that resulted in changes or modifications to | |
their future instructional practices or teaching methods. While this discrepancy could be attributed | |
to the readiness level of the pre-service teacher, it could also be that the feedback loops and | |
structures designed do not create informative pathways that move students learning forward. | |
As university faculty continue to explore how to provide explicit feedback, delivery methods that | |
support reflection and pre-service teacher’s growth are important to consider. With the purpose of | |
feedback being to help reduce the discrepancy between the intended goal and outcome, pre-service | |
teachers must have easy access and retrieval of feedback. While we know that reflective coaching | |
conversations are beneficial in helping pre-service teachers reflect on their teaching practices and | |
to determine alternate methods of instruction that may be more effective, time and availability of | |
university faculty may limit these meaningful interactions from taking place. To overcome this | |
barrier, teacher preparation programs should consider how they might couple traditional forms of | |
written feedback and reflective conversations with digital tools that facilitate collaborative | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
16 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
discussion and grant easier access to feedback allowing pre-service teachers space and opportunity | |
to engage in both collaborative and independent reflection and problem solving. Providing preservice teachers with multiple sources of feedback can be a way to increase the visibility of | |
feedback for pre-service teachers and encourage them to consistently revisit the information to | |
make future instructional decisions and professional judgments. | |
Implications | |
Current literature highlights the gap between providing feedback and the receiver’s interpretation | |
(O’Connor & McCurtin, 2021). This gap creates growth limitations when the learner is not | |
gaining what is needed from the feedback. This is especially important in higher education as | |
institutions develop students for professional careers which require lifelong learning, critical | |
thinking and problem solving, such as education. Therefore, we propose the following framework | |
and action steps to support the understanding of and implementation of feedback for PSTs. We | |
also assert that this framework could span multiple disciplines and professional contexts. | |
17 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Figure 2 | |
Framework to Support Pre-service Teacher Capacity Building for Feedback | |
Limitations and Implications for Future Research | |
Although the results of this study provide insight into PSTs feedback experiences, they must be | |
interpreted within the limitations of the study. The first limitation is that all participants in this | |
study only represent 5 universities across 3 states. We recognize that this limitation in our sample | |
does not represent the scope of teacher preparation programs across the country but believe that | |
the results provide worthwhile insights into PSTs experiences with feedback in practicum | |
experiences. Future studies including participants across numerous states and teacher preparation | |
programs would allow for more diverse experiences and perspectives to be represented. | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
18 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Another limitation in this study is that all participants experienced disruptions in their | |
undergraduate practicum experiences. These disruptions likely resulted in condensed or altered | |
experiences which could have impacted the opportunities and quality of feedback provided by | |
university faculty. Future studies that include participants whose experiences consist of traditional | |
structures and timelines of practicum experiences may better reflect the experiences of PSTs' | |
experiences with feedback and practices used by university faculty. | |
Conclusion | |
Teacher preparation institutions need to reevaluate current feedback practices with PSTs. | |
Participants indicated that more frequent conversations would make guidance more explicit and | |
support development of practice and reflection. Although this is based on a limited number of | |
participants and in one country, the findings are generalizable in most countries. The concept of | |
feedback literacy needs to be taught, modeled, and PSTs need to be practicing it throughout their | |
course of study for them to better understand the connection between feedback and practice. By | |
focusing on timing, delivery, and purpose, teacher preparation institutions can take one step closer | |
to developing reflective practitioners who embody the knowledge and skills to positively impact | |
learning for every student. | |
19 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
References | |
Aguilar, E. (2013). The art of coaching: Effective strategies for school transformation. Wiley. | |
Ajjawi, R., & Boud, D. J. (2017). Researching feedback dialogue: an interactional analysis | |
approach. Assessment and Evaluation in Higher Education, 42(2), 252–265. | |
https://doi.org/10.1080/02602938.2015.1102863 | |
American Association of Colleges of Teacher Education [AACTE] Clinical Practice Commission | |
(2018). A pivot toward clinical practice, its lexicon, and the renewal of teacher | |
preparation. Retrieved from https://aacte.org/resources/ clinical-practicecommission#related-resources | |
Archer, J. C. (2010). State of the science in health professional education: Effective feedback. | |
Medical Education, 44(1), 101–108. https://doi.org/10.1111/j.1365-2923.2009.03546.x. | |
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: | |
Principles, Policy & Practice, 5(1), 7-74. https://doi.org/10.1080/0969595980050102 | |
Bloomberg, P., & Pitchford, B. (2017). Leading impact teams: Building a culture of efficacy. | |
Corwin. | |
Boud, D., & Molloy, E. (Eds.). (2013). Feedback in higher and professional education: | |
understanding it and doing it well. Routledge. | |
Brookfield, S. D. (1995). Becoming a critical reflective teacher. Jossey-Bass Publishers. | |
Campbell, J. L, Quincy, C., Osserman, J., & Pedersen, O. K. (2013). Coding in-depth semi | |
structured interviews: Problems of unitization and intercoder reliability and agreement. | |
Sociological Methods & Research, 42(3), 294-320. | |
https://doi.org10.1177/0049124113500475 | |
Carless, D., Salter, D., Yang, M., & Lam, J. (2010). Developing sustainable feedback practices. | |
Studies in Higher Education, 36(4), 395-407. | |
https://doi.org/10.1080/03075071003642449 | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
20 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Carless, D., & Boud, D. (2018). The development of student feedback literacy: enabling uptake of | |
feedback. Assessment & Evaluation in Higher Education, 43(8), 1315-1325. | |
https://doi.org/10.1080/02602938.2018.1463354 | |
Chaffin C., & Manfredo J. (2009). Perceptions of preservice teachers regarding feedback and | |
guided reflection in an instrumental early field experience. Journal of Music Teacher | |
Education, 19(2), 57-72. https://doi.org/10.1177/1057083709354161 | |
Chan, P. E., Konrad, M., Gonzalez, V., Peters, M. T., & Ressa, V. A. (2014). The critical role of | |
feedback in formative instructional practices. Intervention in School and Clinic, 50(2), | |
96-104. https://doi.org/10.1177/1053451214536044 | |
Chesley, G. M., & Jordan, J. (2012). What’s missing from teacher prep. Educational Leadership, | |
69(8), 41-45. | |
Cheng, M. M., Tang, S. Y., & Cheng, A. Y. (2012). Practicalising theoretical knowledge in | |
student teachers' professional learning in initial teacher education. Teaching and Teacher | |
Education, 28(6), 781-790. https://doi.org/10.1016/j.tate.2012.02.008 | |
Cimen, O., & Cakmak, M. (2020). The effect of feedback on preservice teachers’ motivation and | |
reflective thinking. Elementary Education Online, 19(2), 932943. https://doi.org/10.17051/ilkonline.2020.695828 | |
Cleary, M. L., & Walter, G. (2010). Giving feedback to learners in clinical and academic settings: | |
Practical considerations. The Journal of Continuing Education in Nursing, 41(4), 153154. https://doi.org/10.3928/00220124-20100326-10 | |
Costa, A. L., & Garmston, R. (2002). Cognitive coaching: A foundation for renaissance schools. | |
Christopher-Gordon Publishers. | |
Coulter, G. A., & Grossen, B. (1997). The effectiveness of in-class instructive feedback versus | |
after-class instructive feedback for teachers learning direct instruction teaching behaviors. | |
Effective School Practices, 16(4), 21–35. | |
21 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Crooks, T. J. (1988). The impact of classroom evaluation practices on students. Review of | |
Educational Research, 58(4), 438-481. https://doi.org/10.3102/00346543058004438 | |
Cusella, L., (2017). The effects of feedback on intrinsic motivation: A propositional | |
extension of cognitive evaluation theory from an organizational communication | |
perspective. Annals of the International Communication Association, 4(1), 367-387. | |
https://doi.org/10.1080/23808985.1980.11923812 | |
Daniels, A. C., & Bailey, J. S. (2014). Performance management: Changing behavior that drives | |
organizational effectiveness (5th ed.). Atlanta, GA: Performance Management | |
Publications. | |
Darling-Hammond, L. (2012). Powerful teacher education: Lessons from exemplary programs. | |
John Wiley & Sons. | |
Darling-Hammond, L. (2006). Powerful teacher education. San Francisco: Jossey-Bass. | |
Darling-Hammond, L., & MacDonald, M. (2000). Where there is learning there is hope: The | |
preparation of teachers at the Bank Street College of Education. In L. Darling-Hammond | |
(Ed.), Studies of excellence in teacher education: Preparation at the graduate level (1-95). | |
American Association of Colleges for Teacher Education. | |
Dewey, J. (1933). How we think: A restatement of the relation of reflective thinking to | |
the educative process. Henry Regnery. | |
Eisenhardt, K. M., (1989). Building theories from case study research. Academy of Management | |
Review, 14(4), 532-550. www.jstor.org/stable/258557 | |
Ericsson, K. A. (2002). Attaining excellence through deliberate practice: Insights from the study | |
of expert performance. In M. Ferrari (Ed.), The pursuit of excellence in education (pp. | |
21-55). Erlbaum. | |
Evans, C. (2013). Making sense of assessment feedback in higher education. Review of | |
Educational Research, 83(1), 70-120. https://doi.org/10.3102/0034654312474350 | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
22 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Fletcher, S. (2000). Mentoring in schools: A handbook of good practice. Kogan Page. | |
Flushman, T., Guise, M., & Hegg, S. (2019). Improving supervisor written feedback: Exploring | |
the what and why of feedback provided to pre-service teachers. Issues in Teacher | |
Education, 28(2), 46–66. | |
Grossman, P., Hammerness, K., & McDonald, M. (2008). Redefining teaching, re-imagining | |
teacher education. Teachers and Teaching: Theory and Practice, 15(2), 273-289. | |
https://doi.org/10.1080/13540600902875340 | |
Hammerness, K., Darling-Hammond, L., Bransford, J., Berliner, D., Cochran-Smith, M., | |
McDonald, M., & Zeichner, K. (2005). How teachers learn and develop. In L. DarlingHammond & J. Bransford (Eds.), Preparing teachers for a changing world: What | |
teachers should learn and be able to do (358-389). Jossey-Bass. | |
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77 | |
(1), 81-112. https://doi.org/10.3102/003465430298487 | |
Hudson, P. (2014). Feedback consistencies and inconsistencies: Eight mentors’ observations on | |
one preservice teacher’s lesson. European Journal of Teacher Education, 37(1), 63–73. | |
https://doi.org/10.1080/02619768.2013.801075 | |
Killion, J. (2015). Attributes of an effective feedback process. In: The feedback process: | |
Transforming feedback for professional learning. Oxford, Ohio: Learning Forward. | |
Knight, J. (2016). Better conversations: Coaching ourselves and each other to be more credible, | |
caring, and connected. Corwin. | |
Lawrence, J., & Tar, U. (2013). The use of grounded theory technique as a practical tool for | |
qualitative data collection and analysis. The Electronic Journal of Business Research | |
Methods, 11(1), 29-40. | |
Liakopoulou, M. (2012). The role of field experience in the preparation of reflective teachers. | |
23 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Australian Journal of Teacher Education, 37(6), 42-54. | |
https://doi.org/10.14221/ajte.2012v37n6.4 | |
Liu, K. (2013). Critical reflection as a framework for transformative learning in teacher | |
education. Educational Review, 67(2), 135-157. | |
https://doi.org/10.1080/00131911.2013.839546 | |
Lester, A., & Lucero, R. (2017). Clinical practice commission shares proclamations, tenets at | |
AACTE forum. Ed Prep Matters. http://edprepmatters.net/2017/04/clinical-practicecommission-shares-proclamations-tenets-at-aacte-forum/ | |
Mahmood, S., Mohamed, O., Mustafa, S. M. B. S., & Noor, Z. M. (2021). The influence of | |
demographic factors on teacher-written feedback self-efficacy in Malaysian secondary | |
school teachers. Journal of Language and Linguistic Studies, 17(4). | |
Mangiapanello, K., & Hemmes, N. (2015). An analysis of feedback from a behavior analytic | |
perspective. The Behavior Analyst, 38(1), 51–75. doi:10.1007/s40614-014-0026-x. | |
McGlamery, S., & Harrington, J. (2007). Developing reflective practitioners: The importance of | |
field experience. The Delta Kappa Gamma Bulletin, 73(3), 33-45. | |
Mulholland, J., & Wallace, J. (2001). Teacher induction and elementary science teaching: | |
Enhancing self-efficacy. Teaching and Teacher Education, 17(2), 243–261. | |
https://doi.org/10.1016/s0742-051x(00)00054-8 | |
National Council for Accreditation of Teacher Education. (2010). Transforming | |
teacher education through clinical practice: A national strategy to prepare effective | |
teachers. Retrieved from | |
http://www.ncate.org/LinkClick.aspx?fileticket=zzeiB1OoqPk%3d&tabid=715 | |
New Teacher Center (2017). Instructional Mentoring. Retrieved from | |
https://newteachercenter.org/. | |
Nicol, D. J,. & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
24 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
model and seven principles of good feedback practice. Studies in Higher Education, | |
31(2), 199-218. https://doi.org/10.1080/03075070600572090 | |
O’Connor, A., McCurtin, A. A feedback journey: employing a constructivist approach to the | |
development of feedback literacy among health professional learners. BMC Med Educ | |
21, 486 (2021). https://doi.org/10.1186/s12909-021-02914-2 | |
O’Donovan, B., Rust, C., & Price, M. (2016). A scholarly approach to solving the feedback | |
dilemma in practice. Assessment & Evaluation in Higher Education, 41(6), 938-949. | |
https://doi.org/10.1080/02602938.2015.1052774 | |
O'Reilly, M. F., Renzaglia, A., & Lee, S. (1994). An analysis of acquisition, generalization and | |
maintenance of systematic instruction competencies by preservice teachers using | |
behavioral supervision techniques. Education and Training in mental Retardation and | |
Developmental disabilities, 29(1), 22-33. https://www.jstor.org/stable/23879183 | |
O'Reilly, M. F., Renzaglia, A., Hutchins, M., Koterba-Buss, L., Clayton, M., Halle, J. W., & Izen, | |
C. (1992). Teaching systematic instruction competencies to special education student | |
teachers: An applied behavioral supervision model. Journal of the Association for | |
Persons with Severe Handicaps, 17(2), 104-111. | |
https://doi.org/10.1177/154079699201700205 | |
Orlikowski, W. J. (1993). CASE tools as organizational change: Investigating incremental and | |
radical changes in systems development. MIS Quarterly, 17(3), 309-340. | |
https://doi.org/10.2307/249774 | |
Rots, I., Aelterman, A., Vlerick, P., & Vermeulen, K. (2007). Teacher education, graduates’ | |
teaching commitment and entrance into the teaching profession. Teaching and Teacher | |
Education, 23(5), 543–556. https://doi.org/10.1016/j.tate.2007.01.012 | |
Pena, C., & Almaguer, I. (2007). Asking the right questions: online mentoring of student teachers. | |
International Journal of Instructions Media, 34(1), 105-113. | |
25 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
Pinger, P., Rakoczy, K., Besser, M. & Klieme, E. (2016). Implementation of formative assessment | |
– effects of quality of programme delivery on students’ mathematics achievement and | |
interest. Assessment in Education: Principles, Policy & Practice, 25(2), 160-182. | |
https://doi.org/10.1080/0969594x.2016.1170665 | |
Price, M., Handley, K., Millar, J. & O’Donovan, B. (2010). Feedback: all that effort, but what is | |
the effect? Assessment & Evaluation in Higher Education, 35(3), 277-289. | |
https://doi.org/10.1080/02602930903541007 | |
Sambell, K. (2011). Rethinking feedback in higher education. ESCalate. | |
Scheeler, M. C., Ruhl, K. L., & McAfee, J. K. (2004). Providing performance feedback to | |
teachers: A review. Teacher Education and Special Education: The Journal of the | |
Teacher Education Division of the Council for Exceptional Children, 27(4), 396-407. | |
Scheeler, M. C., Bruno, K., Grubb, E., & Seavey, T. L. (2009). Generalizing teaching techniques | |
from university to K-12 classrooms: Teaching preservice teachers to use what they learn. | |
Journal of Behavioral Education, 18(3), 189-210. https://doi.org/10.1007/s10864-0099088-3 | |
Schön, D. A. (1983). The reflective practitioner. Basic Books. | |
Schwartz, C., Walkowiak, T. A., Poling, L., Richardson, K., & Polly, D. (2018). The nature of | |
feedback given to elementary student teachers from university supervisors after | |
observations of mathematics lessons. Mathematics Teacher Education & Development, | |
20(1), 62–85. | |
Schunk, D., & Pajares, F. (2009). Self efficacy theory. In Handbook of Motivation at School (pp. | |
35–54). New York: Routledge. | |
Smith, M., & Lowe, C. (2021). DIY assessment feedback: Building engagement, trust and | |
transparency in the feedback process. Journal of University Teaching and Learning | |
Practice, 18(3), 9-14. https://doi.org/10.53761/1.18.3.9 | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
26 | |
Wilcoxen and Lemke: Preservice teachers’ perceptions of feedback: The importance of timing, purpose, and delivery | |
Steelman, L., Levy, P., & Snell, A., (2004). The feedback environment scale: Construct definition, | |
measurement and validation. Educational and Psychological Measurement, 64(1), 165184. | |
Sweeney, D. R. (2010). Student-centered coaching: A guide for K-8 coaches and principals. | |
SAGE Publications. | |
The Employers Edge, (2018). Feedback to boost motivation. Retrieved | |
fromhttp://www.theemployersedge.com/providing-feedback/ | |
Tulgar, A. (2019). Four Shades of Feedback: The Effects of Feedback in Practice Teaching on | |
Self-Reflection and Self-Regulation. Alberta Educational Journal of Research, 65(3), | |
258-277. | |
Van Houten, R. (1980). Learning through feedback. Human Sciences Press. | |
Van Rooij, E.C.M, Fokkens-Bruinsma, M., & Goedhart, M. (2019). Preparing science | |
undergraduates for a teaching career: Sources of their teacher self-efficacy. The Teacher | |
Educator, 54(3), 270-294. https://doi.org/10.1080/08878730.2019.1606374 | |
Vartuli, S., Bolz, C., & Wilson, C. (2014). A learning combination: Coaching with CLASS and | |
the project approach. Early Childhood Research & Practice, 16(1), 1. | |
Vasquez, C. (2004). “Very carefully managed”: Advice and suggestions in post observation | |
meetings. Linguistics and Education, 15(1-2), 33-58. | |
https://doi.org/10.1016/j.linged.2004.10.004 | |
Woolfolk, A. (1993). Educational psychology. Allyn & Bacon. | |
Yang, M., & Carless, D. (2013). The feedback triangle and the enhancement of dialogic feedback | |
processes. Teaching in Higher Education, 18(3), 285–297. | |
Yin, R.K. (2016). Qualitative research from start to finish, Second Edition. The Guilford Press. | |
Zeichner, K. (1996). Teachers as reflective practitioners and the democratization of school reform. | |
In K. Zeichner, S. Melnick, & M. L. Gomez (Eds.), Currents of reform in preservice | |
27 | |
Journal of University Teaching & Learning Practice, Vol. 18 [2021], Iss. 8, Art. 14 | |
teacher education (pp. 199-214). Teachers College Press. | |
https://ro.uow.edu.au/jutlp/vol18/iss8/14 | |
28 | |
The current issue and full text archive of this journal is available on Emerald Insight at: | |
www.emeraldinsight.com/2050-7003.htm | |
JARHE | |
10,4 | |
University faculty’s perceptions | |
and practices of student centered | |
learning in Qatar | |
514 | |
Alignment or gap? | |
Saed Sabah | |
Received 10 November 2017 | |
Revised 27 December 2017 | |
Accepted 11 February 2018 | |
Department of Educational Sciences, College of Education, | |
Qatar University, Doha, Qatar and | |
Hashemite University, Zarqa, Jordan, and | |
Xiangyun Du | |
Department of Educational Sciences, College of Education, | |
Qatar University, Doha, Qatar and | |
UNESCO Center for PBL, Aalborg University, Aalborg, Denmark | |
Abstract | |
Purpose – Although student-centered learning (SCL) has been encouraged for decades in higher education, to | |
what level instructors are practicing SCL strategies remains in question. The purpose of this paper is to investigate | |
a university faculty’s understanding and perceptions of SCL, along with current instructional practices in Qatar. | |
Design/methodology/approach – A mixed-method research design was employed including quantitative | |
data from a survey of faculty reporting their current instructional practices and qualitative data on how these | |
instructors define SCL and perceive their current practices via interviews with 12 instructors. Participants of | |
the study are mainly from science, technology, engineering and mathematics (STEM) field. | |
Findings – Study results show that these instructors have rather inclusive definitions of SCL, which range from | |
lectures to student interactions via problem-based teamwork. However, a gap between the instructors’ perceptions | |
and their actual practices was identified. Although student activities are generally perceived as effective teaching | |
strategies, the interactions observed were mainly in the form of student–content or student-teacher, while | |
student–student interactions were limited. Prevailing assessment methods are summative, while formative | |
assessment is rarely practiced. Faculty attributed this lack of alignment between how SCL could and should | |
be practiced and the reality to external factors, including students’ lack of maturity and motivation due to the | |
Middle Eastern culture, and institutional constraints such as class time and size. | |
Research limitations/implications – The study is limited in a few ways. First regarding methodological | |
justification the data methods chosen in this study were mainly focused on the faculty’s self-reporting. Second | |
the limited number of participants restricts this study’s generalizability because the survey was administered | |
in a volunteer-based manner and the limited number of interview participants makes it difficult to establish | |
clear patterns. Third, researching faculty members raises concerns in the given context wherein extensive | |
faculty assessments are regularly conducted. | |
Practical implications – A list of recommendations is provided here as inspiration for institutional support | |
and faculty development activities. First, faculty need deep understanding of SCL through experiences as learners | |
so that they can become true believers and implementers. Second, autonomy is needed for faculty to adopt | |
appropriate assessment methods that are aligned with their pedagogical objectives and delivery methods. Input | |
on how faculty can adapt instructional innovation to tailor it to the local context is very important for its longterm effectiveness (Hora and Ferrare, 2014). Third, an inclusive approach to faculty evaluation by encouraging | |
faculty from STEM backgrounds to be engaged in research on their instructional practice will not only sustain | |
the practice of innovative pedagogy but will also enrich the research profiles of STEM faculty and their institutes. | |
Journal of Applied Research in | |
Higher Education | |
Vol. 10 No. 4, 2018 | |
pp. 514-533 | |
Emerald Publishing Limited | |
2050-7003 | |
DOI 10.1108/JARHE-11-2017-0144 | |
© Saed Sabah and Xiangyun Du. Published by Emerald Publishing Limited. This article is published | |
under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, | |
translate and create derivative works of this article ( for both commercial and non-commercial | |
purposes), subject to full attribution to the original publication and authors. The full terms of this | |
licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode | |
The authors would like to thank the participants in the study: the authors’ colleagues, who | |
supported this study. | |
Social implications – The faculty’s understanding and perceptions of implementing student-centered | |
approaches were closely linked to their prior experiences—experiencing SCL as a learner may better shape | |
the understanding and guide the practice of SCL as an instructor. | |
Originality/value – SCL is not a new topic; however, the reality of its practice is constrained to certain social | |
and cultural contexts. This study contributes with original and valuable insights into the gap between | |
ideology and reality in implementation of SCL in a Middle Eastern context. | |
Keywords Qatar, Assessment, Student-centered learning, Instructional practices, STEM faculty | |
Paper type Research paper | |
1. Introduction | |
In general, higher education (HE) faces challenges in providing students with reasoning and | |
critical thinking skills, problem formulation and solving skills, collaborative skills and the | |
competencies required to cope with the complexity and uncertainty of modern professions | |
(Henderson et al., 2010; Seymour and Hewitt, 1997; Martin et al., 2007; Smith et al., 2009). HE | |
research often reports that traditional lecture-centered education does not provide | |
satisfactory solutions to these challenges (Du et al., 2013; Smith et al., 2009), thereby failing | |
to facilitate students’ meaningful learning of their subjects (Henderson et al., 2010). In some | |
cases, it has resulted in a deficit of university graduates from certain fields, in particular, | |
science, technology, engineering and mathematics (STEM) fields (Graham et al., 2013; | |
Seymour and Hewitt, 1997; Watkins and Mazur, 2013). A change in instructional practices is | |
believed to be necessary to provide students with the requisite skills and competencies, and | |
could potentially serve as a retention strategy in these particular fields such as STEM | |
(Graham et al., 2013; Seymour and Hewitt, 1997; Watkins and Mazur, 2013). Therefore, it is | |
essential to innovate the pedagogical methods and practices used in these fields (American | |
Association for the Advancement of Science (AAAS), 2013; Henderson et al., 2010). | |
Instructional change has resulted in a variety of pedagogical reform initiatives that have | |
been encouraged in STEM classroom practices, including active learning, inquiry-based | |
learning, collaborative learning in teams, interactive learning, technology-enhanced learning, | |
and peer instruction. A substantial body of literature has reported research results regarding | |
how these innovative instructional strategies affect student learning (Graham et al., 2013; | |
Henderson et al., 2010; Watkins and Mazur, 2013). Despite a worldwide trend in instructional | |
change toward student-centered learning (SCL), to what extent university instructors are | |
implementing or practicing these strategies and how they perceive this change is still in | |
question. The international literature has reported that lecture remains the prevailing | |
instructional practice in STEM classrooms despite the waves of pedagogical innovation | |
encouraged at an institutional level (Hora and Ferrare, 2014; Froyd et al., 2013; Prince and | |
Felder, 2006; Walczyk and Ramsey, 2003). In addition, STEM faculty may discontinue its | |
practice of certain types of instructional innovation at certain stages of innovation diffusion | |
due to various reasons including institutional challenges such as a heavy work load and large | |
class sizes, and the lack of individual interests (Henderson and Dancy, 2009). Furthermore, the | |
fidelity of the implementation of SCL approaches is also in question (Borrego et al., 2013). | |
Therefore, this study aims to investigate how faculty who work as instructors in STEM | |
undergraduate programs report their instructional practices and how they perceive the | |
implementation of SCL instructional strategies in their situated contexts. | |
2. Literature review | |
Over the past few decades, a global movement emerges and calls for a new model of learning | |
for the twenty-first century and several key elements are highlighted including solving | |
complex problems, communication, collaboration, critical thinking, creativity, responsibility, | |
empathy, management, among others (NEA, 2010; Scott, 2015). Following this trend, | |
university teaching and learning has transformed from being lecture based and teacher | |
centered to focusing more on engaging and enhancing student learning (Barr and Tagg, 1995; | |
Student | |
centered | |
learning | |
in Qatar | |
515 | |
JARHE | |
10,4 | |
516 | |
Kolmos et al., 2008; Slavich and Zimbardo, 2012). In the process of this transformation, SCL | |
has become a well-used concept. Defined as an approach that “allows students to shape their | |
own learning paths and places upon them the responsibility to actively participate in making | |
their educational process a meaningful one” (Attard et al., 2010, p. 9), SCL is focused on | |
providing an active-learning environment in flexible curricula with the use of learning | |
outcomes to understand student achievements (pp. 10-12). Rooted in a constructivist approach | |
that moves beyond mere knowledge transmission, such learning is conceived as a process | |
whereby learners search for meaning and generate meaningful knowledge based on prior | |
experiences (Biggs and Tang, 2011; Dewey, 1938). | |
In the STEM fields, instructional practices of instructors are changing from teacherdirected approaches to student-centered approaches to improve the quality of | |
undergraduate education ( Justice et al., 2009). A substantial number of studies have | |
reported the positive effects of a variety of approaches to student-centered pedagogy in | |
STEM HE, such as active learning (Felder et al., 2000; Freeman et al., 2014), small-group | |
learning (Felder et al., 2000; Freeman et al., 2014; Springer et al., 1999; Steinemann, 2003), | |
and inquiry-based pedagogy (Anderson, 2002; Curtis and Ventura-Medina, 2008; Duran | |
and Dökme, 2016; Ketpichainarong et al., 2010; Martin et al., 2007; Simsek and Kabapinar, | |
2010). Furthermore, problem- and project-based pedagogy has been well documented as | |
an effective way to help students not only construct subject knowledge meaningfully, but | |
also develop the skills necessary for many professions, including critical thinking, | |
problem solving, communication, management and collaboration (Bilgin et al., 2015; | |
Du et al., 2013; He et al., 2017; Kolmos et al., 2008; Lehmann et al., 2008; Steinemann, 2003; | |
Zhao et al., 2017). | |
Definitions of these terminologies vary and the term SCL in particular is not always used | |
with consistent meaning. However, a few points of agreement can be summarized (Rogers, | |
2002): who the learners are, the context, the learning activities and the processes. Weimer | |
(2002) identifies five key areas for change in the process of transformation from teachercentered to learner-centered classrooms: the balance of power, the function of content, the | |
role of the teacher, the responsibility for learning, and the purpose and process of evaluation. | |
In relation to the practice and implementation of a student-centered approach, Brook (1999) | |
provides a list of guiding principles for the development of constructivist teachers who | |
prioritize SCL strategies in HE. These are: using problems that are relevant to students, | |
constructing learning around principal concepts, eliciting and appreciating students’ | |
perspectives, adjusting the curriculum and syllabus to address students’ prior experience, | |
and linking assessment to learning goals and student learning. | |
A wide range of perspectives has been addressed in previous studies on SCL in HE. Brook | |
(1999), Rogers (2002) and Weimer (2002) provide a synthesis of guiding principles suggesting | |
three dimensions of focus: instructors (how they understand and perceive the instructional | |
innovation they are expected to adopt), student activity and interaction, and assessment. | |
The instructor represents an important and challenging aspect of instructional change, | |
particularly, regarding innovative pedagogy and SCL (Ejiwale, 2012; Kolmos et al., 2008; | |
Weimer, 2002). In a teacher-centered environment, instructors play the dominant role in | |
defining objectives, content, student activities and assessment. Whereas in an SCL | |
environment, instructors facilitate learning via providing opportunities for students to be | |
involved in decision-making regarding goals, content, activities and assessment. | |
Nevertheless, in the reality of instructional practice, instructors face the dilemma of, on | |
the one hand, giving students the freedom to make decisions on their own, and on the other | |
hand, retaining control of classroom activities (Du and Kirkebæk, 2012). In addition, how | |
instructors handle the changes in their relationships with students is a determining factor in | |
the extent to which SCL can be established. In their meta-analysis of student-teacher | |
relationships in a student-centered environment, Cornelius-White (2007) suggests that | |
positive teaching relationship variables, such as empathy, warmth, encouragement and | |
motivation, are more associated with learner participation, critical thinking, satisfaction, | |
drop-out prevention, positive motivation and social connection. In their proposal for | |
developing pedagogical change strategies in STEM, Henderson et al. (2010) emphasize that | |
the beliefs and behaviors of individual instructors should be targeted because they are | |
essential to any strategy for changing the classroom practices and environment. In general, | |
the existing literature agrees that for pedagogical change strategy development, it is | |
essential to work with the instructors and to understand their current instructional practices | |
as well as their perceptions of the change. | |
A student-centered approach emphasizes providing students with opportunities to | |
participate and engage in activities while interacting with the subject matter, the teacher | |
and each other. Student responsibility and ownership of their own learning is regarded as | |
essential in facilitating classroom interactions. Self-governance of the interactions can be | |
enhanced through collaborative group work when students are expected to negotiate and | |
reach consensus on how to work and learn together. Instead of meeting an objective set by | |
the instructors, students should take responsibility for organizing learning activities in | |
order to reach goals they themselves set (Du et al., 2013; Weimer, 2002). The function of | |
teaching content lies in aiding students in learning how to learn, rather than in the | |
transmission of factual knowledge (Du and Kirkebæk, 2012). | |
Student-centered instructional strategies and practices require a change of assessment | |
methods. Formative assessment, which refers to assessment methods that are intended to | |
generate feedback on learner performance to improve learning, is often used to facilitate selfregulated learning (Nicol and Macfarlane-Dick, 2006). In their review of formative | |
assessment, Black and William (1998) summarize the effectiveness of this method in relation | |
to different types of outcomes, educational levels and disciplines. As they emphasize, the | |
essential aspect that defines the success of formative assessment is the quality of the | |
feedback provided to learners, both formally and informally. Furthermore in formative | |
assessment, the process of learning through feedback and dialogue between teachers and | |
students and among students is highly accentuated. Various formative assessment methods | |
have been reported as additional or alternative methods to the prevailing summative | |
assessment methods in STEM in order to align assessment constructively with the | |
implementation of SCL (Downey et al., 2006; Prince and Felder, 2006). | |
To plan and implement meaningful initiatives for improving undergraduate instruction, | |
it is important to collect data on the instructors’ instructional practices (Williams et al., | |
2015). Nevertheless, the existing literature has mainly focused on students’ attitudes, | |
performance and feedback on SCL. A limited number of studies have examined the | |
outcomes of faculty development activities that encourage research-based instructional | |
strategies for SCL. These studies report a good level of faculty knowledge and awareness of | |
various alternative instructional strategies in the fields of physics education (Dancy and | |
Henderson, 2010; Henderson et al., 2012) and engineering and science education (Brawner | |
et al., 2002; Borrego et al., 2013; Froyd et al., 2013). However, instructors’ adoption of | |
teaching strategies varies according to individual preferences and beliefs, the contexts of | |
disciplines, and institutional policy (Borrego et al., 2013; Froyd et al., 2013), and their | |
persistence in the adoption and current use of these strategies (Hora and Ferrare, 2014; | |
Henderson and Dancy, 2009; Walter et al., 2016) and their fidelity (how closely the | |
implementation follows its original plan) (Borrego et al., 2013) are still in question. | |
Therefore, there is a need for additional studies addressing instructors’ understanding, | |
beliefs and perceptions about practicing SCL that impact their instructional design for | |
classroom interactions, and how they construct assessment methods to align with their | |
adoption of instructional strategies. Further research should examine how instructors | |
perceive their roles and experiences in the process of instructional change. | |
Student | |
centered | |
learning | |
in Qatar | |
517 | |
JARHE | |
10,4 | |
518 | |
3. Present study | |
The state of Qatar has the vision of transforming itself into a knowledge-producing economy | |
(General Secretariat for Development Planning, 2008; Rubin, 2012). Accordingly, advancement in | |
the fields of science and technology is a critical goal, as is promoting pedagogical practices that | |
support engagement in science and technology education (Dagher and BouJaoude, 2011). Qatar | |
University (QU) is the country’s foremost institution of HE and aims to become a leader in | |
economic and social development in Qatar. In its strategic plan for 2013–2016 (Qatar University | |
(QU), 2012), the leadership of QU has called for instructional innovation toward SCL by developing | |
“the skills necessary in the 21st century such as leadership, teamwork, communications, | |
problem-solving, and promoting a healthy lifestyle” (QU, 2012, p. 13). It is expected that these | |
initiatives will be implemented at the university level, particularly in the STEM fields. | |
Research on general university instructional practices in Qatar remains sparse, with little | |
information available on current instructional practices and to what extent student-centered | |
teaching and learning strategies are being implemented. In a recent study, the first on | |
university instructional practices in Qatar, Al-Thani et al. (2016) reported that across | |
disciplines, instructors’ prioritized lecture-based and teacher-centered instructional practices. | |
For example, most participants stressed lecture and content clarity as the most important and | |
effective practices. In contrast, instructors mentioned less about student–student interaction, | |
the integration of technology and instructional variety received less interest, according to the | |
perceptions of the participants. However, little is known about either actual classroom | |
practices or the instructors’ perception of SCL, in particular in STEM fields. | |
To develop feasible change strategies that could be applied in the Qatar context with the | |
aim of facilitating innovation in HE in general and STEM education in particular, it is | |
essential to understand current instructional practices and how instructors perceive SCL, as | |
well as what strategies are being implemented (Henderson et al., 2010). Therefore, this study | |
aims to investigate STEM faculty’s perceptions and instructional practices of SCL and in | |
Qatar. The purpose is to generate knowledge on the research-based evaluation of STEM | |
faculty’s instructional practices. The study formulates the following research questions: | |
RQ1. What are the instructional practices of STEM faculty in Qatar? | |
RQ2. To what extent are instructors’ current practices student-centered? | |
RQ3. How do STEM faculty perceive SCL, possibilities for implementation and | |
challenges in classroom practice? | |
4. Research methods | |
4.1 Research design | |
Ideally, the study of STEM instructional practices involves the use of multiple techniques. The | |
methods commonly used to investigate university teaching practices include interviews with | |
instructors and students, portfolios written by instructors, surveys of instructors and | |
students, and observations in educational settings (AAAS, 2013). However in reality, research | |
conditions limit the choice of data collection methods (Creswell, 2013). Although classroom | |
observation and portfolios are widely practiced in schools and can be a potential method for | |
improving university teaching and learning, these rarely occur in practice except in cases of | |
faculty promotion, evaluation or professional development requests (AAAS, 2013). In addition, | |
peer and protocol-based observations demand significant resources of human labor, materials, | |
equipment and physical conditions, which makes them challenging to implement on a larger | |
scale (Walter et al., 2016). Therefore, a mixed-methods research design combining the | |
strengths of quantitative and qualitative data – surveys and interviews – was employed as | |
the major data generation method in this study (Creswell, 2002). | |
4.2 Participants | |
An open invitation was sent to the entire faculty in the science, engineering, mathematics and | |
health sciences fields, asking them to consider participating on a voluntary basis. A sample of | |
65 faculty members (23.4 percent female and 76.4 percent male) completed the questionnaire. | |
4.3 Data generation methods | |
Survey and instruments. A self-reported questionnaire survey is one of the most efficient | |
ways to gain information due to its accessibility, convenience to administrate and relative | |
time efficiency (AAAS, 2013, p. 7). Despite the common concern that the faculty may | |
inaccurately self-report their teaching practices, recent literature reports that some | |
aspects of instruction can be accurately reported by instructors (Smith et al., 2014); this | |
approach helps to identify instructional practices that are otherwise difficult to observe | |
(Walter et al., 2016). | |
The Postsecondary Instructional Practices Survey (PIPS) (Walter et al., 2016) is a newly | |
developed instrument aimed at investigating university teaching practices cross-disciplinarily | |
from the perspective of instructors. The PIPS was developed on the basis of a conceptual | |
framework constructed from a critical analysis of existing survey instruments (Walter et al., | |
2015), the observation codes of the Teaching Dimensions Observational Protocol (Hora et al., | |
2012), and the Reformed Teaching Observation Protocol (Piburn et al., 2000). The PIPS has | |
been proven to be valid and reliable while providing measurable variables, and results from | |
initial studies have shown that PIPS self-reported data are compatible with the results of | |
several Teaching Dimensions Observational Protocol codes (Walter et al., 2016). | |
The PIPS includes 24 items for statements and reports regarding instructional practice | |
and demographic questions on items such as gender, rank and academic titles. An intuitive, | |
proportion-based scoring convention is used to calculate the scores. Two models are used | |
for the supporting analysis – a two-factor or five-factor solution. Factors in the five-factor | |
model include: six items for student–student interactions, four items for content delivery, | |
four items for formative assessment, five items for student–content engagement and four | |
items for summative assessment. Factors in the two-factor model include: nine items for | |
instructor-centered practices and 13 items for student-centered practices. The responses | |
from participants were coded as (0) not at all descriptive of my teaching, (1) minimally | |
descriptive of my teaching, (2) somewhat descriptive of my teaching, (3) mostly descriptive | |
of my teaching and (4) very descriptive of my teaching. | |
In-depth interviews. An interview can provide opportunities to explore teaching practices | |
through interactions with the participants. It can also provide the space for in-depth | |
questions on specific teaching practices as well as perceptions, beliefs, opinions and | |
potentially unexpected findings (Creswell, 2013). During the interviews (interview | |
guidelines see Appendix), participants were asked questions about their understanding of | |
and past experiences with SCL, their perceptions of the effectiveness of practicing SCL in | |
general and in their current environments in particular, what challenges and barriers they | |
had experienced, and what institutional support is needed. | |
4.4 Procedure | |
The questionnaire was sent to all participants in early spring 2017 and was administered by | |
Qualtrics. An explanation of the goals of the survey, namely, to understand their current | |
practices without intention of assessment, was provided to the participants. A pilot test was | |
conducted with a several colleagues who were not participants to ensure that the questions | |
were unambiguous and addressed the goals. | |
A sample of 65 faculty members (23.4 percent female and 76.4 percent male) completed | |
the questionnaire. These were from the schools of sciences, health sciences, pharmacy | |
Student | |
centered | |
learning | |
in Qatar | |
519 | |
JARHE | |
10,4 | |
520 | |
and engineering. The average HE teaching experience of the participants was 14.5 years. | |
About 15.6 percent of participants were full professors, 39.1 percent were associate | |
professors, 31.3 percent were assistant professors and 14 percent were instructors or | |
lecturers. About 58.6 percent of the participants did not have a leadership role (e.g. head of | |
department, chair of curriculum committee). | |
In total, 12 (4 female and 8 male) of the 65 faculty members who completed the | |
questionnaire responded positively to the individual interview request. The interview | |
participants include a representative range of STEM faculty members by academic titles (three | |
professors, three associate professors and six assistant professors) and gender ( four female | |
and eight male). Table I shows details of interview participants’ background information. | |
5. Analyses and findings | |
5.1 Quantitative data analysis and results | |
To answer the first research question, the mean and standard deviation of each item were | |
calculated to identify the practices that best describe STEM faculty teaching in the given context. | |
Name | |
Academic | |
Gender rank | |
Abdullah | |
Male | |
Mohammad Male | |
Assistant | |
Professor | |
Assistant | |
Professor | |
Professor | |
Burhan | |
Male | |
Amin | |
Male | |
Ali | |
Male | |
Ibrahim | |
Male | |
Assistant | |
Professor | |
Ihab | |
Male | |
Professor | |
Alia | |
Female Associate | |
Professor | |
Faris | |
Male | |
Sara | |
Female Assistant | |
Professor | |
Iman | |
Female Assistant | |
Professor | |
Duaa | |
Female Professor | |
Associate | |
Professor | |
Associate | |
Professor | |
Assistant | |
Professor | |
Table I. | |
Interview participants’ | |
background | |
Note: All names are anonymous | |
information | |
Previous pedagogical experiences | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment in 4 countries | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment in 2 countries | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment and in | |
active-learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment and in | |
inquiry-based learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment and in | |
project-based learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment and in | |
project-based learning environment | |
Student experiences in lecture-based learning environment | |
Teaching experiences in lecture-based learning environment and in | |
problem-based learning environment | |
Student experiences in lecture-based learning environment and | |
problem-based learning environment | |
Teaching experiences in problem-based learning environment | |
Student experiences in lecture-based learning environment and | |
problem-based learning environment | |
Teaching experiences in problem-based learning environment | |
Student experiences in lecture-based learning environment and | |
problem-based learning environment | |
Teaching experiences in lecture-based learning environment and | |
problem-based learning environment in 3 countries | |
The grand mean for each factor was also calculated. The descriptive statistics for participants’ | |
responses to the PIPS are presented in Table II. | |
The participants reported that the items of factor 2 (F2), content-delivery practices, were | |
mostly descriptive of their teaching (x ¼ 3.14). That is, the items stating that their syllabus | |
contains the specific topics that will be covered in every class session (x ¼ 3.58), they | |
structure the class session to give students good notes (x ¼ 3.18), and they guide students as | |
they listen and take notes (x ¼ 2.89) were mostly descriptive of their content delivery. | |
The grand mean of student–content engagement (F4) was relatively high (x ¼ 3.07). This | |
means that, for example, instructors frequently ask students to respond to questions during | |
class time (x ¼ 3.49) and frequently structure problems so that students are able to consider | |
multiple approaches to finding a solution. | |
As to the student–student interaction factor (F1), the grand mean (x ¼ 2.18) was | |
relatively low compared to the other factors. The item means ranged from 1.9 to 2.51, with | |
Factor | |
Factor 1: student–student interaction | |
P10. I structure class so that students explore or discuss their understanding of new concepts | |
before formal instruction | |
P12. I structure class so that students regularly talk with one another about course concepts | |
P13. I structure class so that students constructively criticize one another’s ideas | |
P14. I structure class so that students discuss the difficulties they have with this subject with | |
other students | |
Grand mean of factor 1 | |
Factor 2: content delivery practices | |
P01. I guide students through major topics as they listen and take notes | |
P03. My syllabus contains the specific topics that will be covered in every class session | |
P05. I structure my course with the assumption that most of the students have little useful | |
knowledge of the topics | |
P11. My class sessions are structured to give students a good set of notes | |
Grand mean of factor 2 | |
Factor 3: formative assessment | |
P06. I use student assessment results to guide the direction of my instruction during the semester | |
P08. I use student questions and comments to determine the focus and direction of classroom | |
discussion | |
P18. I give students frequent assignments worth a small portion of their grade | |
P20. I provide feedback on student assignments without assigning a formal grade | |
Grand mean of factor 3 | |
Factor 4: student–content engagement | |
P02. I design activities that connect course content to my students’ lives and future work | |
P07. I frequently ask students to respond to questions during class time | |
P09. I have students use a variety of means (models, drawings, graphs, symbols, simulations, | |
etc.) to represent phenomena | |
P16. I structure problems so that students consider multiple approaches to finding a solution | |
P17. I provide time for students to reflect about the processes they use to solve problems | |
Grand mean of factor 4 | |
Factor 5: summative assessment | |
P21. My test questions focus on important facts and definitions from the course | |
P22. My test questions require students to apply course concepts to unfamiliar situations | |
P23. My test questions contain well-defined problems with one correct solution | |
P24. I adjust student scores (e.g. curve) when necessary to reflect a proper distribution of grades | |
The new grand mean of factor 5, excluding P24 | |
Student | |
centered | |
learning | |
in Qatar | |
521 | |
Mean SD | |
2.51 1.03 | |
2.77 1.02 | |
2.06 1.07 | |
2.06 1.11 | |
2.18 0.82 | |
2.89 0.99 | |
3.58 0.77 | |
2.89 0.94 | |
3.18 0.83 | |
3.14 0.55 | |
2.98 0.86 | |
2.95 | |
2.7 | |
1.82 | |
2.62 | |
0.86 | |
1.22 | |
1.33 | |
0.69 | |
3.11 0.95 | |
3.49 0.77 | |
2.92 | |
2.94 | |
2.88 | |
3.07 | |
3.03 | |
2.58 | |
2.91 | |
0.89 | |
2.84 | |
1.04 | |
0.88 | |
0.87 | |
0.53 | |
1.00 | |
1.16 | |
1.13 | |
1.19 | |
0.8 | |
Table II. | |
The descriptive | |
statistics for | |
participants’ | |
responses to the PIPS | |
survey – five-factor | |
model analysis | |
JARHE | |
10,4 | |
522 | |
the maximum possible value being 4. Compared with the other items of this factor, item P13 | |
(“I structure class so that students constructively criticize one another’s ideas”) had the | |
lowest mean (x ¼ 1.9), which indicates that this practice is somewhat, but not mostly or very | |
much, descriptive of instructors’ practices. The item concerning structuring the class so that | |
students discuss the difficulties they have with the subject matter with other students also | |
had a low mean (x ¼ 2.06). | |
The formative assessment factor (F3) also had a relatively low grand mean (x ¼ 2.62). | |
The mean of item P20 was 1.82, indicating that providing feedback on student assignments | |
without assigning a formal grade was not very descriptive of QU instructors’ practices. | |
The means for the rest of the items ranged from 2.7 to 2.98. Using student comments and | |
questions to determine the direction of classroom discussions (x ¼ 2.95) and using student | |
assessment results to guide the direction of their instruction (x ¼ 2.98) were mostly | |
descriptive of QU instructors’ practices, as reported by participants. | |
The summative assessment factor (F5) had a low grand mean (x ¼ 2.35). This relatively | |
low mean was greatly impacted by item P24 (“I adjust student scores [e.g. curve] when | |
necessary to reflect a proper distribution of the grades”). In the given context, instructors are | |
not allowed to adjust student scores, so the result of this item reflects university policy | |
rather than individual instructor’s preference. An analysis excluding item P24 shows a | |
different picture: the mean of the summative assessment factor without item P24 becomes | |
2.84. Thus, the student–student interaction factor and the formative assessment factor | |
represent the lowest means in this study. | |
To answer the second research question, a paired samples t-test was conducted to | |
compare the mean of student-centered items (P02, P04, P06-10, P12-16, P18-20) with the | |
mean of the instructor-centered items (P01, P03, P05, P11, P17, P21-24). The mean of | |
the student-centered factors is 2.69 and the mean of the instructor-centered factors is 2.76. | |
The results of the paired samples t-test found no statistically significant difference | |
(α ¼ 0.05) between student-centered mean and instructor-centered mean (t ¼ −1.00, | |
df ¼ 64). However, when item 24 is excluded, the mean of the instructor-centered items | |
becomes 2.99. A significant difference (α ¼ 0.05) was found between the student-centered | |
mean and the new (excluding item 24) instructor-centered mean (t ¼ −4.15, df ¼ 64). | |
An alignment was identified between the results of the five-factor model analysis and the | |
two-factor model analysis. Quantitative analysis results did not show a correlation between | |
instructional practices and demographic factors such as academic rank or years of teaching. | |
However, the results identified significant differences in using student-centered | |
instructional practices according to the gender of the participant. Based on the data | |
reported by participants, the mean of using student-centered instructional practices was | |
2.81 for male participants and 2.37 for female participants. A one-way ANOVA found a | |
statistically significant difference (α ¼ 0.05) between the student-centered mean of male | |
participants and that of female participants (F ¼ 7.64, p ¼ 0.008). | |
5.2 Qualitative data analysis and results | |
The qualitative analysis provides answers to the third research question. All interviews | |
were transcribed before being coded and analyzed. The analysis used an integrated | |
approach combining guiding principles on SCL by Brook (1999), Rogers (2002) and | |
Weimer (2002), and Kvale and Brinkmann’s (2009) meaning condensation method. The | |
analyzed identified emerging themes from instructors’ accounts of their opinions, | |
experiences and reflections. | |
Instructors’ definitions and perceptions of their roles in SCL. Although all interviewed | |
instructors believed they were using SCL strategies in their classrooms, they defined the | |
term SCL in various ways. Three categories of definitions were identified; these are | |
explained below. Interview data also found a consistency between instructors’ definitions | |
and their perceptions of their roles in an SCL environment: | |
• | |
Category 1: there were three instructors, one professor and two assistant professors, | |
all male that believed lecturing to be the best way of teaching and learning. | |
According to them, a good lecturer is keen to motivate and encourage students to be | |
free thinkers. When students choose to enter a university, they should be sufficiently | |
mature and willing to work hard enough to progress through their education. | |
Therefore, the university “should be student-centric by definition” (Burhan). This | |
definition was supported by the following remark: | |
I believe that in our university every instructor is doing SCL in their own way […] but instead of | |
standing there reading slides, I think it makes it more student-centered by providing an interesting | |
lecture so that when they leave the room you will hear them say, “Wow, this is inspiring and | |
interesting.” (Mohammad) | |
All three of the instructors interviewed conceived of their role as to “inspire and attract | |
students.” As Abdullah commented: | |
It is the responsibility of the instructors to find a way to bring in highly interesting lectures to make | |
students interested […] to do that, we should prioritize research, so we have something really | |
interesting to bring to the class. | |
• | |
Category 2: instructors in this category included one female associate professor, one | |
male assistant professor, two male associate professors and one male professor. They | |
believed that in an SCL environment, the instructor should provide activities for | |
students to learn hands-on skills and relate theories to certain practices, and that | |
students should acquire deep knowledge in the field by working together actively on | |
classroom activities. As Ihab commented, “[I]t is so boring to just fill the class with | |
me talking and lecturing. It is fun to plan some activities so students can work in a | |
team so that they can practice the theories; students like these [activities].” In such an | |
environment, the instructor should play the role of “providing” activities and | |
“guiding” students to learn the requested, relevant knowledge through these | |
activities, as most of them suggested. | |
• | |
Category 3: this category included two female assistant professors, one female | |
professor and one male assistant professor. They believed students should work in | |
small groups, with no more than ten people per team, on certain targets, such as | |
solving a problem. Students should be responsible for organizing study activities and | |
should make decisions on their own to prepare for the requirements of their future | |
professions. They should also be allowed to make mistakes and should receive help | |
with reflecting on these mistakes in order to improve. As Faris commented: | |
I did not like my own student experiences which were filled with lectures and lab work, | |
I appreciated my past experienced of working in a more student-centered learning environment, | |
which offered me tools to provide what I think as better learning environment now to my students. | |
These four instructors used a few different metaphors to describe their roles: “leaders” – “leading | |
students to work towards their targets” (Sara and Iman), “observers” – “observing students from | |
a distance and only interfering when they got off-track” (Faris), and “facilitators” – “having | |
patience when students made mistakes” (Faris), “providing rich resources to students in need of | |
help and redirecting students when they were in trouble” (Sara and Iman), “assisting students to | |
be able to make their own decisions on learning goals, what to learn and how to learn it, and | |
critically evaluate and reflect on their own learning” (Duaa). | |
Student | |
centered | |
learning | |
in Qatar | |
523 | |
JARHE | |
10,4 | |
524 | |
The interview data did not reveal any patterns in teachers’ definitions and perceptions | |
according to their academic ranking or gender. However, past experiences with SCL seemed | |
to make a difference in their understanding and choice of strategies. For example, | |
participants from category 1 mainly experienced lectures as the major source of learning | |
and form of teaching in their past student and teaching experiences. Those from category 2 | |
experienced different types of SCL environments due to their previous work experiences but | |
not during their student experiences. Two participants from category 3 experienced SCL in | |
the form of problem-based learning (PBL) in their past student experiences, and the other | |
two participants had experiences with SCL both as learners and as instructors prior to their | |
current jobs. A participant’s past experiences, particularly as a learner, seem to have a close | |
link to their current instructional practice. As Sara remarked: | |
Having experienced the Problem-Based Learning in my college time, I truly it is the best way to | |
learn. Working in team offered us great opportunities to help each other and support each other. | |
This means a lot in particular for us female in Arabic culture. We never went out to talk with others | |
before and in such a environment we learned how to interact with others and how to behave | |
professionally […] we increased our self-confidence and it was very empowering. | |
Although all three groups mentioned that students should take responsibility for their own | |
learning, when asked to what level students should be involved in deciding what to learn | |
and how to learn it, and even how to assess what they have learned, only one instructor (Ali) | |
said it would be ideal to involve students in these decisions. However, he had neither | |
experienced this himself nor had he observed any such practice in his immediate | |
environment. Out of the 12 interviewees, 10 believed that instructors should decide which | |
activities to provide, what materials to use and how to structure student activity time and | |
form, and should also ensure students reach “the correct” answers. | |
While the data are too broad to draw any strong conclusions, the majority of the | |
classroom activities that the interviewed instructors exemplified focused on students | |
working in groups to fulfill an assignment designed by the instructor or students answering | |
questions from the instructor in a teacher-student one-to-one form. The roles described by | |
all the instructors involved offering directions and structures. As most of instructors | |
mentioned, given the time pressure to deliver all the required content for their courses, they | |
had to ensure students progressed through the mandated learning checkpoints. | |
Assessment. The interviewed instructors agreed that assessment played an essential role | |
in evaluating student learning. One instructor said, an exam “is the best way to engage them | |
to learn because they work so hard just before it” (Ibrahim). With the exception of one | |
instructor, the respondents gave multiple-choice questions plus short-answer questions as the | |
major forms of assessment they used. However, their opinions on what should be included in | |
and what should be the focus of the assessment diverged. The instructors provided examples | |
that included; “To prepare [students] for their future profession, exams in universities should | |
focus on lots of hand-on skills” (Alia); “More writing skills are needed for the exam” (Amin); | |
and “Students need to be posed exams that can question their thinking skills” (Faris). | |
Two major reasons for the choice of assessment were provided. First, the assessment | |
committee within a college or across colleges defined the assessments as exams for some | |
undergraduate courses, particularly general courses. This limited the options for instructors | |
to design exams different from common exam used in these classes. Second, when | |
instructors did have the freedom to design exams for their courses, it is most convenient to | |
use assessment forms that can “examine the knowledge students have mastered” and are | |
the “least time-consuming” for grading purposes, as 8 out of the 12 interview participants | |
expressed. As one participant said, “It takes a few hours to grade multiple-choice question | |
exams. With the busy schedule we have, you don’t want to spend several days to grade and | |
provide feedback for a few hundred essays” (Ihab). | |
Two of the interviewed instructors (Faris and Duaa) expressed their views on how | |
formative assessment should be further enhanced in order to better facilitate SCL, only one | |
of them enhanced their assessments in daily practice, as Duaa commented: | |
Real SCL should involve students not only in deciding on what activities they take in the classroom, | |
but also in defining assessment methods, but I can see the students are shocked when I invite them | |
to give opinions on how they should be assessed […] it will take more time before more people | |
understand that involving students in defining assessment is to motivate them to be more | |
responsible instead of cheating. | |
Given this challenge, this instructor mainly relied in practice on asking students to identify | |
and structure their own projects and problems. | |
Challenges. The majority of the instructors believed that students are the most | |
challenging factor in implementing ideal SCL in the given context. A major reason cited | |
for this is the Arabic culture. Out of the 12 interviewed instructors, 11 believed that most | |
students were raised in Arabic families deeply rooted in Middle Eastern culture, where | |
family plays an important role in one’s daily life, meaning that most teenagers do not have | |
opportunities to live alone and make decisions independently. In addition, their high | |
school experiences did not help them become independent learners, as in that setting they | |
are used to lectures and taking arranged assignments without asking any “why” | |
questions and exams that are mostly in the form of multiple-choice questions that test | |
their memories. Students are familiar with being provided with information and | |
instruction and having their time arranged and they even prefer it that way. As an | |
instructor said, “This is how the students grow up; they are used to it and they cannot take | |
responsibilities on their own. They are not motivated to do things independently, no | |
matter how the instructor works hard to push them, they are not really ready for a true | |
SCL” (Alia). | |
Large classroom sizes were identified as another major challenge for implementing | |
student–student activities because the students easily slip into a chaotic and “out of control” | |
mode, according to some teachers. Interestingly, this was used as an argument for “offering a | |
really interesting lecture as an effective approach to provide SCL,” as Abdullah commented. | |
Finally, the busy schedule of university faculty remains a reason to limit what they do: | |
“if we don’t have so much teaching load, we may have more time to do what could have | |
been more student-centered strategies such as letting students identify problems and | |
learning needs on their own” (Ali). Although teaching plays an important role in the | |
appraisal system at QU, research products, such as publications, remain the major tool to | |
evaluate faculty performance. Ibrahim mentioned “when we apply for promotion, which is | |
particularly crucial for assistant professors, all what is to be evaluated is the publication | |
in one’s own field, as long as we can prove we are able to teach, it is not highly critical how | |
we teach.” | |
Support needed. Three participants expressed their desire for an institutionalized | |
approach to changing the assessment system, allowing for more faculty autonomy to design | |
assessment methods that are appropriate for their courses. Most of the suggestions for | |
support referred to actions focusing on faculty and students. In total, 11 participants | |
suggested more workshops and training sessions for faculty to gain the necessary skills to | |
facilitate SCL. Five participants suggested student tutoring programs to help first-year | |
undergraduate students learn personal responsibility and to “grow up by following | |
suggestions from experienced students” (Faris). One participant even suggested that | |
attention to pedagogy should be reduced for now because “We give too much attention to | |
the students, nearly like spoon-feeding, worrying too much about whether they are happy or | |
not in studying […] students should stand on their own feet, and sometimes they learn by | |
being thrown into the deep sea” (Burhan). | |
Student | |
centered | |
learning | |
in Qatar | |
525 | |
JARHE | |
10,4 | |
6. Discussion | |
In this section, we compare the qualitative data findings and the quantitative study results | |
and discuss them in relation to the three dimensions of focus in SCL previously summarized in | |
this paper: instructors’ perceptions and roles, student activity and interaction and assessment. | |
This is followed by a discussion of STEM instructors’ views on challenges to implementation. | |
526 | |
6.1. STEM instructors’ understanding and perceptions of SCL | |
Improving the quality of teaching and learning in the STEM fields necessitates exploring | |
the conceptions that faculty instructors hold regarding the learning environment and the | |
context of teaching since teaching approaches are strongly influenced by the underlying | |
beliefs of the teacher (Kember, 1997). The participants in this study hold different beliefs | |
about and attitudes toward SCL strategies. Connections can be identified between the | |
participants’ understandings and perceptions of SCL and their prior experiences with it. | |
Those who had experienced SCL as learners tended to make more of an effort to implement | |
the strategies effectively in their own teaching practice. This finding echoes previous | |
studies suggesting that in order to maximize their capability of facilitating PBL faculty | |
should be provided with opportunities to experience PBL as learners (Kolmos et al., 2008). | |
Comparing results from the quantitative and qualitative data, this study identifies gaps | |
between what the instructors consider to be SCL and what they actually practice. | |
As suggested by Paris and Combs (2006), the broad and wide-ranging definitions of SCL | |
legitimize the instructors’ actual practices. This gap can serve as an alert when a large-scale | |
change initiative is being implemented in the given context. As Henderson et al. (2011) | |
note, awareness and knowledge of SCL strategies cannot guarantee their actual practice. | |
6.2 Student activity and interaction | |
This study reported that instructors have a general awareness of using student-centered | |
strategies. Student activities are regarded as essential in instructional practices. | |
Nevertheless, this study also shows that, in the given context, most classroom | |
interactions are in the form of student–content and student-teacher interactions whereas | |
student–student interactions remain limited. In practice, a generally low level of SCL can be | |
concluded, according to the PIPS instrument (Walter et al., 2016) and the definition of SCL in | |
previous studies (Brook, 1999; Rogers, 2002; Weimer, 2002). Student interaction with the | |
content and instructor may be directly related to the common concept of instruction and | |
may reflect a lecture-centered pedagogic approach. This finding is in line with the report | |
from a previous study showing that instructors in Qatar tend to focus on content delivered | |
through lectures as an efficient way of teaching (Al-Thani et al., 2016). Previous studies | |
(Borrego et al., 2013; Henderson and Dancy, 2009; Walter et al., 2016) also report that the | |
levels of implementing instructional practices vary according to different aspects; for | |
example, STEM faculties reported limited use of certain strategies such as group work and | |
solving problems collaboratively in daily practice despite their high level of knowledge and | |
awareness. Instructor’s lack of professional vision on collaborative group work can lead to | |
their lack of practice (Modell and Modell, 2017). An often-reported reason is that instructors | |
give priority to content delivery due to limited class time (Hora and Ferrare, 2014; Walter | |
et al., 2016). Another explanation may be instructors’ lack of confidence in letting students | |
take full responsibility for organizing their own learning activities outside of instructors’ | |
control (Du and Kirkebæk, 2012). | |
Student–student interaction received relatively less attention and consideration from the | |
participants in this study. Previous studies have found that the length of classes and class | |
size were often the most important barrier for the implementation of student-centered | |
instructional practices (Froyd et al., 2013). In the context of this study, this may be one of | |
the factors limiting the possibility of using student interaction in the classroom. In the | |
undergraduate programs, the length of classes is 1 h and 15 minutes, which is counted as a | |
two-study-hour class. This limits instructors’ confidence in their ability to deliver heavy | |
curriculum content while also providing opportunities to engage students with interactive | |
activities. Another possible reason is the bias of the instructors’ knowledge regarding SCL | |
strategies; some instructors believe it is sufficient to deliver SCL by simply asking students | |
to do something that is different from lecturing (Paris and Combs, 2006; Shu-Hui and Smith, | |
2008). Linking the results in the aspect of the instructors’ definition of SCL to their perceived | |
roles of teaching, as the participants described in interviews, the instructors also lack the | |
belief that interactive student activities can lead to actual learning. Participants consider it | |
important that instructors maintain control of classroom activities. For example, Borrego | |
et al. (2013) found a strong correlation between instructors’ beliefs regarding problem | |
solving and the time students spent on collaborative activities, such as discussing problems. | |
6.3 Unaligned assessment | |
Although the participants demonstrated an awareness of SCL in general and willingness to | |
implement certain SCL strategies, they reported limited critical reflection on assessment | |
systems in the given context. Their limited understanding and practice of formative | |
assessment is an impediment to the effectiveness of practicing SCL by aligning instruction | |
and assessment. Instructional innovation demands changes not only in classroom practices | |
but also, more importantly, in assessment methods. Williams et al. (2015) noted that | |
formative assessment is a factor that is often ignored or forgotten, even by many of the | |
researchers who have developed instruments to describe instructional practices. This study | |
similarly found that the summative-oriented prevailing assessment methods at the | |
university level remain unchallenged by the instructors. This may be due to their lack of | |
knowledge and experience of formative assessment, or due at least in part to the | |
convenience of using what they are asked to as well as what they are accustomed. Changing | |
teaching methods without a constructive alignment with assessment methods will limit the | |
effectiveness of any instructional innovation (Biggs and Tang, 2011). | |
6.4 Factors that make a difference | |
Previous studies (Dancy and Henderson, 2007, 2010; Froyd et al., 2013; Henderson | |
and Dancy, 2009; Henderson et al., 2012) have reported that a faculty member’s use of | |
student-centered strategies is often related to demographic factors such as gender, academic | |
rank and years of teaching. The results of this study only identified a correlation between | |
instructional practices and gender. In contrast to the findings of previous studies, namely, | |
that female instructors tend to use student-centered methods more often than male | |
instructors and that younger instructors tend to show more interest in adopting new | |
pedagogical initiatives, quantitative data of this study found that male participants reported | |
higher levels of employing student-centered approaches than female participants, but found | |
no patterns regarding academic rank and years of teaching. A major reason may be the | |
small number of participants in this study. A possible reason for the gender difference may | |
be the imbalanced gender ratio among the overall participants in this study (the proportion | |
of female participants was 23.4 percent). Nevertheless, qualitative data did not identify any | |
patterns due to gender and academic rank, but rather, identified a connection between the | |
instructor’s prior experience with SCL and their understanding, perception and practices, as | |
previously discussed. | |
6.5 Challenges | |
Two categories of instructor concerns and barriers to their sustainable use of instructional | |
innovation were identified. Students’ lack of maturity, motivation and responsibility was | |
Student | |
centered | |
learning | |
in Qatar | |
527 | |
JARHE | |
10,4 | |
528 | |
considered the major challenge by most of the interviewed participants, except for those | |
who had experienced SCL as a student. Regarding students as the source of the problem and | |
blaming students for their own poor performance can be seen as another symptom to be | |
associated with a lecturer-centered approach. | |
Another major challenge is institutional constraints such as the insufficiency of | |
classroom time. Instructors tend to have different opinions regarding the amount of time it | |
takes to include interactive student–student activities. Large class size is often a barrier for | |
instructors hoping to use interactive student–student activities. Female faculty members | |
and younger faculty members are found to have a higher rate of innovative instruction use | |
and continuation. | |
6.6 Recommendations | |
As previous studies (Froyd et al., 2013) have suggested, when an instructional strategy is | |
adopted at a low level, it means that it is either not mature or will never achieve full | |
adoption. Institutionalized faculty development and support are essential for the further | |
implementation of innovative instructional strategies and the persistence and continuation | |
of the implementation, as Dancy and Henderson (2007) pointed out, while institutional | |
barriers can limit instructional innovations when structures have been set up to function | |
well with traditional instruction. The following list of recommendations is provided as | |
inspiration for institutional support and faculty development activities: | |
• | |
First, faculty members need to develop a deep understanding of SCL through | |
experiences as learners so that they can become true believers and implementers. | |
• | |
Second, autonomy is needed for faculty to adopt appropriate assessment methods | |
that are aligned with their pedagogical objectives and delivery methods. Input on | |
how faculty can adapt instructional innovation to tailor it to the local context is very | |
important for its long-term effectiveness (Hora and Ferrare, 2014). | |
• | |
Third, an inclusive approach to faculty evaluation by encouraging faculty from | |
STEM backgrounds to be engaged in research on their instructional practice will not | |
only sustain the practice of innovative pedagogy but will also enrich the research | |
profiles of STEM faculty and their institutes. | |
7. Conclusion | |
This study examined university STEM instructors’ understanding and perceptions of SCL | |
as well as their self-reported current practices. Results of the study provide insights on how | |
institutional strategies of instructional change are continually practiced. The study | |
identified a lack of alignment between instructors’ perceptions and their actual practices of | |
SCL. Despite agreement on perceiving SCL as an effective teaching strategy, the instructors’ | |
actual practices prioritize content delivery, the teachers’ role in classroom control, and | |
defining student learning activities as well as summative assessment. Student–student | |
interactions and formative assessment are limited. The participants tended to blame the | |
limited use of SCL on the lack of motivation and readiness among students and on | |
institutional constraints. Another perspective to explain this gap may be the diverse yet | |
inclusive definitions of SCL espoused by faculty, which tend to legitimate their practices, | |
reflecting a rather low level of implementation compared to the literature. This study also | |
suggests that faculty’s understanding and perceptions of implementing student-centered | |
approaches were closely linked to their prior experiences – experiencing SCL as a learner | |
may better shape the understanding and guide the practice of SCL as an instructor. | |
Thereafter, recommendations are provided for faculty development activities at an | |
institutional level for sustainable instructional innovation. | |
The study has a few limitations. First, regarding methodological justification, the data | |
methods chosen in this study were mainly focused on the faculty’s self-reporting. | |
Although such methods are frequently employed for studying faculty beliefs, perceptions | |
and instructional practices (Borrego et al., 2013), data sources from other sources, such as | |
observation, may offer information from new perspectives for instructional development | |
(Henry et al., 2007). Second, the limited number of participants restricts this study’s | |
generalizability because the survey was administered on a volunteer-based manner and | |
the limited number of interview participants makes it difficult to establish clear patterns. | |
Third, researching faculty members raises concerns in the given context, wherein | |
extensive faculty assessments are regularly conducted. Although special considerations | |
regarding ethical concerns were taken in this study – for example, participants were | |
provided with a clear explanation of the goals and consequences of the study and | |
were shown that it had no relation to the university’s annual faculty performance | |
assessment – the potential sensitivity may have caused a certain amount of reservation | |
among participants regarding sharing further information; this may have limited the | |
results of the study. | |
In conclusion, the results reported in this paper provide a first impression of the present | |
instructional practices in the STEM field in the context of Qatar. Findings of the study, | |
although limited to the given context, may have implications for other countries in the Gulf | |
Region and Arabic speaking contexts, and potentially an even broader contexts, since | |
instructional change toward SCL in STEM classrooms remains a general challenge | |
worldwide (Hora and Ferrare, 2014; Froyd et al., 2013). The results imply that more attention | |
should be given to faculty development programs to enhance instructor awareness, | |
knowledge and skills related student–student interaction and formative assessment. This | |
study contributes to further instructional change implementation by introducing a roadmap | |
toward change on broader levels, such as strategies of institutional change for instructional | |
innovation, as well as toward the establishment of a research-based and evidence-based | |
approach to faculty development and institutional change. | |
References | |
Al-Thani, A.M., Al-Meghaissib, L.A.A.A. and Nosair, M.R.A.A. (2016), “Faculty members’ views of | |
effective teaching: a case study of Qatar University”, European Journal of Education Studies, | |
Vol. 2 No. 8, pp. 109-139. | |
American Association for the Advancement of Science (AAAS) (2013), “Describing and measuring | |
STEM teaching practices: a report from a national meeting on the measurement of | |
undergraduate science, technology, engineering, and mathematics (STEM) teaching”, American | |
Association for the Advancement of Science, Washington, DC, available at: http://ccliconference. | |
org/files/2013/11/Measuring-STEM-Teaching-Practices.pdf (accessed November 15, 2006). | |
Anderson, R.D. (2002), “Reforming science teaching: what research says about inquiry”, Journal of | |
Science Teacher Education, Vol. 13 No. 1, pp. 1-12. | |
Attard, A., Di Loio, E., Geven, K. and Santa, R. (2010), Student Centered Learning: An Insight into | |
Theory and Practice, Partos Timisoara, Bucharest. | |
Barr, R.B. and Tagg, J. (1995), “From teaching to learning: a new paradigm for undergraduate | |
education”, Change: The Magazine of Higher Learning, Vol. 27 No. 6, pp. 12-26. | |
Biggs, J.B. and Tang, C. (2011), Teaching for Quality Learning at University: What the Student Does, | |
McGraw-Hill Education, Berkshire. | |
Bilgin, I., Karakuyu, Y. and Ay, Y. (2015), “The effects of project-based learning on undergraduate | |
students’ achievement and self-efficacy beliefs towards science teaching”, Eurasia Journal of | |
Mathematics, Science & Technology Education, Vol. 11 No. 3, pp. 469-477. | |
Student | |
centered | |
learning | |
in Qatar | |
529 | |
JARHE | |
10,4 | |
Black, P. and William, D. (1998), “Assessment and classroom learning”, Assessment in Education: | |
Principles, Policy & Practice, Vol. 5 No. 1, pp. 7-74. | |
530 | |
Brawner, C.E., Felder, R.M., Allen, R. and Brent, R. (2002), “A survey of faculty teaching practices and | |
involvement in faculty development activities”, Journal of Engineering Education, Vol. 91 No. 4, | |
p. 393. | |
Borrego, M., Froyd, J.E., Henderson, C., Cutler, S. and Prince, M. (2013), “Influence of engineering | |
instructors’ teaching and learning beliefs on pedagogies in engineering science courses”, | |
International Journal of Engineering Education, Vol. 29 No. 6, pp. 1456-1471. | |
Brook, J.G. (1999), In Search of Understanding: The Case for Constructivist Classrooms, Association for | |
Supervision & Curriculum Development, Alexandria. | |
Cornelius-White, J. (2007), “Learner-centered teacher-student relationships are effective: a metaanalysis”, Review of Educational Research, Vol. 77 No. 1, pp. 113-143. | |
Creswell, J.W. (2002), Educational Research: Planning, Conducting, and Evaluating Quantitative and | |
Qualitative Research, Pearson Education, Upper Saddle River, NJ. | |
Creswell, J.W. (2013), Qualitative Inquiry and Research Design: Choosing among Five Approaches, Sage. | |
Curtis, R. and Ventura-Medina, E. (2008), An Enquiry-Based Chemical Engineering Design Project for | |
First-Year Students, University of Manchester, Centre for Excellence in Enquiry-Based | |
Learning, Manchester. | |
Dagher, Z. and BouJaoude, S. (2011), “Science education in Arab states: bright future or status quo?”, | |
Studies in Science Education, Vol. 47, pp. 73-101. | |
Dancy, M. and Henderson, C. (2007), “Framework for articulating instructional practices and | |
conceptions”, Physical Review Special Topics: Physics Education Research, Vol. 3 No. 1, pp. 1-12. | |
Dancy, M. and Henderson, C. (2010), “Pedagogical practices and instructional change of physics | |
faculty”, American Journal of Physics, Physics, Vol. 78 No. 10, pp. 1056-1063. | |
Dewey, J. (1938), Experience and Education, Collier and Kappa Delta Phi, New York, NY. | |
Downey, G.L., Lucena, J.C., Moskal, B.M., Parkhurst, R., Bigley, T., Hays, C. and Lehr, J.L. (2006), “The | |
globally competent engineer: working effectively with people who define problems differently”, | |
Journal of Engineering Education, Vol. 95 No. 2, pp. 107-122. | |
Du, X.Y. and Kirkebæk, M.J. (2012), “Contextualizing task-based PBL”, Exploring Task-Based PBL in | |
Chinese Teaching and Learning, pp. 172-185. | |
Du, X.Y., Su, L. and Liu, J. (2013), “Developing sustainability curricula using the PBL method in a | |
Chinese context”, Journal of Cleaner Production, Vol. 61 No. 15, pp. 80-88. | |
Duran, M. and Dökme, İ. (2016), “The effect of the inquiry-based learning approach on students’ criticalthinking skills”, Eurasia Journal of Mathematics, Science & Technology Education, Vol. 12 No. 12. | |
Ejiwale, J.A. (2012), “Facilitating teaching and learning across STEM fields”, Journal of STEM | |
Education: Innovations and Research, Vol. 13 No. 3, pp. 87-94. | |
Felder, R.M., Woods, D.R., Stice, J.E. and Rugarcia, A. (2000), “The future of engineering education II. | |
Teaching methods that work”, Chemical Engineering Education, Vol. 34 No. 1, pp. 26-39. | |
Freeman, S., Eddy, S.L., McDonough, M., Smith, M.K., Okoroafor, N., Jordt, H. and Wenderoth, M.P. | |
(2014), “Active learning increases student performance in science, engineering, and | |
mathematics”, Proceedings of the National Academy of Sciences, Vol. 111 No. 23, pp. 8410-8415. | |
Froyd, J., Borrego, M., Cutler, S., Henderson, C. and Prince, M. (2013), “Estimates of use of | |
research-based instructional strategies in core electrical or computer engineering courses”, IEEE | |
Transactions on Education, Vol. 56 No. 4, pp. 393-399. | |
General Secretariat for Development Planning (2008), Qatar National Vision 2030, General Secretariat | |
for Development Planning, Doha, available at: http://qatarus.com/documents/qatar-nationalvision-2030/ (accessed November 15, 2016). | |
Graham, M.J., Frederick, J., Byars-Winston, A., Hunter, A.B. and Handelsman, J. (2013), “Increasing | |
persistence of college students in STEM”, Science, Vol. 341 No. 6153, pp. 1455-1456. | |
He, Y., Du, X., Toft, E., Zhang, X., Qu, B., Shi, J. and Zhang, H. (2017), “A comparison between the | |
effectiveness of PBL and LBL on improving problem-solving abilities of medical students using | |
questioning”, Innovations in Education and Teaching International, Vol. 55 No. 1, pp. 44-54, | |
available at: https://doi.org/10.1080/14703297.2017.1290539 | |
Henderson, C. and Dancy, M. (2009), “The impact of physics education research on the teaching of | |
introductory quantitative physics in the United States”, Physical Review Special Topics: Physics | |
Education Research, Vol. 5 No. 2, pp. 1-15. | |
Henderson, C., Beach, A. and Finkelstein, N. (2011), “Facilitating change in undergraduate STEM | |
instructional practices: an analytic review of the literature”, Journal of Research in Science | |
Teaching, Vol. 48 No. 8, pp. 952-984. | |
Henderson, C., Dancy, M. and Niewiadomska-Bugaj, M. (2012), “The use of research-based instructional | |
strategies in introductory physics: where do faculty leave the innovation-decision process?”, | |
Physical Review Special Topics – Physics Education Research, Vol. 8 No. 2, pp. 1-9. | |
Henderson, C., Finkelstein, N. and Beach, A. (2010), “Beyond dissemination in college science teaching: | |
an introduction to four core change strategies”, Journal of College Science Teaching, Vol. 39 No. 5, | |
pp. 18-25. | |
Henry, M.A., Murray, K.S. and Phillips, K.A. (2007), Meeting the Challenge of STEM Classroom | |
Observation in Evaluating Teacher Development Projects: A Comparison of Two Widely Used | |
Instruments, Henry Consulting, St Louis, MA. | |
Hora, M.T. and Ferrare, J.J. (2014), “Remeasuring postsecondary teaching: how singular categories of | |
instruction obscure the multiple dimensions of classroom practice”, Journal of College Science | |
Teaching, Vol. 43 No. 3, pp. 36-41. | |
Hora, M.T., Oleson, A. and Ferrare, J.J. (2012), Teaching Dimensions Observation Protocol (TDOP) | |
User’s Manual, Wisconsin Center for Education Research, University of Wisconsin–Madison, | |
Madison, WI. | |
Justice, C., Rice, J., Roy, D., Hudspith, B. and Jenkins, H. (2009), “Inquiry-based learning in higher | |
education: administrators’ perspectives on integrating inquiry pedagogy into the curriculum”, | |
Higher Education, Vol. 58 No. 6, pp. 841-855. | |
Kember, D. (1997), “A reconceptualisation of the research into university academics’ conceptions of | |
teaching”, Learning and Instruction, Vol. 7 No. 3, pp. 255-275. | |
Ketpichainarong, W., Panijpan, B. and Ruenwongsa, P. (2010), “Enhanced learning of biotechnology | |
students by an inquiry-based cellulose laboratory”, International Journal of Environmental & | |
Science Education, Vol. 5 No. 2, pp. 169-187. | |
Kolmos, A., Du, X.Y., Dahms, M. and Qvist, P. (2008), “Staff development for change to problem-based | |
learning”, International Journal of Engineering Education, Vol. 24 No. 4, pp. 772-782. | |
Kvale, S. and Brinkmann, S. (2009), Interviews: Learning the Craft of Qualitative Research, SAGE, | |
Thousand Oaks, CA. | |
Lehmann, M., Christensen, P., Du, X. and Thrane, M. (2008), “Problem-oriented and project-based | |
learning (POPBL) as an innovative learning strategy for sustainable development in engineering | |
education”, European Journal of Engineering Education, Vol. 33 No. 3, pp. 283-295. | |
Martin, T., Rivale, S.D. and Diller, K.R. (2007), “Comparison of student learning in challenge-based and | |
traditional instruction in biomedical engineering”, Annals of Biomedical Engineering, Vol. 35 | |
No. 8, pp. 1312-1323. | |
Modell, M.G. and Modell, M.G. (2017), “Instructors’ professional vision for collaborative learning | |
groups”, Journal of Applied Research in Higher Education, Vol. 9 No. 3, pp. 346-362. | |
NEA (2010), “Preparing 21st Century students for a global society: an educator’s guide to ‘the four Cs’ ”, | |
National Education Association, Washington, DC, available at: www.nea.org/tools/52217 | |
(accessed December 20, 2017). | |
Nicol, D.J. and Macfarlane-Dick, D. (2006), “Formative assessment and self-regulated learning: a model | |
and seven principles of good feedback practice”, Studies in Higher Education, Vol. 31 No. 2, | |
pp. 199-218. | |
Student | |
centered | |
learning | |
in Qatar | |
531 | |
JARHE | |
10,4 | |
Paris, C. and Combs, B. (2006), “Lived meanings: what teachers mean when they say they are learnercentered”, Teachers & Teaching: Theory and Practice, Vol. 12 No. 5, pp. 571-592. | |
Piburn, M., Sawada, D., Falconer, K., Turley, J., Benford, R. and Bloom, I. (2000), Reformed Teaching | |
Observation Protocol (RTOP), Arizona Collaborative for Excellence in the Preparation of | |
Teachers, Tempe. | |
532 | |
Prince, M.J. and Felder, R.M. (2006), “Inductive teaching and learning methods: definitions, | |
comparisons, and research bases”, Journal of Engineering Education, Vol. 95 No. 2, pp. 123-138. | |
Qatar University (QU) (2012), “Qatar university strategic plan 2013–2016”, available at: www.qu.edu. | |
qa/static_file/qu/About/documents/qu-strategic-plan-2013-2016-en.pdf (accessed June 10, 2017). | |
Rogers, A. (2002), Teaching Adults, 3rd ed., Open University Press, Philadelphia, PA. | |
Rubin, A. (2012), “Higher education reform in the Arab world: the model of Qatar”, available at: www. | |
mei.edu/content/higher-education-reform-arab-world-model-qatar (accessed December 15, 2016). | |
Scott, L.C. (2015), “The futures of learning 2: what kind of learning for the 21st century?”, UNESCO | |
Educational Research and Foresight Working Papers, available at: http://unesdoc.unesco.org/ | |
images/0024/002429/242996E.pdf (accessed December 22, 2017). | |
Seymour, E. and Hewitt, N.M. (1997), Talking About Leaving: Why Undergraduates Leave the Sciences, | |
Westview, Boulder, CO. | |
Shu-Hui, H.C. and Smith, R.A. (2008), “Effectiveness of interaction in a learner-centered paradigm | |
distance education class based on student satisfaction”, Journal of Research on Technology in | |
Education, Vol. 40 No. 4, pp. 407-426. | |
Simsek, P. and Kabapinar, F. (2010), “The effects of inquiry-based learning on elementary students’ | |
conceptual understanding of matter, scientific process skills and science attitudes”, ProcediaSocial and Behavioral Sciences, Vol. 2 No. 2, pp. 1190-1194. | |
Slavich, G.M. and Zimbardo, P.G. (2012), “Transformational teaching: theoretical underpinnings, basic | |
principles, and core methods”, Educational Psychology Review, Vol. 24 No. 4, pp. 569-608. | |
Smith, K.A., Douglas, T.C. and Cox, M. (2009), “Supportive teaching and learning strategies in STEM | |
education”, in Baldwin, R. (Ed.), Improving the Climate for Undergraduate Teaching in STEM | |
Fields. New Directions for Teaching and Learning, Vol. 117, Jossey-Bass, San Francisco, CA, | |
pp. 19-32. | |
Smith, M.K., Vinson, E.L., Smith, J.A., Lewin, J.D. and Stetzer, K.R. (2014), “A campus-wide study of | |
STEM courses: new perspectives on teaching practices and perceptions”, CBE Life Sciences | |
Education, Vol. 13, pp. 624-635. | |
Springer, L., Stanne, M.E. and Donovan, S.S. (1999), “Effects of small-group learning on | |
undergraduates in science, mathematics, engineering, and technology: a meta-analysis”, | |
Review of Educational Research, Vol. 69 No. 1, pp. 21-51. | |
Steinemann, A. (2003), “Implementing sustainable development through problem-based learning: | |
pedagogy and practice”, Journal of Professional Issues in Engineering Education and Practice, | |
Vol. 129 No. 4, pp. 216-224. | |
Walczyk, J.J. and Ramsey, L.L. (2003), “Use of learner-centered instruction in college science and | |
mathematics classrooms”, Journal of Research in Science Teaching, Vol. 40 No. 6, pp. 566-584. | |
Walter, E.M., Beach, A.L., Henderson, C. and Williams, C.T. (2015), “Measuring postsecondary teaching | |
practices and departmental climate: the development of two new surveys”, in Burgess, D., | |
Childress, A.L. and Slakey, L. (Eds), Transforming Institutions: Undergraduate STEM in the 21st | |
Century, G.C. Weaver, W. Purdue, IN, Purdue University Press, West Lafayette, IN, pp. 411-428. | |
Walter, E.M., Henderson, C.R., Beach, A.L. and Williams, C.T. (2016), “Introducing the Postsecondary | |
Instructional Practices Survey (PIPS): a concise, interdisciplinary, and easy-to-score survey”, | |
CBE – Life Sciences Education, Vol. 15 No. 4, pp. 1-11. | |
Watkins, J. and Mazur, E. (2013), “Retaining students in science, technology, engineering, and | |
mathematics (STEM) majors”, Journal of College Science Teaching, Vol. 42 No. 5, pp. 36-41. | |
Weimer, M. (2002), Learner-Centered Teaching: Five Key Changes to Practice, Jossey-Bass, | |
San Francisco, CA. | |
Williams, C.T., Walter, E.M., Henderson, C. and Beach, A.L. (2015), “Describing undergraduate STEM | |
teaching practices: a comparison of instructor self-report instruments”, International Journal of | |
STEM Education, Vol. 2 No. 18, pp. 1-14, doi: 10.1186/s40594-015-0031-y. | |
Zhao, K., Zhang, J. and Du, X. (2017), “Chinese business students’ changes in beliefs and strategy use in | |
a constructively aligned PBL course”, Teaching in Higher Education, Vol. 22 No. 7, pp. 785-804, | |
doi: 10.1080/13562517.2017.1301908. | |
Appendix | |
Interview guidelines | |
(1) How do you understand/define SCL? What are important characteristics of SCL in your | |
opinion? | |
(2) What are your past experiences of using SCL? | |
(3) How do you see the role of instructor in an SCL environment, and in which ways is this role | |
descriptive of your current practice? | |
(4) What are your preferred assessment methods within your current teaching practices and | |
why? | |
(5) What should be the ideal assessment methods in an SCL environment? | |
(6) What are the challenges of practicing SCL in your current environment? | |
(7) In your opinion, what institutional supports are needed to implement SCL in Qatar? | |
Corresponding author | |
Saed Sabah can be contacted at: [email protected] | |
For instructions on how to order reprints of this article, please visit our website: | |
www.emeraldgrouppublishing.com/licensing/reprints.htm | |
Or contact us for further details: [email protected] | |
Student | |
centered | |
learning | |
in Qatar | |
533 | |
Enhancing Quality of Teaching in the Built Environment | |
Higher Education, UK | |
Muhandiramge Kasun Samadhi Gomis | |
School of Architecture and Built Environment, University of Wolverhampton, | |
Wolverhampton, UK, | |
Mandeep Saini | |
School of Architecture and Built Environment, University of Wolverhampton, | |
Wolverhampton, UK, | |
Chaminda Pathirage | |
School of Architecture and Built Environment, University of Wolverhampton, | |
Wolverhampton, UK, | |
Mohammed Arif | |
Architecture, Technology and Engineering, University of Brighton, Brighton, UK | |
1 | |
Abstract | |
Purpose – The issues in the current Built Environment Higher Education (BEHE) curricula | |
recognise a critical need for enhancing the quality of teaching. This paper aims to identify the | |
need for a best practice in teaching within Built Environment Higher Education (BEHE) | |
curricula and recommend a set of drivers to enhance the current teaching practices in the Built | |
Environment (BE) education. The study focused on section one of the National Student Survey | |
(NSS) – Teaching on my course; with a core focus on improving student satisfaction, making | |
the subject interesting, creating an intellectually stimulating environment, and challenging | |
learners. | |
Methodology- The research method used in this study is the mixed method, 1.) A document | |
analysis consisting of feedback from undergraduate students, and 2.) A closed-ended | |
questionnaire to the academics in the BEHE context. More than 375 student feedback were | |
analysed to understand the teaching practices in BE and fed forward to developing the closedended questionnaire for 23 academics, including a Head of school, a Principal lecturer, Subject | |
leads and lecturers. The data was collected from Architecture, Construction Management, Civil | |
Engineering, Quantity Surveying, and Building surveying disciplines representing BE context. | |
The data obtained from both instruments were analysed with content analysis to develop 24 | |
drivers to enhance quality of teaching. These drivers were then modelled using the Interpretive | |
Structural Modelling (ISM) method to identify their correlation and criticality to NSS section | |
one themes. | |
Findings – The study revealed 10 independent, 11 dependent and 3 autonomous drivers, | |
facilitating the best teaching practices in BEHE. The study further recommends that the drivers | |
be implemented as illustrated in the level partitioning diagrams under each NSS section one to | |
enhance the quality of teaching in BEHE. | |
Practical implications: The recommended set of drivers and the level partitioning can be set | |
as a guideline for academics and other academic institutions to enhance quality of teaching. | |
This could be further used to improve student satisfaction and overall NSS results to increase | |
the rankings of academic institutions. | |
Originality/Value: New knowledge can be recognised with the ISM analysis and level | |
partitioning diagrams of the recommended drivers to assist academics and academic | |
institutions in developing quality of teaching. | |
Keywords – Enhancing Teaching Quality, Built Environment Higher Education, Learning in | |
post-COVID, National Student Survey (NSS), Teaching on my course. | |
2 | |
Introduction | |
The United Kingdom’s Higher Education (HE) sector is focused on improving the | |
quality of teaching (Santos et al., 2020; Tsiligiris and Hill, 2019; Matthews and Kotzee, 2019). | |
HE providers continuously attempt to enhance learning standards by assuring teaching | |
developments within courses. Hence, knowledge providers make considerable efforts to | |
develop pedagogy within BE academia (Van Schaik et al., 2019). However, developing | |
teaching within a discipline-specific is challenging (Ovbiagbonhia et al., 2020; McKnight et | |
al., 2016). Moreover, Tsiligiris and Hill (2019) and Welzant (2015) noticed an eminent | |
knowledge gap in enhancing quality within the current HE curricula. The global COVID | |
pandemic has exacerbated the challenges related to teaching and learning within higher | |
education (Allen et al., 2020). Both the learners and academics face challenges in maintaining | |
quality in HE, especially within the current focus on digitised and Virtual Learning | |
Environment (VLE) teaching (Arora and Srinivasan, 2020; Bao, 2020). This study explores | |
best practices to improve the quality of teaching across the Built Environment Higher | |
Education (BEHE). Thus, the study investigates section one in NSS questions, namely “The | |
Teaching on my Course”. The main emphasis is given to four central themes within the NSS | |
section one of the questionnaires reflects on whether “the staff is good at explaining things”, | |
“made the subject interesting”, how “the course is intellectually stimulating”, and “how the | |
course has challenged students to achieve the best work”. Many contemporary learning and | |
teaching strategies are present in curriculum development (Tsiligiris and Hill, 2019). However, | |
a significant knowledge gap is present in identifying the best use of each theme under NSS | |
Section one and developing a best practice to enhance quality of teaching. The data obtained | |
by section one of NSS in 2019, 2020 and 2021 highlights the need to enhance teaching in the | |
BE curricula. NSS records that the satisfaction level has reduced by 6% in the average | |
minimum scoring criteria of “teaching in my course” in 2021 (Office for Students, 2020). It | |
further identifies that the average percentile of NSS section one of 2021 was 84% for all | |
subjects, whereas BE scored only 79%. This score provides insight into how BE performs | |
compared to other subjects within the UK's HE context. Issues in teaching and the COVID | |
pandemic may have influenced the significant reduction in NSS score (Arora and Srinivasan, | |
2020; Allen et al., 2020). Therefore, this study aims to identify best practices and enhance the | |
quality of teaching in BEHE. | |
3 | |
1.0 Literature Review | |
1.1 Explaining the subject | |
Increasing understanding in an area of expertise is vital in providing pedagogical | |
education. Literature (Ferguson, 2012) suggests that teaching helps identify cognition within | |
human behaviour and gain insight into relevant information while relating to exposure and | |
experience within the subject area within various levels of learning. The levels of learning lead | |
to further considerations of self-academic development in students. Findings from Gollub | |
(2002) suggest that a better understanding of learning is facilitated around the concepts and | |
principles of the subject matter. Moreover, Andersson et al. (2013) highlight that the students | |
tend to generate more knowledge by acquiring prerequisite knowledge and utilising them to | |
increase their understanding of the subject. In addition, Andersson et al. (2013) suggest that | |
learners embrace prior learning to understand interactive learning better. However, in a | |
classroom context, the multi-disciplinary orientation of BE makes it challenging to address | |
prerequisite knowledge and provide in-depth understanding to learners (Waheed et al., 2020; | |
Dieh et al., 2015). Thus, BE academics need to devise module delivery aligning to the subject | |
area while reflecting previous knowledge in enhancing knowledge. | |
Moreover, Lai (2011) and McKnight et al. (2016) stressed the importance of interactive | |
learning within pedagogical education. These studies highlight that learners find that | |
knowledge is constructive when peer-reviewed; thus, providing a better environment to | |
embrace enhanced knowledge of BE understanding is essential. Moreover, Guo and Shi (2014) | |
further explain the uses of collaboration which increases understanding using active strategies. | |
However, Guo and Shi (2014) overlook that innovation embedded in learning effectively | |
brings collaboration and utilises modern approaches within the classroom context. | |
Furthermore, the current pandemic has encouraged active strategies such as blended learning | |
and digitised technologies (Allen et al., 2020) within a VLE. However, challenges were | |
identified in the definitive use of VLE, which did not advocate sub-teaching concepts such as | |
interactive learning and context-based knowledge (Waheed et al., 2020). The "silent" | |
classrooms are not appropriate for the transfer and sharing of technical knowledge in BEHE. | |
Ultimately, the prospect of teaching signifies innovative approaches and the extent of using | |
VLEs (Virtual Learning Environment)’ to make a subject interesting to foster interactive | |
learning through the co-creation of knowledge to promote a clear explanation of a BE subject. | |
1.2 Making the Subject Interesting | |
The students do not engage in situations where they will no longer see value or interest | |
in the content taught (Fraser 2019). Lozano et al. (2012) state that analytical competency is | |
achieved using theory taught relevant to industrial capacity, creating a platform for students to | |
participate in learner engagement. Both Fraser (2019) and Lozano et al. (2012) suggest that | |
collaboration between academics and learner is significant to active learner engagement in | |
developing interest in subjects learnt. Therefore, engagement and collaboration are considered | |
the most critical challenges in an active learning environment (Hue and Li 2008 and Scott | |
2020). However, a knowledge gap exists in measuring collaboration that shows competitive | |
learning and the cooperation of learners with the academic. The social, psychological, and | |
academic characteristics build learners’ perception of collaborative work (Uchiyama and | |
Radin, 2008). Out of the above, Hmelo-Silver et al. (2008) established the importance of the | |
social entity of collaboration. That suggests associating the benefits of social support by | |
establishing a positive atmosphere within collaborative learning. Also, implementing a | |
collaborative approach to learning enhances diversity within the BEHE. | |
4 | |
Furthermore, engagement benefits the learners' psychological aspects, reflecting on | |
academic performance and mental well-being (Clough and Strycharczyk 2012). It signifies | |
student-centric education reflecting on the psychological characteristic of developing students' | |
self-esteem, thus increasing interest in the subject. Secondary elements in BE teaching, such | |
as site visits, guest lectures and other innovative concepts, could be denoted as examples (Van | |
Schaik, 2019). Although Clough and Strycharczyk (2012) consider psychological | |
characteristics, the study does not signify the prominence of critical thinking obtained from | |
collaboration. Bye, et al. (2007) imply that critical thinking is needed to make content more | |
meaningful and collaborative. However, collaborative teaching methods have been limited in | |
considering teaching during the COVID pandemic (Blundell et al., 2020); thus, more research | |
is needed to identify the means of developing learner engagement in VLEs. In addition, this | |
study identifies learner engagement and fostering collaboration demands stimulating learners | |
and making the subject interesting. However, a significant knowledge gap exists in addressing | |
the findings to make a subject interesting in the current BEHE context. | |
1.3 Intellectual stimulation of learners | |
Studies identify that learners become stimulated when the subject is interesting and | |
motivated to overcome the challenging nature of the course structure (Bolkan and Goodboy, | |
2010). Moreover, student motivation and intellectual stimulation increase when subject matter | |
reflects learner interests (Baeten et al., 2010). Furthermore, intellectual stimulation improves | |
when academics provide authentic, current, industry-related practices relevant to learners’ | |
academic learning. Bolkan et al. (2011) suggested implementing active learning to enhance | |
learners’ intellectual effort. Thus, intellectual stimulation needs to be integrated through | |
problem-solving teaching methods, context-based learning, realistic case studies and setting | |
clear expectations and motivation for student excellence. | |
Chickering and Gamson (1999) suggested that summarising ideas, reviewing problems, | |
assessing the level of understanding and concluding on learning outcomes at the end of a | |
learning session stimulates learners. Furthermore, Tirrell and Quick (2012) outlined | |
opportunities to direct learners by contrasting fundamental theories and applying theory to real | |
life. However, the researchers overlook the fact that stimulation could be provided outside the | |
learning environment. The current practice within BE academia involves guest lectures and | |
arranging site visits to explain the classroom bandwidth and stimulate learners (Chen and Yang, | |
2019). Furthermore, Educational Development Association (2013) signifies the influence of | |
Professional Standards and Regulatory Bodies (PSRBs) within BE learning. The use of PSRBs | |
deems the guarantee in using industry-appropriate knowledge delivered. In addition to making | |
the subject interesting, PSRBs would further stimulate the learner to develop academic skills | |
and competencies. Thus, the learners tend to foresee the industry-standard reflecting the | |
theories, advocating intellectual stimulation. Nonetheless, these strategies are disrupted by the | |
COVID pandemic's current measures for virtual module delivery (Allen et al., 2020). Thus, the | |
use of the strategies was to be integrated into digitisation platforms and integrated with the | |
VLE teaching methods. In contrast, a measure of best practice is eminent in contemplating | |
using VLE platforms' strategies in addressing the COVID situation and further development in | |
the BEHE curriculum. | |
The stimulation provided at the elementary level in BE learning is vital for interaction | |
between the learners (Jabar and Albion 2016). The collaboration between learners and | |
knowledge providers is vital for intellectual stimulation, and the use of concepts such as VLE | |
further promotes stimulation (Block, 2018; Marshalsey and Madeleine, 2018). However, | |
identifying fundamental digitisation approaches and innovative teaching methods such as | |
5 | |
blended learning or flipped classroom will signify the commitment toward stimulated learning. | |
Stimulation through quizzes and experimental studies will improve the clarity of knowledge | |
provided through VLE. In addition, stimulation in a VLE through various digital learning | |
strategies for students can promote challenging learners. However, some views on the current | |
teaching practices in the COVID era denote that VLE is not the perfect solution for academic | |
development (Bao, 2020). Academics need to know to what extent VLE should be integrated | |
and how the best practice in BE teaching should be developed. | |
1.4 Challenging Learners | |
Knowledge providers who promote intellectual stimulation create a challenging | |
learning environment that empowers the learners and promotes cognitive and affective learning | |
(Bolkan and Goodboy, 2010). Kohn Rådberg et al. (2018) discuss that intellectual stimulation | |
depends on the intrinsic motivation to be challenged in critical learning contexts. Thus, the | |
learners require encouragement in identifying intellectual stimulus in acknowledging the | |
knowledge gained in HE curricula. Altomonte et al. (2016) explain how learners persist in their | |
learning process much longer in a challenging environment than in a traditional learning | |
environment. A plethora of more contemporary literature (Avargil et al., 2011; Chen and Yang, | |
2019) addresses specific learning strategies such as project-based and context-based learning, | |
which acts as a stimulus in developing challenging environments in the current BE learning | |
context. | |
A study carried out by Han and Ellis (2019) has detailed revelation on in-depth learning | |
approaches to learning and 'higher learning outcomes'. However, it fails to identify the | |
relationship between challenging learners and their impact on academic and cognitive learning | |
strategies. Learners often respond more to challenges made via competitive elements such as | |
quizzes, polls, and other simpler assessments in module delivery (Chen and Yang, 2019). It is | |
vital to understand that a challenging learning environment is not a mere self-testing method | |
for assessment in curricula but rather an instrument for continuous academic improvement | |
(Darling-Hammond et al., 2019). Further, learners will benefit from self-preparing concerning | |
the knowledge content discussed in the classroom. It further influences advanced knowledge | |
gained through research rather than knowledge transmission provided in the classroom. | |
Challenging learners create more opportunities to collaborate and increase intellectual | |
stimulation (Boud et al., 2018; Gomis et al., 2021). However, Boud overlooks counter | |
motivation created by learners in challenging, which results in innovation. Furthermore, | |
challenging students could be identified to apprehend stimulation and provide informative | |
judgment on their academic experience. By challenging the learner, the academic could | |
evaluate the aptitude and growth (Hamari et al., 2016). The current practice in academia during | |
the COVID pandemic deemed the use of VLE in setting out quizzes and other evaluation | |
methods to stimulate and challenge learners (Block, 2018; Bao, 2020). Hence, using digitised | |
platforms in an active learning environment is paramount in advancing teaching in BE. | |
However, these VLE instruments could be further integrated with the module delivery plan to | |
optimise challenging learners and enhance academic development. | |
2.0 Methodology | |
2.1 Participants & Materials | |
‘Teaching on my course’ of the NSS questionnaire emphasises four questions related | |
to ‘explain things, make the subject interesting, create an intellectually stimulating | |
environment, and challenge the learners’. Documental analysis and questionnaire surveys with | |
6 | |
separate samples were identified as the potential research tools optimal for the study. Document | |
analysis is adopted to analyse a sample of 375 Mid-Module Reviews (MMR) from the students | |
from level three to level six in contemplating the finding from literature focusing on the four | |
questions in NSS section one. The documental data were categorised into themes where | |
students identified how the teaching helped them establish the key elements that were positive | |
about the module. This analysis uses 375 samples, assuming the confidence level of 95% and | |
the margin of error at 5%. | |
The themes identified from the documental analysis were used to identify and develop | |
the survey framework and questionnaire conducted for the academics. The closed-ended | |
questionnaire survey refined the documental data findings and established the gap between the | |
existing and best practices. Departments of Architecture, Construction Management, Civil | |
Engineering, Quantity Surveying, and Building surveying were selected to represent the BE | |
discipline to obtain validated and reliable data making the survey sample 20 academics. Four | |
sets of academics were selected under each discipline based on their title, including a | |
Professor/Reader, two Senior Lecturers and a Lecturer from each BE discipline. This approach | |
helped to recruit four participants from each discipline in BE. Additionally, three participants, | |
a Head of the school, a Principal lecturer, and a Subject lead, were included, bringing the | |
sample size to 23 participants. A critical focus of the latter three participants was to eliminate | |
unconscious bias in feedback received from students and endorse validity, reliability and | |
transferability of the data collected and modelled through ISM analysis. The data obtained from | |
the questionnaire assisted in developing the drivers in enhancing the best practice of teaching | |
in the BEHE context. | |
2.2 Research Procedure | |
A systematic approach to data collection incorporating the literature review, document | |
analysis, and questionnaire survey has allowed an in-depth understanding of current BEHE | |
teaching and learning. The substantial data collected from documental analysis and | |
questionnaire survey needed to be correlated with the NSS theme establishing relationships on | |
improving BEHE teaching and learning. Thus, the data was modelled using the Interpretive | |
Structural Modelling (ISM) tool to find critical drivers and correlation of each driver to the | |
theme of NSS section one. The drivers identified through the data analysis were used in the | |
ISM analysis. Afterwards, a reachability matrix was developed from modelling the drivers | |
through a “Structural Self-interaction Matrix” (SSIM). A “Matrice d’Impacts CroisesMultiplication Appliqúe a Classement” (MICMAC) was further developed to identify what | |
factors need to be emphasised in enhancing teaching strategies ascertaining the degree of the | |
relationships between the drivers found through SSIM. The MICMAC enabled categorising | |
data obtained into independent, dependent and autonomous clusters to establish a best practice | |
framework for teaching enhancement in BEHE. The data derived from each analysis was | |
factored in when developing the level partitioning of each driver. Moreover, the ISM level | |
partitioning illustrated a critical correlation of each driver under NSS themes and emphasised | |
implications in the BEHE context. Finally, this study's general conclusions are drawn from the | |
level partitioning and presented as the recommended strategies for developing teaching | |
enhancement in BEHE. | |
3.0 Analysis | |
Three Hundred and seventy-five (375) MMRs (Mid Module Reviews) were examined. | |
Students were given three questions; how the module is undergoing; what is good/bad, and | |
suggestions to improve module delivery. A subjective evaluation by academics was made of | |
the reviews provided, and themes were identified in the given student suggestions. This | |
7 | |
evaluation identifies 24 drivers directly influencing the teaching practices highlighted by the | |
four NSS questions. The identified drivers were collated and categorised into the specific NSS | |
questions/themes, and an ISM analysis was carried out. A pair-wise relationship is mapped to | |
the Structural self-interaction matrix (SSIM) using a binary matrix based on the above data | |
gathered through the closed-ended questionnaire survey from the teaching staff. The binary | |
matrix was used to create the MICMAC graph in recognising the influential drivers that | |
enhance HE teaching. Furthermore, a level partitioning was carried out to find the interrelationship of each driver and recognise the sequential order of implications within the BEHE | |
context. Based on the characteristics of the independent cluster, these drivers are considered | |
fundamental to the system. These drivers are considered incredibly important for enhancing | |
teaching in BEHE. The drivers based on the characteristics of the dependent cluster are | |
considered a necessity for accommodating the independent drivers. Thus, dependant drivers | |
directly influence the planning and module development rather than being fundamental to | |
teaching. The drivers based on the characteristics of the autonomous cluster are considered | |
fundamental unimportant in the system. | |
The study reveals that critical emphasis needs to be given to promote active learning | |
and provide in-depth understanding when the academic explains module content. Promoting | |
collaboration, student engagement and focussing on student-centric approaches occurred in the | |
independent cluster to make the subject interesting. Promoting intellectual stimulation by | |
enhancing interaction between the learner and the academic was considered fundamental in | |
enhancing active learner stimulation. Challenging the learner by providing motivation, | |
promoting self-assessment for continuous improvement, challenging learning culture through | |
learner motivation and helping the learner develop an action plan on career progression was | |
illustrated in the independent cluster making the drivers deemed fundamental. Thus, | |
implementing these drivers would facilitate the best practice in HE teaching. | |
Furthermore, dependent drivers identified through the study will be beneficial in | |
facilitating the independent drivers mentioned above. An interim assessment opportunity and | |
guidance given through a formative feedback session were recognised as dependent drivers in | |
explaining the module content. Use of various media in explaining the subject content, | |
executing cognitive approaches, arranging site visits (where applicable) or site walk-throughs, | |
guest lecturers, augmentation in lecture material, and presenting real-world examples in | |
lectures were identified as dependent drivers in making the subject interesting. Intellectual | |
stimulation by challenging learners in problem-based learning and assessment guidance | |
through assessment rubrics and question-based learning were identified under the dependent | |
cluster. Contrary to widespread belief, revisiting previous knowledge and reflecting on module | |
content with the pathway provided by PSRB in explaining module content and reflecting more | |
on the industry-led practices in intellectually stimulating students were in the autonomous | |
cluster. However, this is not because the said drivers have little influence on the system, but | |
the drivers are facilitated by other (both dependent in independent) drivers. | |
To generalise the critical findings from the MICMAC analysis, the following Table 1 | |
illustrates the fundamental drivers (independent), facilitating drivers (dependent), and noninfluential/already accommodated drivers (autonomous) in enhancing teaching in HE. The | |
drivers are categorised into the four performance indicators depicted by Section 1 of the NSS | |
to clarify and ease interpretation. Thus, academics and academic institutions can implement | |
these drivers to promote teaching practices within BEHE. | |
8 | |
Table 1: Categorisation of Drivers | |
Section 1: The teaching on my course | |
NSS Section | |
Q1 – | |
Staff is good | |
at explaining | |
things. | |
Q2 – | |
Staff have | |
made the | |
subject | |
interesting. | |
Q3 – | |
The course is | |
intellectually | |
stimulating. | |
Q4 – | |
My course | |
has | |
challenged | |
me to | |
achieve my | |
best work. | |
Drivers identified through the study | |
D1 - Promoting active learning | |
D2 - Providing an in-depth | |
understanding | |
D3 - Revisiting previous knowledge. | |
D4 - Interim assessment opportunity. | |
D5 - Guidance given through | |
formative feedback session. | |
D6 - Reflecting module content with | |
the pathway provided by PSRB. | |
D7 - Promoting collaboration | |
D8 - Focussing on student-centric | |
approaches. | |
D9 - Promoting student engagement. | |
D10 - Use of a variety of media in | |
explaining the subject content. | |
D11 - Executing cognitive approaches. | |
D12 - Arranging site visits (where | |
applicable) or site walkthroughs. | |
D13 - Guest lecturers | |
D14 - Augmentation in lecture | |
material | |
D15 - Presenting real-world examples | |
in lectures. | |
D16 - Promoting intellectual | |
stimulation. | |
D17 - Enhance interaction between the | |
learner and the academic. | |
D18 - Reflecting more on industry-led | |
practices. | |
D19 - Challenging learners in | |
problem-based learning. | |
D20 - Promoting self-assessment for | |
continuous improvement. | |
D21 - Challenging learning culture | |
through learner motivation. | |
D22 - Assessment guidance through | |
assessment rubrics. | |
D23 - Question-based learning. | |
D24 - Having an action plan on career | |
progression. | |
SISM Coordinates (I,j) | |
10 | |
17 | |
2 | |
11 | |
13 | |
24 | |
6 | |
6 | |
13 | |
10 | |
10 | |
7 | |
10 | |
13 | |
6 | |
10 | |
21 | |
14 | |
15 | |
15 | |
5 | |
11 | |
18 | |
17 | |
4 | |
4 | |
18 | |
2 | |
19 | |
4 | |
9 | |
19 | |
8 | |
15 | |
11 | |
9 | |
13 | |
11 | |
9 | |
13 | |
7 | |
19 | |
12 | |
16 | |
6 | |
11 | |
5 | |
20 | |
MICMAC | |
Categorisation | |
Independent | |
Independent | |
Autonomous | |
Dependent | |
Dependent | |
Autonomous | |
Independent | |
Independent | |
Independent | |
Dependent | |
Dependent | |
Dependent | |
Dependent | |
Dependent | |
Dependent | |
Independent | |
Independent | |
Autonomous | |
Dependent | |
Independent | |
Independent | |
Dependent | |
Dependent | |
Independent | |
9 | |
4.0 Discussion and Recommendations | |
This study recognises the significant need to enhance quality of teaching in BEHE. | |
Both the literature and primary data collection recognised a substantial number of suggestions | |
for enhancing teaching practices. The strategies/drivers obtained from primary and secondary | |
data are categorised into themes and analysed according to their influence/driver capability | |
with questions put forth by NSS section 1. The outcome of the discussion will be the level | |
partitioning of the identified drivers, which will illustrate the accurate implementation in | |
increasing quality of HE teaching. The below section further finds the identified drivers and | |
their correspondence with the NSS themes under section one. | |
3.1 Explaining the subject | |
The root of explaining the subject depends on how the learner clarifies the knowledge | |
criteria. Gollub (2002), Ferguson (2012), and McKnight et al. (2016) prove that active learning | |
is highly dependent on the levels of understanding. Providing a higher understanding of the | |
subject matter, the context of knowledge transferred, revisiting the experience learnt and | |
promoting interactive learning are critical academic performance enhancers (McKnight et al., | |
2016; Guo and Shi, 2014; Eames and Birdsall, 2019). The level partitioning developed from | |
the research findings shown in figure 1 below identifies that revisiting knowledge (D3) and | |
reflecting on the (D6) PSRB pathway was the least priority at level III. Even though they are | |
at level III, they will aid other drivers with in-depth understanding (D2) to better explain | |
module content. Both literature (Lozano et al., 2012; Ovbiagbonhia et al., 2020) and data state | |
that the module leader needs to identify how to merge academic and professional competency | |
gaps in providing an in-depth understanding of BE curricula. However, the research findings | |
highlight the importance of the availability of interim assessment guidance. The use of interim | |
assessment opportunities (D4) and guidance given through formative feedback (D5) should be | |
considered significant in developing the module. Emphasis is on module leaders, and | |
academics need to develop and deliver the module content facilitating formative | |
assessment/feedback. The study identifies that promoting active learning and in-depth | |
understanding is fundamental and at Level I in enhancing knowledge delivery. The current | |
studies (Allen et al., 2020) as pedagogic theories and platforms such as VLE in promoting | |
active learning by using quizzes and other media to engage students have deemed the best | |
strategies in enhancing active learning. | |
Figure 1: Level partitioning of Drivers on NSS Q1 - Staff is good at explaining the subject | |
10 | |
3.2 Making the subject interesting | |
The literature establishes that the learning culture of the modern-day classroom has | |
evolved. Hue and Li (2008) and Hmelo-Silver et al. (2008) identified the core context of | |
collaboration and its’ effect on subject engagement. The widespread belief that the current | |
pedagogical paradigm on digitised practices promotes collaborations (Siew, 2018; Hamari et | |
al., 2016) influences authentic, industry-related content, especially within the BE curricula. | |
Moreover, the literature review identifies that BE knowledge providers promote digitised | |
learning concepts in HE. Findings from primary data also recognise approaches in | |
accommodating augmented concepts and focusing on digitised learning environments | |
facilitating such learning. The level partitioning developed from the research findings shown | |
in figure 2 below illustrates both facilitating drivers and fundamental drivers. The facilitating | |
drivers are: execute cognitive approaches (D11), arrange site visits or site walk-throughs (D12), | |
guest lecturers (D13), augmentation/digitisation in lecture material (D14), and present realworld examples in lectures (D15). Since these drivers are positioned at level II, these drivers | |
(D10 to D15) are considered to facilitate module delivery's fundamental drivers. However, it | |
is identified that D13, D10 and D14 facilitate each other and help facilitate D11 and D15, which | |
facilitate D7 and D9, respectively. The study further strengthens the argument that promoting | |
student collaborations (D7), engagement (D9) and focussing on student-centric approaches | |
(D8) are fundamental in making the subject content interesting. It further revealed that both D7 | |
and D9 facilitated D8 in making the subject interesting. The ISM level partitioning positioned | |
them at Level I due to their fundamental influence in making the subject interesting. | |
A critical finding from the study is that using a variety of media (D10) to explain the | |
subject brings innovation to the classroom. The research findings signify that digitisation must | |
be considered a key facilitator but not a fundamental element in pedagogic development. | |
Further to the evidence of earlier studies, blended learning and flipped classroom techniques | |
are considered paramount in carrying out collaborative knowledge in group learning (Allen et | |
al., 2020). Documental analysis insists on combining traditional and digitised media to deliver | |
module content. Findings from documental analysis reveal that students prefer traditional | |
module delivery aligning with digitised recordings for revisiting knowledge. Thus, digitisation | |
needs to be a facilitator rather than being promoted to a fundamental driver in teaching HE. It | |
is further applicable to the current COVID learning context, where online learning has | |
dominated pedagogical implementation (Bao, 2020). This study presents critical evidence that | |
digitisation is not the case in enhancing teaching practices but rather an opportunity to facilitate | |
independent drivers in enhancing HE learning. | |
Figure 2: Level partitioning of Drivers on NSS Q2 - Staff have made the subject interesting | |
11 | |
3.3 Intellectual stimulation of learners | |
Baeten et al. (2010), Bolkan et al. (2011), and Jabar and Albion (2016) identify that | |
intellectual stimulation critical in HE student progression. Both literature by Baeten et al. | |
(2010) and Bolkan et al. (2011) and research findings reveal that a straightforward ‘lecturing’ | |
where the knowledge is being pushed to the learner with less reflection and context is | |
considered adverse to academic progress and performance. Data and literature (Van Schaik, | |
2019) disagree with adopting industry-led practices (D18) to deliver the module content, thus, | |
positioning it at Level III. findings reveal that this is due to drivers such as site visits, guest | |
lectures, and focusing on real-world context were already adhered to make the subject | |
interesting. However, these drivers are prominent in challenging learners by using problembased (D19) and industry-led contexts in learning. Tirrell and Quick (2012) and Jabar and | |
Albion (2016) further emphasised innovative teaching and effective teaching methods, such as | |
problem-based learning (D19). However, the research findings emphasise that such practice is | |
not fundamental but crucial in increasing intellectual stimulation since it is positioned at Level | |
II in ISM level partitioning. However, it recognises the influence of D19 in facilitating both | |
D16 and D17. The study emphasises intellectual stimulation (D16) in module development and | |
that enhancing learner-academic interaction (D17) is fundamental and is self-facilitating to | |
make the course intellectually stimulating. The ISM level partitioning has positioned them in | |
Level I, which denotes fundamental influence over intellectual stimulation. The findings | |
further show the benefits of utilising digitised tools or in-class activities to promote intellectual | |
stimulus, especially within the COVID pandemic (Arora and Srinivasan 2020) and for | |
disciplines such as BE, where a vast knowledge content (e.g. architectural, engineering, | |
surveying and management) needs to be reflected. | |
Figure 3: Level partitioning of Drivers on NSS Q3 - The course is intellectually stimulating | |
12 | |
3.4 Challenging Learners | |
The literature review (Darling-Hammond et al., 2019 and Boud et al., 2018) identifies | |
those challenging students could increase the probability of academic progression. However, | |
Kohn Rådberg et al. (2018) stressed the deficiencies in academic progression regarding the | |
lack of motivation and drivers, which does not aid intellectual stimulation. The literature | |
provides many strategies for promoting a challenging culture within the learning environment; | |
however, the surplus of theories makes the implementation complicated and time-consuming | |
(Boud et al., 2018; Bolkan, 2010). Assessment guidance through assessment rubrics (D22) and | |
question-based learning (D23) are at Level II at ISM Level portioning. Contradicting the | |
literature (Ellis and Hogard, 2018), the research findings illustrate that D22 and D23 were not | |
fundamental to challenging students but influential in facilitating D21 in enabling students to | |
achieve their best work. Also, this could be due to digitalisation being a prominent aspect in | |
enabling these drivers into the HE curriculum. This study identifies that the fundamental | |
drivers as promoting self-assessment opportunities (D20), motivating the student through a | |
challenging culture of knowledge provision (D21), and developing an action plan on career | |
progression/continuous improvement (D24) is positioned at Level I in ISM analysis. It further | |
highlights that D21 and D24 facilitate D20, promoting continuous student improvement. Thus, | |
the analysis deems that the module leader/lead academic needs to consider the self-assessment | |
techniques, challenging learning culture, and action plan for career development in developing | |
the module and enhancing teaching in HE. | |
Figure 4: Level partitioning of Drivers on NSS Q4 - Course has challenged to achieve the best work | |
6.0 Conclusions | |
This study establishes drivers to enhance the quality of teaching in BEHE across the | |
range of students that reflects on the results of section 1 of NSS. The findings are novel as the | |
study discusses drivers and illustrates implementation to improve quality of teaching within | |
the four NSS themes. The main findings from the literature review set up a significant room | |
for improvement in teaching and pedagogy to enhance student performance in BEHE. The | |
practical implications of this study are that the identified drivers could help academics and | |
students increase understanding in conjunction with the lectures that deliver in-depth | |
knowledge through practical sessions. As illustrated in the figures, the level partitioning will | |
enable academics to focus on significant pedagogical themes and enforce strategies. As the | |
theme refers to the NSS guidelines, the drivers developed could assist HE institutions in | |
13 | |
obtaining better results for the NSS survey. Finally, the combined set of figures could form a | |
framework for enhancing quality of teaching within HE curricula. | |
The suggestions for student engagement, developing a stimulating learning | |
environment, and challenging students need various collaborative online and face-to-face | |
teaching approaches. The literature set up another critical part in providing context on module | |
background and content. Drivers further reinforced that promoting active learning and in-depth | |
understanding was fundamental in improving teaching in the BEHE context. Moreover, the | |
study's primary data proved that teaching and learning, resources, standards, and assessments | |
could provide a better understanding to students and could be further facilitated by the abovementioned independent drivers. | |
This interpretation contrasts that implementing innovative practices in knowledge | |
transfer such as blended learning, flipped classroom and group learning are vital for stimulating | |
learners. Promoting collaboration, student engagement, and focusing on student-centric | |
approaches were considered independent, but these drivers facilitate other drivers in making | |
the subject interesting. Moreover, promoting intellectual stimulation, enhancing interaction | |
between the learner and the academic, promoting self-assessment for continuous improvement, | |
challenging learning culture through learner motivation, and having an action plan on career | |
progression are recognised as independent drivers in advancing teaching in BEHE. | |
The study identified several dependent factors, such as aligning the module content | |
with the PSRB requirements and emphasising personal and career development benefits. | |
However, the current learning practices need to be integrated with the online delivery platforms | |
to provide knowledge and challenge learners for better learning practice. Enforcing quizzes | |
and real-world examples through a digital platform proves vital in helping independent drivers | |
for intellectual stimulation and challenging the learner for an active learning atmosphere. | |
Finally, a unique finding is that online delivery in the current situation (COVID 19) | |
brings more challenges since the lectures are either blended or delivered online. All the | |
independent and dependant drivers for engaging students, increasing understanding, inspiring | |
and challenging learners remain unchanged. The current situation also demands training for | |
the lecturers on various tools that can help engage, challenge, stimulate, and increase the | |
learners' understanding. However, the lecturers may now need to use multimedia tools to | |
accommodate the suggestions from this study and facilitate the independent drivers to enhance | |
quality of teaching in BEHE. Further research could be carried out by involving a higher | |
sample from different HE institutes around the globe to develop a global framework. Also, | |
further research is needed to reflect on how quality of teaching influences student learning | |
opportunities, assessment and feedback, academic support, and learning resources. | |
Acknowledgement | |
The data obtained for the below paper was based on a project guided by a steering | |
committee within the University of Wolverhampton, chaired by Professor Mohammed Arif. | |
Among the committee members, credit needs to be given to Dr David Searle, Dr Alaa Hamood | |
and Dr Louise Gyoh for their significant input on the data collection. Furthermore, the student | |
and academic participants at the University of Wolverhampton need recognition for their | |
insightful comments. | |
14 | |
7.0 Reference List | |
Allen, J., Rowan, L. and Singh, P., 2020. Teaching and teacher education in the time of COVID-19. | |
Asia-Pacific Journal of Teacher Education, 48(3), pp.233-236. | |
Altomonte, S., Logan, B., Feisst, M., Rutherford, P. and Wilson, R. (2016). Interactive and situated | |
learning in education for sustainability. International Journal of Sustainability in Higher | |
Education, 17(3), pp.417-443. | |
Andersson, P., Fejes, A. and Sandberg, F., 2013. Introducing research on recognition of prior learning. | |
International Journal of Lifelong Education, 32(4), pp.405-411. | |
Arora, A. and Srinivasan, R., 2020. Impact of Pandemic COVID-19 on the Teaching – Learning | |
Process: A Study of Higher Education Teachers. Prabandhan: Indian Journal of Management, | |
13(4), p.43. | |
Avargil, S., Herscovitz, O. and Dori, Y., 2011. Teaching Thinking Skills in Context-Based Learning: | |
Teachers’ Challenges and Assessment Knowledge. Journal of Science Education and | |
Technology, 21(2), pp.207-225. | |
Baeten, M., Kyndt, E., Struyven, K. and Dochy, F., 2010. Using student-centred learning environments | |
to stimulate deep approaches to learning: Factors encouraging or discouraging their | |
effectiveness. Educational Research Review, 5(3), pp.243-260. | |
Bao, W., 2020. COVID ‐19 and online teaching in higher education: A case study of Peking University. | |
Human Behavior and Emerging Technologies, 2(2), pp.113-115. | |
Block, B., 2018. Digitalization in engineering education research and practice. 2018 IEEE Global | |
Engineering Education Conference (EDUCON). | |
Blundell, C., Lee, K. and Nykvist, S., 2020. Moving beyond enhancing pedagogies with digital | |
technologies: Frames of reference, habits of mind and transformative learning. Journal of | |
Research on Technology in Education, 52(2), pp.178-196. | |
Bolkan, S. and Goodboy, A. (2010). Transformational Leadership in the Classroom: The Development | |
and Validation of the Student Intellectual Stimulation Scale. Communication Reports, 23(2), | |
pp.91-105. | |
Bolkan, S., Goodboy, A., and Griffin, D. (2011). Teacher Leadership and Intellectual Stimulation: | |
Improving Students' Approaches to Studying through Intrinsic Motivation. Communication | |
Research Reports, 28(4), 337-346. doi: 10.1080/08824096.2011.615958 | |
Boud, D., Ajjawi, R., Dawson, P. and Tai, J. (2018). Developing Evaluative Judgement in Higher | |
Education. 1st ed. London: Routledge. | |
Bowen, T. (2017). Assessing visual literacy: a case study of developing a rubric for identifying and | |
applying criteria to undergraduate student learning. Teaching in Higher Education, 22(6), | |
pp.705-719. | |
Bye, D., Pushkar, D., and Conway, M. (2007). Motivation, Interest, and Positive Affect in Traditional | |
and Nontraditional Undergraduate Students. Adult Education Quarterly, 57(2), 141-158. doi: | |
10.1177/0741713606294235 | |
Chen, C. and Yang, Y., 2019. Revisiting the effects of project-based learning on students’ academic | |
achievement: A meta-analysis investigating moderators. Educational Research Review, 26, | |
pp.71-81. | |
15 | |
Chickering, A. W., & Gamson, Z. F. (1999). Development and adaptations of the seven principles for | |
good practice in undergraduate education. New Directions for Teaching and Learning, 80, 75– | |
81. | |
Clough, P., and Strycharczyk, D. (2012). Developing mental toughness (1st ed.). London: KoganPage. | |
Darling-Hammond, L., Flook, L., Cook-Harvey, C., Barron, B. and Osher, D., 2019. Implications for | |
educational practice of the science of learning and development. Applied Developmental | |
Science, 24(2), pp.97-140. | |
Dieh, M., Lindgren, J. and Leffler, E., 2015. The Impact of Classification and Framing in | |
Entrepreneurial Education: Field Observations in Two Lower Secondary Schools. Universal | |
Journal of Educational Research, 3(8), pp.489-501. | |
Eames, C., and Birdsall, S. (2019). Teachers’ perceptions of a co-constructed tool to enhance their | |
pedagogical content knowledge in environmental education. Environmental Education | |
Research, 1-16. doi: 10.1080/13504622.2019.1645445 | |
Ellis, R. and Hogard, E., 2018. Handbook of Quality Assurance for University Teaching, Routledge, | |
London. | |
Ferguson, R. (2012). Learning analytics: drivers, developments and challenges. International Journal of | |
Technology Enhanced Learning, 4(5/6), 304. doi: 10.1504/ijtel.2012.051816 | |
Fram, S., and Margolis, E. (2011). Architectural and built environment discourses in an educational | |
context: the Gottscho and Schleisner Collection. Visual Studies, 26(3), 229-243. doi: | |
10.1080/1472586x.2011.610946 | |
Fraser, S., 2019. Understanding innovative teaching practice in higher education: a framework for | |
reflection. Higher Education Research & Development, 38(7), pp.1371-1385. | |
French, A. and O'Leary, M. (2017). Teaching Excellence in Higher Education:|b Challenges, Changes | |
and the Teaching Excellence Framework. Bingley: Emerald Publishing Limited. | |
Gollub, J. (2002). Learning and understanding. Washington, DC: National Academy Press. | |
Gomis, K., Saini, M., Pathirage, C. and Arif, M., 2021. Enhancing learning opportunities in higher | |
education: best practices that reflect on the themes of the national student survey, UK. Quality | |
Assurance in Education, 29(2/3), pp.277-292. | |
Guo, F. and Shi, J. (2014). The relationship between classroom assessment and undergraduates' learning | |
within Chinese higher education system. Studies in Higher Education, 41(4), pp.642-663. | |
Hamari, J., Shernoff, D., Rowe, E., Coller, B., Asbell-Clarke, J. and Edwards, T., 2016. Challenging | |
games help students learn: An empirical study on engagement, flow and immersion in gamebased learning. Computers in Human Behavior, 54, pp.170-179. | |
Han, F. and Ellis, R. (2019). Identifying consistent patterns of quality learning discussions in blended | |
learning. The Internet and Higher Education, 40, pp.12-19. | |
Hmelo-Silver, C., Chernobilsky, E. and Jordan, R., 2008. Understanding collaborative learning | |
processes in new learning environments. Instructional Science, 36(5-6), pp.409-430. | |
Hue, M., and Li, W. (2008). Classroom Management: Creating a Positive Learning Environment (Hong | |
Kong teacher education). Hong Kong: Hong Kong University Press, HKU. | |
16 | |
Jabar, S. and Albion, P., 2016. Assessing the Reliability of Merging Chickering & Gamson’s Seven | |
Principles for Good Practice with Merrill’s Different Levels of Instructional Strategy | |
(DLISt7). ERIC Online Learning, 20(2). | |
Kohn Rådberg, K., Lundqvist, U., Malmqvist, J. and Hagvall Svensson, O. (2018). From CDIO to | |
challenge-based learning experiences – expanding student learning as well as societal impact?. | |
European Journal of Engineering Education, 45(1), pp.22-37. | |
Lai, K. (2011). Digital technology and the culture of teaching and learning in higher education. | |
Australasian Journal of Educational Technology, 27(8). doi: 10.14742/ajet.892 | |
Lozano, J., Boni, A., Peris, J. and Hueso, A., 2012. Competencies in Higher Education: A Critical | |
Analysis from the Capabilities Approach. Journal of Philosophy of Education, 46(1), pp.132147. | |
Marshalsey, L, and Madeleine S. (2018). “Critical Perspectives of Technology-Enhanced Learning in | |
Relation to Specialist Communication Design Studio Education Within the UK and Australia.” | |
Research in Comparative and International Education 13 (1): 92–116. doi: | |
10.1177/1745499918761706 | |
Matthews, A. and Kotzee, B., 2019. The rhetoric of the UK higher education Teaching Excellence | |
Framework: a corpus-assisted discourse analysis of TEF2 provider statements. Educational | |
Review, pp.1-21. | |
McKnight, K., O'Malley, K., Ruzic, R., Horsley, M., Franey, J. and Bassett, K. (2016). Teaching in a | |
Digital Age: How Educators Use Technology to Improve Student Learning. Journal of Research | |
on Technology in Education, 48(3), pp.194-211. | |
Moore, D. and Fisher, T., 2017. Challenges of Motivating Postgraduate Built Environment Online | |
Teaching and Learning Practice Workgroups to Adopt Innovation. International Journal of | |
Construction Education and Research, 13(3), pp.225-247. | |
Office for Students, 2020. National Student Survey Results 2020. London, UK. | |
Ovbiagbonhia, A., Kollöffel, B. and Den Brok, P., 2020. Teaching for innovation competence in higher | |
education Built Environment engineering classrooms: teachers’ beliefs and perceptions of the | |
learning environment. European Journal of Engineering Education, 45(6), pp.917-936. | |
Santos, G., Marques, C., Justino, E. and Mendes, L., 2020. Understanding social responsibility’s | |
influence on service quality and student satisfaction in higher education. Journal of Cleaner | |
Production, 256, p.120597. | |
Scott, L. (2020). Engaging Students' Learning in the Built Environment Through Active Learning. | |
Claiming Identity Through Redefined Teaching in Construction Programs, pp.1-25. | |
Staff and Educational Development Association, (2013). Measuring The Impact Of The UK | |
Professional Standards Framework For Teaching And Supporting Learning (UKPSF). Higher | |
Education Academy. | |
Tirrell, T., and Quick, D. (2012). Chickering's Seven Principles of Good Practice: Student Attrition in | |
Community College Online Courses. Community College Journal of Research and Practice, | |
36(8), 580-590. doi: 10.1080/10668920903054907 | |
Tsiligiris, V. and Hill, C., 2019. A prospective model for aligning educational quality and student | |
experience in international higher education. Studies in Higher Education, 46(2), pp.228-244. | |
Uchiyama, K. and Radin, J., 2008. Curriculum Mapping in Higher Education: A Vehicle for | |
Collaboration. Innovative Higher Education, 33(4), pp.271-280. | |
17 | |
Van Schaik, P., Volman, M., Admiraal, W., and Schenke, W. (2019). Approaches to co-construction of | |
knowledge in teacher learning groups. Teaching And Teacher Education, 84, 30-43. doi: | |
10.1016/j.tate.2019.04.019 | |
Waheed, H., Hassan, S., Aljohani, N., Hardman, J., Alelyani, S. and Nawaz, R., 2020. Predicting | |
academic performance of students from VLE big data using deep learning models. Computers | |
in Human Behavior, 104, p.106189. | |
Welzant, H., Schindler, L., Puls-Elvidge, S., & Crawford, L. (2015). Definitions of quality in higher | |
education: A synthesis of the literature. Higher Learning Research Communications, 5 (3). | |
doi:10.18870/hlrc.v5i3.244 | |
18 | |
English Language Teaching; Vol. 11, No. 1; 2018 | |
ISSN 1916-4742 | |
E-ISSN 1916-4750 | |
Published by Canadian Center of Science and Education | |
The Affection of Student Ratings of Instruction toward EFL | |
Instructors | |
Yingling Chen1 | |
1 | |
Center for General Education, Orieantal Institute of Technology, New Taipei City, Taiwan | |
Correspondence: Yingling Chen, Center for General Education, Oriental Instutte of Technology, New Taipei | |
City, Taiwan. Tel: 886-909-301-288. E-mail: [email protected] | |
Received: October 27, 2017 | |
doi: 10.5539/elt.v11n1p52 | |
Accepted: December 3, 2017 | |
Online Published: December 5, 2017 | |
URL: http://doi.org/10.5539/elt.v11n1p52 | |
Abstract | |
Student ratings of instruction can be a valuable indicator of teaching because the quality measurement of | |
instruction identifies areas where improvement is needed. Student ratings of instruction are expected to evaluate | |
and enhance the teaching strategies. Evaluation of teaching effectiveness has been officially implemented in | |
Taiwanese higher education since 2005. Therefore, this research investigated Taiwanese EFL university | |
instructors’ perceptions toward student ratings of instruction and the impact of student ratings of instruction on | |
EFL instructors’ classroom teaching. The data of this quantitative study was collected by 21 questionnaires. 32 | |
qualified participants were selected from ten universities in the northern part of Taiwan. The results indicate | |
those EFL instructors’ perceptions and experiences toward student ratings of instruction affects their approach to | |
teaching, but EFL instructors do not prepare lessons based on the results of student ratings of instruction. | |
Keywords: student ratings of instruction, EFL, instruction | |
1. Introduction | |
The Ministry of Education (MOE) authorizes universities and colleges to determine whom to hire in the college | |
system according to the Taiwanese College Regulation 21. Moreover, the MOE (2005) concluded that | |
developing a system for teacher evaluation is necessary in each college and university. As a result, schools have | |
more power in deciding the qualification of educators. Wolfer and Johnson (2003) emphasized that one must be | |
clear about the purpose of a course evaluation feedback since it may determine the kind of data required. | |
Moreover, teacher evaluation should include the key element for not only promotion, tenure, and reward, but | |
also performance review and teaching improvement. In addition, student ratings of instruction become an | |
essential element to evaluate teachers’ success for ensuring the quality of teaching. Students’ opinions are | |
fundamental sources for forming the quality of instruction in higher education. Murray (2005) stated that more | |
than 90% of U.S. colleges and universities pay attention to student evaluation of teachers in order to assess | |
teaching. Besides, about 70% of college instructors recognize the need of student input for assessing their | |
classroom instruction (Obenchain, Abernathy, & Wiest, 2001). Teacher decision making toward curriculum | |
design and teacher expectancy of student achievement have a significant influence on the results of curricular | |
and instructional decisions. However, most of the research focus on how to assist and improve students’ learning | |
through SRI, how to improve teaching effectiveness through SRI, issues of SRI, or student achievement toward | |
SRI; few of them address how do instructors use the feedback from SRI or how do instructors improve teaching | |
through the results of SRI (Beran, Violato, Kline, & Frideres, 2005). Accordingly, instructors’ perceptions of | |
student ratings become valuable in presenting a better insight for improving teacher performances because | |
understanding how instructors are impacted by SRI is influential. | |
1.1 Literature | |
1.1.1 The Use of Student Ratings of Instruction | |
The implementations of SRI at colleges and universities have not only been employed for purposes of improving | |
teaching effectiveness, but also have been used for personnel decisions such as tenure. SRI is widely practiced in | |
colleges and universities across Canada and the United States (Greenwald, 2002) .In fact, student ratings is not a | |
new topic in higher education. Researchers, Remmers and Brandenburg published their first research studies on | |
student ratings at Purdue University in 1927. Also, Guthrie (1954) stated that students at the University of | |
52 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
Washington filled out the first student rating forms seventy-five years ago. Nevertheless, SRI is a pertinent topic | |
for researchers to study because students still fill out the evaluation forms which produce vital information on | |
teaching quality. Administrators take SRI into consideration to determine the effectiveness of instruction and | |
personnel promotions as well. There were 68% of American colleges reported using student ratings in Sedin’s | |
1983 survey. Meanwhile, there were 86 percent of American colleges reported using student rating surveys in | |
colleges in 1993 (Sedin, 1993a). Seldin’s (1993b) surveys reflected the growing number of use of student rating | |
as an instrument for teaching evaluation in higher education. | |
1.1.2 Student Rating of Instruction in Higher Education in Taiwan | |
“During the 1990s, most education systems in the English-speaking world moved towards some notion of | |
performance management” (West-Burnham, O’Neill, & Bradbury, 2001, p. 6). The widespread use of the | |
performance management concept contributes to the education system, which focuses on specific measurement | |
of classroom instruction delivery. The quality of teaching influences students not only academically, but also | |
psychologically. With regard to the value of teacher evaluation, the Taiwanese Ministry of Education has | |
mandated that colleges and universities monitor the quality of teaching because the quality of teachers and | |
instructions impact students’ academic achievement and the reputation of the school. Chang (2002) declared that | |
approximately 76 percent of public universities and 85 percent of private universities have implemented SRI in | |
Taiwan. As a result, teacher evaluation has become an instrument for examining instructors’ classroom | |
presentation. Liu (2011) stated that teachers’ classroom presentation is equivalent to teacher appraisal and | |
teacher performance. Furthermore, Liu (2011) found the following: | |
Since 28th December 1995, the 21st Regulation of the University Act stated that a college should formulate a | |
teacher evaluation system that decides on teacher promotion, and continues or terminates employment based on | |
college teachers’ achievement in teaching, research and so forth. (p. 4) SRI has been wildly accepted by | |
universities and colleges in Taiwan and has become a practical tool for enhancing teaching performance and | |
developing an effective trigger to examine factors that relate to educational improvement. | |
SRI stimulates organizational level effects by providing information from evaluation practice such as diagnosing | |
organizational problems. SRI raises environmental level effects such as hiring, retention, and dismissal which is | |
highly public acts justified through the evaluation process (Cross, Dooris, & Weinstein, 2004). | |
1.2 State Hypotheses and Their Correspondence to Research Design | |
1.2.1 Null Hypotheses | |
The independent variable in this study was SRI. The dependent variables were northern Taiwanese EFL | |
university instructors’ perception and the influence of SRI on northern Taiwanese EFL university instructors. The | |
null hypotheses was designed for testing the association between EFL instructors’ perceptions and SRI, SRI and | |
the classroom instruction, and the impact of SRI and the classroom instruction. A Chi-Square was used to test the | |
associations of the null hypotheses. A Chi-square probability of .05 or less was used to reject the null hypotheses. | |
The following hypotheses addressed the research question: | |
1.2.2 Research Questions | |
1). What are Taiwanese EFL university instructors’ perceptions toward SRI? | |
H10: No association exists between EFL university instructors’ perceptions and SRI | |
(at the .05 level of significance). | |
2). What impact does SRI have on EFL university instructors’ classroom instructions? | |
H20: No association exists between the impact of SRI and classroom instruction | |
(at the .05 level of significance). | |
2. Method | |
2.1 Participant | |
All participating EFL instructors have master or doctoral degrees from the foreign universities or local | |
Taiwanese universities. The subjects’ ages were between thirty-five to seventy years old. Each participating | |
experienced instructor has received at least three years of results from SRI. | |
2.2 Sampling Procedures | |
The researcher used random sampling strategy to gain participants from 10 universities in the northern part of | |
Taiwan for the quantitative data. The key to random sampling is that each university in the population has an | |
53 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
equal probability of being selected in the sample (Teddlie & Yu, 2007). Using random sampling strategy helped | |
the researcher prevent biases from being introduced in the sampling process by drawing names or numbers. 32 | |
Taiwanese university EFL instructors and were conducted from ten universities in the Northern part of Taiwan. | |
2.3 Sample | |
The target participants for the quantitative phase were thirty-two Chinese speaking English instructors from 10 | |
northern universities. All participating EFL instructors have master or doctoral degrees from the foreign | |
universities or local Taiwanese universities. Each participating experienced instructor has received at least three | |
years of results from SRI. | |
2.4 Measurment | |
The quantitative data was collected and identified through a demographic survey and EFL instructors’ perception | |
of SRI questionnaire. A questionnaire covering instructors’ perceptions toward SRI and a demographic | |
questionnaire were used to explain the result of the quantitative data. | |
2.5 Research Design | |
The researcher randomly selected ten northern universities, which offer the English or applied foreign language | |
major by drawing from twenty-eight schools. | |
2.6 Data Analisis | |
The first step of data analysis was the analyzing of the quantitative data. The researcher assigned codes to all | |
questionnaires so that the participants’ information was ensured. Then, the information was transferred into the | |
Statistical Package for the Social Sciences (SPSS 21.0). Also, the researcher was correctly enter quantitative data | |
into SPSS in order to run a Cronbach's alpha test to create internally consistent, reliable, and valid tests and | |
questionnaires for enhancing the accuracy of the survey. Furthermore, a Chi-Square test was implemented for | |
testing hypotheses using a non-parametric test. Cooper and Schindler (2006) stated that Non-parametric tests are | |
used to test the significance of ordinal and nominal data. A Chi-Square was used to compare SRI to the | |
dependent variables. The Chi-Square statistical analysis was used to determine if an association exists between | |
SRI and EFL instructors’ perceptions, | |
3. Results | |
In the Results section, summarize the collected data and the analysis performed on those data relevant to the The | |
results were reported in two main parts: (1) background information of quantitative survey participants, (2) a | |
Chi-Square test was used to compare SRI to response dependent variables. | |
3.1 Gender and Age | |
Table 1 showed the distribution of gender and age for participants who taught in the department of English and | |
Applied Foreign Language in the universities. Among the 32 EFL university instructor participants, 57% (n= 19) | |
of the participants were female and 43% (n=13) percent of the participants were male. In addition, 3% (n=1) of | |
participants were between 25-29 years old, 29% of the participants (n=9) were between 30-39 years old, 37% of | |
the participants (n=12) were between 40-49 years old, 25% of the participants (n=8) were between 50-59 years | |
old, and 6% of the participants (n=2) were between 60-69 years old. | |
Table 1. Frequency distribution of gender and age | |
Gender | |
Frequency | |
Overall | |
Percentage | |
Overall | |
Female | |
Male | |
Total | |
19 | |
13 | |
32 | |
57% | |
43% | |
100% | |
Age | |
Frequency | |
Overall | |
Frequency | |
Overall | |
25-19 | |
30-39 | |
40-49 | |
1 | |
9 | |
12 | |
3% | |
29% | |
37% | |
54 | |
elt.ccsenet.org | |
50-59 | |
60-69 | |
70+ | |
Total | |
English Language Teaching | |
8 | |
2 | |
0 | |
32 | |
Vol. 11, No. 1; 2018 | |
25% | |
6% | |
0 | |
100% | |
Note. n=32. | |
3.2 Years of Teaching | |
Table 2 reported the distribution of years of teaching for the participants who taught in the department of English | |
and applied foreign language in the universities under EFL settings. The years of teaching varied from | |
participants to participants. The distribution of years were the following: 1 for 1-3 years of experience, 2 for 4-6 | |
years of experience, 7 for 7-10 years of experience, 5 for 11-15 years of experience, 6 for 16-20 years of | |
experience, 4 for 21-25 years of experience, 6 for 26-30 years of experience, and 1 for more than 30 years of | |
teaching experience. | |
Table 2. Frequency distribution of years of teaching | |
Years of Teaching | |
Frequency | |
Overall | |
Percentage | |
Overall | |
Less than 1 year | |
1-3 years | |
4-6 years | |
7-10 years | |
11-15 years | |
16-20 years | |
21-25 years | |
26-30 years | |
More than 30 years | |
Total | |
0 | |
1 | |
2 | |
7 | |
5 | |
6 | |
4 | |
6 | |
1 | |
32 | |
0 | |
3.1% | |
6.2% | |
21.9% | |
16% | |
19% | |
12% | |
19% | |
3.1% | |
100% | |
Note. n=32. | |
3.3 EFL Instructors’ Highest Level of Education | |
Table 3 showed the distribution of EFL instructors’ highest level of education among the 32 participants, 24 | |
participants held doctoral degrees and 8 participants had master’s degrees. Furthermore, 26 participants earned | |
the highest level of formal education in a foreign country and 6 participants got the highest level of formal | |
education in Taiwan. | |
Table 3. Frequency distribution of the educational background | |
Highest Degree | |
Frequency | |
Overall | |
Percentage | |
Overall | |
Master Degree | |
Doctoral Degree | |
Total | |
Foreign Degree | |
Domestic Degree | |
Total | |
8 | |
24 | |
32 | |
26 | |
6 | |
32 | |
25% | |
75% | |
100% | |
81.25% | |
18.75 | |
100% | |
Note. n=32. | |
55 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
3.4 Employment Status | |
Table 4 showed the employment status among 32 participants. There were 12 (38%) permanent employment | |
with on-going contracts without fixed end-points before the age of retirement, 10 (31%) were fixed term | |
contracts for a period of more than one school year, and 10 (31%) were fixed term contract for a period of one | |
school year or less. In the mean time, 12% of the participants (n=4) were part-time instructors and 88% of the | |
participants (n=28) were full-time instructors. | |
Table 4. Descriptive statistics for participants’ employment status | |
Employment | |
status (1) | |
Permanent | |
employment | |
Fixed term contract | |
of more than one | |
school year | |
Fixed term contract | |
of more than one | |
school year or less | |
Total | |
Participants/Count | |
Percentage % | |
12 | |
37.5% | |
10 | |
31.25% | |
10 | |
31.25% | |
32 | |
100% | |
Employment | |
status (2) | |
Part-time | |
employment | |
Full-time | |
employment | |
Total | |
Participants/count | |
Percentage% | |
8 | |
25% | |
24 | |
75% | |
32 | |
100% | |
Note. n=32. | |
3.5 Personal Development | |
Table 5 showed personal development status among 32 participants. There were 25% (n=7) of participants who | |
had master’s degree were pursuing a doctoral degrees that related to their professional field at present in Taiwan. | |
There were 75% (n=25) of participants were holding their original degrees without pursuing further degrees. | |
Table 5. Descriptive statistics for personal development status | |
Personal | |
status | |
development | |
Participants/Count | |
Percentage % | |
Pursuing | |
a | |
doctoral | |
degree at present | |
Holding | |
degree | |
the | |
original | |
Total | |
7 (in Education, TESL, | |
Linguistics, and English | |
fields) | |
22% | |
25 | |
32 | |
78% | |
100% | |
Note. n=32. | |
3.6 Internal Reliability | |
Six Likert-scale items (items 1-6) in the first section. The researcher assessed the internal reliability with a pilot | |
test of item analysis to obtain the Cronbach’s alpha coefficient. Cronbach’s alpha coefficient was utilized to | |
determine the reliability of 21 items in discovering Taiwanese EFL university instructors’ perceptions toward | |
student rating of instructions. The subscales were (1) EFL instructors’ perceptions toward SRI (six items, | |
Cronbach’s Alpha .71); and the influence of SRI on EFL instructors’ classroom instruction (fifteen items, | |
Cronbach’s alpha .74) (see Table 6). During data collection, participants were verified as part-time and full-time | |
EFL university instructors. The survey packet was distributed at the office. After each participant had completed | |
the survey questionnaires, the researcher reviewed the packet for completeness. Fraenkel and Wallen (2003) | |
defined validity as the degree to which data supports any inferences that a researcher uses based on the evidence | |
he collects using a specific instrument. Content validity is defined as the level in which an instrument can be | |
duplicated under the same condition with the same conditions and participants (Sproull, 2002). | |
56 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
Table 6. Reliability statistics of pilot SRI | |
Variables | |
N of Items | |
Cronbach’s Alpha | |
EFL instructors’ perceptions toward SRI | |
6 | |
.71 | |
The influence of SRI on EFL instructors’ | |
classroom instruction | |
15 | |
.74 | |
3.7 Rating of Instructions | |
A preliminary analysis was executed to determine Taiwanese EFL university instructors’ perceptions toward | |
student rating of instruction. Based on primary analysis in Table 7, item 1 reported that 25% of the participants | |
strongly disagreed and 69% of the participants disagreed with the positive attitude toward SRI; 6% of the | |
participants were neutral. In item 2, 59% of the participants disagreed with holding enthusiastic and confident | |
perceptions about the results of SRI. Twenty-two percent of the participants were neutral; 16% of the participants | |
agreed and 3% of the participants strongly agreed with having enthusiasm and confidence toward the result of | |
SRI. In item 3, 41% of the participants disagreed that they spend more time preparing their classes according to | |
SRI results. Fifty-three percent of the participants were neutral and 6% of the participants agreed that they spent | |
more time preparing courses based on SRI results. Additionally, in item 4, 31% of the participants disagreed that | |
being open to students’ opinions would help receive more positive results of SRI. Forty-four percent of the | |
participants were neutral; 22% of the participants agreed and 3% of the participants strongly agreed that being | |
open to students’ opinions would help receive more positive result of SRI. In item 5, 6% of the participants | |
disagreed that they care about the quality of SRI. There were 41% of the participants were neutral. Fifty-three | |
percent of the participants agreed and 16% of the participants strongly agreed that they cared about the quality of | |
SRI. In item 6, 6% of the participants strongly disagreed and 47% of the participants disagreed that they were | |
always satisfied with the results of SRI. Forty-one percent of the participants were neutral and 6% of the | |
participants agreed that they were always satisfied with the result of SRI. | |
Table 7. Mean, standard deviation, and percentage of Taiwanese EFL university instructors’ perceptions toward | |
SRI | |
Item 1-6 | |
Percentage | |
Strongly | |
Disagree | |
% | |
Disagree | |
Agree | |
% | |
Neutral | |
% | |
Agree | |
% | |
Strongly | |
Agree | |
% | |
M | |
SD | |
1. I have positive attitude | |
toward SRI. | |
2. I am enthusiastic and | |
confident about the result of | |
SRI. | |
3. I | |
spend | |
more | |
time | |
preparing my class according to | |
SRI results. | |
4. I think if I am more open | |
to students’ opinions, the result | |
will be more positive. | |
5. I care about the quality of | |
SRI | |
6. I am always satisfied with | |
the result of SRI | |
25 | |
68.8 | |
6.3 | |
0 | |
0 | |
1.81 | |
.535 | |
0 | |
59.4 | |
21.9 | |
15.6 | |
3.1 | |
2.68 | |
.871 | |
0 | |
40.6 | |
53.1 | |
6.3 | |
0 | |
2.66 | |
.602 | |
0 | |
31.3 | |
43.8 | |
21.9 | |
3.1 | |
2.97 | |
.822 | |
0 | |
6.3 | |
40.6 | |
53.1 | |
0 | |
3.47 | |
.621 | |
6.3 | |
46.9 | |
40.6 | |
6.3 | |
0 | |
2.47 | |
.718 | |
Note. M=Mean; SD=Standard Deviation. | |
3.8 Descriptive Analyses of the Influence of SRI on Taiwanese University EFL Instructors | |
According to the analysis in Table 8, item 7, 6% of the instructors strongly disagreed and 43.8% of the | |
instructors disagreed that SRI was an effective instrument for improving English instructional delivery. There | |
were 41% of the participants were neutral. There were 9% of the instructors agreed that SRI was an effective | |
57 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
instrument for improving English instructional delivery. In item 8, 16% of the participants strongly disagreed and | |
the majority of the participants (56%) disagreed that SRI provides authentic information in developing effective | |
English lessons. There were 28% of the instructors were neutral. | |
Furthermore, in item 9, 56% of the instructors strongly disagreed and 34% of the participants disagreed that they | |
became more supportive in assisting students learning after receiving the result of EFL SRI. There were 9% of | |
participants were neutral. In item 10, 13% of the instructors strongly disagreed and 41% of the instructors | |
disagreed that the result of SRI provided positive encouragement for their classes. There were 31% of the | |
participants were neutral. There were 6% of the participants agreed that the results of SRI provided positive | |
encouragement for their classes. Moreover, item 11 was worded in reverse, 9% of the participants strongly | |
disagreed and 50% of the participants disagreed that criticism from the SRI did not influence their English | |
teaching performance. There were 25% of the instructors were neutral. There were 9% of the participants agreed | |
that criticism from the SRI did not influence their English teaching performance. | |
In item 12, 6% of the participants strongly disagreed and 59% of the participants disagreed that EFL SRI was an | |
efficient communicative bridge between their students and them. There were 25% of the participants were | |
neutral. Only 9% of the participants agreed that EFL SRI is an efficient communicative bridge between their | |
students and them. In item 13, 6% of the participants disagreed that students’ feedback gave them ideas for | |
teaching students with special needs. There were 56% of the participants were neutral and 37% of the | |
participants agreed that students’ feedback gave them ideas for teaching students with special needs. | |
In item 14, 34% of the participants disagreed that students’ feedback improves their English classroom | |
management. 37.5% of the participants were neutral. There were 28% of the participants agreed that students’ | |
feedback improved their English classroom management. Moreover, item 15 was worded in reverse, 3% of the | |
participants strongly disagreed and 34% of the participants disagreed that they would not change their | |
knowledge and understanding of English instructional practices after receiving the results of EFL SRI. There | |
were 46% of the participants were neutral and 16% of the participants agreed that they would not change the | |
knowledge and understanding of English instructional practices after receiving the result of EFL SRI. | |
In item 16, 13% of the participants strongly disagreed and the majority of the participants (63%) disagreed that | |
students provided trustworthy information when evaluating the effectiveness of English classroom instruction. | |
There were 22% of the participants were neutral. Only 3% of the participants agreed that students provided | |
trustworthy information when evaluating the effectiveness of English classroom instruction. In item 17, 41% of | |
the participants disagreed that students’ academic achievements influenced the result of SRI. There were 31% of | |
the participants were neutral. There were 25% of the participants agreed and 3% of the participants strongly | |
agreed that students’ academic achievements influenced the result of SRI. | |
In item 18, 28% of the instructors disagreed that if they improved the quality of their English teaching, they | |
received higher ratings from students. There were 56% of the participants were neutral and 16% of the | |
participants agreed that if they improved the quality of their English teaching, they received higher rating from | |
students. In item 19, 13% of the instructors disagreed that if they received unpleasant rating scores in the past, | |
they changed their English teaching strategies. There were 56% of the instructors were neutral. There were 25% | |
of the instructors agreed and 6% of the instructors strongly agreed that they received unpleasant rating scores in | |
the past, so they changed their English teaching strategies. | |
In item 20, 9% of the participants disagreed that after they changed their English teaching strategies, they | |
received better scores of EFL SRI. There were 81% of the participants were neutral and 9% of the participants | |
agreed that after changing their English teaching strategies, they received better scores of EFL SRI. In addition, | |
item 21 was worded in reverse, 6% of the participants strongly disagreed and 25% of the participants disagreed | |
that unpleasant scores of EFL SRI would not decrease their passion toward teaching. Thirteen percent (13%) of | |
the participants were neutral. There were 25% of the participants agreed and 6% of the participants strongly | |
agreed that unpleasant scores of EFL SRI would not decrease their passion toward teaching. | |
58 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
Table 8. Mean. standard deviation and percentage of the influence of SRI on Taiwanese university EFL | |
instructors’ classroom instruction | |
Item 7-21 | |
Percentage | |
7. EFL SRI is an effective | |
instrument for improving | |
English instructional delivery. | |
8. Overall, | |
EFL | |
SRI | |
provides | |
me | |
authentic | |
information in developing | |
effective English lessons. | |
9. I | |
become | |
more | |
supportive in assisting student | |
learning after receiving the | |
result of EFL SRI. | |
10. The result of EFL SRI | |
provides | |
positive | |
encouragement for my class. | |
11. Criticism from the SRI | |
does not influence my English | |
teaching performance. | |
12. EFL SRI is an efficient | |
communicative | |
bridge | |
between my students and me. | |
13. Students’ feedback gives | |
me ideas for teaching students | |
with special needs. | |
14. Students’ | |
feedback | |
improves | |
my | |
English | |
classroom management. | |
15. I will not change the | |
knowledge and understanding | |
of | |
English | |
instructional | |
practices after receiving the | |
result of EFL SRI. | |
16. Students | |
provide | |
trustworthy information when | |
evaluating the effectiveness of | |
English classroom instruction. | |
17. Students’ | |
academic | |
achievements influence the | |
result of SRI. | |
18. If I improve the quality | |
of my English teaching, I will | |
receive higher ratings from | |
students. | |
19. I received an unpleasant | |
rating score in the past, so I | |
changed my English teaching | |
strategies. | |
20. After I changed my | |
English teaching strategies, I | |
received better scores of EFL | |
SRI. | |
21. Unpleasant scores of | |
EFL SRI will not decrease my | |
passion toward teaching. | |
Strongly | |
Disagree | |
% | |
6.3 | |
Disagree% | |
Neutral | |
% | |
Agree | |
% | |
43.8 | |
40.6 | |
9.4 | |
Strongly | |
Agree | |
% | |
0 | |
2.53 | |
15.6 | |
56.3 | |
28.1 | |
0 | |
0 | |
2.13 | |
.660 | |
56.3 | |
34.4 | |
9.4 | |
0 | |
0 | |
2.53 | |
.671 | |
12.5 | |
40.6 | |
31.3 | |
15.6 | |
0 | |
2.50 | |
.916 | |
9.4 | |
50.4 | |
25.0 | |
9.4 | |
0 | |
2.44 | |
.840 | |
6.3 | |
59.4 | |
25.0 | |
9.4 | |
0 | |
2.38 | |
.751 | |
0 | |
6.3 | |
56.3 | |
37.5 | |
0 | |
3.31 | |
.592 | |
0 | |
34.4 | |
37.5 | |
28.1 | |
0 | |
2.94 | |
.801 | |
3.1 | |
34.4 | |
45.9 | |
15.6 | |
0 | |
2.75 | |
.762 | |
50 | |
40.6 | |
9.4 | |
0 | |
0 | |
1.59 | |
.665 | |
0 | |
28.1 | |
56.3 | |
15.6 | |
3.1 | |
2.91 | |
.893 | |
0 | |
28.1 | |
46.3 | |
21.9 | |
3.1 | |
2.88 | |
.660 | |
0 | |
12.5 | |
56.3 | |
25 | |
6.3 | |
3.00 | |
.803 | |
0 | |
63.4 | |
50.3 | |
3.4 | |
9.4 | |
3.47 | |
.761 | |
18.8 | |
56.3 | |
18.8 | |
3.1 | |
3.1 | |
2.16 | |
.884 | |
Note. M=Mean; SD=Standard Deviation. | |
59 | |
M | |
SD | |
.761 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
3.9 The Frequency of Distribution of Years of Teaching Experiences in Four Groups | |
Table 9 presented the frequency of distribution of years of teaching experiences in four groups. The researcher | |
divided the participants into four different groups based on their years of teaching experiences. Group 1 | |
represented participants who have been teaching English for 1-6 years (n=3). Group 2 indicated participants who | |
have been teaching English for 7-15 years (n=12). Group 3 showed participants who have been teaching English | |
for 16-25 years (n=10). Group 4 expressed participants who have been teaching English for more than 26 years | |
(n=7). | |
Table 9. Frequency of distribution of years of teaching experiences in four groups | |
Groups 1-4 | |
Frequency | |
Percentage | |
% | |
Valid Percentage | |
% | |
Cumulative | |
Percent % | |
1 (1-6 years) | |
2 (7-15 years) | |
3 (16-25 years) | |
4 (26 year and more) | |
Total | |
3 | |
12 | |
10 | |
7 | |
32 | |
9.4 | |
37.5 | |
31.3 | |
21.9 | |
100.0 | |
9.4 | |
37.5 | |
31.3 | |
21.9 | |
100.0 | |
9.4 | |
46.9 | |
78.1 | |
100.0 | |
Note. n=32. | |
3.10 The Means of the Influences of SRI on Taiwanese EFL University Instructors Based on Their Years of | |
Teaching Experience | |
Four open-ended interview questions (Q1, Q3, Q7, and Q8) reflecting the first part of six quantitative survey | |
questionnaires which were designed to investigate EFL instructors’ perceptions toward SRI. The survey | |
questionnaires were (1) in general, I have positive attitude toward SRI; (2) I am enthusiastic and confident about | |
the result of SRI; (3) I spend more time preparing my class according to SRI result; (4) I think if I am more open | |
to students’ opinions, the results will be more positive; (5) I care about the quality of SRI; (6) I am always | |
satisfied with the result of SRI. Based on the analysis of participants’ interview transcripts, two themes, four | |
subthemes and four issues emerged in order to answer the first research question. The findings to the first | |
research question are structured in Table 10. | |
Table 10. Structure of the qualitative findings: Research Question 1 | |
Research Question 1 | |
Themes | |
Theme 1: | |
The | |
university | |
EFL | |
Instructors’ Perceptions of | |
SRI | |
Theme 2: | |
The role of SRI | |
Subthemes | |
Experiences of receiving the results of | |
SRI | |
Issues | |
| |
Negative | |
| |
Implementation of SRI in EFL classroom | |
| |
Objective | |
Opinions after receiving the result of | |
EFL SRI | |
| |
The purpose of SRI | |
Suggestions after receiving the result of | |
EFL SRI | |
The real situation of SRI | |
in universities in Taiwan. | |
3.11 Quantitative Findings: Null Hypotheses 1 | |
H10: No association exists between EFL university instructors’ perceptions and SRI. | |
Table 11 reported that the researcher failed to reject the first null hypothesis which stated that there was not an | |
association between EFL university instructors’ perceptions and student rating of instructions based on a | |
significance level of .149 in item 4 (EFL instructors become more open to SRI receive better ratings). The | |
significance level of .804 in item 5 (EFL instructors care about the quality of SRI) accepted the first null | |
hypothesis. Besides, the first null hypothesis, which stated that there was not an association between EFL | |
university instructors’ perceptions and student rating of instructions was rejected based on a significance level | |
60 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
of .000 in item 1 (EFL instructors have positive attitude toward SRI). The significance level of .000 in item 2 | |
(EFL instructors are confident in the results of SRI) rejected the first null hypothesis. Also, the first null | |
hypothesis was rejected based on the significance level of .003 in item 3 (EFL instructors prepare lessons based | |
on the results of SRI). The significance level of .000 in item 6 (EFL instructors satisfy with the results of SRI) | |
rejected the null hypothesis. As hypothesized, Cillessen and Lafontana (2002) stated that teachers’ perceptions | |
affect their behavior and classroom practices. The more teachers learn about their students, the more they are | |
able to design effective experiences that elicit real learning. Borg (2006) noted that understanding teacher | |
perception is central to the process of understanding teaching. Research also indicated that teachers who are | |
willing to develop their teaching skills were open-minded in listening to feedback from their students (Chang, | |
Wang, & Yong, 2003). | |
Table 11. The summary of chi-square testing for Null Hypothesis 1 | |
Items 1-6 | |
Sig | |
Null Hypothesis 1 | |
Accept/Reject | |
1. SRI is an effective instrument for EFL instructors to improve | |
instructional delivery. | |
2. The results of SRI provide EFL instructors authentic | |
information in developing lessons. | |
3. EFL instructors become more supportive in students’ learning | |
after receiving the results of SRI. | |
4. The results of SRI provide positive encouragement for EFL | |
instructors. | |
5. Criticism from SRI does not influence EFL instructors’ teaching | |
performance. | |
6. SRI is an effective communicative bridge between EFL | |
instructors and students. | |
.000 | |
Reject | |
.000 | |
Reject | |
.003 | |
Reject | |
.149 | |
Accept | |
.804 | |
Accept | |
.000 | |
Reject | |
Note. A P-value of .05 or less was used to reject the null hypotheses. | |
3.12 Quantitative Findings: Null Hypothesis 2 | |
H20: No association exists between the impact of SRI and classroom instruction. | |
Table 12 reported the summary of the Chi-Square test of second null hypothesis, which stated that there was no | |
association between the impact of SRI and classroom instruction. The researcher failed to reject the second null | |
hypothesis which stated that there was not an association between SRI and classroom instruction based on a | |
significance level of .080 in item 10 (The results of SRI provide positive encouragement for EFL instructors) and | |
a significance level of .102 in item 14 (SRI improves EFL instructors’ classroom management). The second null | |
hypothesis, which stated that there was not an association between SRI and classroom instruction was rejected | |
based on a significance level of .002 in item 7 (EFL instructors have positive attitude toward SRI), a significance | |
level of .016 in item 8, a significance level of .005 in item 9, a significance level of .004 in item 11, a | |
significance level of .000 in item 12, a significance level of .002 in item 13 (SRI gives EFL instructors ideas for | |
teaching students with special needs), a significance level of .002 in item 15(EFL instructors will not change the | |
knowledge and understanding of instructional practices after receiving the results of SRI), a significance level | |
of .001 in item 16 (The results of SRI provide trustworthy information for EFL instructors), a significance level | |
of .021 in item 17 (Students’ achievements influence the results of SRI), a significance level of .016 in item 18 | |
(If I improve the quality of the English instruction, I will receive higher ratings from students), a significance | |
level of .006 in item 19 (I received an unpleasant rating score in the past, so I changed my English teaching | |
strategies), a significance level of .001 in item 20 (After I changed English teaching strategies, I received better | |
results of SRI), and a significance level of .000 in item 21 (Unpleasant scores of SRI will not decrease my | |
passion toward English teaching). | |
The current findings concurred with the hypothesis that an association existed between the influence of SRI and | |
classroom instruction. Teacher evaluation provided information to faculty about teaching effectiveness (Biggs, | |
2003; Ramsdem, 2003; Yorke, 2003) and to students about how they can improve their learning and how well | |
they are doing in the course (Carless et al., 2007; Gibbs 2006). Liu (2011) stated that teachers’ classroom | |
61 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
presentation is equivalent to teacher appraisal and teacher performance. Furthermore, “since 28th December | |
1995, the 21st Regulation of the University Act states that a college should formulate a teacher evaluation system | |
that decides on teacher promotion, and continues or terminates employment based on college teachers’ | |
achievement in teaching, research and so forth” (Liu, 2011, p. 4). “Universities started to formulate school | |
regulations based on the University Act and began executing teacher education. According to the official | |
documentation, 60% of the colleges stipulate that teachers have to pass the evaluation before receiving a | |
promotion” (Liu, 2011, p. 4). EFL instructors’ perceptions and experiences toward SRI will affect their approach | |
to teaching. In other words, assessment attitudes and experiences by EFL students will also influence their way | |
of learning. | |
Table 12. The results of chi-square testing for Null Hypothesis 2 | |
Items 7-21 | |
Sig | |
Null Hypothesis 2 | |
Accept/Reject | |
7. SRI is an effective instrument for EFL instructors to improve | |
instructional delivery. | |
8. The results of SRI provide EFL instructors authentic | |
information in developing lessons. | |
9. EFL instructors become more supportive in students’ learning | |
after receiving the results of SRI. | |
10. The results of SRI provide positive encouragement for EFL | |
instructors. | |
11. Criticism from SRI does not influence EFL instructors’ teaching | |
performance. | |
12. SRI is an effective communicative bridge between EFL | |
instructors and students. | |
13. SRI gives instructors ideas for teaching students with special | |
needs. | |
14. SRI improves EFL instructors’ classroom management. | |
15. EFL instructors will not change the knowledge and | |
understanding of instructional practices after receiving the results of | |
SRI | |
16. SRI provides trustworthy information for EFL instructors. | |
17. Students’ achievements influence the results of SRI. | |
18. If I improve the quality of the English instruction, I will receive | |
higher ratings from students. | |
19. I received an unpleasant rating score in the past, so I changed | |
my English teaching strategies. | |
20. After I changed English teaching strategies, I received better | |
results of SRI. | |
21. Unpleasant scores of SRI will not decrease my passion toward | |
English teaching. | |
.002 | |
Reject | |
.016 | |
Reject | |
.005 | |
Reject | |
.080 | |
Accept | |
.004 | |
Reject | |
.000 | |
Reject | |
.002 | |
Reject | |
.102 | |
.002 | |
Accept | |
Reject | |
.001 | |
.021 | |
.016 | |
Reject | |
Reject | |
Reject | |
.006 | |
Reject | |
.001 | |
Reject | |
.000 | |
Reject | |
Note. A P-value of .05 or less was used to reject the null hypotheses. | |
4. Discussions | |
The results uncovered that EFL instructors’ teaching attitudes and motivation were being diminished simply | |
because teachers overwhelmingly expressed that SRI did not provide them useful feedback on their performance | |
in the classroom. EFL instructors were not willing to take risk in assigning works, carrying out tests, or | |
addressing needs in supporting student in learning. The results of SRI were hardly for EFL instructors used to | |
make important decisions for improving the quality of instruction/education. In fact, SRI was considered an | |
indicator of instructors’ performance when it came time to dismiss them. The findings highlighted the northern | |
Taiwanese EFL instructors’ perceptions toward SRI and the influence of SRI on EFL instructors’ classroom | |
62 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
instruction. Faculties were more likely to disagree on the effectiveness of SRI and pointed out the increasing | |
issues of SRI. Broadly negative feedback accompanied by small numbers objective feedback may provide us | |
with indicators about the different value perceptions and influences adopted by northern Taiwanese EFL | |
university instructors. As the results of quantitative data showed 87% of the items from the second part of the | |
influence of SRI on EFL university instructors had associations between SRI and classroom instruction. It was | |
interesting to note that EFL instructors seemed to distrust the results of SRI. The possible explanations of the | |
negative perceptions could indicate that EFL instructors were sensitive to the factors that the results of SRI were | |
considered for tenure, promotion, and employment status which reflects Cross, Dooris and Weinstein’s theory in | |
2004. SRI raises environmental level effects such as hiring, retention, and dismissal. They were highly public | |
acts justified through the evaluation process. Students’ perceptions of SRI may differ from the faculty members | |
because students may not realize how the results of teacher evaluation may be used by administrators. As a result, | |
students may not know the consequences of teaching. Administrators and educators need to understand factors | |
that influence EFL instructors’ classroom instruction so that they will be able to develop a reasonable | |
environment in merit raises, promotion, and tenure decisions. | |
References | |
American Psychological Association. (1972). Ethical standards of psychologists. Washington, DC: American | |
Psychological Association. | |
Anderson, C. A., Gentile, D. A., & Buckley, K. E. (2007). Violent video game effects on children and adolescents: | |
Theory, research and public policy. | |
Beran, T., Violato, C., Kline, D., & Frideres, J. (2005). The utility of student ratings of instruction for students, | |
faculty, and administrators: a consequential validity study. The Canadian Journal of Higher Education, 2, | |
49-70. | |
Biggs, J. (2003). Teaching for quality learning at university (2nd ed.). Buckingham: Society for Research into | |
Higher Education/Open University Press. | |
Borg, S. (2006). Teacher cognition and language education: Research and practice. London: Continuum. | |
Carless, D., Joughin, G., & Mok. M. M. C. (2007). Learning-oriented assessment: Principles and practice. | |
Assessment & Evaluation in Higher Education, 31, 395-398. | |
Chang, J. L, Wang, W. Z., & Yong, H. (2003). Measurement of Fracture Toughness of Plasma-Sprayed Al2O3 | |
Coatings Using a Tapered Double Cantilever Beam Method. Journal of the American Ceramic Society, | |
86(8), 1437-1439. https://doi.org/10.1111/j.1151-2916.2003.tb03491.x | |
Chang, T-S. (2002). Student ratings of instruction. Taipei, Taiwan: Yung Zhi. | |
Cillessen, A. H. N., & Lafontana, K.M. (2002). Children’s perceptions of popular and unpopular peers: A | |
multimethod | |
assessment. | |
Developmental | |
Psychology, | |
38(5), | |
635-647. | |
https://doi.org/10.1037/0012-1649.38.5.635 | |
Fraenkel, J. R., & Wallen, N. E. (2003). How to design and evaluate research in education (5th ed.). Boston: | |
McGraw-Hill. | |
Gibbs, G. (2006). How assessment frames student learning. In C. Bryan, & K. Clegg (Eds.), Innovative | |
Assessment in Higher Education (pp. 23-36). London: Routledge. | |
Cooper, D., & Schindler, P. S. (2006). Business research methods (9th ed.). New York: McGraw-Hill Companies, | |
Inc. | |
Greenwald, A. G. (2002). Constructs in student ratings of instructors. In H. I. Braun, D. N. Jackson, & D. E. | |
Wiley (Eds.), The role of constructs in psychological and educational measurement, 24(3), 193-202, New | |
York: Erlbaum. | |
Guthrie, E. R. (1954). The evaluation of teaching: A progress report. Seattle: University of Washington. | |
Liu, C-W. (2011). The implementation of teacher evaluation for professional development in primary education | |
in Taiwan. (Doctoral dissertation). Retrieved from Dissertation.com, Boca Raton, Florida. | |
Ministry of Education (Taiwan) (MOE). (2005). Ministry of Education News: college law. Retrieved from | |
October 31, 2016, from http://tece.heeact.edu.tw/main.php. | |
Murray, H. G. (2005). Student evaluation of teaching: has it made a difference? In the Annual meeting of the | |
society for teaching and learning in higher education, June 2005 (pp.1-15). Charlottetown, Prince Edward | |
63 | |
elt.ccsenet.org | |
English Language Teaching | |
Vol. 11, No. 1; 2018 | |
Island, Canada. | |
Obenchain, K. M., Abernathy, T. V., & Wiest, L. R. (2001). The reliability of students’ ratings of faculty teaching | |
effectiveness. College Teaching, 49(3), 100-104. https://doi.org/10.1080/87567550109595859 | |
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge. | |
Seldin, P. (1993a). How colleges evaluate professors: 1983 versus 1993. AAHE Bulletin, 12, 6-8 | |
Seldin, P. (1993b). The use and abuse of student ratings of professors. Bolton, MA:Anker. | |
Sproull, J. (2002). Personal communication with authors, University of Edinburgh. | |
Teddlie, C., & Yu, F. (2007). Mixed methods sampling: a typology with examples. Journal of Mixed Methods | |
Research, 1(1), 77-100. https://doi.org/10.1177/2345678906292430 | |
West-Burnham, J., O’Neill, J., & Bradbury, I. (Eds.) (2001). Performance management in schools: How to lead | |
and manage staff for school improvement. London, UK: Person Education. | |
Wolfer, T., & Johnson, M. (2003). Re-evaluating student evaluation of teaching: The Teaching Evaluation Form. | |
Journal of Social Work Education, 39, 111-121. | |
Yorke, M. (2003). Formative assessment in higher education: Moves towards theory and enhancement of | |
pedagogic practice. Higher Education, 45, 477-501. https://doi.org/10.1023/A:1023967026413 | |
Copyrights | |
Copyright for this article is retained by the author(s), with first publication rights granted to the journal. | |
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution | |
license (http://creativecommons.org/licenses/by/4.0/). | |
64 | |
Advances in Engineering Education | |
FALL 2020 VOLUME 8 NUMBER 4 | |
Supportive Classroom Assessment for Remote Instruction | |
RENEE M. CLARK | |
MARY BESTERFIELD-SACRE | |
AND | |
APRIL DUKES | |
University of Pittsburgh | |
Pittsburgh, PA | |
ABSTRACT | |
During the summer 2020, when remote instruction became the norm for universities due to | |
COVID-19, expectations were set at our school of engineering for interactivity and activity within | |
synchronous sessions and for using technology for engaging asynchronous learning opportunities. | |
Instructors were asked to participate in voluntary assessment of their instructional techniques, and | |
this “supportive” assessment was intended to enable growth in remote teaching as well as demonstrate excellence in the School’s instruction. Preliminary results demonstrated what is possible | |
with voluntary assessment with a “support” focus – namely instructor willingness to participate and | |
encouragement in the use of desirable teaching practices. | |
Key words: Assessment, COVID-19, remote learning | |
INTRODUCTION AND BACKGROUND | |
For many faculty, the last five weeks of the spring 2020 semester represented a time of “persisting | |
through” to the end of the semester after a heavily-unforeseen, rapid change from ordinary campus life and learning to remote education. At the University of Pittsburgh’s Swanson School of | |
Engineering, there were different expectations, however, for the summer 2020 semester, as the | |
Associate Dean for Academic Affairs established a “new norm” for remote instruction by setting | |
expectations regarding interactivity and activity in synchronous classroom sessions as well as the | |
use of technology for creating engaging, high-quality asynchronous learning resources. These expectations were supported by multiple synchronous training sessions for faculty prior to the start | |
FALL 2020 VOLUME 8 NUMBER 4 | |
1 | |
ADVANCES IN ENGINEERING EDUCATION | |
Supportive Classroom Assessment for Remote Instruction | |
of the summer semester. In addition, instructors were asked to participate in voluntary assessment | |
of their summer instruction via interviews with and classroom observation by the School’s Assessment Director. This voluntary activity had a two-fold purpose, namely 1) to perform “supportive,” | |
as opposed to summative, assessment, to enable growth and development in remote online teaching, and 2) to demonstrate to others excellence in the School’s instruction. The authors believe this | |
voluntary program was particularly noteworthy because it was considered an assessment program; | |
however, a very supportive aspect was also involved, namely upfront planning assistance (via an | |
instructional checklist developed via faculty discussions), in-class coaching and observation, and | |
follow-up formative verbal and written feedback. Thus, this voluntary “assessment” program had | |
concomitant supportive aspects. | |
This supportive assessment program consisted of both 1) one-on-one instructional planning and | |
coaching intended to encourage participation, and 2) formative assessment and feedback. This | |
program was rooted in previous work by the Assessment Director (AD), in which she had used an | |
individualized, social-based approach involving instructional coaching to propagate active learning | |
within the engineering school [1]. Her previous work was based on the writings of Charles Henderson, | |
Dancy, and colleagues, which advanced the idea that educational change may best occur through | |
socially-driven and personalized practices, such as informal communication, interpersonal networks, | |
collegial conversations, faculty communities, and support provided during change and implementation [2–4]. The AD’s previous work was also grounded in the professional development literature | |
indicating that adult professional learning must be personalized, including support with upfront | |
planning, during classroom implementation, and via evaluation [5–7]. Classroom observation is one | |
such form of support during classroom implementation [6–11]. | |
METHODS | |
In the two weeks prior to the start of the summer semester, synchronous training and information sessions via Zoom video conferencing were held for instructors to promote desired teaching | |
techniques and approaches in the remote online environment. The training and information sessions, which were one hour in length and conducted during the lunch hour, covered the following | |
topics: 1) Online Classroom Organization and Communication, 2) Using Zoom for Active Learning, | |
3) Active Learning with Classroom Assessment Techniques (CATs), 4) Inclusive Online Teaching, | |
and 5) Voluntary Supportive Assessment. | |
During the information session on voluntary assessment, the Assessment Director described the | |
plan shown in Table 1, which was based on the framework discussed in Introduction & Background. | |
2 | |
FALL 2020 VOLUME 8 NUMBER 4 | |
ADVANCES IN ENGINEERING EDUCATION | |
Supportive Classroom Assessment for Remote Instruction | |
Table 1. Voluntary Assessment Program. | |
1. Individual interview with instructor (e.g., Zoom, phone, email) | |
a. Review Planning and Observational Checklist | |
b. Discuss plans for classroom observation (if applicable and desired) | |
c. Discuss plans for other support or review (e.g., review of course materials) if desired | |
2. Observe class session if applicable | |
a. Provide written feedback to instructor | |
3. Provide other review or support as desired | |
a. Provide written feedback to instructor | |
4. Provide acknowledgment of instructor participation to Associate Dean | |
5. Future discussion, interview, or email communications with instructor (as follow-up) | |
6. Create concise written summary (e.g., table/template) whereby excellence in teaching can be demonstrated | |
Thus, the assessment program was socially-based and involved one-on-one discussions with each | |
instructor about his/her instructional plans, classroom observation using the COPUS observational | |
protocol [12], determination of additional types of review or support desired, provision of written | |
feedback to the instructor, and future follow-up communications with the instructor. The initial | |
interview/discussion with the instructor was guided by a customized checklist created by a faculty | |
team to assist the instructor with his/her planning as well as enable the Assessment Director to | |
document actual practices observed or otherwise determined. The various sections of the checklist | |
are as follows: 1) Synchronous instruction and methods for interactivity, activity, and “changing up” | |
of lecture, 2) Asynchronous instruction, including flipped instruction, and methods such as videos, | |
readings, accountability quizzes, and in-class exercises, 3) Learning Management System (LMS) use | |
and organization, 4) Communication methods with students, 5) Assessment of learning approaches, | |
submission methods, and student feedback plans, and 6) Academic integrity promotion. | |
Given that the program was voluntary, each instructor’s participation was acknowledged to the | |
Associate Dean in a weekly bulk email. This email described desirable practices witnessed during | |
assessment activity with the instructor that week (e.g., via classroom observation). Each instructor discussed in the email was cc’d to drive community among the participants, with the hope of | |
potentially creating small learning communities. | |
PRELIMINARY RESULTS | |
Of the 31 summer instructors, 16 (52%) volunteered to participate in the assessment following the | |
information session. We believe this participation metric was noteworthy given the program was | |
FALL 2020 VOLUME 8 NUMBER 4 | |
3 | |
ADVANCES IN ENGINEERING EDUCATION | |
Supportive Classroom Assessment for Remote Instruction | |
one of voluntary-based assessment. This “supportive” assessment proactively began immediately at | |
the start of the summer semester. At approximately five weeks into the summer semester, an initial | |
interview, classroom observation, and/or “other review” had occurred with 15 instructors and so | |
the assessment was formative and supportive, versus summative. A plan was made to observe the | |
remaining instructor later in the summer given the schedule of the course. The following examples of | |
desirable instructional practices, which were communicated to the Associate Dean, were observed | |
by the Assessment Director: | |
• Not only did Instructor 1 create a classroom in which the expectation was activity and | |
engagement, but his flipped classroom was notable for the positive environment in which he | |
thanked students for their responses, randomly asked students if they would mind answering | |
questions, and always provided positive feedback on the responses. The classroom execution | |
was flawless, including circulation among 11 breakout rooms for group work. | |
• Instructor 2 made use of the Top Hat software and simple classroom assessment techniques | |
(CATs), such as the Minute Paper, to drive interactivity and engagement. He also desired to | |
use Zoom for this purpose (i.e., Polling or Chat window). | |
• Instructor 3 created an asynchronous class design using Panopto videos with embedded | |
accountability quizzes and reflective questions, all exceptionally laid out for students in | |
Canvas. She held a live Zoom Q&A session to highlight the week’s material, pose questions, and answer questions. The students responded to questions and asked their own | |
questions. | |
• Instructor 4 ran a blended classroom, in which he conducted both synchronous Zoom lecture | |
sessions and provided content videos via Panopto. Students took a quiz in Canvas to drive | |
accountability with the videos during class. There was interactive lecture, in which students | |
were highly responsive by asking and answering questions via chat and verbally. | |
These sample results demonstrate what is possible with a voluntary assessment program with | |
a “support” focus given strong leadership that provides learning and training opportunities for | |
instructors – namely instructor willingness to participate as well as support for desirable teaching | |
practices. An anonymous survey distributed to the instructors near the end of the semester indicated an average rating of 3.88 on a 5-point scale regarding the helpfulness and usefulness of the | |
classroom observation and other formative feedback offered (57% response rate). In the words of | |
one participant, “I got a professional review of my strategy for remote teaching, and a check on my | |
early implementation. Assessment provided me with a positive reinforcement that gave me assurance | |
and encouraged me to move forward. I was offered a broad range of helpful support that reassured | |
me that I could rely on opportune help when needed. I do appreciate it very much!” In the words of | |
another, “…Also, just the act of being evaluated makes me reflect more on my teaching methods.” | |
4 | |
FALL 2020 VOLUME 8 NUMBER 4 | |
ADVANCES IN ENGINEERING EDUCATION | |
Supportive Classroom Assessment for Remote Instruction | |
NEXT STEPS AND FUTURE PLANS | |
Given the relatively larger number of courses in the fall semester, this assessment program will | |
be continued on an “as requested” basis for instructors. It is worth noting that there was a time | |
commitment by the Assessment Director and that (in general), individualized coaching is time-wise | |
expensive [13]. However, evidence suggests that the effectiveness of professional development | |
for instructors, including coaching, is positively associated with the intensity of the support [14]. | |
Thus, seeing what was possible with this supportive voluntary assessment program in the summer | |
suggests that committing the right resources (i.e., both in number and supportiveness) may be an | |
avenue to propelling remote instruction to higher levels. | |
REFERENCES | |
1. Clark, R., Dickerson, S., Bedewy, M., Chen, K., Dallal, A., Gomez, A., Hu, J., Kerestes, R., & Luangkesorn, L. (2020). SocialDriven Propagation of Active Learning and Associated Scholarship Activity in Engineering: A Case Study. International | |
Journal of Engineering Education, 36(5), 1–14. | |
2. Dancy, M., Henderson, C., & Turpen, C. (2016). How faculty learn about and implement research-based instructional | |
strategies: The case of peer instruction. Physical Review Physics Education Research, 12(1), 010110-010110–17. | |
3. Dancy, M., & Henderson, C. (2010). Pedagogical practices and instructional change of physics faculty. American | |
Journal of Physics, 78(10), 1056–1063. | |
4. Foote, K., Neumeyer, X., Henderson, C., Dancy, M., & Beichner, R. (2014). Diffusion of research-based instructional | |
strategies: the case of SCALE-UP. International Journal of STEM Education, 1(1), 1–18. | |
5. Rodman, A. (2019). Personalized Professional Learning: A Job-Embedded Pathway for Elevating Teacher Voice. | |
Alexandria, VA: ASCD, pp. 1–9. | |
6. Desimone, L. M., & Pak, K. (2017). Instructional coaching as high-quality professional development. Theory Into | |
Practice, 56(1), 3–12. | |
7. Rhodes, C., Stokes, M., & Hampton, G. (2004). A practical guide to mentoring, coaching and peer-networking: Teacher | |
professional development in schools and colleges. London: Routledge, pp. 25, 29–30. | |
8. Braskamp, L., & Ory, J. (1994). Assessing Faculty Work. San Francisco: Jossey-Bass Inc., 202. | |
9. Keig, L., & Waggoner, M. (1994). Collaborative peer review: The role of faculty in improving college teaching. | |
ASHE-ERIC Higher Education Report No. 2. Washington, DC: The George Washington University, School of Education | |
and Human Development, 41–42. | |
10. Reddy, L. A., Dudek, C. M., & Lekwa, A. (2017). Classroom strategies coaching model: Integration of formative | |
assessment and instructional coaching. Theory Into Practice, 56(1), 46–55. | |
11. Gallucci, C., Van Lare, M., Yoon, I., & Boatright, B. (2010). Instructional coaching: Building theory about the role | |
and organizational support for professional learning. American Educational Research Journal, 47(4), 919–963. | |
12. Smith, M., Jones, F., Gilbert, S., & Wieman, C. (2013). The classroom observation protocol for undergraduate | |
STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE-Life Sci. Educ., 12(4), 618–627. | |
FALL 2020 VOLUME 8 NUMBER 4 | |
5 | |
ADVANCES IN ENGINEERING EDUCATION | |
Supportive Classroom Assessment for Remote Instruction | |
13. Connor, C. (2017). Commentary on the special issue on instructional coaching models: Common elements of | |
effective coaching models. Theory into Practice, 56(1), 78–83. | |
14. Devine, M., Houssemand, C., & Meyers, R. (2013). Instructional coaching for teachers: A strategy to implement new | |
practices in the classrooms. Procedia-Social and Behavioral Sciences, 93, 1126–1130. | |
AUTHORS | |
Renee M. Clark is Research Assistant Professor of Industrial Engineering and | |
Director of Assessment for the Swanson School of Engineering at the University | |
of Pittsburgh. Dr. Clark’s research focuses on assessment of active learning and | |
engineering professional development initiatives. Her research has been funded | |
by the NSF and the University of Pittsburgh’s Office of the Provost. | |
Mary Besterfield-Sacre is Nickolas A. DeCecco Professor, Associate Dean for | |
Academic Affairs, and Director of the Engineering Education Research Center in | |
the Swanson School of Engineering at the University of Pittsburgh. Dr. Sacre’s | |
principal research is in engineering education assessment, which has been | |
funded by the NSF, Department of Education, Sloan Foundation, Engineering | |
Information Foundation, and VentureWell. | |
April Dukes is the Faculty and Future Faculty Program Director for the | |
Engineering Education Research Center in the Swanson School of Engineering | |
at the University of Pittsburgh. Dr. Dukes facilitates professional development | |
on instructional best practices for current and future STEM faculty for both | |
synchronous online and in-person environments. | |
6 | |
FALL 2020 VOLUME 8 NUMBER 4 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
Timeless Principles for Effective Teaching and Learning: A Modern | |
Application of Historical Principles and Guidelines | |
R. Mark Kelley1,*, Kim Humerickhouse2, Deborah J. Gibson3 & Lori A. Gray1 | |
1 | |
School of Interdisciplinary Health Programs, Western Michigan University, Kalamazoo, MI, USA | |
2 | |
Department of Teacher Education, MidAmerica Nazarene University, Olathe, KS, USA | |
3 | |
Department of Health and Human Performance, University of Tennessee at Martin, Martin, TN, USA | |
*Correspondence: School of Interdisciplinary Health Programs, Western Michigan University, 1903 W. Michigan | |
Ave., Kalamazoo, MI, 49008, USA. Tel: 1-269-387-1097. E-mail: [email protected] | |
Received: February 13, 2021 | |
doi:10.5430/wje.v11n3p1 | |
Accepted: May 23, 2021 | |
Online Published: June 2, 2021 | |
URL: https://doi.org/10.5430/wje.v11n3p1 | |
Abstract | |
The purpose of this study is twofold: (a) to assess the perceived relevance of the Seven Timeless Principles and | |
guidelines posited by Gregory (1886) for current educators and educators-in-training and (b) to develop and pilot test | |
the instrument needed to accomplish the former. The “Rules for Teachers” Gregory attributes to each of these laws | |
were used as guidelines to develop an assessment instrument. Eighty-four educators and future educators across three | |
universities participated in an online survey using a 4-point Likert scale to evaluate the consistency of Gregory’s | |
guidelines with modern best-teaching practices. Responses were framed within the Timeless Principles, providing a | |
measure of pedagogical universality. Total mean scores for all principles and guidelines were greater than 3.0, | |
suggesting that Gregory had indeed identified foundational principles of teaching and learning that maintain | |
relevance across academic disciplines and in a variety of settings in which learning occurs. | |
Keywords: teaching and learning, principles of teaching, historical pedagogy, educational principles | |
1. Introduction | |
In 1886, John Milton Gregory published a book entitled The Seven Laws of Teaching that offered a set of principles | |
to support and strengthen teachers’ capabilities systemically and comprehensively. The primary purpose of this study | |
was to explore whether Gregory’s principles are consistent with faculty and student perceptions of 21st century best | |
teaching practices. To accomplish the primary purpose, a secondary goal of the study was to pilot and provide | |
evidence of reliability and validity of an instrument based on Gregory’s principles and guidelines. The study | |
evaluated the value and relevance of these 19th century principles to modern teachers via a researcher-developed | |
instrument using the guidelines established within each of Gregory’s principles and then presented results to validate | |
concept transferability. After examining the basic structure of The Seven Laws of Teaching in the context of modern | |
approaches, we suggest that these seven laws represent Timeless Principles of the science and art of teaching. | |
1.1 Background | |
Discussions of foundational principles that frame effective teaching are not unique to Gregory, and the educational | |
literature contains an abundance of suggested principles, strategies, and guidelines. Thorndike (1906) identified three | |
essential principles that included readiness, exercise, and effect. The Law of Readiness suggested that a child must be | |
ready to learn in order to learn most efficiently. It is the responsibility of the teacher to develop the readiness to learn | |
in the student. The Law of Exercise is further divided into the Law of Use and Law of Disuse. Repetition strengthens | |
understanding, and practice makes perfect. Conversely, if one does not “use it,” they tend to “lose it.” It is the | |
responsibility of the teacher to ensure practice is interesting and meaningful in order to enhance learning. | |
Thorndike’s Law of Effect suggests that: (a) actions that elicit feelings of pleasure and satisfaction enhance effective | |
learning, (b) any action met with frustration and annoyance will likely be avoided, and (c) success breeds success | |
and failure leads to further failure. | |
Published by Sciedu Press | |
1 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
Rosenshine and Furst (1971) conducted what is considered the first literature review of the research addressing | |
principles for effective teaching. They outlined five “most important” teacher-effectiveness variables, which include: | |
clarity, variability, enthusiasm, task-oriented behavior, and student opportunity to learn criterion material. Almost 30 | |
years later, Walls (1999) posited four similar criteria, including outcomes, clarity, engagement, and enthusiasm. Walls | |
stressed that it is important for students to understand the direction in which the teacher is guiding their | |
learning—and the teacher’s intentions for going there—by providing clear goals and related learning outcomes. It is | |
vital to build upon what students already know while making material as clear as possible. | |
In 1987, Chickering and Gamson posited seven principles that they argued are representative of good practice in | |
undergraduate education: (a) encourage contacts between students and faculty, (b) develop reciprocity and | |
cooperation among students, (c) use active learning techniques, (d) give prompt feedback, (e) emphasize time on task, | |
(f) communicate high expectations, and (g) respect diverse talents and ways of learning. These seven principles are | |
“intended as guidelines for faculty members, students, and administrators to improve teaching” (Chickering & | |
Gamson, 1987, p. 3). | |
Walls (1999) agreed with Thorndike (1906) that students must be engaged to learn, stressing the importance of active | |
learning, which encompasses aspects of Thorndike’s laws. Students must be engaged to learn, as people learn what | |
they practice (Law of Exercise). Both the student and the teacher should be enthusiastic about the learning (Law of | |
Effect); if the teacher does not enjoy the teaching, how can students be expected to enjoy the learning? | |
More recently, distinct approaches have offered an element of novelty but ultimately integrated pre-existing | |
principles. Perkins (2008) used baseball as a metaphor to depict his principles of teaching. The principles set the | |
stage for what Perkins further referred to as conditions and principles of transfer. The principles include: (a) play the | |
whole game (develop capability by utilizing holistic work); (b) make the game worth playing (engage students | |
through meaningful content); (c) work on the hard parts (develop durable skills through practice, feedback, and | |
reflection); (d) play out of town (increase transfer of knowledge with diverse application of experiences); (e) play the | |
hidden game (sustain active inquiry); (f) learn from the team (encourage collaborative learning); and (g) learn the | |
game of learning (students taking an active role in their learning). | |
Tomlinson’s (2017) differentiation emphasized the need for teachers to respond dynamically within a given | |
classroom by varying (“differentiating”) instruction to meet student needs. Conceptually, Tomlinson identified | |
respectful tasks, ongoing assessment and adjustment, and flexible grouping as general principles driving | |
differentiation while identifying the primary domains of the teacher (content, process, and product) and the student | |
(readiness, interests, and learning profile). | |
Beyond the contributions of individual approaches, the past 20 years have also seen an increase in collaborative, | |
research-based recommendations for educational principles that draw upon the experiences of educators, researchers, | |
and policymakers. Workforce entry and academic preparation for college have been the primary aspects of these | |
recommendations. The InTASC Model Core Teaching Standards delineated competencies based on key principles | |
that are intended to be mastered by the teacher (Council of Chief State School Officers, 2011). It is anticipated that | |
proficiency in these standards supports sufficient preparation for K-12 students to succeed in college and to obtain | |
the skill sets needed for a future workplace. Preparing 21st Century Students for a Global Society set forth four skills | |
found to be most important, including critical thinking, communication, collaboration, and creativity, and stated, | |
“What was considered a good education 50 years ago, however, is no longer enough for success in college, career, | |
and citizenship in the 21st century” (National Education Association, 2012, p. 3). | |
In specific academic disciplines, similar discussions and statements have been made. For example, in the field of | |
health education and promotion, Auld and Bishop (2015) stated that “given today’s rapid pace of change and health | |
challenges, we are called to identify, adapt and improve key elements that make teaching and learning about health | |
and health promotion successful” (p. 5). Pruitt and Epping-Jordan (2005) discussed the need to develop a new | |
approach to training for the 21st century global healthcare workforce. Regardless of approach or discipline, there is a | |
clear desire among educators to identify a universal set of principles to guide effective teaching. | |
1.2 Overview of the Seven Laws of Teaching | |
Gregory (1886) drew upon the metaphor of examining natural laws or phenomena to define the foundational | |
principles that govern effective teaching. In step with what is now recognized as a positivist paradigm, Gregory | |
believed that in order to understand such laws, one must subject the phenomenon to scientific analysis and identify | |
its individual components. Gregory (1886) posited that the essential elements of “any complete act of teaching” are | |
composed of: | |
Published by Sciedu Press | |
2 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
Seven distinct elements or factors: (1) two personal factors—a teacher and a learner; (2) two mental factors—a | |
common language or medium of communication, and a lesson or truth or art to be communicated; and (3) three | |
functional acts or processes—that of the teacher, that of the learner, and a final or finishing process to test and | |
fix the result. (p. 3) | |
Further, he argued that regardless of whether that which to be learned is a single fact requiring a few minutes or a | |
complex concept requiring a lesson of many hours, all seven of these factors must be present if learning is to occur; | |
none can be missing. For the purposes of this article, the concept of a “law” of teaching as expressed by Gregory | |
(1886) has been re-termed to be a “principle.” We also embraced Gregory’s general grouping of these elements as | |
key dimensions of the Seven Principles (i.e., actors, mental factors, functional processes, and finishing acts). | |
1.2.1 The Seven Principles Stated | |
There are a variety of ways that these seven principles can be expressed. Gregory (1886) first stated the overarching | |
principles, then expressed them as direct statements for teachers to follow in their pursuits. Below are the principles | |
exactly as Gregory wrote them (emphasis his own): | |
1) The Principle of the Teacher: A teacher must be one who KNOWS the lesson or truth or art to be taught... [As | |
expressed to teachers:] Know thoroughly and familiarly the lesson you wish to teach,—teach from a full mind and a | |
clear understanding. | |
2) The Principle of the Learner: A learner is one who ATTENDS with interest to the lesson given.… [As expressed | |
to teachers:] Gain and keep the attention and interest of the pupils upon the lesson. Do not try to teach without | |
attention. | |
3) The Principle of the Language: The language used as a MEDIUM between teacher and learner must be | |
COMMON to both... [As expressed to teachers:] Use words understood in the same way by the pupils and | |
yourself—language clear and vivid to both. | |
4) The Principle of the Lesson: The lesson to be mastered must be explicable in terms of truth already known by the | |
learner—the UNKNOWN must be explained by means of the KNOWN… [As expressed to teachers:] Begin with | |
what is already well known to the pupil upon the subject and with what [they themselves] experienced,—and | |
proceed to the new material by single, easy, and natural steps, letting the known explain the unknown. | |
5) The Principle of the Teaching Process: Teaching is AROUSING and USING the pupil’s mind to grasp the | |
desired thought... [As expressed to teachers:] Stimulate the pupil’s own mind to action. Keep [their] thoughts as | |
much as possible ahead of your expression, placing [their] in the attitude of a discoverer, an anticipator. | |
6) The Principle of the Learning Process: Learning is THINKING into one’s own UNDERSTANDING a new idea | |
or truth… [As expressed to teachers:] Require the pupil to reproduce in thought the lesson [they are] | |
learning—thinking it out in its parts, proofs, connections and applications till [they] can express it in [their] own | |
language. | |
7) The Principle of Review: The test and proof of teaching done—the finishing and fastening process—must be a | |
REVIEWING, RETHINKING, RE-KNOWING, REPRODUCING, and APPLYING of the material that has been | |
taught… [As expressed to teachers:] Review, review, REVIEW, reproducing correctly the old, deepening its | |
impression with new thought, linking it with added meanings, finding new applications, correcting any false views, | |
and completing the true. (Gregory, 1886, pp. 5-7) | |
1.2.2 Essentials of Successful Teaching Using the Seven Principles | |
There are a variety of understandings that are essential for applying these Seven Principles to effective teaching. The | |
first understanding is that the Seven Principles are both necessary and sufficient for effective teaching. Gregory | |
(1886) stated that “these rules, and the laws which they outline and presuppose, underlie and govern all successful | |
teaching. If taken in their broadest meaning, nothing need be added to them; nothing can be safely taken away” (p. 7). | |
He posited that when these principles are used in conjunction with “good order,” no teacher need be concerned about | |
failing as a teacher, provided each principle is paired with effective behavior management. Thus, Gregory indicated | |
that profound understanding and consistent application of these principles forms the foundation for all successful | |
teaching and learning experiences. | |
Another understanding essential for successful teaching with the principles is the deceptiveness of their simplicity. At | |
first review, it is easy for the reader to conclude that these principles “seem at first simple facts, so obvious as | |
scarcely to require such formal statement, and so plain that no explanation can make clearer their meaning” (Gregory, | |
1886, p. 8). As one begins to examine the applications and effects of these principles, it becomes apparent that while | |
Published by Sciedu Press | |
3 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
there is constancy, there is also opportunity for variation as each teacher finds their personal expression of each | |
principle. | |
The functionality of the principles is not temporally constrained; the principles are as applicable for the 21st century | |
teacher as they were for teachers of the 19th century. For example, while the language of the learners of the 1800s | |
was likely to have been substantially different from the language of the learners of the 2000s, teachers must prepare | |
their lesson with the language of their learners in mind regardless of the century in which they taught or are teaching. | |
Gregory’s (1886) principles offer a basis for modern strategies and theories of teaching and learning that is consistent | |
with broader philosophies of education. For this reason, we will refer to them as the Seven Timeless Principles. | |
The ubiquitous nature of these Seven Timeless Principles needs to be understood in order for the principles to be | |
applied in effective teaching. Gregory (1886) stated that the laws “cover all teaching of all subjects and in all grades, | |
since they are the fundamental conditions on which ideas may be made to pass from one mind to another, or on | |
which the unknown can become known” (p. 8). In this way, he suggested that the principles are just as applicable to | |
the elementary school teacher as they are to the college professor, equally important to the music teacher as to the | |
health teacher. | |
Associated with each principle were what Gregory (1886) described as “Rules for Teachers” (p. 31). These rules | |
herein subsequently will be referred to as guidelines. These guidelines detail the core components that shape each | |
principle. For example, a guideline under the Teacher Principle would be: “Prepare each lesson by fresh study. Last | |
year’s knowledge has necessarily faded somewhat” (Gregory, 1886, p. 20). A guideline posited for the Learner | |
Principle: “Adapt the length of the class exercise to the ages of the pupils: the younger the pupils the briefer the | |
lesson” (Gregory, 1886, p. 30). | |
1.3 Significance and Study Objective | |
Gregory’s (1886) original work has been recognized as making valuable contributions to the teaching and learning | |
process in some circles (Stephenson, 2014; Wilson, 2014). In a recent reprint of Gregory’s first edition text, | |
Stephenson (2014) provided supplemental materials that included study questions, self-assessment, and a sample | |
teacher observation form. In the same book, Wilson (2014) argued that one of the essential elements of effective | |
teaching is that teachers understand the distinction between the methods of teaching and the principles of teaching. | |
Wilson (2014) stated, “Methods change. They come and go. In the ancient world, students would use wax tablets to | |
take notes, and now they use another kind of tablet, one with microchips inside” (p. 4). Wilson suggested that a | |
teacher using the methods of wax or stone needed to know what was going to be said and why just as much as a | |
teacher using the methods of a smart board or computer in today’s classroom. The purpose of this study is twofold: (a) | |
to assess the perceived relevance of the Seven Timeless Principles and guidelines posited by Gregory (1886) for | |
current educators and educators-in-training and (b) to develop and pilot test the instrument needed to accomplish the | |
former. The research hypothesis of this study is that the principles and guidelines posited by Gregory are affirmed as | |
relevant by current and future educators. The approach is to translate Gregory’s guidelines into a survey instrument | |
capable of providing evidence of the value of the overarching principles. | |
2. Method | |
2.1 Research Design | |
This research was an exploratory study with a cross-sectional design that used a convenience sample. Research sites | |
were chosen because of their accessibility to the researchers. The research protocol was approved by the institutional | |
review boards (IRBs) of all of the institutions with which the authors are affiliated. | |
2.2 Sample and Participant Selection | |
The participants for this study consisted of current educators and educators-in-training. The current educators were | |
higher education professors from three universities ranging in size from small- to mid-sized: one in the South, one in | |
the Midwest, and one in the North. The educators in training participants were students enrolled in the undergraduate | |
teacher education programs at two of the universities. Recruitment for all participants was conducted via an email or | |
in-class invitation to participate in the research project by completing the survey. | |
Student participants were recruited from two classes: an introduction to teacher education course and a senior-level | |
course. Surveys were taken by students prior to participating in their student teaching experience, and bonus points | |
were offered for participation. Faculty participants were recruited through the faculty development process, though | |
participation in the process was not required to participate in the survey. All participants voluntarily completed the | |
Published by Sciedu Press | |
4 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
survey after reading and acknowledging the informed consent form. | |
2.3 Data Collection and Analysis | |
Participant invitations and all surveys were administered in the 2018 spring and fall academic semesters using | |
Google Forms, from which aggregated data were downloaded. Statistics for descriptive and reliability analyses were | |
generated using SPSS Version 26 software. Means and standard deviations were calculated for all 43 guidelines, | |
including all aggregate groupings for principles and dimensions. To affirm reliability of the instrument and the | |
subscales, Cronbach’s alphas were computed on the total scale and on each of the principle subscales. | |
2.4 Institutional Approvals and Ethical Considerations | |
The protocol of this project was approved by the IRBs of Western Michigan University, Mid-America Nazarene | |
University, and University of Tennessee at Martin. Prior to completing the electronic survey, each potential | |
participant reviewed an IRB-approved informed consent form online. Potential participants who agreed to participate | |
clicked on the “proceed to survey” button, which led them to the initial questions of the survey. The informed | |
consent notified participants that they could discontinue participation at any time. | |
Participant confidentiality and anonymity were protected through the security of the Google survey management | |
system and the encrypted, password-protected security of the investigators’ university computers. There is limited | |
psychometric risk to participation in an online survey. No prior psychometric data were available for the instrument, | |
as one purpose of this study was to pilot its use. | |
2.5 Instrument Development: Assessment and Measures | |
The instrument used in this research was developed by the authors and is based upon Gregory’s (1886) Seven | |
Timeless Principles. The instrument contains two basic components. The first component of the survey was basic | |
demographic information, including: age, binary gender, race, level of involvement in teaching, and primary | |
academic discipline. No identifying information beyond the above-mentioned variables was collected. | |
The second component of the instrument was developed directly from the guidelines for teachers described by | |
Gregory (1886) to measure teacher perception of the guidelines’ modern relevance. Each guideline was used as an | |
item on the instrument. Evidence of face validity was obtained by a panel of education professionals who reviewed | |
each of the guidelines for its relevance to the principle with which it was associated. In some instances, minor | |
changes were made to the language of Gregory’s guidelines in order present the content in more modern language. | |
Care was taken to ensure that each statement accurately reflected its original meaning. | |
The final instrument consisted of five demographic items and 43 items related to the guidelines for effective teaching, | |
creating a Timeless Principles Scale. The items (guidelines) associated with each of the seven principles were | |
combined into subscales comprised of n items (i.e., Principle of the Teacher [n = 6], Principle of the Learner [n = 6], | |
Principle of the Language [n = 6], Principle of the Lesson [n = 6], Principle of the Teaching Process [n = 9], Principle | |
of the Learning Process [n = 4], and Principle of Review and Application [n = 6]). | |
Using a 4-point Likert scale, participants affirmed or rejected the perceived relevance of each item (guideline) as it | |
relates to teacher best practices in 21st century educational settings (1 = strongly disagree to 4 = strongly agree). | |
Means and standard deviations were computed for the total scale, for each of the subscales, and for each of the 43 | |
items. Responses and mean scores of 3.0 or greater were considered affirming of the relevance of the principle and/or | |
guideline for current teaching and learning. | |
3. Results | |
3.1 Demographics | |
Of the 84 educators and education students who participated in the study, 86.9% identified as White and 9.6% | |
identified as African American/Black, Hispanic, Asian, or Native American; 3.6% did not identify race. The majority | |
of participants were female (57.1%), with 39.3% participants identifying as male and 3.6% that did not identify | |
gender. With regard to primary discipline, health sciences was most common (25.0%), followed by physical sciences | |
(15.5%), behavioral sciences (13.1%), and social sciences (11.9%). Humanities, language arts, music or fine arts, and | |
physical education represented 8.3%, 9.5%, 4.8%, and 2.4% of disciplines, respectively. | |
The majority of respondents were educators in higher education settings (70.2%). Education students represented | |
28.6% of participant responses, and other workforce professionals represented 1.2% of the sample. Most participants | |
in higher education reported employment at a full-time level (44% of total), with 4.8% reporting a part-time teaching | |
Published by Sciedu Press | |
5 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
position. Of the 59 total teachers/professors, 50% stated that their education included training and course work in | |
effective teaching practices. | |
3.2 Total Scale | |
Mean and standard deviation scores for the total scale, the subscales, and for each item are presented in Table 1. The | |
mean total score for the Timeless Principles Scale (consisting of all 43 items on the instrument) was 3.37 with a | |
standard deviation of 0.348 (see Table 1). This result indicates that participants agreed overall, and were inclined to | |
strongly agree, with the guidelines and principles identified by Gregory’s (1886) laws. Cronbach’s alpha calculated | |
for the total scale was 0.954, indicating a high level of internal consistency and that the total scale is reliable. | |
Table 1. Mean, Standard Deviation, and Cronbach’s Alpha Scores for the Timeless Principles Scale | |
Item | |
Timeless Principles: Total Scale | |
Principle of the Teacher - An effective teacher should: | |
1) Prepare each lesson by fresh study. Last year’s knowledge has necessarily faded somewhat. | |
2) Find the connection of the lesson to the lives and duties of the learners. Its practical value lies in | |
these connections. | |
3) Keep in mind that complete mastery of a few things is better than an ineffective smattering of many. | |
4) Have a plan of study, but do not hesitate, when necessary, to study beyond the plan. | |
5) Make use of all good books and resources available to you on the subject of the lesson. | |
6) Get the help of the best scholars and thinkers on the topic at hand to solidify your own thoughts. | |
Principle of the Learner - To enhance student engagement, an effective teacher should: | |
7) Never exhaust wholly the learner's power of attention. Stop or change activities when signs of | |
attention fatigue appear. | |
8) Adapt the length of the class exercise to the ages of the pupils: The younger the pupils the briefer the | |
lesson. | |
9) Appeal whenever possible to the interests of your learners. | |
10) Prepare beforehand thought-provoking questions. Be sure that these are not beyond the ages and | |
attainments of your learners. | |
11) Make your lesson as attractive as possible, using illustrations and all legitimate devices and | |
technologies. Do not, however, let these devices or technologies be so prominent as to become sources | |
of distraction. | |
12) Maintain in yourself enthusiastic attention to and the most genuine interest in the lesson at hand. | |
True enthusiasm is contagious. | |
Principle of the Language (n = 6) - In order to ensure a common language, an effective teacher should: | |
13) Secure from the learners as full a statement as possible of their knowledge of the subject, to learn | |
both their ideas and their mode of expressing them, and to help them correct their knowledge. | |
14) Rephrase the thought in more simple language if the learner fails to understand the meaning. | |
15) Help the students understand the meanings of the words by using illustrations. | |
16) Give the idea before the word, when it is necessary to teach a new word. | |
17) Test frequently the learner's sense of the words she/he uses to make sure they attach no incorrect | |
meaning and that they understand the true meaning. | |
18) Should not be content to have the learners listen in silence very long at a time since the acquisition | |
of language is one of the most important objects of education. Encourage them to talk freely | |
Principle of the Lesson - In order to create an effective lesson, an effective teacher should: | |
19) Find out what your students know of the subject you wish to teach to them; this is your starting | |
point. This refers not only to textbook knowledge but to all information they may possess, however | |
acquired. | |
20) Relate each lesson as much as possible with prior lessons, and with the learner's knowledge and | |
experience. | |
21) Arrange your lesson so that each step will lead naturally and easily to the next; the known leading | |
to the unknown. | |
22) Find illustrations in the most common and familiar objects suitable for the purpose. | |
23) Lead the students to find fresh illustrations from their own experience. | |
Published by Sciedu Press | |
6 | |
ISSN 1925-0746 | |
M | |
3.37 | |
3.30 | |
3.02 | |
SD | |
.348 | |
.385 | |
.640 | |
α | |
.954 | |
.667 | |
- | |
3.48 | |
.611 | |
- | |
3.23 | |
3.39 | |
3.33 | |
3.30 | |
3.52 | |
.704 | |
.560 | |
.627 | |
.561 | |
.376 | |
.754 | |
3.39 | |
.602 | |
- | |
3.43 | |
.556 | |
- | |
3.62 | |
.513 | |
- | |
3.64 | |
.530 | |
- | |
3.60 | |
.518 | |
- | |
3.45 | |
.629 | |
- | |
3.31 | |
.414 | |
.800 | |
3.27 | |
.588 | |
- | |
3.54 | |
3.37 | |
3.11 | |
.525 | |
.533 | |
.581 | |
- | |
3.30 | |
.636 | |
- | |
3.25 | |
.641 | |
- | |
3.51 | |
.415 | |
.828 | |
3.37 | |
.655 | |
- | |
3.65 | |
.503 | |
- | |
3.61 | |
.515 | |
- | |
3.46 | |
3.48 | |
.525 | |
.571 | |
- | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
Item | |
24) Urge the learners to use their own knowledge to find or explain other knowledge. Teach them that | |
knowledge is power by showing how knowledge really helps solve problems. | |
Principle of the Teaching Process - To create and effective teaching process, the effective teacher should: | |
25) Select and/or develop lessons and problems that relate to environment and needs of the learner | |
26) Excite the learner's interest in the lesson when starting the lesson, by some question or statement | |
that will awaken inquiry. Develop a hook to awaken their interest. | |
27) Place yourself frequently in the position of a learner among learners, and join in the search for | |
some fact or principle. | |
28) Repress the impatience which cannot wait for the student to explain themselves, and which takes | |
the words out of their mouth. They will resent it, and feel that they could have answered had you given | |
them sufficient time | |
29) Count it your chief duty to awaken the minds of the learners and do not rest until each learner | |
shows their mental activity by asking questions. | |
30) Repress the desire to tell all you know or think upon the lesson or subject; and if you tell | |
something to illustrate or explain, let it start a fresh question. | |
31) Give the learner time to think, after you are sure their mind is actively at work, and encourage them | |
to ask questions when puzzled. | |
32) Do not answer the questions asked too promptly, but restate them, to give them greater force and | |
breadth, and often answer with new questions to secure deeper thought. | |
33) Teach learners to ask What? Why? and How? in order to better learn the nature, cause, and method | |
of every fact, idea, or principle observed or taught them: also, Where? When? By whom? and What of | |
it? - the place, time, actors, and consequences. | |
The Principle of the Learning Process - In order to facilitate an effective learning process, the effective | |
teacher should: | |
34) Ask the learner to express, in their own words, the meaning as they understand it, and to persist | |
until they have the whole thought. | |
35) Let the reason why be perpetually asked until the learner is brought to feel that they are expected to | |
give a reason for their opinion. | |
36) Aim to make the learner an independent investigator - a student of nature, a seeker of truth. | |
Cultivate in them a fixed and constant habit of seeking accurate information. | |
37) Seek constantly to develop a profound regard for truth as something noble and enduring. | |
The Principle of Review and Application: To affirm the learning that has occurred and apply it, the | |
effective teacher should: | |
38) Have a set time for reviews. At the beginning of each lesson take a brief review of the preceding | |
lesson | |
39) Glance backward, at the close of each lesson, to review the material that has been covered. Almost | |
every good lesson closes with a summary. It is good to have the learners know that any one of them | |
may be called upon to summarize the lesson at the end of the class. | |
40) Create all new lessons to bring into review and application, the material of former lessons. | |
41) The final review, which should never be omitted, should be searching, comprehensive, and | |
masterful, grouping all parts of the subject learned as on a map, and giving the learner the feeling of a | |
familiar mastery of it all. | |
42) Seek as many applications as possible for the subject studied. Every thoughtful application | |
involves a useful and effective review. | |
43) An interesting form of review is to allow members of the class to ask questions on previous | |
lessons. | |
M | |
SD | |
α | |
3.51 | |
.570 | |
- | |
3.37 | |
3.49 | |
.407 | |
.549 | |
.858 | |
- | |
3.57 | |
.521 | |
- | |
3.42 | |
.587 | |
- | |
3.29 | |
.654 | |
- | |
3.06 | |
.766 | |
- | |
3.30 | |
.555 | |
- | |
3.48 | |
.548 | |
- | |
3.34 | |
.590 | |
- | |
3.37 | |
.599 | |
- | |
3.40 | |
.462 | |
.684 | |
3.35 | |
.674 | |
- | |
3.29 | |
.721 | |
- | |
3.58 | |
.542 | |
- | |
3.37 | |
.638 | |
- | |
3.37 | |
.348 | |
.852 | |
3.25 | |
.618 | |
- | |
3.30 | |
.619 | |
- | |
3.12 | |
.722 | |
- | |
3.28 | |
.668 | |
- | |
3.33 | |
.627 | |
- | |
3.23 | |
.533 | |
- | |
Note. N = 84. Survey questions (“guidelines”) were aggregated by subscales representing Gregory’s (1886) Seven | |
Laws (“Principles”) of Teaching. Values were calculated from 4-point Likert scale responses (1 = strongly disagree, | |
2 = disagree, 3 = agree, 4 = strongly agree). | |
3.3 Principles and Guidelines | |
The mean and standard deviation for each of the principle subscales were computed as follows: Principle of the | |
Teacher (M = 3.30, SD = 0.385), Principle of the Learner (M = 3.52, SD = 0.376), Principle of the Language (M = | |
Published by Sciedu Press | |
7 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
3.31, SD = 0.414), Principle of the Lesson (M = 3.51, SD = 0.415), Principle of the Teaching Process (M = 3.37, SD | |
= 0.407), Principle of the Learning Process (M = 3.40, SD = 0.462), and the Principle of Review and Application (M | |
= 3.37, SD = 0.348). This represents affirmation of each of the seven principles as relevant to current educational | |
settings. | |
Cronbach’s alpha for each of the subscales were as follows: Principle of the Teacher (α = 0.667), Principle of the | |
Learner (α = 0.754), Principle of the Language (α = 0.800), Principle of the Lesson (α = 0.828), Principle of the | |
Teaching Process (α = 0.858), Principle of the Learning Process (α = 0.684), and Principle of Review and | |
Application (α = 0.852). These values affirm the internal consistency of each of the subscales. | |
The mean scores of each of the 43 items (guidelines) were above 3.0. The item mean scores ranged from 3.02 to 3.65 | |
with standard deviations ranging from 0.503 to 0.766. These results reflect that each individual guideline was | |
affirmed as being relevant to current educational settings. | |
4. Discussion | |
4.1 Implications | |
In this paper, we examined the relevance of the principles (laws) presented in 1886 by John Milton Gregory in The | |
Seven Laws of Teaching. We presented evidence that these principles may indeed represent enduring Timeless | |
Principles of effective teaching that, while their application in the 21st century may look different than it did in the | |
19th century, encapsulate the necessary elements to facilitate effective learning. The results of this exploratory study | |
confirm affirmation from educators and educators-in-training of the current relevance of these principles. | |
The results of the study also affirm the perception of applicability of the guidelines—or as Gregory (1886) described | |
them, rules for teachers—for faculty members of institutions of higher education as well as prospective K-12 | |
teachers. However, neither we nor Gregory posit that the guidelines presented in the study represent a comprehensive, | |
exhaustive list of appropriate guidelines. For example, one could envision a guideline such as “Learn students’ names | |
to help them feel connected to the learning community” as an element of effective teaching. However, this statement | |
could easily be considered as a fit for the Principle of the Learner, as the feeling of being connected to the learning | |
community certainly contributes to learner engagement. It is reasonable and should be expected that other guidelines | |
for teachers would be consistent with one of the seven principles. | |
The mean score of respondents to each guideline statement was above 3.0 on a 4-point Likert scale in which a 3 | |
represented agree and a 4 represented strongly agree (lowest M = 3.02, highest M = 3.65). In addition, the mean | |
scores for the subscales representing each principle had mean scores ranging from 3.31 to 3.52, reflecting strong | |
affirmation of the current relevance of each of the Seven Timeless Principles. | |
The enduring nature of these Seven Principles may be a result of their consistency with research-based practices | |
whose impact has been shown since Gregory (1886) described his Laws for Teachers. For example, the concept of | |
cognitive load theory (Atkinson & Shiffin, 1968) is consistent with both the Principle of the Learner and the | |
Principle of the Lesson. In addition, elements of self-determination theory (Ryan & Deci, 2000) are clearly consistent | |
with the guidelines in the Principle of the Teaching Process, and spaced-retrieval practice (Karpicke & Roediger, | |
2007) easily fits within the Principle of Review, the reviewing, rethinking, re-knowing, and reproducing of the | |
learning. Eyler’s (2018) description of curiosity as one the fundamental elements of how humans learn contains | |
many elements that overlap with and are similar to the language used by Gregory to describe the Principle of the | |
Learner. In order for learning to occur, the learner must actively engage in the learning process and must demonstrate | |
curiosity toward that which is to be learned. | |
As Wilson (2014) indicated, “highly effective teachers will understand the profound differences between methods of | |
teaching and principles of teaching” (p. 3). For example, lesson plan development is a common method used in | |
teacher preparation programs to emphasize the importance of comprehensive understanding of the lesson to be taught. | |
The lesson plan includes objectives, a review of previous lessons, a summary of the content, and identification of | |
activities that will be used to facilitate the learning. These activities represent methods that are consistent with the | |
Principle of the Lesson. The teacher must have a clear understanding of what is to be learned in this class and how | |
the content to be learned builds upon previous lessons or classes. | |
Additionally, in the higher education arena, institutions and accreditation bodies have a variety of methods designed | |
to be consistent with the Principle of the Teacher. A teacher must be one who knows the lesson or truth to be taught. | |
Potential faculty are evaluated on the relevance of their degrees, research, and experiences to the classes to be taught, | |
Published by Sciedu Press | |
8 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
all of which is done in an attempt to demonstrate that the instructor knows the lesson or truth to teach. | |
There is danger in too great of a focus on methods rather than the principles. For example, the actions of some | |
accrediting bodies in higher education imply that the only way an instructor can learn about a particular content area | |
is to take courses at a university or college. However, it is easy to elicit examples of respected experts who developed | |
their expertise outside the traditional classroom. Another example can be easily observed in the developing role of | |
the digital classroom. While the methods of developing and maintaining the engagement of students are likely to be | |
quite distinct from a face-to-face classroom versus an online or hybrid classroom, the Principle of the Learner is | |
equally relevant in both settings. | |
4.2 Limitations and Future Work | |
While embracing convenience sampling and incentivizing student participation increases reliability and power | |
associated with sample size, it also influences who accepts the invitation to participate. This increases the potential | |
non-response bias of the study. Similarly, while adhering to Gregory’s (1886) language closely was a primary | |
component of identifying transferability, the structure of the instrument may increase desirability and acquiescence | |
biases. Such response biases are possible when evaluating a series of statements without embedded item controls. | |
While a highly controlled instrument was outside the scope of this work, future studies can leverage an in-depth | |
analysis of specific principles and guidelines using survey techniques designed to mitigate bias. | |
The sample size, while sufficient for the statistical purposes of the study, is not necessarily sufficient to make an | |
argument that it is representative of a national population of educators or future educators. However, we believe the | |
sample is strengthened by the diversity of academic disciplines that are represented in it. Additional replications of | |
the study with larger, more representative samples will be necessary to extrapolate the results to a larger population; | |
this will be a focus of continued research. | |
Additional efforts are needed to examine each of the Seven Timeless Principles in-depth and to provide insights into | |
the application in 21st century education. This includes more detailed research involving a larger and more diverse | |
sample, as well as the addition of mixed methods for a more comprehensive portrayal of data. Further, future efforts | |
will attempt to demonstrate that current-day teaching theories and methods, as well as modern policies and | |
regulations that are considered innovative, are founded in these Timeless Principles. In addition, there is potential to | |
create a framework for the teaching and learning process that assists teachers at all levels of education to clearly | |
associate their strategies and methods of teaching with the Timeless Principles. | |
References | |
Atkinson, R. C., & Shiffin, R. M. (1968). Human memory: A proposed system and its control processes. Psychology | |
of Learning and Motivation, 2, 89-195. https://doi.org/10.1016/S0079-7421(08)60422-3 | |
Auld, M. E., & Bishop, K. (2015). Striving for excellence in health promotion pedagogy. Pedagogy in Health | |
Promotion, 1(1), 5-7. https://doi.org/10.1177/2373379915568976 | |
Chickering, A. W., & Gamson, Z. F. (1987). Seven principles for good practice in undergraduate education. AAHE | |
Bulletin, 39(7), 3-6. Retrieved from https://aahea.org/articles/sevenprinciples1987.htm | |
Council of Chief State School Officers. (2011). InTASC model core teaching standards: A resource for state dialogue. | |
Washington, | |
DC: | |
Author. | |
Retrieved | |
from | |
https://ccsso.org/resource-library/intasc-model-core-teaching-standards | |
Eyler, J. R. (2018). How human learn: The science and stories behind effective college teaching. Morgantown, WV: | |
West Virginia Press. | |
Gregory, J. M. (1886). The seven laws of teaching. Boston, MA: Congregational Sunday-School and Publishing | |
Society. | |
Karpicke, J. D., & Roediger, H. L. III. (2007). Expanding retrieval practice promotes short-term retention, but | |
equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, | |
and Cognition, 33(4), 704-719. https://doi.org/10.1037.0278-7393.33.4.704 | |
National Education Association. (2012). Preparing 21st century students for a global society: An educator’s guide to | |
the “four Cs.” Alexandria, VA: Author. | |
Perkins, D. (2008). Making learning whole: How seven principles of teaching can transform education. San | |
Francisco, CA: Jossey-Bass. | |
Published by Sciedu Press | |
9 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
http://wje.sciedupress.com | |
World Journal of Education | |
Vol. 11, No. 3; 2021 | |
Pruitt, S. D., & Epping-Jordan, J. E. (2005). Preparing the 21st century global healthcare workforce. BMJ, 330, 637. | |
https://doi.org/10.1136/bmj.330.7492.637 | |
Rosenshine, B., & Furst, N. (1971). Research on teacher performance criteria. In B. O. Smith (Ed.), Research in | |
teacher education (pp. 37-72). Englewood Cliffs, NJ: Prentice Hill. | |
Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social | |
development, and well-being. American Psychologist, 55(1), 68-78. https://doi.org/10.1037/0003-066X.55.1.68 | |
Stephenson, L. (2014). Appendices. In J. M. Gregory, The seven laws of teaching (1st ed. reprint; pp. 129-144). | |
Moscow, ID: Canon Press. | |
Thorndike, E. L. (1906). The principles of teaching: Based on psychology. London, England: Routledge. | |
Tomlinson, C. A. (2017). How to differentiate instruction in academically diverse classrooms (3rd ed.). Alexandria, | |
VA: ASCD. | |
Walls, R. T. (1999). Psychological foundations of learning. Morgantown, WV: West Virginia University International | |
Center for Disability Information. | |
Wilson, D. (2014). Foreword: The seven disciplines of highly effective teachers. In J. M. Gregory, The seven laws of | |
teaching (1st ed. reprint; pp. 1-9). Moscow, ID: Canon Press. | |
Copyrights | |
Copyright for this article is retained by the author(s), with first publication rights granted to the journal. | |
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution | |
license (http://creativecommons.org/licenses/by/4.0/). | |
Published by Sciedu Press | |
10 | |
ISSN 1925-0746 | |
E-ISSN 1925-0754 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Developing Peer Review of Instruction | |
in an Online Master Course Model | |
John Haubrick | |
Deena Levy | |
Laura Cruz | |
The Pennsylvania State University | |
Abstract | |
In this study we looked at how participation in a peer-review process for online Statistics courses | |
utilizing a master course model at a major research university affects instructor innovation and | |
instructor presence. We used online, anonymous surveys to collect data from instructors who | |
participated in the peer-review process, and we used descriptive statistics and qualitative analysis | |
to analyze the data. Our findings indicate that space for personal pedagogical agency and | |
innovation is perceived as limited because of the master course model. However, responses | |
indicate that participating in the process was overall appreciated for the sense of community it | |
helped to build. Results of the study highlight the blurred line between formative and summative | |
assessment when using peer review of instruction, and they also suggest that innovation and | |
presence are difficult to assess through short term observation and through a modified version of | |
a tool (i.e., the Quality Matters rubric) intended for the evaluation of an online course rather than | |
the instruction of that course. The findings also suggest that we may be on the cusp of a second | |
stage for peer review in an online master course model, whether in-person or online. Our findings | |
also affirm the need for creating a sense of community online for the online teaching faculty. The | |
experiences of our faculty suggest that peer review can serve as an integral part of fostering a | |
departmental culture that leads to a host of intangible benefits including trust, reciprocity, | |
belonging, and, indeed, respect. | |
Keywords: Peer review, online teaching, teaching evaluation, master course model, statistics | |
education, instructor presence | |
Haubrick, J., Levy, D., & Cruz, L., (2021). Developing peer review of instruction in an online | |
master course model. Online Learning, 25(3), 313-328. doi:10.24059/olj.v25i3.2428 | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
313 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Peer review has a long history in academia, originating in the professional societies of the | |
early Enlightenment. The practice first arose to address the need for an evaluation/evaluative | |
metric of the quality of research in an era replete with amateur scientists. In this same context, | |
peer review also functioned as a foundation for establishing collective expertise that was not | |
dependent on the approval of an external body, whether political fiat or divine consecration. The | |
present study examines one way in which this long-standing practice of peer review has evolved | |
to embrace new professional modes (i.e., teaching), new modalities of instruction (i.e., online), | |
and new roles for instructors within the current context of higher education. | |
Literature Review | |
Peer review had long been the gold standard for academic research, but it was not until | |
the learning-centered revolution, begun in the 1970s, that the practice found application in | |
education. At first, peer review was confined largely to volunteers who were experimenting with | |
pedagogical changes stemming from recent developments in learning science research. As one | |
leading scholar writes, there was “a general sense…that teaching would benefit from the kinds of | |
collegial exchange and collaboration that faculty seek out as researchers” (Hutchings, 1996). | |
Further, contrary to the conservative bias often attributed to the peer review of research (Roy & | |
Ashburn, 2001), peer review of teaching (PRT) has increasingly proven to foster both personal | |
empowerment and teaching transformation (Chism, 2005; Hutchings, 1996; Lomas & Nicholls, | |
2005; Smith, 2014; Trautman, 2009). As one set of scholars state, “the value of formative peer | |
assessment is promoted in the exhortative literature…justified in the theoretical literature…and | |
supported by reports of experimental and qualitative research” (Kell & Annetts. 2009; Hyland et | |
al., 2018; Thomas et al., 2014). | |
Those early experiments led to dramatic breakthroughs in evidence-based practice in | |
teaching and learning and, by extension, changes in how these activities are evaluated. Since the | |
early 2000s, universities have responded to a growing imperative to assess teaching | |
effectiveness, both as a means of evaluating work performance and as a way of demonstrating | |
collective accountability for the student learning experience. An increasing number of studies | |
have linked effective instruction to desired institutional outcomes, including recruitment, | |
persistence, and graduation rates, upon the latter of which many funding models rest. Because | |
the drive towards accountability is fueled by student interests, it is perhaps not surprising that the | |
most common strategy for evaluating teaching are student evaluations of instruction (SETs). At | |
a typical U.S. university today, students are asked to complete an electronic survey at the end of | |
each semester comprised of a series of scaled survey items along with a handful of open-ended | |
questions. | |
Over the years, the use of SETs as a measure of teaching effectiveness has been both | |
affirmed and disputed (Seldin, 1993). The reliability of the practice has been strengthened | |
through increasing sophistication of both the design of the questions and the analysis of the | |
results. At the same time, however, it has also been questioned as the basis of personnel | |
decisions (Nilson, 2012; Nilson, 2013). | |
Although not definitively proven, there is a persistent perception that SETs are biased, | |
particularly in the case of faculty members from under-represented populations, including those | |
for whom English is a second language and, in some disciplines, women (Calsamiglia & | |
Loviglio, 2019; Zipser & Mincieli, 2019). Other scholars have called the validity of the results | |
into question, suggesting that students are not always capable of assessing their own learning | |
accurately or appropriately, leading to claims that SETs are more likely to measure popularity | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
314 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
rather than effectiveness (Schneider, 2013; Uttl et al., 2017). Perhaps the only safe and definitive | |
conclusion to draw is that the implications of the practice are complex and contested. | |
Higher education institutions have navigated these stormy waters in multiple ways, most | |
by encouraging the use of multiple forms of measurement for teaching effectiveness, often in the | |
form of a portfolio, or similar collection tool (Chism, 1999; Seldin et al., 2010). This practice is | |
supported by the research literature, which aligns the practice with the multi-faceted nature of | |
teaching as well as the importance of direct (e.g., not self-reported) measures of student learning. | |
To potentially counterbalance the limitations of SETs, practitioners have suggested the use of | |
PRT, which places disciplinary experts, rather than amateur students, in the driver’s seat. In this | |
evaluative mode, PRT typically takes the form of either peer review of instructional materials | |
and/or peer observation of teaching. | |
While PRT may appear to be a neat solution to a pervasive issue, the practice had | |
previously been used largely for formative purposes on a voluntary basis. The transition to | |
compulsory (or strongly encouraged) evaluative practice has proven to be fraught with dangers, | |
both philosophical and practical (Blackmore, 2005; Edström 2019; Keig, 2006; McManus, | |
2002). Practically speaking, the PRT process requires a considerable investment of time, energy, | |
and attention, not only to conducting the reviews but also to developing shared standards and | |
practices. Philosophically, several scholars have predicted that several of the primary benefits of | |
PRT as a developmental tool might suffer when transposed into a summative context (Cavanagh, | |
1996; Gosling, 2002; Kell & Annetts, 2002; Morley, 2003; Peel, 2005). It has proven to be | |
difficult to substantiate these fears, however, as one of the downsides of utilizing summative | |
assessment is the challenges it presents to research. | |
The PRT problem is confounded by the rise of new modes of instruction, especially | |
online and hybrid modalities (Bennett & Barp, 2008; Jones & Gallen, 2016). Since its inception, | |
online education has carried with it a burden of accountability that traditional in-person | |
instruction has not, and the onus rests with online instructors to prove that the virtual learning | |
experience is of comparable quality to other modalities (Esfijani, 2018: Shelton, 2011). This has, | |
in turn, led to the development and refinement of shared quality standards for online courses | |
(notably, the Quality Matters (QM) rubric), the application and evaluation of which often rely on | |
the collective expertise of other online instructors, i.e., pedagogical (rather than disciplinary) | |
peers (Shattuck et al., 2014). The QM peer-review process, for example, designates two reviewer | |
roles, a subject matter expert and online pedagogy practitioner, the latter of whom undergoes a | |
QM-administered certification process. | |
The proliferation of online courses, however, has been accompanied by design and | |
implementation changes. Because it takes time and sustained engagement to master the | |
techniques and approaches needed to meet the quality standards for online courses, the role of | |
the instructional designer (ID) as expert in these areas has become increasingly commonplace. A | |
typical role for an ID might be to collaborate closely with faculty members to design and develop | |
online courses that effectively deliver content in a manner that meets (or exceeds) quality | |
standards. Once created, it is certainly possible for the same course to be taught by multiple | |
faculty members. | |
In a typical ID-faculty scenario, the faculty member often has considerable input on the | |
design as it evolves and provides primary instruction, but peer review of instruction is | |
complicated both by the medium and the role of the third party (the ID) (Drysdale, 2019). For | |
example, the observation protocols developed for the classroom may not apply to a virtual space, | |
at least not to the same degree, and a review of instructional strategies, as reflected in artifacts | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
315 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
such as the syllabus, may be the product of both the ID and/or the faculty member. It is perhaps | |
for these reasons that peer review of online instruction has tended to focus on the course rather | |
than the instructor. The Quality Matters rubric, for example, emphasizes attributes of course | |
design rather than teaching effectiveness. Yet, the need for evaluative measures of instruction | |
and instructor persists, perhaps even more so as trends point to a growing number of adjunct | |
faculty teaching online courses for whom such measures can provide both accountability and | |
professional development. (Barnett, 2019; Taylor, 2017). | |
The challenge is further compounded by the emergence of instructional standards and/or | |
competencies for online (or hybrid) courses that are distinctive to the virtual environment, both | |
in form and context (Baran et al., 2011). The popular community of inquiry model, for example, | |
differentiates between cognitive presence (content and layout), social presence (engagement), | |
and teaching presence in online courses; all are facets of instruction that are less emphasized in | |
in-person instruction. These insights have led to the development of several exemplary protocols | |
specifically intended for reviewing online instruction (McGahan et al., 2015; Tobin et al., 2015). | |
Each of these tools are firmly grounded in an extensive body of evidence-based practice for | |
online teaching, but still, the handful of studies that have been conducted on the PRT process | |
itself have tended to be limited to case studies and/or action research (Barnard et al., 2015; | |
Swinglehurst et al., 2014; Sharma & Ling, 2018; Wood & Friedel, 2009). As one researcher put | |
it, it is simply “difficult to find quantitative evidence due to its nature and context” (Bell, 2002; | |
Peel, 2002). | |
The challenge of peer review of teaching is even further complicated by the increasing | |
use of the master course model (Hanbing & Mingzhuo, 2012; Knowles & Kalata, 2007). For | |
courses in which stakes are higher and student populations larger, such as gateway or barrier | |
courses, an institution may choose to adopt a master course model in which an already designed | |
course is provided to all instructors, thereby ensuring a consistent experience for all students | |
(Parscal & Riemer, 2010). In this scenario, instructors have little to no control over the content, | |
design, and, in many cases, delivery of the course, all of which serve as major components of | |
most peer review of instruction models, whether for online or in-person courses. However, even | |
within a master course model, instruction varies and opportunities remain to provide both | |
formative (for individual improvement) and summative (for performance evaluation) feedback. | |
Yet, the question of how to evaluate teaching within these boundaries is a subject that has | |
received less attention in both research and practice. Our study explores the implementation of a | |
peer review of teaching process for an online statistics program that uses master courses at a | |
large, public, research-intensive university. | |
Methods | |
Context | |
The Pennsylvania State University is a public research university located in the | |
northeastern part of the United States. The statistics program offers 24 online courses, with | |
approximately 1500 enrollments per semester, including those for its online graduate program | |
and two undergraduate service courses. Statistics courses have been identified as barrier courses | |
at many institutions, including this one. Therefore, the program at The Pennsylvania State | |
University bears the responsibility for high standards of instructions that contribute to student | |
success, especially persistence. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
316 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Each of the program’s 24 courses is based on a master template of objectives, content, | |
and assessments. The courses are delivered through two primary systems, the learning | |
management system (LMS) and the content management system (CMS). Each section has its | |
own unique LMS space for each iteration of the course. Students and instructors use the LMS for | |
announcements, communication/email, assessments, grading, discussion and any other | |
assignments or interactions. The lesson content for each course is delivered through a CMS, | |
which in this case has a public website whose content is classified as open educational resources | |
under a creative commons license. The CMS is unique to the course and is not personalized or | |
changed from semester to semester. Similarly, the lesson content, developed and written by | |
program faculty members, does not change from semester to semester, aside from minor fixes | |
and/or planned revisions. | |
Instructor agency in the LMS context varies depending on the course taught, how long | |
the instructor has taught it, and how many sections are offered in that semester. Instructors who | |
are teaching a course that has only one section have more agency to change appearance and | |
interactions within the LMS than instructors who are teaching a course with multiple sections. In | |
this statistics department, only one section of most of the online graduate courses is offered per | |
semester, while more than one section of undergraduate courses is typically offered. The largest | |
of these undergraduate courses is a high enrollment, general education requirement course that | |
runs 10-12 sections per semester. Courses with multiple sections use the same CMS as well as | |
the same master template in the LMS to maintain consistency in the student experience. | |
Therefore, in a single section course the instructor could modify the design of their course space | |
within the LMS by choosing their home page, setting the navigation, and organizing the modules | |
while still delivering the content and objectives as defined by the department for that course. | |
Such modifications are less likely to occur in multi-section courses. The following table | |
highlights the level of agency possessed by the instructor in both the CMS and LMS according to | |
the varied teaching contexts in this department. | |
Table 1 | |
Levels of Instructor Agency in Various Course Types Offered | |
If the instructor teaches... | |
Undergraduate, single section | |
Graduate, single section | |
Undergraduate, multiple sections | |
Graduate, multiple sections | |
Content Management System | |
(CMS) | |
Learning Management System | |
(LMS) | |
Low | |
Low | |
Low | |
Low | |
High | |
High | |
Low | |
Low | |
During the fall 2019 semester, the faculty members in the department who teach online courses | |
were comprised of full-time teaching professors (n=13), tenure-track professors (n=6), and | |
adjuncts (n=10). Peer review of instruction has been practiced since the onset of the program. In | |
its current iteration, the process takes place annually over an approximately three-week period in | |
the fall semester. The primary purpose of the peer-review process is to offer formative feedback | |
to the instructors, but the results are shared with the assistant program director and faculty | |
members are permitted (though not required) to submit the results as part of their reappointment, | |
promotion, and tenure dossiers. For the fall 2019 semester, 27 of the 29 (93%) faculty members | |
participated in the peer-review process. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
317 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Peer Review of Instruction Model | |
In the fall of 2018, the instructional designer for these statistics courses piloted a new | |
peer-review rubric, which is a modification of the well-known Quality Matters Higher Ed rubric. | |
In this modification, 21 out of 42 review standards were determined to be applicable to the | |
instructors in the master course context. The rubric serves as the centerpiece of a two-part | |
process, in keeping with identified best practices (Edkey, & Roehrich, 2013). First, the faculty | |
member completes a pre-observation survey and the reviewer, who is added to the course as an | |
instructor, evaluates the course according to each of the twenty-one standards in the rubric. The | |
observation is followed by a virtual, synchronous meeting with the peer-review partner. Faculty | |
members are paired across various teaching ranks and course levels, and the pairings are rotated | |
from year to year. Both the observation and the peer meeting are guided by materials created by | |
the instructional designer, who provides both the instructor intake form and two guiding | |
questions for discussion. | |
In keeping with evidence-based practice for online instruction, the first discussion prompt | |
addresses how the faculty establish social, cognitive, and teaching presence within their course. | |
Along with the prompt, definitions and examples of each type of presence are provided to the | |
instructor. | |
Discussion prompt 1 in the online statistics program peer-review guide: | |
Prompt #1: Share with your peer how you establish these three types of presence in your | |
course. | |
Notes: How does your peer establish these three types of presence in their course? | |
The second prompt provides an opportunity for the instructors to share changes or innovations | |
they have implemented within the past year. | |
Discussion prompt 2 in the online statistics program peer-review guide: | |
\Prompt #2: Share with your peer if you are trying anything new this semester (or year)? | |
If yes, share your innovation or change you’ve made this semester (or year). | |
• Has the innovation or change been successful? | |
• What challenges have you had to work through? | |
• How could others benefit from what you’ve learned? | |
• What advice would you share with a colleague who is interested in trying | |
this or something similar? | |
Notes: What has your peer done this semester (or year) that is innovative or new for | |
them? | |
The process seeks to evaluate and promote not only quality standards through the rubric, but also | |
collegial discussion around innovation, risk-taking, and instructor presence. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
318 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Study Design | |
The IRB-approved study was originally intended to be a mixed methods study, in which | |
input from participating instructors, collected in the form of a survey, would be supplemented | |
with an analysis of the peer-review artifacts, especially the instructor intake form and the peerreview rubric (which includes the 2 discussion prompts). The instructors provided mixed | |
responses to the requests for use of their identifiable artifacts, which limits their inclusion in the | |
study, but the majority did choose to participate in the anonymous survey (14 out of 27, 54%) | |
which was administered in the Fall semester of 2019. The online survey, sent to instructors by a | |
member of the research team not associated with the statistics department, consisted of 11 | |
questions, comprised of 1 check all that apply, 8 five-point Likert scale, 1 yes/no, and 3 openended questions. | |
Results | |
Quantitative Results | |
With the small sample size (n=13) we are limited to basic descriptive statistics to analyze | |
the results of the Likert questions. The most infrequently chosen category on the Likert scale of | |
this survey was “neither agree nor disagree” (n=10), while “somewhat agree” (n=37) was the | |
most frequently chosen. In looking at the responses to specific prompts, we note that the | |
statement with the highest score was The steps of the peer-review process were clear. For this | |
statement, 13/13 responded with somewhat agree or strongly agree (mode = “strongly agree”). | |
Consistent with our qualitative findings, the next highest scoring statement was The peer-review | |
process was collegial, where 12/13 responded with somewhat agree or strongly agree and one | |
responded as neither agree nor disagree (mode = “strongly agree”). The statement The peerreview process was beneficial to my teaching received the third highest rating with 10/13 | |
respondents saying that they somewhat agree (n=7) or strongly agree (n=3) (mode = “somewhat | |
agree”). | |
We do want to note that consistent with best survey design practice, one of the statements | |
was purposely designed as a negative statement: The peer-review process was not worth the time | |
spent on doing it. For this prompt, 8/13 responded with strongly disagree or somewhat disagree, | |
while 3/13 somewhat agreed with that statement and 2 chose neither agree nor disagree (mode = | |
“strongly disagree”). | |
Qualitative Results | |
The findings suggest that the participants operated under several constraints. When asked | |
how they assess student learning in the intake form, for example, the majority indicated that the | |
assessments are part of the master class and largely outside of their control, e.g. All… sections | |
have weekly graded discussion forums (might not be the same question), same HWs and same | |
exams. All instructors contribute for exams and HWs. Assessment of learning outcomes mainly | |
occur through these. This was evident both in the content and tone of their responses, with | |
passive voice predominating, e.g., quiz and exam questions are linked to lesson learning | |
objectives. The presence of constraint also came to the fore in the survey questions about | |
changes; for those who did make changes (6/11), these largely took the form of microinnovations (e.g., so far just little things, small modifications), tweaks primarily focused either | |
on course policies (e.g., new late policy); enhancing instructor presence (e.g., try new | |
introductions; I am using announcements more proactively) or fostering community (e.g., | |
increasing discussion board posts, add netiquette statement). | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
319 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Space for personal pedagogical agency and innovation is perceived as limited because of | |
the master course model employed in this context. This sentiment is evidenced by the tone of the | |
survey responses related to assessments, and as just discussed. On the other hand, the instructor | |
intake form shows that instructors can innovate and experiment with those course components | |
that can be characterized broadly as relating to instructor presence, particularly regarding | |
communication in the course. There is a marked shift in the tone of response when asked, for | |
example, Please describe the nature and purpose of the communications between students and | |
instructors in this course. Responses to this question show agency and active involvement on the | |
part of the instructor in this aspect of the course: | |
I post announcements regularly and am in constant communication with the class. The | |
discussion forums have a fair bit of chatter and I have replied with video and images as | |
well there with positive feedback. | |
I respond very quickly to student correspondence. I use the course announcements | |
feature very often and check Canvas multiple times a day. | |
I would like to promote the use of the Discussion Boards more, but students still do not | |
use those as much as I would like them to. | |
In this last example, we see that the instructor is forward looking and discusses changes that he | |
or she would like to make even in the future. The data suggest that instructors are trying to make | |
space for their own unique contribution to the course and for more personalized choices in their | |
interactions with students. They are also eager to get feedback from their peers on practices that | |
fall into this space of agency: | |
I would appreciate any feedback on my use of course announcements. Do you feel that | |
they are appropriate in both content, frequency, and timing? | |
Our findings indicate that many of these instructors are operating within the constraints of a | |
master course model, as discussed earlier, and they are most enthusiastic in their responses and | |
innovative in their teaching when they can identify areas over which they can exert some degree | |
of control in the course design and delivery process. | |
As evidenced in the quantitative findings previously discussed, these qualitative findings | |
also tell us that instructors who participated in the survey appreciate the collegiality of the | |
process. Their open-ended responses indicate an appreciation of the collegiality and connection, | |
the informal learning, that the peer-review process afforded them. For example, one instructor | |
comments, “I have enjoyed the opportunity to discuss teaching ideas and strategies with other | |
online faculty. As a remote faculty member, I particularly value that interaction.” Responses | |
primarily indicate that participating in the process was overall appreciated for the sense of | |
community it helped to build. What we see emerge is another space—a space where instructors | |
can negotiate together the limitations for innovation that exist in this sequence of Statistics | |
courses, and where they can also share experiences. As one participant comments, The direct | |
communication with the peer is great for sharing positive and negative experiences with different | |
courses. As we see in our findings, faculty members clearly find value in the process, regardless | |
of the product. This insight suggests the presence of a lesser known third model, distinct from | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
320 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
either formative or evaluative formats, called collaborative PRT (Gosling, 2002; Keig & | |
Waggoner, 1995). In collaborative PRT, the end goal is to capture the benefits of turning | |
teaching from a public to a more collaborative activity (Hutchings, 1996). | |
Discussion | |
Our findings should not be overstated. This study was conducted for a single program at a | |
single university over the course of one semester; as such, the results may or may not be | |
replicable elsewhere. Replication may also be hindered by the challenges inherent in studying | |
peer review as a process. Because the results of peer review in this case may be used for | |
summative or evaluative purposes, any evidence generated is considered part of a personnel file | |
and, as such, subject to higher degrees of oversight in the ethical review process. The ethical | |
review board at The Pennsylvania State University, for example, did not classify this study as | |
exempt research, but rather put the proposal through full (rather than expedited or exempt) board | |
review, and has required additional accountability measures. And the evaluative nature of those | |
documents also contributed to low faculty participation (n=3) in the first stage of our study, | |
where we asked to include copies of their peer-review documents (an intake form, review rubric, | |
and meeting notes). There is a reason why there are comparatively few studies on peer review as | |
a process. | |
In the case of the statistics program, the primary rationale for establishing a peer review | |
of teaching process was intended to be formative assessment, i.e., providing feedback to | |
instructors so that they might improve the teaching and learning in online statistics courses. In | |
practice, however, the boundaries between formative and summative assessment blurred. While | |
instructors were not required or compelled to disclose the results of their peer review, many did | |
choose to include comments and/or ratings in their formal appointment portfolios, especially | |
when the only other evidence of teaching effectiveness (a primary criterion) available are student | |
evaluations of instruction (SETs). At The Pennsylvania State University, SETs are structured so | |
that students provide feedback on both the instructor and the course, at times separately and, at | |
other times, together. In a master course model, however, instructors have limited control over | |
many components of the course, making the results of student evaluations challenging to parse | |
out and potentially misleading if treated nominally or comparatively. | |
The distinction between formative and evaluative assessment is not the only blurred line | |
that arose from this study. In this case, peer review of instruction was accomplished with a | |
modified version of a tool (the QM rubric) intended to be used for the evaluation of an online | |
course. The modification of the QM rubric took the form of removing questions or sections | |
pertaining to course components deemed to be outside the control of the master course | |
instructors. In addition to the modified QM rubric, two supplemental items—open-ended | |
questions—were added to the review process. These items focused on presence and innovation, | |
which are difficult to assess through short-term observation. Our results suggest that this strategy | |
has led to partial success, i.e., the majority (10/13) of faculty members who responded to our | |
survey strongly or somewhat agreed that the process was beneficial, but its impact on teaching | |
practice has been limited. This may be partially a result of the limited scope of the study (one | |
academic year) which may or may not be an appropriate time frame for capturing changes to | |
teaching practice, but it may also stem from limitations in the current iteration of the peer-review | |
process itself. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
321 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
If we look back over the history of peer review of instruction for online courses, a pattern | |
emerges in which first, an existing tool, developed for a different purpose or context, is | |
importuned and adapted into a new environment. This occurred, for example, when peer | |
evaluation tools designed for in-person courses were adapted to suit online courses. In the next | |
stage, the adaption process reveals limitations of the existing tool which, in turn, spur the | |
development of new instruments or processes that are specifically designed for the context in | |
which they are being used. The creation of the QM Rubric is a clear example of this latter step. | |
The findings of our study suggest that we may be on the cusp of this second stage for | |
peer review of teaching in online master courses, which constitutes a quite different teaching | |
environment than other types of courses, whether in-person or online. In the case of master | |
courses, there is a distinctive division of labor where, primarily, instructional designers work | |
with authors to develop courses, course leads manage content, and instructors serve as the | |
primary point of contact with students. It may be time to develop a new rubric (or similar tool) | |
that takes this increasingly popular configuration more into consideration. | |
Adoption of the master course model is fueled by the need for both efficiency and | |
consistency in the student learning experience, and both experience and research suggest that it | |
has been effective in serving these goals. That being said, like all models, it also has its | |
limitations. Our study suggests that one of those tradeoffs may be that the model constricts both | |
the space for and the drivers of change. Without being able to make changes to the master course | |
itself, the faculty in our study tried to find ways to make small changes, i.e., micro-improvements | |
in those areas over which they held agency. Larger or more long-term changes, on the other | |
hand, would need to come from instructional designers and program managers, who may be one | |
or even two steps removed from the direct student experience. Although instructors frequently | |
make suggestions for course improvements, large changes to courses are not frequently | |
implemented. In other words, the division of labor needed to support the master course model | |
also divides agency, and the challenge remains to find systematic ways to re-integrate that | |
agency in the service of continuous improvement. | |
The limitations on faculty agency inherent in the master course model have led some | |
institutions to further devalue the role, substituting faculty-led courses for lower-paid, lesser | |
recognized, and more easily inter-changeable instructor roles (Barnett, 2019). Such a path would | |
be at odds with the culture of The Pennsylvania State University, but it does suggest the need for | |
faculty development, i.e., for finding ways to support and treat even part-time instructors as | |
valued and recognized members of the community of teaching and learning, even in conditions | |
where they may not be able to meet in person. It could be said that our findings affirm both the | |
need for creating a sense of community online both inside and outside of the courses, for faculty | |
members who teach them. The experiences of our faculty members suggest that peer review can | |
be an integral part of departmental culture that supports faculty peer to peer engagement, leading | |
to a host of intangible benefits including trust, reciprocity, belonging, and, indeed, respect. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
322 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
References | |
Baran, E., Correia, A. P., & Thompson, A. (2011). Transforming online teaching practice: | |
Critical analysis of the literature on the roles and competencies of online teachers. Distance | |
Education, 32(3), 421-439. | |
Barnard, A., Nash, R., McEvoy, K., Shannon, S., Waters, C., Rochester, S., & Bolt, S. (2015). | |
LeaD-in: a cultural change model for peer review of teaching in higher education. Higher | |
Education Research & Development, 34(1), 30-44. | |
Barnett, D. E. (2019). Full-range leadership as a predictor of extra effort in online higher | |
education: The mediating effect of job satisfaction. Journal of Leadership Education, 18(1). | |
Bennett, S., & Barp, D. (2008). Peer observation–a case for doing it online. Teaching in Higher | |
Education, 13(5), 559-570. | |
Blackmore, J. A. (2005). A critical evaluation of peer review via teaching observation within | |
higher education. International Journal of Educational Management, 19(3), 218-232. | |
Calsamiglia, C., & Loviglio, A. (2019). Grading on a curve: When having good peers is not | |
good. Economics of Education Review, 73(C). | |
Cavanagh, R. R. (1996). Formative and summative evaluation in the faculty peer review of | |
teaching. Innovative higher education, 20(4), 235-240. | |
Chism, N. V. N. (1999). Peer review of teaching. A sourcebook. Bolton, MA: Anker. | |
Drysdale, J. (2019). The collaborative mapping model: Relationship-centered instructional | |
design for higher education. Online Learning, 23(3), 56-71. | |
Edkey, M. T. & Roehrich, H. (2013). A faculty observation model for online instructors: | |
Observing faculty members in the online classroom. Online Journal of Distance Learning | |
Administration, 16 (2). | |
http://www.westga.edu/~distance/ojdla/summer162/eskey_roehrich162.html | |
Edström, K., Levander, S., Engström, J., & Geschwind, L. (2019). Peer review of teaching merits | |
in academic career systems: A comparative study. In Research in Engineering Education | |
Symposium. | |
Esfijani, A. (2018). Measuring quality in online education: A meta-synthesis. American Journal | |
of Distance Education, 32(1), 57-73. | |
Gosling, D. (2002). Models of peer observation of teaching. Report. LTSN Generic Center. | |
https://www.researchgate.net/profile/David_Gosling/publication/267687499_Models_of_Peer_O | |
bservation_of_Teaching/links/545b64810cf249070a7955d3.pdf | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
323 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Graham, C., Cagiltay, K., Lim, B. R., Craner, J., & Duffy, T. M. (2001). Seven principles of | |
effective teaching: A practical lens for evaluating online courses. The Technology Source, 30(5), | |
50. | |
Hanbing, Y., & Mingzhuo, L. (2012). Research on master-teachers’ management model in online | |
course by integrating learning support. Journal of Distance Education, 5(10), 63-67. | |
Hutchings, P. (1996). Making teaching community property: A menu for peer collaboration and | |
peer review. AAHE Teaching Initiative. | |
Hutchings, P. (1996). The peer review of teaching: Progress, issues and prospects. Innovative | |
Higher Education, 20(4), 221-234. | |
Hyland, K. M., Dhaliwal, G., Goldberg, A. N., Chen, L. M., Land, K., & Wamsley, M. (2018). | |
Peer review of teaching: Insights from a 10-year experience. Medical Science Educator, 28(4), | |
675-681. | |
Johnson, G., Rosenberger, J., & Chow, M. (October 2014) The importance of setting the stage: | |
Maximizing the benefits of peer review of teaching. eLearn, 2014 (10). | |
https://doi.org/10.1145/2675056.2673801 | |
Jones, M. H., & Gallen, A. M. (2016). Peer observation, feedback and reflection for development | |
of practice in synchronous online teaching. Innovations in Education and Teaching | |
International, 53(6), 616-626. | |
Keig, L. (2000). Formative peer review of teaching: Attitudes of faculty at liberal arts colleges | |
toward colleague assessment. Journal of Personnel Evaluation in Education, 14(1), 67-87. | |
Keig, L. W., & Waggoner, M. D. (1995). Peer review of teaching: Improving college instruction | |
through formative assessment. Journal on Excellence in College Teaching, 6(3), 51-83. | |
Kell, C., & Annetts, S. (2009). Peer review of teaching embedded practice or policy‐holding | |
complacency?\ Innovations in Education and Teaching International, 46(1), 61-70. | |
Knowles, E., & Kalata, K. (2007). A model for enhancing online course development. Innovate: | |
Journal of Online Education, 4(2). | |
Lomas, L., & Nicholls, G. (2005). Enhancing teaching quality through peer review of teaching. | |
Quality in Higher Education, 11(2), 137-149. | |
Mayes, R. (2011, March). Themes and strategies for transformative online instruction: A review | |
of literature. In Global Learn (pp. 2121-2130). Association for the Advancement of Computing | |
in Education (AACE). | |
McGahan, S. J., Jackson, C. M., & Premer, K. (2015). Online course quality assurance: | |
Development of a quality checklist. InSight: A Journal of Scholarly Teaching, 10, 126-140. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
324 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
McManus, D. A. (2001). The two paradigms of education and the peer review of teaching. | |
Journal of Geoscience Education, 49(5), 423-434. | |
Nilson, L. B. (2012). 14: Time to raise questions about student ratings. To improve the academy, | |
31(1), 213-227. | |
Nilson, L. B. (2013). 17: Measuring student learning to document faculty teaching effectiveness. | |
To Improve the Academy, 32(1), 287-300. | |
Nogueira, I. C., Gonçalves, D., & Silva, C. V. (2016). Inducing supervision practices among | |
peers in a community of practice. Journal for Educators, Teachers and Trainers, 7, 108-119. | |
Parscal, T., & Riemer, D. (2010). Assuring quality in large-scale online course development. | |
Online Journal of Distance Learning Administration, 13(2). | |
Peel, D. (2005). Peer observation as a transformatory tool? Teaching in Higher Education, 10(4), | |
489 - 504. | |
Roy, R., & Ashburn, J. R. (2001). The perils of peer review. Nature, 414(6862), 393-394. | |
Schneider, G. (2013, March). Student evaluations, grade inflation and pluralistic teaching: | |
Moving from customer satisfaction to student learning and critical thinking. In Forum for Social | |
Economics 42(1),122-135. | |
Seldin, P. (1993). The use and abuse of student ratings of professors. Chronicle of Higher | |
Education, 39(46), A40-A40. | |
Seldin, P., Miller, J. E., & Seldin, C. A. (2010). The teaching portfolio: A practical guide to | |
improved performance and promotion/tenure decisions. John Wiley & Sons. | |
Sharma, M., & Ling, A. (2018). Peer review of teaching: What features matter? A case study | |
within STEM faculties. Innovations in Education and Teaching International, 55(2), 190200.ms: a comparative study. In Research in Engineering Education Symposium. | |
Shattuck, K., Zimmerman, W. A., & Adair, D. (2014). Continuous improvement of the QM | |
Rubric and review processes: Scholarship of integration and application. Internet Learning | |
Journal, 3(1). | |
Shelton, K. (2011). A review of paradigms for evaluating the quality of online education | |
programs. Online Journal of Distance Learning Administration,4(1), 1-11. | |
Smith, S. L. (2014). Peer collaboration: Improving teaching through comprehensive peer review. | |
To Improve the Academy, 33(1), 94-112. | |
Swinglehurst, D., Russell, J., & Greenhalgh, T. (2008). Peer observation of teaching in the online | |
environment: an action research approach. Journal of Computer Assisted Learning, 24, 383-393. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
325 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Taylor, A. H. (2017). Intrinsic and extrinsic motivators that attract and retain part-time online | |
teaching faculty at Penn State (Doctoral dissertation, The Pennsylvania State University). | |
Thomas, S., Chie, Q. T., Abraham, M., Jalarajan Raj, S., & Beh, L. S. (2014). A qualitative | |
review of literature on peer review of teaching in higher education: An application of the SWOT | |
framework. Review of Educational Research, 84(1), 112-159. | |
Tobin, T. J., Mandernach, B. J., & Taylor, A. H. (2015). Evaluating online teaching: | |
Implementing best practices. San Francisco, CA: John Wiley & Sons. | |
Trautmann, N. M. (2009). Designing peer review for pedagogical success. Journal of College | |
Science Teaching, 38(4). | |
Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty’s teaching | |
effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies | |
in Educational Evaluation, 54, 22-42. | |
Wood, D., & Friedel, M. (2009). Peer review of online learning and teaching: Harnessing | |
collective intelligence to address emerging challenges. Australasian Journal of Educational | |
Technology, 25(1). | |
Zipser, N., & Mincieli, L. (2018). Administrative and structural changes in student evaluations of | |
teaching and their effects on overall instructor scores. Assessment & Evaluation in Higher | |
Education, 43(6), 995-1008. | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
326 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Appendix A | |
Anonymous Survey Questions | |
Likert Questions [1-8] | |
Answer Options | |
Strongly disagree | |
Somewhat disagree | |
Neither agree nor disagree | |
Somewhat agree | |
Strongly agree | |
1. | |
The peer-review process was beneficial to my teaching. | |
2. | |
The peer-review process was beneficial to my career development. | |
3. | |
The peer-review process was not worth the time spent on doing it. | |
4. | |
The peer-review process was collegial. | |
5. | |
The peer-review process provided me with new insight into my teaching practice. | |
6. | |
The peer-review process inspired me to try new things related to my teaching. | |
7. | |
The steps of the peer-review process were clear. | |
8. | |
I have little to no prior experience with peer review of online teaching. | |
Open-ended Questions [9-11] | |
9. | |
Did you make (or do you plan to make) changes to your instruction based on your participation in this peerreview process (e.g. feedback you received, conversations with your peers, rubrics, etc...)? | |
Y/N | |
a. | |
If Y, please describe the change(s) you plan to make to your instruction based on the feedback you | |
received through the peer-review process. | |
10. Please describe at least two insights gained from your participation in the peer-review process. | |
11. What changes, if any, would you suggest should be made to enhance the benefits of the peer-review | |
process? | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
327 | |
Developing Peer Review of Instruction in an Online Master Course Model | |
Appendix B | |
Instructor Intake Form Questions | |
Your information | |
1. What is your name? | |
2. What is your e-mail address? | |
3. Who is your assigned peer reviewer? | |
Your Online Course | |
4. What is your course name, number & section (e.g., STAT 500 001)? | |
5. What is the title of your course (e.g., Applied Statistics)? | |
6. What is the Canvas link to your course? | |
7. What is the link to the online notes in your course? | |
Context | |
8. How many semesters have you taught this course? Choose: (0-3) (4-6) (6 or more) | |
9. Does your course have multiple sections? | |
10. If yes, are all sections based on a single master (or another instructor’s) course? | |
11. If yes, roughly what percentage of the course do you change or personalize from the master? | |
12. How do you know if students are meeting the learning outcomes of your course? | |
13. Is there a specific part of the course content or design for which you would like the reviewer to | |
provide feedback? | |
14. Please describe the nature and purpose of the communications between students and instructors in | |
this course. | |
15. Are you trying anything new this semester based on prior student or peer feedback, professional | |
development, or your own experiences? | |
16. If yes, please explain. | |
Canvas Communication | |
17. Please identify other communications among students and instructors about which the | |
Reviewer should be aware, but which are not available for review at the sites listed above. | |
18. Does the course require any synchronous activities (same time, same place)? | |
___Yes | |
___No | |
19. If yes, please describe: | |
20. Is there any other information you would like to share with your peer before they review your | |
course? | |
Online Learning Journal – Volume 25 Issue 3 – September 2021 | |
328 | |
feedback | |
OPEN ACCESS | |
This is the English version. | |
The German version starts at p. 8. | |
article | |
Does peer feedback for teaching GPs improve student | |
evaluation of general practice attachments? A pre-post | |
analysis | |
Abstract | |
Objectives: The extent of university teaching in general practice is increasing and is in part realised with attachments in resident general | |
practices. The selection and quality management of these teaching | |
practices pose challenges for general practice institutes; appropriate | |
instruments are required. The question of the present study is whether | |
the student evaluation of an attachment in previously poorly evaluated | |
practices improves after teaching physicians have received feedback | |
from a colleague. | |
Methods: Students in study years 1, 2, 3 and 5 evaluated their experiences in general practice attachments with two 4-point items (professional competence and recommendation for other students). Particularly | |
poorly evaluated teaching practices were identified. A practising physician with experience in teaching and research conducted a personal | |
feedback of the evaluation results with these (peer feedback), mainly | |
in the form of individual discussions in the practice (peer visit). After | |
this intervention, further attachments took place in these practices. The | |
influence of the intervention (pre/post) on student evaluations was | |
calculated in generalised estimating equations (cluster variable practice). | |
Results: Of 264 teaching practices, 83 had a suboptimal rating. Of | |
these, 27 practices with particularly negative ratings were selected for | |
the intervention, of which 24 got the intervention so far. There were no | |
post-evaluations for 5 of these practices, so that data from 19 practices | |
(n=9 male teaching physicians, n=10 female teaching physicians) were | |
included in the present evaluation. The evaluations of these practices | |
were significantly more positive after the intervention (by n=78 students) | |
than before (by n=82 students): odds ratio 1.20 (95% confidence interval 1.10-1.31; p<.001). | |
Conclusion: The results suggest that university institutes of general | |
practice can improve student evaluation of their teaching practices via | |
individual collegial feedback. | |
Michael Pentzek1 | |
Stefan Wilm1 | |
Elisabeth | |
Gummersbach1 | |
1 Heinrich Heine University | |
Düsseldorf, Medical Faculty, | |
Centre for Health and Society | |
(chs), Institute of General | |
Practice (ifam) , Düsseldorf, | |
Germany | |
Keywords: general practice, teacher training, feedback, medical | |
students, undergraduate medical education, evaluation | |
Introduction | |
The German “Master Plan Medical Studies 2020” | |
provides for a strengthening of the role of general practice | |
in the curriculum [1]. One form of implementation desired | |
by students and teachers is attachments in practices | |
early and continuously in the course of studies [2]. Beyond | |
pure learning effects, experiences that students make in | |
these attachments can help shape a professional orientation. Good experiences in attachments can increase interest in general practice as a discipline and profession | |
[3], [4]. | |
In accordance with the Medical Licensing Regulations | |
[https://www.gesetze-im-internet.de/_appro_2002/ | |
BJNR240500002.html], students in the Düsseldorf | |
medical curriculum complete an attachment in general | |
practices lasting a total of six weeks in the academic | |
years 1, 2, 3 and 5 [https://www.medizinstudium.hhu.de]. | |
The requirements of the attachments build on each other | |
in terms of content; initially the focus is on anamnesis | |
and physical examination, later more complex medical | |
contexts and considerations for further diagnostics and | |
therapy are added. Under the supervision of the resident | |
teaching general practitioners (GP), the students can gain | |
experience in doctor-patient interaction. An important | |
and therefore repeatedly emphasised factor for a positive | |
student perception of the attachments is the fact that | |
the students are given the opportunity to work independently with patients during the attachment in order to be | |
able to directly experience themselves in the provider | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
1/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
role [2], [5]. The attitude and qualifications of the teaching | |
physicians continue to play an important role in the didactic success of the attachments [3]. About 2/3 of the | |
teaching practices are positively evaluated by the students, but about 1/3 are not. Due to the increasing demand for attachments in general practices since the installation of the new curriculum, many teaching practices | |
have been newly recruited; a feedback culture is now | |
being established. A first step was the possibility for | |
teaching practices to actively request their written evaluation results, but this was almost never taken up. The | |
next step of establishing a feedback strategy is reported | |
here: One way to improve teaching performance is to receive feedback from an experienced colleague (peer | |
feedback) [6]. This can generate insights that student | |
evaluations alone cannot achieve and is increasingly recognised as a complement to student feedback. In personal peer feedback, ideas can be exchanged, problems | |
discussed, strategies identified and concrete approaches | |
to improvement found [7]. Potential effects include increased awareness and focus of the teaching physician | |
on the teaching situation in practice, more information | |
about what constitutes good teaching, motivation to be | |
more interactive and student-centred, and inspiration to | |
use new teaching methods [8]. Pedram et al. found positive effects on teacher behaviour after peer feedback, | |
especially in terms of shaping the learning atmosphere | |
and interest in student understanding [9]. The application | |
of peer feedback to the setting described here has not | |
yet been investigated. The research question of the | |
present study is whether the student attachment evaluation of previously poorly rated GPs improves after peer | |
feedback has been conducted. | |
Methods | |
Teaching practices | |
The data were collected during the 4 attachments in | |
GP practices [https://www.uniklinik-duesseldorf.de/ | |
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/lehre], all of which take place in | |
teaching practices coordinated by the Institute of General | |
Practice. Before starting their teaching practice, all | |
teaching GPs are informed verbally and in writing about | |
the collection of student evaluations and a personal interview with an institute staff member in case of poor evaluation results. | |
Interested doctors take part in a 2-3 hour information | |
session led by the institute director (SW) before taking | |
up a teaching GP position, in which they are first informed | |
about the prerequisites for teaching students in their | |
practices. These include, among other things, the planning | |
of time resources for supervising students in the attachments, enthusiasm for working as a GP, acceptance of | |
the university’s teaching objectives in general practice | |
(in particular that interns are allowed to work independently with patients) and participation in at least two of | |
the eight didactic trainings offered annually by the institute (with the commencement of the teaching activity, | |
the institute assumes the acceptance of these prerequisites on the part of the teaching physician, but does not | |
formally check that they are met). This is followed by detailed information on the structure of the curriculum, the | |
position of the attachments, the contents and requirements of the individual attachments and basic didactic | |
aspects of 1:1 teaching. Information about the student | |
evaluation of the attachment is provided verbally and in | |
writing, combined with the offer to actively request both | |
an overall evaluation and the individual evaluation by email. There is no unsolicited feedback of the evaluation | |
results to the practices. After the information event, a | |
folder with corresponding written information is handed | |
out. | |
Before each attachment, the teaching physicians are sent | |
detailed material so that they can orient themselves once | |
again. This contains information on the exact course of | |
the attachment, on the current learning status of the | |
students incl. enclosure of or reference to the underlying | |
didactic materials, on the tasks to be worked on during | |
the attachment and the associated learning objectives, | |
on the relevance of practising on patients as well as a | |
note on the attitude of wanting to convey a positive image | |
of the GP profession to the students. | |
In addition, each student receives a cover letter to the | |
teaching physician in which the most important points | |
mentioned above are summarised once again. | |
Evaluation | |
Student evaluation as a regular element of teaching | |
evaluation [https://www.medizin.hhu.de/studium-undlehre/lehre] was carried out by independent student | |
groups before and after the intervention. It consisted, | |
among other things, of the opportunity for free-text comments, an indication of the number of patients personally | |
examined and the items “How satisfied were you with the | |
professional supervision by your teaching physician?” | |
and “Would you recommend this teaching practice to | |
other fellow students?” (both with a positively ascending | |
4-point scale). | |
Selection of practices for the intervention | |
Since most practices received a very good evaluation | |
(skewed distribution), three groups were identified as | |
follows. From all the institute’s teaching practices involved | |
in the attachments, those were first selected that had a | |
lower than very good evaluation (=“suboptimal”): rated | |
<2 at least once on at least one of the two above-mentioned items or repeatedly received negative free text | |
comments. From this group of suboptimal (=less than | |
very good) practices, those with more than two available | |
student evaluations, continued teaching and particularly | |
negative evaluations were selected: at least twice with | |
<2 on at least one of the two items or repeated negative | |
free text comments. Of the 27 practices, 24 practices | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
2/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
(88.9%) have so far received an intervention to improve | |
their teaching from a peer (n=3 not yet due to the pandemic), and 19 practices (70.4%) provided evaluation | |
results from post-intervention attachments (n=5 had no | |
attachments after the intervention). To characterise the | |
three groups of very well, suboptimal and poorly evaluated | |
(=selected) practices, an analysis of variance including | |
post-hoc Scheffé tests was calculated with the factor | |
group and the dependent variable evaluation result. | |
Intervention | |
The free texts in the student evaluations as well as the | |
teacher comments in the peer visits and group discussions were processed qualitatively using content analysis | |
in order to outline the underlying problems and the | |
teacher reactions to the feedback in addition to the pure | |
numbers. For this purpose, inductive category development was carried out on the material [11]. The numbers | |
of negative student comments before and after the intervention were also compared quantitatively. | |
Results | |
Peer feedback was implemented as part of the | |
didactic concept in particularly negatively evaluated | |
teaching practices [https://www.uniklinik-duesseldorf.de/ | |
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/didaktik-fortbildungen]: A GP staff | |
member of the Institute of General Practice (EG) known | |
to the teaching physicians and experienced in practice | |
and teaching reported back to the teaching physicians | |
their student evaluations. The primary mode was a personal visit to the practice (peer visit) [10]. For organisational reasons, group discussions with several teaching | |
physicians and written feedback occasionally had to be | |
offered as alternative solutions. Peer visits and group | |
discussions were both aimed at reflecting on one's own | |
teaching motivation and problems. This was followed by | |
a discussion of the personal evaluation in order to enter | |
into a constructive exchange between teaching GP and | |
university with regard to teaching and dealing with students in the practice. Peer visits and group discussions | |
were recorded. The opening question was “Why are you | |
a teaching doctor?”, followed by questions about personal | |
experiences: “Can you tell me about your experiences? | |
What motivates you to be a teaching physician? Are there | |
any problems from your point of view?” Then the (bad) | |
feedback was addressed and discussed, followed by the | |
question “What can we do to support you?”. The written | |
feedback consisted of an uncommented feedback of the | |
student evaluation results (scores and free texts). | |
Analyses | |
Due to a strong correlation of the two evaluation items | |
(Spearman's rho=0.79), these were averaged into an | |
overall evaluation for the present analyses. In order to | |
determine multivariable influences on this student evaluation, a generalised estimating equation (GEE) was | |
calculated with the cluster variable “practice”, due to the | |
lack of a normal distribution (Kolmogorov-Smirnov test | |
p<.001) with gamma distribution and log linkage. The | |
following were included as potential influence variables: | |
Intervention effect (pre/post), intervention mode (peer | |
visit vs. group/written), time of attachment (study year), | |
number of patients seen in person per week. In parallel | |
to this analysis, the intervention effect on the number of | |
personally supervised patients was examined in a second | |
GEE. | |
Teaching practices and pre-evaluations | |
264 teaching practices with a total of 1648 attachments | |
were involved. Of these, 181 practices (68.6%) with 1036 | |
attachments were rated very good (student evaluation | |
mean 3.8±standard deviation 0.2), 56 practices (21.2%) | |
with 453 attachments were rated suboptimal (3.3±0.4) | |
and 27 practices (10.2%) with 159 attachments were | |
rated very poor (2.8±0.4). The overall comparison of the | |
three | |
groups | |
shows | |
significant | |
differences | |
(F(df=2)=205.1; p<.001), with significant differences in | |
all post-hoc comparisons (all p<.001): very good vs. suboptimal (mean difference 0.51; standard error 0.04); very | |
good vs. poor (1.09; 0.06); suboptimal vs. poor (0.58; | |
0.07). | |
Table 1 describes the analysis sample of n=19 out of the | |
27 poorly rated practices in more detail. | |
Reasons for a poor evaluation according to free texts of | |
the student evaluation can be presented in five categories. For example, the lack of opportunity to practise on | |
patients was criticised. | |
“Unfortunately, I did not have the opportunity to examine many patients myself during my last patient attachment, although I requested this on several occasions.” | |
(about practice ID 1) | |
There were also comments about lack of appreciation | |
and difficult communication: | |
“The teaching doctor has little patience especially | |
with foreign patients who cannot understand anatomical or medical terms. She makes insulting and ironic | |
statements. With some patients I was left alone for | |
30 minutes while with others only 2 and afterwards | |
she got annoyed when I was not done with the examination/anamnesis.” (about practice ID 14) | |
Some teaching physicians were commented on with regard to their didactic competence: | |
“[…] as a teaching doctor, I experienced him as little | |
to not at all competent and also very disinterested. | |
He had no idea of what PA1 [Patient Attachment 1] | |
was supposed to teach us and even after several approaches to him on my part, he understood little of | |
what I was about or what I was supposed to learn | |
there.” (about practice ID 22) | |
Practice procedures and structures were mentioned | |
which, according to the students, made it difficult to carry | |
out the attachment efficiently: | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
3/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
Table 1: Characteristics of the analysis sample | |
“From 8-11 am only patients come for blood collection, fixed appointments are not scheduled during | |
that time. As I was not allowed to take blood or vaccinate, there was nothing for me to do during that | |
time.” (about practice ID 10) | |
In some practices with primarily non-German-speaking | |
patients and also staff (incl. teaching physician), the | |
language barrier turned out to be a problem in the evaluations. | |
“As the teaching doctor is [nationality XY], about 70% | |
of the consultations were in [language XY].” (about | |
practice ID 2) | |
Intervention | |
In the protocols of the peer visits and group discussions | |
with the teaching physicians, four categories of problems | |
emerge, which partly mirror the student comments mentioned above: For example, the teaching physicians reported concerns about letting students work alone with patients (the following are quotes from the protocols of the | |
intervening peer doctor.) | |
“He finds it difficult to leave students alone. [...] He | |
thinks the patients don’t like it that way, although his | |
experience is actually different. Also has many patients from management. “Students are also too short | |
in practice.”” (reg. ID 17) | |
A sceptical attitude towards lower semester students in | |
particular was also expressed. | |
“Can’t do anything with the 2nd semesters, “they can’t | |
do anything, there's no point in letting them listen to | |
the heart if they don't know the clinical pictures”. [...] | |
“The problem is also that they are always very young | |
girls now.”” (reg. ID 24) | |
Some teaching physicians were not familiar with the didactic concepts and materials of the practical courses. | |
“He has no knowledge of teaching, doesn't read | |
through anything. Doesn’t know he is being evaluated | |
either.” (reg. ID 6) | |
In some cases, a self-image as a teaching general practitioner leads to the definition of one’s own attachment | |
content, neglecting or devaluing the learning objectives | |
set by the university. | |
““I’ve made a commitment to general practice and I | |
want to pass that on”. Explains a lot to students, but | |
doesn’t let them do much. “I show young people the | |
right way. Nobody else does it (the university certainly | |
doesn’t), so I do it.”” (reg. ID 4) | |
“However, clearly wants to show the students | |
everything, repeatedly mentions ultrasound, blood | |
sampling, does not know teaching content, makes | |
his own teaching content: “I show them everything of | |
interest””. (reg. ID 22) | |
At several points, the teaching physicians expressed intentions to change their behaviour, e.g. according to the | |
minutes, “wants to guide students more to examination” | |
or “says he wants to read through the handouts in future”. | |
The majority of the teaching physicians showed a basic | |
interest and commitment in supervising the students. | |
Most were able to reflect on the points of criticism. | |
Pre-post analysis | |
The intervention effect on the student evaluation is significant and independent of the (also significant) influence | |
of the number of patients (see table 2). | |
The intervention effect on the number of patients personally cared for by students also persisted in a GEE (odds | |
ratio 1.41; 95% confidence interval 1.21-1.64; p<.001), | |
regardless of the type of intervention and study year | |
(analysis not shown). | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
4/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
Table 2: Multivariable influences on the dependent variable “student evaluation of GP attachment” (generalised estimating | |
equation (GEE) with cluster variable practice) | |
Table 3: Number of students’ comments on attachments in 19 poorly evaluated GP teaching practices | |
The proportion of critical comments in the student freetext comments decreases overall and in four of the five | |
categories mentioned (see table 3). | |
Discussion | |
In a pre-post comparison of poorly evaluated teaching | |
physicians who supervised students in the context of GP | |
attachments, peer feedback by a general practitioner had | |
a positive effect on student evaluation and on the number | |
of patients personally examined by students during the | |
attachment. This is reflected in the evaluation scores and | |
also in the fact that corresponding negative free-text | |
comments by the students were less frequent after the | |
intervention. | |
In line with the literature, it was crucial for student evaluation that students were given the opportunity to work | |
independently with patients in order to experience | |
themselves directly in the provider role [2], [5]. Also independent of the number of patients, student evaluation | |
improved after the intervention: The qualitative results | |
provide evidence that the teaching physicians may have | |
been more closely engaged with the meaning of the at- | |
tachments, the learning objectives and didactic materials | |
after the intervention. This in turn also seemed to have | |
had positive effects on the exchange and relationship | |
between the teaching physician and the student (possibly | |
in the sense of an alignment of mutual expectations) also important elements of a positive attachment experience [3], [12]. The qualitative results on didactic competence and attitude indicate that, at least for the small | |
group of previously poorly evaluated teaching physicians | |
studied here, a more intensive consideration of their | |
teaching assignment and repeated interaction between | |
the university and the teaching practice is required in | |
order to internalise contents and concepts and to implement them in the attachments for students in a recognisable and consistent manner. The fact that it is precisely | |
the poorly evaluated teaching physicians who tend to | |
rarely attend the meetings at the university (offered eight | |
times a year in Düsseldorf) is an experience also reported | |
by many other locations. The formal review of the prerequisites and criteria for an appropriate teaching GP | |
position would involve an enormous amount of effort | |
given the high number of teaching practices required – | |
especially in a curriculum constructed along the lines of | |
longitudinal general practice. However, it must be weighed | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
5/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
up whether more resources should be invested in the | |
selection and qualification of practices interested in | |
teaching or in quality control and training of practices | |
already teaching. | |
A strength of this study is the evaluations by independent | |
student groups pre-post, so that biases due to repeated | |
exposure of students to a practice (e.g. response shift | |
bias, habituation, observer drift) are excluded. The | |
weakness associated with the pre-post design without a | |
control group and the focus on poorly evaluated practices | |
is, among other things, the phenomenon of regression | |
to the mean, which presumably accounts for part of the | |
positive intervention effect. The primary research question | |
of this study is formulated and answered quantitatively; | |
we report only limited qualitative results. These allow only | |
partial hypothesis-generating insights into the exact | |
mechanisms of peer feedback [13]. In the present study, | |
several modes of mediation of a peer feedback were | |
realised. Since the analyses do not indicate different effects of the personnel and time-intensive peer visit on | |
the one hand and the more efficient methods of group | |
discussion and written feedback on the other, further | |
studies are necessary to differentiate before a broader | |
implementation. For example, Rüsseler et al. [14] found | |
that written peer feedback – albeit in relation to lecturers | |
– had positive effects on the design of the course. | |
3. | |
Grunewald D, Pilic L, Bödecker AW, Robertz J, Althaus A. Die | |
praktische Ausbildung des medizinischen Nachwuchses Identifizierung von Lehrpraxen-Charakteristika in der | |
Allgemeinmedizin. Gesundheitswesen. 2020;82(07):601-606. | |
DOI: 10.1055/a-0894-4556 | |
4. | |
Böhme K, Sachs P, Niebling W, Kotterer A, Maun A. Macht das | |
Blockpraktikum Allgemeinmedizin Lust auf den Hausarztberuf? | |
Z Allg Med. 2016;92(5):220-225. DOI: | |
10.3238/zfa.2016.0220–0225 | |
5. | |
Gündling PW. Lernziele im Blockpraktikum Allgemeinmedizin Vergleich der Präferenzen von Studierenden und Lehrärzten. Z | |
Allg Med. 2008;84:218-222. DOI: 10.1055/s-2008-1073148 | |
6. | |
Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, | |
Prideaux D. A systematic review of faculty development initiatives | |
designed to improve teaching effectiveness in medical education: | |
BEME Guide No. 8. Med Teach. 2006;28(6):497-526. DOI: | |
10.1080/01421590600902976 | |
7. | |
Garcia I, James RW, Bischof P, Baroffio A. Self-Observation and | |
Peer Feedback as a Faculty Development Approach for ProblemBased Learning Tutors: A Program Evaluation. Teach Learn Med. | |
2017;29(3):313-325. DOI: 10.1080/10401334.2017.1279056 | |
8. | |
Gusic M, Hageman H, Zenni E. Peer review: a tool to enhance | |
clinical teaching. Clin Teach. 2013;10(5):287-290. DOI: | |
10.1111/tct.12039 | |
9. | |
Pedram K, Brooks MN, Marcelo C, Kurbanova N, Paletta-Hobbs | |
L, Garber AM, Wong A, Qayyum R. Peer Observations: Enhancing | |
Bedside Clinical Teaching Behaviors. Cureus. 2020;12(2):e7076. | |
DOI: 10.7759/cureus.7076 | |
10. | |
O'Brien MA, Rogers S, Jamtvedt G, Oxman AD, Odgaard-Jensen | |
J, Kristoffersen DT, Forsetlund L, Bainbridge D, Freemantle N, | |
Davis DA, Haynes RB, Harvey EL. Educational outreach visits: | |
effects on professional practice and health care outcomes. | |
Cochrane Database Syst Rev. 2007;2007(4):CD000409. DOI: | |
10.1002/14651858.CD000409.pub2 | |
11. | |
Kruse J. Qualitative Interviewforschung. 2. Aufl. Weinheim: Beltz | |
Juventa; 2015. | |
12. | |
Koné I, Paulitsch MA, Ravens-Taeuber G. Blockpraktikum | |
Allgemeinmedizin: Welche Erfahrungen sind für Studierende | |
relevant? Z Allg Med. 2016;92(9):357-362. DOI: | |
10.3238/zfa.2016.0357-0362 | |
13. | |
Raski B, Böhm M, Schneider M, Rotthoff T. Influence of the | |
personality factors rigidity and uncertainty tolerance on peerfeedback. In: 5th International Conference for Research in | |
Medical Education (RIME 2017), 15.-17. March 2017, | |
Düsseldorf, Germany. Düsseldorf: German Medical Science GMS | |
Publishing House; 2017. P15. DOI: 10.3205/17rime46 | |
14. | |
Ruesseler M, Kalozoumi-Paizi F, Schill A, Knobe M, Byhahn C, | |
Müller MP, Marzi I, Walcher F. Impact of peer feedback on the | |
performance of lecturers in emergency medicine: a prospective | |
observational study. Scand J Trauma Resusc Emerg Med. | |
2014;22:71. DOI: 10.1186/s13049-014-0071-1 | |
15. | |
Huenges B, Gulich M, Böhme K, Fehr F, Streitlein-Böhme I, | |
Rüttermann V, Baum E, Niebling WB, Rusche H. | |
Recommendations for Undergraduate Training in the Primary | |
Care Sector - Position Paper of the GMA-Primary Care Committee. | |
GMS Z Med Ausbild. 2014;31(4):Doc35. DOI: | |
10.3205/zma000927 | |
16. | |
Böhme K, Streitlein-Böhme I, Baum E, Vollmar HC, Gulich M, | |
Ehrhardt M, Fehr F, Huenges B, Woestmann B, Jendyk R. Didactic | |
qualification of teaching staff in primary care medicine - a position | |
paper of the Primary Care Committee of the Society for Medical | |
Education. GMS J Med Educ. 2020;37(5):Doc53. DOI: | |
10.3205/zma001346 | |
Conclusions | |
It makes sense to further consider the effects of teaching | |
physician feedback in both research and teaching. The | |
comprehensive GMA recommendations provide a robust | |
framework for teaching [15] and the didactic qualification | |
of teaching physicians [16]. Embedded in this, collegial | |
peer feedback for poorly rated teaching physicians represents a possible tool for quality management of general | |
practice teaching. | |
Competing interests | |
The authors declare that they have no competing interests. | |
References | |
1. | |
Bundesministerium für Bildung und Forschung. Masterplan | |
Medizinstudium 2020. Berlin: Bundesministerium für Bildung | |
und Forschung; 2017. Zugänglich unter/available from: https:/ | |
/www.bmbf.de/files/2017-03-31_Masterplan% | |
20Beschlusstext.pdf | |
2. | |
Wiesemann A, Engeser P, Barlet J, Müller-Bühl U, Szecsenyi J. | |
Was denken Heidelberger Studierende und Lehrärzte über | |
frühzeitige Patientenkontakte und Aufgaben in der | |
Hausarztpraxis? Gesundheitswesen. 2003;65(10):572-578. DOI: | |
10.1055/s-2003-42999 | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
6/14 | |
Pentzek et al.: Does peer feedback for teaching GPs improve student ... | |
Corresponding author: | |
PD Dr. rer. nat Michael Pentzek | |
Heinrich Heine University Düsseldorf, Medical Faculty, | |
Centre for Health and Society (chs), Institute of General | |
Practice (ifam) , Moorenstr. 5, Building 17.11, D-40225 | |
Düsseldorf, Germany, Phone: +49 (0)211/81-16818 | |
[email protected] | |
Please cite as | |
Pentzek M, Wilm S, Gummersbach E. Does peer feedback for teaching | |
GPs improve student evaluation of general practice attachments? A | |
pre-post analysis. GMS J Med Educ. 2021;38(7):Doc122. | |
DOI: 10.3205/zma001518, URN: urn:nbn:de:0183-zma0015182 | |
This article is freely available from | |
https://www.egms.de/en/journals/zma/2021-38/zma001518.shtml | |
Received: 2021-03-03 | |
Revised: 2021-08-12 | |
Accepted: 2021-08-17 | |
Published: 2021-11-15 | |
Copyright | |
©2021 Pentzek et al. This is an Open Access article distributed under | |
the terms of the Creative Commons Attribution 4.0 License. See license | |
information at http://creativecommons.org/licenses/by/4.0/. | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
7/14 | |
Feedback | |
OPEN ACCESS | |
This is the German version. | |
The English version starts at p. 1. | |
Artikel | |
Verbessert Peer-Feedback für Lehrärzte die studentische | |
Bewertung von Hausarztpraktika? Ein Prä-Post-Vergleich | |
Zusammenfassung | |
Zielsetzung: Die allgemeinmedizinische Lehre an den Universitäten | |
nimmt zu und wird u.a. mit Praktika bei niedergelassenen Hausärzten | |
realisiert. Auswahl und Qualitätsmanagement dieser Lehrpraxen stellen | |
die allgemeinmedizinischen Institute vor Herausforderungen; entsprechende Instrumente sind gefragt. Die Fragestellung der vorliegenden | |
Studie lautet, ob sich die studentische Bewertung eines Praktikums in | |
bislang schlecht evaluierten Hausarztpraxen verbessert, nachdem die | |
hausärztlichen Lehrärzte eine Rückmeldung durch eine Kollegin erhalten | |
haben. | |
Methodik: Studierende der Studienjahre 1, 2, 3 und 5 bewerteten ihre | |
Erfahrungen in hausärztlichen Praktika mit zwei 4-stufigen Items | |
(fachliche Betreuung und Empfehlung für andere Kommilitonen). Besonders schlecht evaluierte Lehrpraxen wurden identifiziert. Eine | |
praktisch tätige und lehr-erfahrene Hausärztin und wissenschaftliche | |
Mitarbeiterin führte mit diesen eine persönliche Rückmeldung der | |
Evaluationsergebnisse durch (Peer-Feedback), überwiegend in Form | |
von Einzelgesprächen in der Praxis (peer visit). Nach dieser Intervention | |
wurden in diesen Praxen weiter Praktika durchgeführt. Der Einfluss der | |
Intervention (prä/post) auf die studentischen Evaluationen wurde in | |
verallgemeinerten Schätzungsgleichungen (Clustervariable Praxis) berechnet. | |
Ergebnisse: Von insgesamt 264 Lehrpraxen hatten 83 eine suboptimale | |
Bewertung. Davon wurden 27 besonders negativ bewertete Praxen für | |
die Intervention ausgewählt, von denen in bislang 24 die Intervention | |
umgesetzt werden konnte. Für 5 dieser Praxen gab es keine post-Evaluationen, so dass in die vorliegende Auswertung die Daten von 19 | |
Praxen (n=9 männliche Lehrärzte, n=10 weibliche Lehrärztinnen) eingingen. Die Evaluationen dieser Praxen waren nach der Intervention | |
(durch n=78 Studierende) signifikant positiver als vorher (durch n=82 | |
Studierende): Odds Ratio 1.20 (95% Konfidenzintervall 1.10-1.31; | |
p<.001). | |
Schlussfolgerung: Die Ergebnisse deuten darauf hin, dass allgemeinmedizinische Universitätsinstitute die studentische Bewertung ihrer | |
Lehrpraxen über individuelle kollegiale Rückmeldungen verbessern | |
können. | |
Michael Pentzek1 | |
Stefan Wilm1 | |
Elisabeth | |
Gummersbach1 | |
1 Heinrich-Heine-Universität | |
Düsseldorf, Medizinische | |
Fakultät, Centre for Health | |
and Society (chs), Institut für | |
Allgemeinmedizin (ifam), | |
Düsseldorf, Deutschland | |
Schlüsselwörter: Allgemeinmedizin, Ausbildung von Lehrkräften, | |
Feedback, Medizinstudenten, medizinische Ausbildung im | |
Grundstudium, Evaluation | |
Einleitung | |
Der „Masterplan Medizinstudium 2020“ sieht eine Stärkung der Rolle der Allgemeinmedizin im Curriculum vor | |
[1]. Eine von Studierenden und Lehrenden gewünschte | |
Form der Umsetzung besteht in Praktika in Hausarztpraxen bereits früh und kontinuierlich im Studienverlauf [2]. | |
Die Erfahrungen, die Studierende in diesen Praktika machen, können –über reine Lerneffekte hinaus– eine be- | |
rufliche Orientierung mitformen; gute Erfahrungen in | |
Praktika können das Interesse an Allgemeinmedizin und | |
am Hausarztberuf steigern [3], [4]. | |
Im Einklang mit der ärztlichen Approbationsordnung | |
[https://www.gesetze-im-internet.de/_appro_2002/ | |
BJNR240500002.html] absolvieren die Studierenden im | |
Düsseldorfer Modellstudiengang in den Studienjahren 1, | |
2, 3 und 5 jeweils ein Praktikum in Hausarztpraxen mit | |
insgesamt | |
sechs | |
Wochen | |
Dauer | |
[https:// | |
www.medizinstudium.hhu.de]. Die Anforderungen der | |
Praktika bauen inhaltlich aufeinander auf; zunächst liegt | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
8/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
der Schwerpunkt auf Anamnese und körperlicher Untersuchung, später kommen komplexere medizinische Zusammenhänge und Überlegungen zu weiterführender | |
Diagnostik und Therapie hinzu. Unter Supervision der | |
niedergelassenen Lehrärzte können die Studierenden | |
hier Erfahrungen in der Arzt-Patienten-Interaktion sammeln. Ein wichtiger und deshalb immer wieder betonter | |
Faktor für eine positive studentische Wahrnehmung der | |
Praktika ist die Tatsache, dass den Studierenden im | |
Praktikum die Möglichkeit gegeben wird, selbstständig | |
mit Patienten zu arbeiten, um sich unmittelbar selbst in | |
der ärztlichen Rolle erleben zu können [2], [5]. Für den | |
didaktischen Erfolg der Praktika spielen weiterhin die | |
Haltung und Qualifikation der Lehrärzte eine wichtige | |
Rolle [3]. Ungefähr 2/3 der Lehrpraxen werden von den | |
Studierenden sehr gut bewertet, ca. 1/3 jedoch nicht. | |
Aufgrund des seit Installation des Modellstudiengangs | |
steigenden Bedarfs an Praktikumsplätzen in Hausarztpraxen wurden viele Lehrpraxen neu gewonnen; eine Feedback-Kultur wird nun aufgebaut. Ein erster Schritt bestand | |
in der Möglichkeit für Lehrpraxen, aktiv ihre schriftlichen | |
Evaluationsergebnisse einzufordern, was aber fast nie in | |
Anspruch genommen wurde. Über den nächsten Schritt | |
der Etablierung einer Feedback-Strategie wird hier berichtet: Eine Möglichkeit zur Verbesserung der Lehrperformanz ist die Rückmeldung durch einen erfahrenen Kollegen „auf Augenhöhe“ (Peer-Feedback) [6]. Dies kann | |
Einsichten generieren, die studentische Evaluationen allein nicht erreichen und wird zunehmend als Ergänzung | |
zur Studierendenrückmeldung anerkannt. Insbesondere | |
in persönlichen Peer-Feedbacks können Ideen ausgetauscht, Probleme diskutiert, Strategien aufgezeigt und | |
konkrete Verbesserungsansätze gefunden werden [7]. | |
Zu den möglichen Effekten gehören ein größeres Bewusstsein und eine stärkere Fokussierung des Lehrarztes auf | |
die Lehrsituation in der Praxis, mehr Information über | |
das, was gutes Lehren ausmacht, die Motivation zu verstärkter Interaktivität und Studierendenzentriertheit sowie | |
eine Inspiration zur Anwendung neuer Lehrmethoden [8]. | |
Pedram et al. fanden nach einem Peer-Feedback positive | |
Effekte auf das Verhalten der Lehrenden, insbesondere | |
hinsichtlich der Gestaltung der Lernatmosphäre und des | |
Interesses am Studierendenverständnis [9]. Die Anwendung von Peer-Feedback auf das hier beschriebene Setting wurde bislang nicht untersucht. Die Fragestellung | |
der vorliegenden Studie lautet, ob sich die studentische | |
Praktikumsevaluation bislang schlecht bewerteter Hausarztpraxen nach Durchführung eines Peer-Feedback | |
verbessert. | |
Methoden | |
Lehrpraxen | |
Die Daten wurden im Rahmen der 4 Praktika in | |
Hausarztpraxen [https://www.uniklinik-duesseldorf.de/ | |
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/lehre] erhoben, die alle in vom Institut | |
für Allgemeinmedizin koordinierten hausärztlichen Lehrpraxen stattfinden. Vor Aufnahme der Lehrarzttätigkeit | |
werden alle Lehrärzte mündlich und schriftlich über die | |
Erhebung studentischer Evaluationen und ein persönliches Gespräch mit einem oder einer Institutsmitarbeiter/in im Falle schlechter Evaluationsergebnisse informiert. | |
Interessierte Ärzte nehmen vor Aufnahme einer Lehrarzttätigkeit an einer 2-3-stündigen Informationsveranstaltung | |
unter Leitung des Institutsdirektors (SW) teil, in der sie | |
zunächst über die Voraussetzungen für die Lehrarzttätigkeit informiert werden; dazu gehören u.a. die Planung | |
zeitlicher Ressourcen für die Betreuung der Studierenden | |
in den Praktika, Begeisterung für die Arbeit als Hausarzt, | |
die Akzeptanz des universitären allgemeinmedizinischen | |
Lehrzielkataloges (insbesondere dass Praktikanten | |
selbstständig mit Patienten arbeiten dürfen) und die | |
Teilnahme an mindestens zwei der acht jährlich angebotenen allgemeinmedizinisch-didaktischen Fortbildungen | |
des Instituts. (Mit Aufnahme der Lehrtätigkeit geht das | |
Institut von der Akzeptanz dieser Voraussetzungen seitens | |
des Lehrarztes aus, überprüft das Vorliegen jedoch nicht | |
formal.) Es folgen ausführliche Informationen über den | |
Aufbau des Curriculums, die Verortung der Praktika, die | |
Inhalte und Anforderungen der einzelnen Praktika und | |
grundlegende didaktische Aspekte des 1:1-Unterrichts. | |
Über die Studierendenevaluation des Praktikums wird | |
mündlich und schriftlich aufgeklärt, verbunden mit dem | |
Angebot, sowohl eine Gesamtauswertung als auch die | |
individuelle Evaluation per E-Mail aktiv anfordern zu | |
können. Eine unaufgeforderte Rückmeldung der Evaluationsergebnisse an die Praxen gibt es nicht. Nach der Informationsveranstaltung wird eine Mappe mit entsprechenden schriftlichen Informationen ausgehändigt. | |
Vor jedem Praktikum wird den Lehrärzten ausführliches | |
Material zugeschickt, damit sie sich noch einmal orientieren können. Dieses enthält Hinweise zum genauen Ablauf | |
des Praktikums, zum aktuellen Lernstand der Studierenden inkl. Beilage der bzw. Verweis auf die zugrundeliegenden didaktischen Materialien, zu den im Praktikum zu | |
bearbeitenden Aufgaben und den damit verbundenen | |
Lernzielen, zur Relevanz des Übens am Patienten sowie | |
einen Hinweis zur Haltung, den Studierenden ein positives | |
Bild des Hausarztberufs vermitteln zu wollen. | |
Außerdem erhält jeder Studierende ein Anschreiben an | |
den Lehrarzt, in dem die wichtigsten o.g. Punkte noch | |
einmal zusammengefasst sind. | |
Evaluation | |
Die studentische Praktikumsevaluation als reguläres Element | |
der Lehrevaluation [https://www.medizin.hhu.de/studiumund-lehre/lehre.html] wurde in den untersuchten Praxen | |
vor und nach der Intervention durch unabhängige Studierendengruppen durchgeführt und bestand u.a. aus der | |
Möglichkeit für Freitext-Kommentare, einer Angabe der | |
Anzahl persönlich betreuter Patienten und den Items „Wie | |
zufrieden waren Sie mit der fachlichen Betreuung durch | |
Ihre Lehrärztin/Ihren Lehrarzt?“ und „Würden Sie anderen | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
9/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
KommilitonInnen diese Lehrpraxis empfehlen?“, beide | |
aufsteigend positiv 4-stufig skaliert. | |
Auswahl der Praxen für die Intervention | |
Da die meisten Praxen eine sehr gute Bewertung erhielten | |
(schiefe Verteilung), wurden wie folgt drei Gruppen identifiziert: Aus allen an den Praktika beteiligten Lehrpraxen | |
des Instituts wurden zunächst diejenigen ausgewählt, die | |
eine geringere als sehr gute Evaluation aufwiesen | |
(=„suboptimal“): mindestens einmal mit <2 auf mind. | |
einem der beiden o.g. Items bewertet oder wiederholt | |
negative Freitextkommentare. Aus dieser Gruppe der | |
suboptimal (=geringer als sehr gut) bewerteten Praxen | |
wurden nun die mit mehr als zwei vorliegenden Studierendenbewertungen, weiterhin bestehender Lehrarzttätigkeit und besonders negativen Bewertungen ausgewählt: mindestens zweimal mit <2 auf mind. einem der | |
beiden Items bewertet oder wiederholt negative Freitextkommentare. Von den 27 Praxen erhielten bislang 24 | |
Praxen (88.9%) eine Intervention zur Verbesserung ihrer | |
Lehre von Seiten einer hausärztlich tätigen Allgemeinmedizinerin (n=3 pandemiebedingt noch nicht), und 19 | |
Praxen (70.4%) lieferten Evaluationsergebnisse aus | |
Praktika nach der Intervention (n=5 hatten nach der Intervention keine Praktikanten mehr). Zur Charakterisierung der drei Gruppen der sehr gut, suboptimal und | |
schlecht evaluierten (=ausgewählten) Praxen wurde eine | |
Varianzanalyse inkl. post-hoc Scheffé-Tests mit dem | |
Faktor Gruppe und der abhängigen Variable Evaluationsergebnis gerechnet. | |
thematisiert und besprochen, gefolgt von der Frage „Was | |
können wir tun, um Sie zu unterstützen?“. Das schriftliche | |
Feedback bestand aus einer unkommentierten Rückmeldung der studentischen Evaluationsergebnisse (Scores | |
und Freitexte). | |
Analysen | |
Aufgrund einer starken Korrelation der beiden Evaluationsitems (Spearman’s rho=0.79) wurden diese für die | |
vorliegenden Analysen zu einer Gesamtbewertung gemittelt. Um multivariable Einflüsse auf diese studentische | |
Bewertung zu ermitteln, wurde eine verallgemeinerte | |
Schätzungsgleichung (GEE) mit der Clustervariable „Praxis“ gerechnet, aufgrund fehlender Normalverteilung | |
(Kolmogorow-Smirnow-Test p<.001) mit Gamma-Verteilung und Log-Verknüpfung. Als potenzielle Einflussvariablen flossen ein: Interventionseffekt (prä/post), Interventionsmodus (peer visit vs. Gruppe/schriftlich), Praktikumszeitpunkt (Studienjahr), Anzahl der persönlich betreuten | |
Patienten pro Woche. Parallel zu dieser Analyse wurde | |
in einer zweiten GEE der Interventionseffekt auf die Anzahl der persönlich betreuten Patienten untersucht. | |
Die Freitexte in den Studierendenevaluationen sowie die | |
Lehrarztkommentare in den peer visits und Gruppendiskussionen wurden qualitativ inhaltsanalytisch aufgearbeitet, um neben den reinen Zahlen auch die dahinterliegenden Probleme und die Lehrarztreaktionen auf das Feedback zu skizzieren. Dazu wurde eine induktive Kategorienbildung am Material vorgenommen [11]. Die Anzahlen | |
negativer Studierendenkommentare vor und nach der | |
Intervention wurden zudem quantitativ gegenübergestellt. | |
Intervention | |
Das Peer-Feedback wurde als Teil des didaktischen | |
Konzepts bei besonders negativ evaluierten Lehrpraxen | |
realisiert | |
[https://www.uniklinik-duesseldorf.de/ | |
patienten-besucher/klinikeninstitutezentren/institut-fuerallgemeinmedizin/didaktik-fortbildungen]: Eine den | |
Lehrärzten bekannte und in Praxis und Lehre erfahrene | |
hausärztliche Mitarbeiterin des Instituts für Allgemeinmedizin (EG) meldete den Lehrärzten deren studentischen | |
Evaluationen zurück. Der vorrangige Modus war ein persönlicher Besuch in der Praxis (peer visit) [10]. Aus organisatorischen Gründen mussten gelegentlich Gruppendiskussionen mit mehreren Lehrärzten sowie ein schriftliches | |
Feedback als Ausweichlösungen angeboten werden. Peer | |
visit und Gruppendiskussion hatten beide eine Reflexion | |
der eigenen Lehrarztmotivation, der Probleme sowie eine | |
Diskussion der persönlichen Evaluation zum Ziel, um | |
darüber in einen konstruktiven Austausch zwischen | |
Lehrarzt und Universität in Bezug auf die Lehre und den | |
Umgang mit Studierenden in der Praxis zu gelangen. Peer | |
visits und Gruppendiskussionen wurden protokolliert. Die | |
Eingangsfrage lautete „Warum sind Sie Lehrarzt/Lehrärztin?“, gefolgt von Fragen zu persönlichen Erfahrungen: | |
„Können Sie mir über Ihre Erfahrungen berichten? Was | |
motiviert Sie zu der Lehrarzttätigkeit? Gibt es aus Ihrer | |
Sicht Probleme?“. Dann wurde das (schlechte) Feedback | |
Ergebnisse | |
Lehrpraxen und Präevaluationen | |
264 Lehrpraxen mit insgesamt 1648 Praktika waren beteiligt. Davon wurden 181 Praxen (68.6%) mit 1036 | |
Praktika sehr gut bewertet (Mittelwert der Studierendenevaluation 3.8 ± Standardabweichung 0.2), 56 Praxen | |
(21.2%) mit 453 Praktika suboptimal (3.3±0.4) und 27 | |
Praxen (10.2%) mit 159 Praktika sehr schlecht (2.8±0.4). | |
Der übergeordnete Vergleich der drei Gruppen ergibt signifikante Unterschiede (F(df=2)=205.1; p<.001), mit | |
jeweils signifikanten Unterschieden in allen post-hocVergleichen (alle p<.001): sehr gut vs. suboptimal (mittlere Differenz 0.51; Standardfehler 0.04); sehr gut vs. | |
schlecht (1,09; 0.06); suboptimal vs. schlecht (0.58; | |
0.07). | |
In Tabelle 1 ist die Analysestichprobe der n=19 aus den | |
27 schlecht bewerteten Praxen näher beschrieben. | |
Gründe für eine schlechte Bewertung laut Freitexten der | |
Studierendenevaluation lassen sich in fünf Kategorien | |
darstellen. So wurde die mangelnde Gelegenheit zum | |
Einüben praktischer Fertigkeiten am Patienten kritisiert. | |
„Leider hatte ich während meines letzten Patientenpraktikums nicht die Möglichkeit, viele Patienten ei- | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
10/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
Tabelle 1: Merkmale der Analysestichprobe | |
genständig zu untersuchen, obwohl ich dies zu mehreren Gelegenheiten eingefordert habe.“ (über Praxis | |
ID 1) | |
Weiterhin gab es Kommentare über mangelnde Wertschätzung und schwierige Kommunikation: | |
„Die Lehrärztin hat wenig Geduld insbesondere mit | |
ausländischen Patienten, die anatomische oder medizinische Begriffe nicht verstehen können. Sie macht | |
beleidigende und ironische Aussagen. Mit einigen | |
Patienten wurde ich 30 Minuten lang alleine gelassen, | |
während mit anderen nur 2 und danach hat sie sich | |
darüber geärgert, wenn ich mit der Untersuchung/Anamnese noch nicht fertig war.“ (über Praxis ID 14) | |
Einige Lehrärzte wurden hinsichtlich ihrer didaktischen | |
Kompetenz kommentiert: | |
„[…] als Lehrarzt hab ich ihn als wenig bis gar nicht | |
kompetent erlebt und auch sehr desinteressiert. Er | |
hatte keine Ahnung von dem, das PP1 [Patientenpraktikum 1] uns lehren soll und hat auch nach mehrmaligem Herantreten an ihn meinerseits wenig verstanden, worum es mir ging bzw. was ich dort lernen sollte.“ (über Praxis ID 22) | |
Genannt wurden Praxisabläufe und –strukturen, die laut | |
Studierenden eine effiziente Praktikumsdurchführung | |
erschwerten: | |
„Von 8-11 Uhr kommen nur Patienten zur Blutentnahme, feste Termine sind in der Zeit nicht geplant. Da | |
ich weder Blut abnehmen noch impfen durfte, war in | |
der Zeit nichts für mich zu tun.“ (über Praxis ID 10) | |
In einigen Praxen mit primär nicht-deutschsprachigem | |
Patientenklientel und auch Personal (inkl. Lehrarzt) | |
stellte sich in den Evaluationen die Sprachbarriere als | |
Problem heraus. | |
„Da die Lehrärztin [Nationalität XY] ist, verliefen ca. | |
70% der Konsultationen auf [Sprache XY].“ (über | |
Praxis ID 2) | |
Intervention | |
In den Protokollen der peer visits und Gruppendiskussionen mit den Lehrärzten zeigen sich vier Kategorien von | |
Problemen, die teilweise die genannten Studierendenkommentare spiegeln: So berichteten die Lehrärzte von | |
Bedenken, Studierende allein mit Patienten arbeiten zu | |
lassen. (Im folgenden Zitate aus den Protokollen der intervenierenden Peer-Ärztin.) | |
„Es fällt ihm schwer, Studierende allein zu lassen. […] | |
Er meint, die Patienten mögen das nicht so, obwohl | |
seine Erfahrungen eigentlich anders sind. Hat auch | |
viele Patienten aus dem Management. „Die Studierenden sind auch zu kurz in der Praxis.““ (zu ID 17) | |
Auch eine skeptische Haltung vor allem Studierenden | |
niedriger Semester gegenüber wurde geäußert. | |
„Kann mit den 2. Semestern nichts anfangen, „die | |
können nichts, es hat keinen Sinn, sie das Herz abhören zu lassen, wenn sie die Krankheitsbilder nicht | |
kennen.“ […] „Das Problem ist auch, dass es jetzt | |
immer ganz junge Mädchen sind.““ (zu ID 24) | |
Einige Lehrärzte waren nicht vertraut mit den didaktischen | |
Konzepten und Materialien der Praktika. | |
„Er hat keine Kenntnis von der Lehre, liest sich nichts | |
durch. Weiß auch nicht, dass er evaluiert wird.“ (zu | |
ID 6) | |
Teils führt ein Selbstverständnis als allgemeinmedizinischer Lehrarzt zur Definition eigener Praktikumsinhalte | |
unter Vernachlässigung oder Abwertung der universitär | |
vorgegebenen Lernziele. | |
„„Ich habe mich zur Allgemeinmedizin bekannt und | |
will das weiterreichen.“ Erklärt den Studierenden viel, | |
lässt aber nicht viel machen. „Ich zeige jungen Menschen den rechten Weg. Sonst macht es ja keiner (die | |
Uni schon gar nicht), also mach ich es.““ (zu ID 4) | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
11/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
Tabelle 2: Multivariable Einflüsse auf die abhängige Variable ‚studentische Bewertung des Praktikums‘ (verallgemeinerte | |
Schätzungsgleichung (GEE) mit Clustervariable Praxis) | |
Tabelle 3: Anzahl der Kommentare von Studierenden zu Praktika in 19 schlecht evaluierten hausärztlichen Lehrpraxen | |
„Möchte allerdings eindeutig den Studierenden alles | |
zeigen, erwähnt wiederholt Ultraschall, Blutabnahmen, kennt Lehrinhalte nicht, macht sich eigene | |
Lehrinhalte: „Ich zeig denen alles Interessante““. (zu | |
ID 22) | |
An mehreren Stellen äußerten die Lehrärzte Intentionen | |
zur Verhaltensänderung, laut Protokollen z.B. „will Studierende mehr zum Selbst-Untersuchen anleiten“ oder „sagt, | |
er wolle sich zukünftig die Handouts durchlesen“. Die | |
Mehrzahl der besuchten Lehrärzte zeigte sich im Gespräch grundsätzlich interessiert und engagiert in der | |
Betreuung der Studierenden. Die meisten waren in der | |
Lage, die Kritikpunkte zu reflektieren. | |
Prä-post-Analyse: Der Interventionseffekt auf die studentische Bewertung ist deutlich und unabhängig vom | |
(ebenfalls signifikanten) Einfluss der Patientenanzahl | |
(siehe Tabelle 2). | |
Auch der Interventionseffekt auf die Anzahl persönlich | |
durch die Studierenden betreuter Patienten bleibt in einer | |
GEE bestehen (Odds Ratio 1.41; 95% Konfidenzintervall | |
1.21-1.64; p<.001), unabhängig von der Art der Intervention und dem Studienjahr (Analyse nicht gezeigt). | |
Der Anteil kritischer Anmerkungen in den studentischen | |
Freitextkommentaren nimmt insgesamt und in vier der | |
fünf genannten Kategorien deutlich ab (siehe Tabelle 3). | |
Diskussion | |
Ein Peer-Feedback durch eine hausärztlich tätige Allgemeinmedizinerin wirkte sich in einer Stichprobe schlecht | |
evaluierter Lehrärzte, die im Rahmen der hausärztlichen | |
Praktika Studierende betreuten, im prä-post-Vergleich | |
positiv auf die studentische Evaluation und auf die Anzahl | |
der im Praktikum von Studierenden persönlich betreuten | |
Patienten aus. Dies zeigt sich in den Evaluationsscores | |
und auch darin, dass entsprechend negative Freitextkommentare der Studierenden nach der Intervention seltener | |
waren. | |
Im Einklang mit der Literatur war es entscheidend für die | |
studentische Bewertung, dass den Studierenden die | |
Möglichkeit gegeben wurde, selbstständig mit Patienten | |
zu arbeiten, um sich unmittelbar selbst in der ärztlichen | |
Rolle erleben zu können [2], [5]. Aber auch unabhängig | |
von der Patientenanzahl verbesserte sich die studentische | |
Evaluation nach der Intervention: Die qualitativen Ergeb- | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
12/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
nisse liefern Hinweise darauf, dass sich die Lehrärzte | |
nach der Intervention näher mit dem Sinn der Praktika, | |
den Lernzielen und didaktischen Materialien beschäftigt | |
haben könnten. Dies wiederum schien auch positive Effekte auf den Austausch und die Beziehung zwischen | |
Lehrarzt und Studierendem (u.U. im Sinne eines Abgleichs | |
gegenseitiger Erwartungen) gehabt zu haben – ebenfalls | |
wichtige Elemente einer positiven Praktikumserfahrung | |
[3], [12]. Die qualitativen Ergebnisse zur didaktischen | |
Kompetenz und Haltung weisen darauf hin, dass es zumindest für die hier untersuchte kleine Gruppe zuvor | |
schlecht evaluierter Lehrärzte einer intensiveren Auseinandersetzung mit ihrem Lehrauftrag und einer wiederholten Interaktion zwischen der universitären Einrichtung | |
für Allgemeinmedizin und der Lehrpraxis bedarf, um Inhalte und Konzepte zu verinnerlichen und diese in den | |
Praktika für Studierende wiedererkennbar und konsistent | |
umzusetzen. Dass gerade die schlecht evaluierten | |
Lehrärzte eher selten an den (in Düsseldorf achtmal pro | |
Jahr angebotenen) Treffen in der Universität teilnehmen, | |
ist eine auch von vielen anderen Standorten berichtete | |
Erfahrung. Die formale Überprüfung der Voraussetzungen | |
und Kriterien für eine angemessene Lehrarzttätigkeit | |
wäre bei der – insbesondere in einem longitudinal-allgemeinmedizinisch und praxisnah konstruierten Curriculum | |
erforderlichen – hohen Anzahl an Lehrpraxen mit enormem Aufwand verbunden. Es ist jedoch abzuwägen, ob | |
mehr Ressourcen in die Auswahl und Qualifikation lehrinteressierter Praxen oder aber in die Qualitätskontrolle | |
und das Training bereits lehrender Praxen zu investieren | |
ist. | |
Eine Stärke dieser Studie sind die Bewertungen durch | |
unabhängige Studierendengruppen prä-post, so dass | |
Verzerrungen durch wiederholte Exposition der Studierenden mit einer Praxis (z.B. response shift bias, Gewöhnung, | |
observer drift) ausgeschlossen sind. Die mit dem präpost-Design ohne Kontrollgruppe und dem Fokus auf | |
schlecht evaluierte Praxen einhergehende Schwäche | |
besteht u.a. im Phänomen der Regression zur Mitte, | |
welches vermutlich einen Teil des positiven Interventionseffekts begründet. Die primäre Fragestellung dieser Studie ist quantitativ formuliert und beantwortet; wir berichten nur begrenzt qualitative Ergebnisse. Diese erlauben | |
hier nur in Teilen hypothesengenerierende Einsichten in | |
die genauen Wirkmechanismen eines Peer-Feedback | |
[13]. In der vorliegenden Studie wurden mehrere Modi | |
der Vermittlung eines Peer-Feedback realisiert. Da die | |
Analysen nicht auf unterschiedliche Effekte des personell | |
und zeitlich aufwändigen peer visit einerseits und der | |
effizienteren Methoden Gruppendiskussion und schriftliche Rückmeldung andererseits hindeuten, sind vor einer | |
breiteren Umsetzung weitere Studien zur Differenzierung | |
notwendig. So fanden Rüsseler et al. [14], dass ein | |
schriftliches Peer-Feedback – dort allerdings bezogen | |
auf Vorlesungsdozenten – positive Effekte auf die Gestaltung der Lehrveranstaltung hatte. | |
Schlussfolgerungen | |
Es macht Sinn, die Effekte eines Lehrarzt-Feedbacks sowohl in der Forschung als auch in der Lehre weiter zu | |
berücksichtigen. Die umfangreichen GMA-Empfehlungen | |
bieten einen robusten Rahmen für die Lehre [15] und die | |
didaktische Qualifizierung von Lehrärzten [16]. Darin | |
eingebettet stellt ein kollegiales Peer-Feedback für | |
schlecht bewertete Lehrärzte ein mögliches Werkzeug | |
für das Qualitätsmanagement der allgemeinmedizinischen | |
Lehre dar. | |
Interessenkonflikt | |
Die Autor*innen erklären, dass sie keinen Interessenkonflikt im Zusammenhang mit diesem Artikel haben. | |
Literatur | |
1. | |
Bundesministerium für Bildung und Forschung. Masterplan | |
Medizinstudium 2020. Berlin: Bundesministerium für Bildung | |
und Forschung; 2017. Zugänglich unter/available from: https:/ | |
/www.bmbf.de/files/2017-03-31_Masterplan% | |
20Beschlusstext.pdf | |
2. | |
Wiesemann A, Engeser P, Barlet J, Müller-Bühl U, Szecsenyi J. | |
Was denken Heidelberger Studierende und Lehrärzte über | |
frühzeitige Patientenkontakte und Aufgaben in der | |
Hausarztpraxis? Gesundheitswesen. 2003;65(10):572-578. DOI: | |
10.1055/s-2003-42999 | |
3. | |
Grunewald D, Pilic L, Bödecker AW, Robertz J, Althaus A. Die | |
praktische Ausbildung des medizinischen Nachwuchses Identifizierung von Lehrpraxen-Charakteristika in der | |
Allgemeinmedizin. Gesundheitswesen. 2020;82(07):601-606. | |
DOI: 10.1055/a-0894-4556 | |
4. | |
Böhme K, Sachs P, Niebling W, Kotterer A, Maun A. Macht das | |
Blockpraktikum Allgemeinmedizin Lust auf den Hausarztberuf? | |
Z Allg Med. 2016;92(5):220-225. DOI: | |
10.3238/zfa.2016.0220–0225 | |
5. | |
Gündling PW. Lernziele im Blockpraktikum Allgemeinmedizin Vergleich der Präferenzen von Studierenden und Lehrärzten. Z | |
Allg Med. 2008;84:218-222. DOI: 10.1055/s-2008-1073148 | |
6. | |
Steinert Y, Mann K, Centeno A, Dolmans D, Spencer J, Gelula M, | |
Prideaux D. A systematic review of faculty development initiatives | |
designed to improve teaching effectiveness in medical education: | |
BEME Guide No. 8. Med Teach. 2006;28(6):497-526. DOI: | |
10.1080/01421590600902976 | |
7. | |
Garcia I, James RW, Bischof P, Baroffio A. Self-Observation and | |
Peer Feedback as a Faculty Development Approach for ProblemBased Learning Tutors: A Program Evaluation. Teach Learn Med. | |
2017;29(3):313-325. DOI: 10.1080/10401334.2017.1279056 | |
8. | |
Gusic M, Hageman H, Zenni E. Peer review: a tool to enhance | |
clinical teaching. Clin Teach. 2013;10(5):287-290. DOI: | |
10.1111/tct.12039 | |
9. | |
Pedram K, Brooks MN, Marcelo C, Kurbanova N, Paletta-Hobbs | |
L, Garber AM, Wong A, Qayyum R. Peer Observations: Enhancing | |
Bedside Clinical Teaching Behaviors. Cureus. 2020;12(2):e7076. | |
DOI: 10.7759/cureus.7076 | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
13/14 | |
Pentzek et al.: Verbessert Peer-Feedback für Lehrärzte die studentische ... | |
10. | |
O'Brien MA, Rogers S, Jamtvedt G, Oxman AD, Odgaard-Jensen | |
J, Kristoffersen DT, Forsetlund L, Bainbridge D, Freemantle N, | |
Davis DA, Haynes RB, Harvey EL. Educational outreach visits: | |
effects on professional practice and health care outcomes. | |
Cochrane Database Syst Rev. 2007;2007(4):CD000409. DOI: | |
10.1002/14651858.CD000409.pub2 | |
11. | |
Kruse J. Qualitative Interviewforschung. 2. Aufl. Weinheim: Beltz | |
Juventa; 2015. | |
12. | |
Koné I, Paulitsch MA, Ravens-Taeuber G. Blockpraktikum | |
Allgemeinmedizin: Welche Erfahrungen sind für Studierende | |
relevant? Z Allg Med. 2016;92(9):357-362. DOI: | |
10.3238/zfa.2016.0357-0362 | |
13. | |
14. | |
15. | |
16. | |
Raski B, Böhm M, Schneider M, Rotthoff T. Influence of the | |
personality factors rigidity and uncertainty tolerance on peerfeedback. In: 5th International Conference for Research in | |
Medical Education (RIME 2017), 15.-17. March 2017, | |
Düsseldorf, Germany. Düsseldorf: German Medical Science GMS | |
Publishing House; 2017. P15. DOI: 10.3205/17rime46 | |
Korrespondenzadresse: | |
PD Dr. rer. nat Michael Pentzek | |
Heinrich-Heine-Universität Düsseldorf, Medizinische | |
Fakultät, Centre for Health and Society (chs), Institut für | |
Allgemeinmedizin (ifam), Moorenstr. 5, Gebäude 17.11, | |
40225 Düsseldorf, Deutschland, Tel.: +49 | |
(0)211/81-16818 | |
[email protected] | |
Bitte zitieren als | |
Pentzek M, Wilm S, Gummersbach E. Does peer feedback for teaching | |
GPs improve student evaluation of general practice attachments? A | |
pre-post analysis. GMS J Med Educ. 2021;38(7):Doc122. | |
DOI: 10.3205/zma001518, URN: urn:nbn:de:0183-zma0015182 | |
Artikel online frei zugänglich unter | |
https://www.egms.de/en/journals/zma/2021-38/zma001518.shtml | |
Ruesseler M, Kalozoumi-Paizi F, Schill A, Knobe M, Byhahn C, | |
Müller MP, Marzi I, Walcher F. Impact of peer feedback on the | |
performance of lecturers in emergency medicine: a prospective | |
observational study. Scand J Trauma Resusc Emerg Med. | |
2014;22:71. DOI: 10.1186/s13049-014-0071-1 | |
Eingereicht: 03.03.2021 | |
Überarbeitet: 12.08.2021 | |
Angenommen: 17.08.2021 | |
Veröffentlicht: 15.11.2021 | |
Huenges B, Gulich M, Böhme K, Fehr F, Streitlein-Böhme I, | |
Rüttermann V, Baum E, Niebling WB, Rusche H. | |
Recommendations for Undergraduate Training in the Primary | |
Care Sector - Position Paper of the GMA-Primary Care Committee. | |
GMS Z Med Ausbild. 2014;31(4):Doc35. DOI: | |
10.3205/zma000927 | |
Copyright | |
©2021 Pentzek et al. Dieser Artikel ist ein Open-Access-Artikel und | |
steht unter den Lizenzbedingungen der Creative Commons Attribution | |
4.0 License (Namensnennung). Lizenz-Angaben siehe | |
http://creativecommons.org/licenses/by/4.0/. | |
Böhme K, Streitlein-Böhme I, Baum E, Vollmar HC, Gulich M, | |
Ehrhardt M, Fehr F, Huenges B, Woestmann B, Jendyk R. Didactic | |
qualification of teaching staff in primary care medicine - a position | |
paper of the Primary Care Committee of the Society for Medical | |
Education. GMS J Med Educ. 2020;37(5):Doc53. DOI: | |
10.3205/zma001346 | |
GMS Journal for Medical Education 2021, Vol. 38(7), ISSN 2366-5017 | |
14/14 | |
Higher Learning Research Communications | |
2021, Volume 11, Issue 2, Pages 22–39. DOI: 10.18870/hlrc.v11i2.1244 | |
Original Research | |
© The Author(s) | |
Students’ and Teachers’ Perceptions and Experiences | |
of Classroom Assessment: A Case Study of a Public | |
University in Afghanistan | |
Sayed Ahmad Javid Mussawy, PhD Candidate | |
University of Massachusetts Amherst, Amherst, Massachusetts, United States | |
https://orcid.org/0000-0001-9991-6681 | |
Gretchen Rossman, PhD | |
University of Massachusetts Amherst, Amherst, Massachusetts, United States | |
https://orcid.org/0000-0003-1224-4494 | |
Sayed Abdul Qahar Haqiqat, MEd | |
Baghlan University, Pule-khumri, Baghlan, Afghanistan | |
Contact: [email protected] | |
Abstract | |
Objective: The primary goal of the study was to examine students’ perceptions of classroom assessment at a | |
public university in Afghanistan. Exploring current assessment practices focused on student and faculty | |
members lived experiences was a secondary goal. The study also sought to collect evidence on whether or not | |
the new assessment policy was effective in student achievement. | |
Method: Authors used an explanatory sequential mixed-methods design to conduct the study. Initially, we | |
applied the Students Perceptions of Assessment Questionnaire (SPAQ), translated into Dari/Farsi and | |
validated, to collect data from a random sample of 400 students from three colleges: Agriculture, Education, | |
and Humanities. Response rate was 88.25% (N = 353). Semi-structured interviews were used to collect data | |
from a purposeful sample of 18 students and 7 faculty members. Descriptive statistics, one-way ANOVA, and | |
t-tests were used to analyze quantitative data, and NVivo 12 was used to conduct thematic analysis on | |
qualitative data. | |
Results: The quantitative results suggest that students have positive perceptions of the current assessment | |
practices. However, both students and faculty members were dissatisfied with the grading policy, reinforcing | |
summative over formative assessment. Results support that the policy change regarding assessment has | |
resulted in more students passing the courses compared to in the past. The findings also suggest | |
improvements in faculty professional skills such as assessment and teaching and ways that they engage | |
students in assessment processes. | |
Implication for Policy and Practice: Recommendations include revisiting the grading policy at the | |
national level to allow faculty members to balance the formative and summative assessment and utilizing | |
We would like to thank the students and teachers who participated and assisted with this study. | |
Mussawy et al., 2021 | |
Open Access | |
assessment benchmarks and rubrics to guide formative and summative assessment implementation in | |
practice. | |
Keywords: assessment, classroom assessment, higher education, Afghanistan | |
Submitted: March 14, 2021 | Accepted: July 23, 2021 | Published: October 13, 2021 | |
Recommended Citation | |
Mussawy, S. A. J., Rossman, G., & Haqiqat, S. A. Q. (2021). Students’ and teachers’ perceptions and experiences of | |
classroom assessment: A case study of a public university in Afghanistan. Higher Learning Research | |
Communications, 11(2) 22–39. DOI: 10.18870/hlrc.v11i2.1244 | |
Introduction | |
Classroom assessment, an instrumental aspect of teaching and learning, refers to a systematic process of | |
obtaining information about learner progress, understanding, skills, and abilities towards the learning goals | |
(Dhindsa et al., 2007; Goodrum et al., 2001; Klenowski & Wyatt-Smith, 2012; Linn & Miller, 2005). According | |
to Scriven (1967) and Poskitt (2014), educational assessment surfaced in the 20th century to serve two purposes. | |
The first was to improve learning (formative assessment), and the second was to make judgments about student | |
learning (summative assessment). The current literature on assessment emphasizes establishing alignment | |
between educational expectations versus student learning needs (Black et al., 2003; Gulikers et al., 2006; | |
Mussawy, 2009). Therefore, teachers use various forms of assessment to determine where students are and | |
create diverse activities to help them achieve the expected outcomes (Mansell et al., 2020). | |
As most countries have expanded their higher education systems by embracing broader access to higher | |
education, the student population has also become diverse (Altbach, 2007; Salmi, 2015). The more diverse | |
student population suggests that conventional assessment approaches may no longer work. Therefore, | |
alternative assessment approaches need to put students in the center to avoid wasting “learning for drilling | |
students in the things that they [teachers] will be held accountable [for]” (Dhindsa et al., 2007, p. 1262). | |
The concept of classroom assessment has been loosely defined in the higher education sector of Afghanistan. | |
While students and teachers are aware of different assessment approaches, current assessment practices rely | |
heavily on conventional summative assessment (Mussawy, 2009; Noori et al., 2017). Previously, final exams | |
were the only mechanisms to assess student learning (UNESCO-IIEP, 2004). However, higher education | |
reform in Afghanistan in the early 2000s paved the way for introducing mid-term exams and the credit | |
system that replaced the conventional course structure based on the number of subjects (Babury & Hayward, | |
2013; Hayward, 2017). More specifically, in the traditional system, the value of final grades for each subject | |
was the same, irrespective of the number of hours the subject was taught per week. However, in the credit | |
system, the value of grades varies depending on credit hours per week. Further, due to the absence of specific | |
regulations on assessment approaches, faculty members enjoyed immense autonomy in assessing student | |
learning. Since most of the faculty members had not received any training on pedagogy and assessment, they | |
primarily relied on conventional open-ended summative assessment (Darmal, 2009). | |
In 2014, the Ministry of Higher Education (MoHE) in Afghanistan introduced a new assessment policy that | |
centers on (a) transparency through the establishment of assessment committees at the institution and faculty | |
levels and (b) the type and the number of question items in an exam (Ministry of Higher Education (MoHE), | |
2018). The second component, which is the focus of this study, indicates that assessment includes “evaluation | |
of quizzes, mid-term exams, assignments, laboratory projects, class seminars and projects, final exams, and | |
thesis and dissertations” (MoHE, 2018, p. 5). While mid-term and final exams constitute 20% and 60% of | |
students’ grades, respectively, the policy emphasizes “30—40 question items on final exams” and “a minimum | |
of 10 items on mid-terms” (MoHE, 2018, p. 7). The policy also recommends a combination of closed-ended | |
Higher Learning Research Communications | |
23 | |
Mussawy et al., 2021 | |
Open Access | |
and open-ended questions with a value of “3–5 points” for descriptive and analytic items and “1 point for | |
multiple-choice questions” (MoHE, 2018, p. 5). | |
Although the assessment policy recognizes various approaches, such as quizzes, assignments, student | |
projects, seminars, and mid-term and final exams, formative assessment and class attendance account for | |
only 20% of a student’s grade. Mid-term and final exams, on the contrary, constitute 80% of students’ grades; | |
this indirectly projects more value for summative over formative assessment. Therefore, perceptions of | |
students and faculty members can shed light on current practices and participants’ experiences of classroom | |
assessment. | |
Review of Literature | |
The focus of classroom assessment has gradually shifted from assessment of learning—“testing learning” | |
(Birenbaum & Feldman, 1998, p. 92) to assessment for learning—creating diverse opportunities for learners to | |
prosper (Brown, 2005; Wiliam, 2011). This is because research shows that classroom assessment significantly | |
affects the approach students take to learning (Pellegrino & Goldman, 2008). More specifically, new | |
assessment approaches encourage an increase in correspondence between student learning needs and | |
expectations to prosper in a changing environment (Gulikers et al., 2006). Goodrum et al. (2001) argued that, | |
ideally, assessment “enhances learning, provides feedback about student progress, builds self-confidence and | |
self-esteem, and develops skills in evaluation” (p. 2). Nonetheless, Dhindsa et al. (2007) stated that [primary | |
and secondary school] teachers “sacrifice learning for drilling students in the things that they will be held | |
accountable for” (p. 1262). This suggests that teachers use “a very narrow range of assessment strategies” to | |
help students prepare for high-stakes tests, while limited evidence exists to support that “teachers actually use | |
formative assessment to inform planning and teaching” (Goodrum et al., 2001, p. 2). Most importantly, recent | |
research on classroom assessment emphasizes the quality and relevance of assessment activities to help | |
students learn (Ibarra-Saiz et al., 2020). | |
Inquiring into students’ perception of assessment has been an important aspect of the literature on classroom | |
assessment (Koul et al., 2006; Segers et al., 2006; Struyven et al., 2005; Waldrip et al., 2009). Examining | |
their perceptions confirms the assumption that assessment “rewards genuine effort and in-depth learning | |
rather than measuring luck” (Dhindsa et al., 2007, p. 1262). For this reason, recent studies on classroom | |
assessment advocate for student involvement in developing assessment tools (Falchikove, 2004; Waldrip et | |
al., 2014) to make the learning process more valuable to students. With this in mind, Fisher et al. (2005) | |
developed Students Perceptions of Assessment Questionnaire (SPAQ) and confirmed its validity by applying it | |
to a sample consisting of 1,000 participants from 40 science classes in grades 8–10. Following that, Cavanagh | |
et al. (2005) modified and adapted the SPAQ as an analytic tool to study student perceptions of classroom | |
assessment in five specific areas: Congruence with planned learning (CPL), assessment of applied learning | |
(AAL), students’ consultation (SC) types, transparency in assessment (TA), and accommodation of students’ | |
diversity (ASD) in assessment procedures. Cavanagh et al. (2005) used SPAQ to study 8th through 10th grade | |
student perceptions of assessment in Australian science classrooms. Their study showed that student | |
perceptions of assessment in science subjects varied depending on their abilities. | |
Other studies examining students’ perceptions of assessment reveal diverse responses. For instance, Koul et | |
al. (2006) modified, validated, and applied SPAQ on a 4-point Likert scale to study secondary student | |
perceptions of assessment in Australia. Their study shows that the difference between males’ and females’ | |
perceptions of assessment was not statistically significant. However, they reported statistically significant | |
differences in student perceptions of assessment by grade level. Similarly, Dhindsa et al. (2007) used SPAQ to | |
examine high school student perceptions of assessment in Brunei Darussalam and learned that the Student | |
Consultation was rated the lowest of the scales. Their findings suggest that students perceived assessment as | |
Higher Learning Research Communications | |
24 | |
Mussawy et al., 2021 | |
Open Access | |
transparent and as aligned with learning goals. However, they did find that teachers hardly consulted with | |
students regarding assessment forms. | |
Kwok (2008) also studied student assumptions of peer assessment and reported that, while students | |
perceived peer assessment as substantially important in enhancing self-efficacy, they considered themselves | |
unprepared relative to their teachers who brought years of experience. In another study, Segers et al. (2006) | |
examined college students’ understanding of assignment-based learning versus problem-based learning. Their | |
study showed that students in the assignment-based learning course embraced “more deep learning strategies | |
and less surface-learning strategies than the students in the PBL [problem-based learning] course” (Segers et | |
al., 2006, p. 234). They reported that students in the PBL course showed surface-level learning strategies | |
(Segers et al., 2006, p. 236). Although the context varied, their findings are partly consistent with those of | |
Birenbaum and Feldman (1998), who examined 8th through 10th-grade student attitudes towards openended versus closed-ended response assessment. They reported that gender and learning strategies were | |
significantly correlated in that female students leaned towards essay questions while male students favored | |
closed responses. In other words, students who demonstrated the “surface study approach” preferred closeended question items as opposed to those with “deep study approach,” favoring open-ended questions | |
(Birenbaum & Feldman, 1998). | |
However, Beller and Gafni’s (2000) study shows that although boys favored multiple-choice questions items | |
in mathematics assessment, the difference between performance based on gender was not profound. Their | |
study focused on the relationship between question format, examining whether multiple choice versus openended questions accounted for gender differences. Their study “results challenge the simplistic assertion that | |
girls perform relatively better on OE [open-ended] test items” (Beller & Gafni, 2000, p. 16). On a similar note, | |
Van de Watering et al. (2008) found no “relationship between students’ perceptions of assessment and their | |
assessment results” (p. 657). They reported that students prefer close-ended question formatting when | |
attending to a “New Learning Environment” (Van de Watering et al., 2008, p. 245). | |
Meanwhile, Struyven et al. (2005) studied the relationship between student perceptions of assessment and | |
their learning approaches. In general, students preferred close-ended questions; however, students with | |
advanced learning abilities and with low test anxiety favored essay exams. Lastly, Ounis (2017) investigated | |
perceptions of classroom assessment among secondary school teachers in Tunisia. The author reported that | |
the teachers “have highly favorable perceptions of assessment and they hold highly the motivational function | |
of assessment” (p. 123). According to Ounis (2017), the teachers emphasized oral assessment as a useful | |
approach to increase learning even though they reported some challenges to implementing the oral | |
assessment. | |
Although assessment in higher education is loosely defined relative to assessment at the primary and | |
secondary education levels, recent literature sheds light on introducing alternative/formative assessment | |
tasks such as portfolios, applied research projects, and others (Bess, 1977; Ibarra-Sáiz et al., 2020; Nicol & | |
Macfarlane-Dick, 2006; Struyven et al., 2005). Further, to date, research on perceptions and experiences of | |
undergraduate students and faculty members in Afghanistan is scarce. For instance, Noori et al. (2017) and | |
Darmal (2009) studied assessment practices of university lecturers in Afghanistan. However, the scope of | |
these research studies is limited. For instance, Darmal’s (2009) study focuses on the experiences of six faculty | |
members involved in the Department of Geography, and Noori et al.’s (2017) research included three lecturers | |
who taught English as a Foreign Language. Since the government has introduced new regulations on | |
assessment with a focus on types and number of questions in mid-term and final exams, exploring the | |
experiences of students and faculty members can shed light on the meaningfulness of classroom assessment | |
and create insight into the policy. | |
Although the existing literature provides mixed findings regarding the student perceptions of assessment | |
based on gender, gender equity has been underscored as a key challenge in the higher education sector of | |
Higher Learning Research Communications | |
25 | |
Mussawy et al., 2021 | |
Open Access | |
Afghanistan (Babury & Hayward, 2014; Mussawy & Rossman, 2018). According to Babury and Hayward | |
(2014), female students constitute less than 20% of the student population in universities. Since females are | |
underrepresented in the higher education sector, examining students’ perceptions of assessment based on | |
gender will inform whether assessment practices serve male and female students evenly. | |
Study Purpose | |
The primary purpose of the study was to examine student perceptions of classroom assessment at a university | |
in Afghanistan. Exploring current assessment practices focused on student and faculty lived experiences was a | |
secondary purpose. The study also sought to collect evidence on whether the new Afghanistan assessment | |
policy was effective in improving student learning. Cavanagh et al. (2005) suggested two strategies to | |
understand the advantages and disadvantages of classroom assessment on student learning: (a) examining | |
the research on assessment forms that teachers use; and (b) inquiring into students’ perceptions of classroom | |
assessment. This study used both strategies. More specifically, the research questions and hypotheses guiding | |
the study are below. | |
1. | |
What are the perceptions of students about classroom assessment? As part of this research question, | |
gender and academic discipline differences were explored. | |
• | |
Hypothesis 1: There is no significant difference in student perceptions of classroom assessment | |
based on gender. | |
• | |
Hypothesis 2: There is no significant difference in student perceptions of classroom assessment | |
based on academic discipline. | |
2. What are the experiences of students and faculty members concerning classroom assessment? | |
Significance of the Study | |
This study contributes to the literature on classroom assessment. First, the study’s findings provide new | |
insights into how students perceive classroom assessment and whether the assessment outcomes affect their | |
learning. Second, the research explored student and faculty lived experiences with classroom assessment. | |
Specific attention was given to faculty pedagogical skills and assessment literacy. Third, teachers’ challenges | |
concerning the national assessment policy with a focus on grading practices are highlighted. The study also | |
informs the conversation regarding student involvement in assessment processes and the challenges | |
associated with the lack of student preparedness to pursue undergraduate degree programs. | |
Theoretical Framework | |
The study uses formative and summative assessment as an analytic lens to explore perceptions and experiences | |
of classroom assessment among undergraduate students and faculty. Formative and summative assessment | |
approaches are well explored in the literature (Scriven, 1967; Wiliam & Black, 1996; Wiliam & Thompson, | |
2008). Formative assessment in the United States refers to “assessments that are used to provide information on | |
the likely performance of students on state-mandated test—a usage that might better be described as ‘earlywarning summative’” (Wiliam & Thompson, 2008, p. 60). Other places use formative assessment to provide | |
feedback to students—informing them “which items they got correct and incorrect” (Wiliam & Thompson, 2008, | |
p. 60). Providing feedback to improve learning is a key component of formative assessment that benefits | |
students in a higher education setting to achieve desirable outcomes (Black & Wiliam, 1998; Nicole & | |
Macfarlane-Dick, 2006; Sadler, 1998). In other words, formative assessment allows instructors to help students | |
engage in their own learning by exhibiting what they know and identifying their needs to move forward (Black & | |
Higher Learning Research Communications | |
26 | |
Mussawy et al., 2021 | |
Open Access | |
Wiliam, 1998; Mansell et al., 2020; Wiliam, 2011). Formative assessment occurs in formal and informal forms | |
such as quizzes, oral questioning, self-reflection, peer feedback, and think-aloud (Mansell et al., 2020; Wiggins & | |
McTighe, 2007). Formative assessment also influences the quality of teaching and learning while engaging | |
students in self-directed learning (Stiggins & Chappuis, 2005). | |
On the other hand, summative assessment is bound to administrative decisions (Wiliam, 2008). It occurs at the | |
“end of a qualification, unit, module or learning target to evaluate the learning which has taken place towards the | |
required outcomes” (Mansell et al., 2020, p. xxi). Summative assessment, known as assessment of learning, is | |
primarily used “in deciding, collecting and making judgments about evidence relating to the goals of the learning | |
being assessed” (Harlen, 2006, p. 103). Herrera et al. (2007, p. 13) argued that “assessment of achievement has | |
become increasingly standardized, norm-referenced and institutionalized,” which thus negatively affects the | |
quality of teaching (Firestone & Mayrowetz, 2000). For scholars like Stiggins and Chappuis (2005), student | |
roles vary depending on assessment forms, suggesting that summative assessment enforces a passive role while | |
formative assessment engages students in the process as active members. | |
While some studies promote formative assessment over summative assessment (Firestone & Mayrowetz, | |
2000; Harlen, 2006), other studies emphasize the purpose and outcome of assessment activities with a focus | |
on ways to utilize the information to improve the teaching and learning experience (Taras, 2008; Ussher & | |
Earl, 2010). Bloom (1969) also asserted that when assessment is aligned with the process of teaching and | |
learning, it will have “a positive effect on student learning and motivation” (cited in Wiliam, 2008, p. 58). | |
Assessment in general accounts for “supporting learning (formative), certifying the achievement or potential | |
of individuals (summative), and evaluating the quality of educational institutions or programs (evaluative)” | |
(Wiliam, 2008, p. 59). Black and Wiliam (2004) emphasized ways to use the outcomes of formative and | |
summative assessment approaches to improve student learning. Taras (2008) argued that “all assessment | |
begins with summative assessment (which is a judgment) and that formative assessment is, in fact, | |
summative assessment plus feedback which the learner uses” (p. 466). According to Taras (2008), both | |
formative and summative assessments require “making judgments,” which might be implicit or explicit | |
depending on the context (p. 468). In other words, Taras (2008) argued that assessment could not “be | |
uniquely formative without the summative judgment having preceded it” (p. 468). Similarly, Wiggins and | |
McTighe (2007) explained that formative assessment occurs during instruction rather than as a separate | |
activity at the end of a class or unit. The literature on assessment underscores the importance of formative | |
and summative assessment and ways that “assessment… feed into actions in the classroom in order to affect | |
learning” (Wiliam & Thompson, 2008, p. 63). | |
Methods | |
Research Site | |
The study was conducted at a public university in Northern Afghanistan. The university, established in 1993 | |
and re-established in 2003, has seven colleges and 27 departments. The university has approximately 155 fulltime faculty members who serve approximately 5,000 students, 20% of whom are female. The faculty– | |
student ratio at the university is 1/35, and the staff–student ratio is 1:70. The university offers only | |
undergraduate degrees. | |
Procedure and Participants | |
The authors used an explanatory sequential mixed-methods design to collect data from senior, junior, and | |
some sophomore students. We administered the 24-item SPAQ to a random sample of 400 students from the | |
Agriculture, Education, and Humanities colleges and received a response rate from 355 students (88.25 %). | |
Following the administration of the SPAQ, the authors conducted document analysis (mainly policy | |
documents on assessment) as well as semi-structured interviews with a purposeful sample of 25 individuals, | |
Higher Learning Research Communications | |
27 | |
Mussawy et al., 2021 | |
Open Access | |
seven faculty members, and 18 undergraduate students to explore their lived experiences concerning current | |
assessment practices. The in-person interviews ranged from 30 to 70 minutes. The notation for this study can | |
be written as QUAN → QUAL (Creswell & Clark, 2017). The authors obtained approval of the Institutional | |
Review Board prior to conducting the study. | |
Instrument | |
We adapted the SPAQ (Cavanagh et al., 2005) to examine students’ perceptions of assessment. As a | |
conceptual model, SPAQ assesses students’ perceptions of assessment in the following five dimensions: | |
1. | |
Congruence with planned learning (CPL)—Students affirm that assessment tasks align with the | |
goals, objectives, and activities of the learning program; | |
2. [Assessment] Authenticity (AA)—Students affirm that assessment tasks feature real-life | |
situations that are relevant to themselves as learners; | |
3. Student consultation (SC)—Students affirm that they are consulted and informed about the forms | |
of assessment tasks being employed; | |
4. [Assessment] Transparency (AT) –The purposes and forms of assessment tasks are affirmed by | |
the students as well-defined and are made clear; and | |
5. Accommodation to student diversity (ASD)—Students affirm they all have an equal chance of | |
completing assessment tasks (Cavanagh et al., 2005, p. 3). | |
Since the original instrument was only used to measure science assessment, we adapted and translated it to | |
correspond to other disciplines such as social science, agriculture, and humanities. The Dari/Farsi translation | |
of SPAQ is located in Appendix A. Students’ responses to the SPAQ were recorded on a 4-point Likert scale (4 | |
= Strongly Agree to 1 = Strongly Disagree). | |
For the qualitative section of the study, we used a phenomenological approach to explore student and faculty | |
experiences of classroom assessment (Rossman & Rallis, 2016). Using a phenomenological approach in a | |
qualitative study is important in “understanding meaning, for participants in the study, of the events, | |
situations, and actions they are involved with, and of the accounts that they give of their lives and | |
experiences” (Maxwell, 2012, p. 8). The authors used two semi-structured interview protocols (one for | |
students and one for faculty) containing 19 open-ended questions to corroborate the results of the quantitative | |
data. Appendices B and C contain the interview protocols for faculty and students, respectively. These | |
protocols centered on four important themes of classroom assessment—methods, authenticity, transparency, | |
and the use of assessment outcomes to improve learning—that emerged from the literature on perceptions of | |
assessment. | |
Since the SPAQ and interview protocols were developed in English, one of the authors, fluent in English and | |
Dari, used a forward translation approach to translate the instruments into Dari/Farsi. The English and Dari | |
versions were shared with three experts who were fluent in both languages, and the translated versions were | |
revised based on their comments and suggestions. Then, the instruments were pilot tested among senior and | |
junior students and faculty members. The investigators conducted the survey and interviews once the | |
research participants confirmed that the questionnaire and interview protocols were understandable in the | |
local language. | |
Higher Learning Research Communications | |
28 | |
Mussawy et al., 2021 | |
Open Access | |
Validity and Reliability | |
Previous research confirmed the validity and reliability of SPAQ. For instance, Fisher et al. (2005) developed | |
SPAQ and confirmed its validity by applying it to a sample consisting of 1,000 participants from 40 science | |
classes in grades 8–10. Cavanagh et al. (2006) replicated the study and revised the instrument from 30 to 24 | |
items. Dhindsa et al. (2007) administered the revised SPAQ with 1,028 Bruneian upper-secondary students. | |
They reported Cronbach’s alpha reliability (Cronbach, 1951) as “0.86” for 24 items, while it ranged from “0.64 | |
to 0.77” for subscales (p. 1269). Similarly, Koul et al. (2006) applied the original 30-item instrument and | |
reported that Cronbach’s alpha reliability coefficient for SPAQ subscales ranged from 0.63 to 0.83. Lastly, | |
Mussawy (2009) administered the revised SPAQ at Baghlan Higher Education Institution in Afghanistan and | |
confirmed that the SPAQ was suitable for understanding student perceptions of assessment. Cronbach’s alpha | |
reliability coefficient in that study was 0.89 for all items (24), and it ranged from 0.61 to 0.76 for subscales. | |
Thus, validity and reliability of SPAQ have been confirmed in secondary and tertiary education settings. The | |
investigators used the triangulation technique to increase the study’s validity by collecting data from different | |
sources including the SPAQ, semi-structured interviews, and document analysis. Research methodologists, | |
including Maxwell (2012) and Rossman and Rallis (2016), support that by using triangulation, researchers | |
can reduce the risk of any chance combined with the data or covering only one aspect of the phenomenon that | |
can result when using one particular method. Further, the Cronbach’s alpha reliability coefficient was | |
calculated to determine the extent to which items in each subscale measure the same dimension of students’ | |
perceptions of assessment. | |
Analysis | |
Descriptive analyses address the first research question about students’ overall perceptions about assessment | |
at the university. Two separate statistical analyses were performed to answer the research hypotheses testing | |
whether there are statistical differences in student perceptions of assessment by academic discipline and | |
gender. The investigators performed one-way, between-groups ANOVA to examine whether the difference | |
between students’ perceptions of assessment was statistically significant based on colleges/disciplines. Next, | |
we conducted a t-test to analyze the difference in students’ perceptions of assessment based on gender. | |
To analyze the qualitative data, initially, the interviews were transcribed and translated into English. Next, the | |
authors organized the data, reviewed it for accuracy, and cross-checked the original translation to ensure the | |
meanings were consistent (Marshall & Rossman, 2016). Then, the authors applied accepted analysis practices | |
such as “immersion in the data, generating case summaries and possible categories and themes, coding the | |
data, offering interpretations through analytic memos, search for alternative understanding, and writing the | |
report” to analyze the data inductively (Marshall & Rossman, 2016, p. 217). We used NVivo 12 to code the | |
data, run queries, and observe overlaps/connections among themes. The process, overall, was very interactive | |
as the authors exchanged perspectives by writing analytic memos and reflections to draw connections between | |
the qualitative themes and to corroborate the quantitative results (Marshall & Rossman, 2016). In short, the | |
qualitative analysis focused on the meaningfulness of classroom assessment based on lived experiences | |
(Rossman & Rallis, 2016). | |
Results | |
Quantitative | |
The Cronbach alpha reliability coefficient for all items in SPAQ was α = 0.89, suggesting strong internal | |
consistency. Among the subcales within SPAQ, Transparency had the highest alpha reliability score of α = | |
0.75, and Congruence with Planned Learning had the lowest α = 0.64. The instrument reliability for subscales is consistent with previous research (see Dhindsa et al., 2007; Koul et al., 2006; Mussawy, 2009). Given | |
Higher Learning Research Communications | |
29 | |
Mussawy et al., 2021 | |
Open Access | |
that the alpha reliability results for the subscales of SPAQ were consistently above 0.63, according to Cortina | |
(1993), the use of SPAQ was considered reliable (See Table 1). | |
The descriptive statistics show mean scores ranging from M = 2.99 for the sub-scales Accommodation to | |
Student Diversity to M = 3.30 for Congruence with Planned Learning on a 4-point Likert scale (4 = strongly | |
agree—1 = strongly disagree). The high mean scores suggest that students have a very positive perception of | |
classroom assessment. Table 1 provides an illustration of sub-scales mean scores, standard deviations, and | |
Cronbach alpha reliability. | |
Table 1. Sub-Scale Mean, Standard Deviation, and Cronbach Alpha Reliability Coefficient for the SPAQ and | |
its Subscales | |
SPAQ Scales | |
Mean | |
St. Dev | |
Alpha Reliability | |
Congruence with planned learning | |
3.30 | |
.506 | |
.644 | |
Assessment authenticity | |
3.19 | |
.540 | |
.694 | |
Student consultation | |
3.09 | |
.690 | |
.732 | |
Assessment transparency | |
3.18 | |
.652 | |
.749 | |
Accommodation to student diversity | |
2.99 | |
.710 | |
.698 | |
Overall | |
3.16 | |
.484 | |
.898 | |
The descriptive statistics associated with students’ perceptions of classroom assessment across three colleges | |
are reported in Table 2. The results show that participants from the College of Humanities were associated | |
with the smallest mean value (M = 3.05, SD =.467); participants from the College of Education were | |
associated with the highest mean value (M = 3.28, SD = .499); and participants from the College of | |
Agriculture were in between (M = 3.19, SD = .397). A one-way, between-groups ANOVA was performed to test | |
the hypothesis that college was associated with perceptions of classroom assessment. The assumption of | |
homogeneity of variance was tested and satisfied based on Levene’s test, F(2, 350) = .59, p = .55. | |
Table 2. Average Scale-Item Mean, Average Item Standard Deviation, and Standard Error Results for | |
College Level Differences in SPAQ Overall Scores | |
95% Confidence Interval for | |
Mean | |
N | |
M | |
SD | |
Std. | |
Error | |
Lower | |
Bound | |
Upper | |
Bound | |
Education | |
142 | |
3.28 | |
.499 | |
.041 | |
3.20 | |
3.37 | |
Humanities | |
171 | |
3.05 | |
.467 | |
.035 | |
2.98 | |
3.12 | |
Agriculture | |
40 | |
3.19 | |
.397 | |
.062 | |
3.06 | |
3.31 | |
Total | |
353 | |
3.16 | |
.484 | |
.025 | |
3.11 | |
3.21 | |
Colleges | |
The independent between-groups ANOVA was statistically significant, F(2, 350) = 9.45, p = .000, η2 = .058. | |
Thus, the null hypothesis of no difference between the mean scores was rejected, and 5.8% of variance was | |
accounted for in the college group. To analyze the differences between the mean scores of the three colleges, | |
we used Fisher’s LSD post-hoc tests. The difference between students’ perceptions from the College of | |
Education and the College of Humanities was statistically significant across Congruence with Planned | |
Learning, Assessment Authenticity, Student Consultation, and Accommodation to Student Diversity | |
subscales. The difference between student perceptions from the Colleges of Education and Agriculture was | |
Higher Learning Research Communications | |
30 | |
Mussawy et al., 2021 | |
Open Access | |
only statistically significant for the Accommodations to Student Diversity subscale. Finally, the difference | |
between students’ perceptions of assessment from the Colleges of Agriculture and Humanities was not | |
statistically significant across all scales. See Table 3 for further information on means and probability values. | |
Table 3. Average Scale-Item Mean, Average Item Standard Deviation, and ANOVA Results for | |
College Differences in SPAQ Scale Scores | |
Education | |
Humanities | |
Agriculture | |
p values | |
M | |
SD | |
M | |
SD | |
M | |
SD | |
Education | |
versus | |
Humanities | |
CLP | |
3.38 | |
.461 | |
3.15 | |
.559 | |
3.41 | |
.516 | |
.003 | |
.800 | |
.030 | |
AA | |
3.26 | |
.602 | |
3.04 | |
.540 | |
3.20 | |
.571 | |
.005 | |
.494 | |
.260 | |
SC | |
3.32 | |
.604 | |
2.97 | |
.707 | |
3.22 | |
.457 | |
.000 | |
.220 | |
.108 | |
AT | |
3.25 | |
.686 | |
3.11 | |
.639 | |
3.19 | |
.558 | |
.059 | |
.816 | |
.325 | |
ASD | |
3.21 | |
.621 | |
2.85 | |
.706 | |
2.94 | |
.572 | |
.000 | |
.009 | |
.467 | |
Scale | |
Education | |
versus | |
Agriculture | |
Agriculture | |
versus | |
Humanities | |
Lastly, an independent sample t-test was performed to determine if the mean scores between male (N = 258) | |
and female (N = 95) students were statistically different. The assumption of homogeneity of variances was | |
tested and satisfied via Levene’s test, F(351) = .551, p = .458. The independent samples t-test was not | |
associated with a statistically significant effect, t(351) = -1.34, p = .17. This suggests that the difference | |
between students’ perceptions of assessment based on gender was not statistically significant, and the null | |
hypothesis was retained. | |
Qualitative Section | |
The qualitative results generated insights about important aspects of classroom assessment. Both students | |
and faculty commented that the existing classroom assessment policies and practices favor exams, which | |
center on summative assessment approaches. However, most faculty members reported that they implement | |
both formative and summative assessment. Three themes emerged from the interviews with faculty and | |
students: Improvement in pedagogy and assessment; student involvement in assessment processes; and | |
assessment forms versus the grading policy. Findings suggest that awareness about different forms of | |
assessment is high among the faculty. In addition, both students and faculty reported student involvement in | |
assessment processes at some level. Further, all participants highlighted the restriction of the grading policy | |
as an important challenge for faculty members to institutionalize alternative assessment approaches in | |
addition to the existing high stakes assessment and for students to buy into assessment activities that are not | |
tied to their grades. | |
Higher Learning Research Communications | |
31 | |
Mussawy et al., 2021 | |
Open Access | |
Improvements in Pedagogy and Assessment Skills | |
Most faculty indicated substantial growth in teaching and assessment competencies due to exposure to | |
modern pedagogies provided at the national and institutional levels. A faculty member explained that | |
universities in Afghanistan follow a cascade model of professional development for faculty. She added that the | |
university has a team of experts facilitating training sessions on “outcome-based learning” and “studentcentered instruction.” Another faculty member supported that the training sessions covered different | |
assessment approaches. He explained, “I feel confident facilitating student-driven lessons and developing | |
different assessment forms to assess my students.” Similarly, a junior faculty reported that she had learned | |
ways to create “individualized and collaborative assessment tasks.” For these participants, professional | |
development programs facilitated by the quality assurance office have increased their assessment literacy. | |
While the faculty participants noted improvements in their assessment skills, many students criticized them for | |
failing to design assessment tasks that matched individual student capabilities. “My classmates come from | |
different geographies where access to schools is limited. They have different learning abilities, but assignments | |
and exams are the same for everyone,” said a senior student from the College of Education. He added that not | |
everyone has the same learning style, suggesting that faculty members should pay attention to the individualized | |
needs of students. Nonetheless, students acknowledged assessment transparency and the recurrence of daily | |
assessment during instruction. A senior student described that their “exams consist of simple, medium, and | |
difficult questions.” Nevertheless, a few students were skeptical about merit-based assessment, noting that final | |
exams are sometimes politicized to promote one student group over another. While participants avoided | |
providing specific details, the recent example flags assessment ethics centered on “fairness and equity” as | |
teachers make judgments about student learning (Klenowski & Wyatt-Smith, 2014, p. 7). | |
Student Involvement in Assessment Processes | |
Most of the faculty members who participated in the interviews expressed reluctance to involve the student in | |
assessment tasks, particularly when grading a student’s work. Nevertheless, they were open to the idea of | |
having students review their peers’ work and provide constructive feedback. One faculty member stated that | |
he often encouraged students to make oral comments when their peers presented their projects, but he never | |
asked them to provide written feedback. Other faculty members also recalled instances when they worked | |
with students to solve a problem or discuss applying concepts and theories in practice. For these faculty, | |
assessment and teaching are “inseparable.” For instance, according to one faculty who was teaching writing | |
courses, providing opportunities for students to ask questions and reflect on the lesson was central to her | |
teaching philosophy. She went on to explain, “I usually provide lengthy feedback on students’ papers by | |
explaining the strengths, weaknesses, and ways to improve them.” The faculty member, nonetheless, | |
acknowledged that she had never shared her assessment rubric with students. | |
Student engagement in assessment tasks only occurred in informal settings. Students explained that the | |
faculty usually involve students in assessment when the subject requires them to conduct fieldwork and share | |
their findings with the class. More precisely, a junior student said, “When we present the findings of our | |
fieldwork, our classmates can ask questions or make comments about the presentation.” He went on to say | |
that a few faculty members had specific policies, for example, choosing referees among students to make | |
judgments about student presentations. Another junior stated, “I felt much empowered when it was my turn | |
to evaluate other students’ presentations one day.” He added, “I was a little nervous but so excited to serve as | |
a referee.” However, a few students complained about the purpose of peer assessment when there is no | |
guideline from the instructor. According to a sophomore, “The faculty members should establish the | |
grounding rules when they let students ask questions and assess the presentations. Some students ask difficult | |
questions to challenge their classmates.” While many of the students highlighted the importance of student | |
involvement in assessment processes, the last example informs the role of faculty members in managing | |
assessment. | |
Higher Learning Research Communications | |
32 | |
Mussawy et al., 2021 | |
Open Access | |
Given that classroom assessment occurs at different intervals, several faculty members complained about the | |
lack of student preparedness for post-secondary education. They criticized secondary schools for failing to | |
prepare students with adequate knowledge and skills to pursue undergraduate programs. For instance, a | |
faculty member who facilitated a freshman course on academic writing described her experience: “Students | |
barely know how to write. I had to revisit my course syllabus to meet their needs.” For this participant and | |
s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment