# Free Investigation of the Statistical Significance Essay Sample

B. Guinn (2006) has asked how teachers assess students in a way that reflects their ability and learning growth. Guinn also assumed that all a teacher has to do is administer several assessment tests for various topics and sub-topics over a period of time. In line with this question is the idea of scales used in grading.

Basically, there are two common scales used in grading, the 4-point and 100-point scales (Guinn, 2006). The 4-point grading technique is based on the grouping of smaller grading scales e.g. 3.0 and 5.0, which means narrowing of grading ranges within smaller units to track students’ performances. The 100-scaling technique is based on percentage. For example, each student’s larger grade is narrowed into the percentage score (the student’s score weighted out of 100).

Within the 4-point score are various maximum standards between 2.0 up to 10. However, various questions have been asked whether this various scaling levels have significant difference in grading students score. Are there statistically significant differences in grades given to student work depending on the scale used?

When teachers use different scaling, the reliability of the grades is in question. According to Borg (2004), the smaller the scale teachers use, the higher the inter-rater reliability. But what if, for example, Mr. Green uses 5.0 as a maximum scale, while Mr. Blue uses 3.0 scales? Would there be some significant differences? In Moscovici’s (2004), view, it’s important for teachers to agree on each point values consistency in the grading system.

Intuitively, the 4-point scale is quite appealing as it narrows down the difficult task of dealing with wide range of grading presented by the 100-point grading. However, it fails to reflect students’ performance rating regarding measurement topics (Moscovici, 2004). For example, just because a student scored 3.0 and 4.0 out of 5, respectively on different tests, does not mean he has increased his understanding of the topic being tested. In fact, studies have shown that the scores students are awarded on various tests are more dependent on who scores the tests and how they score it rather than on what the student knows (2009). Teachers use rubrics as a basic analysis of the 4-point scale system in order to come up with informed decisions, leading us to think that they are more subjective.

According to Yates (2009), we are basically interpreting the students’ works rather than analyzing the manner in which the numbers add up to a specific grade. Studies have suggested that students’ grades and achievements are dependent on the scale a teacher uses (Clark & Peterson, 2006). Boekartes’ (2006) concern is that grades can be lumped and converted into numbers as in many systems of education around the globe.

This study will rely on the quantitative research methods. Using this, the researcher finalizes analyzed data and support or eliminate hypothesis. One main advantage of this research method is that it is quite standardized and easy to interpret with more accuracy.

Purpose of the Study

The purpose of this study is to compare teachers’ grading using two different scales.

In many cases, education experts have criticized the concept of having lumping students’ performance within a narrow set of scales. Adding more questions to this gap is the fact that many studies address grading and scale related issues without touching on the significance of the differences between grading and scaling. This research will actively involve English teachers, who will be asked to grade a piece of students’ works on a 7 point grading scale and 10 point grading scale and record the results.

This study is expected to achieve a better view into the differences and similarities of the two grading scales. Considering these earlier studies, it is noted that the two grading scales interrelate to each other and their interrelation is based on the cause-effect correlation. For example, the use of scales in grading criteria is affected by the type of scale used.

From this study, the expected outcome would be based on certain parameters related to various aspects of grading techniques based on scales. First, it will assist teachers to develop enormous and clear and explicit knowledge of students’ performance at individual level. Secondly, students will also learn how to deduce their key performances and areas of their strengths and weaknesses. Thirdly, the use of continuous assessment is expected to justify a very important aspect of grading criteria, as many students would consider the amount of grade they receive from their teachers. The significance of this correlation is expected to be exposed further to explain the idea of scales vs. grades scored. Fourth, it adds up to the already existing evidence that students’ grades correlate to scales given.

Finally, comparison of the two grading scales will help us to gauge the students’ progress and informed planning of educational syllabus. This is particularly true for the on-going assessment where planning is paramount for both students and teachers. Together, the research study involves the use of basic concepts to understand theories associated with the nature of tests, grading criteria and scales applied. The final outcome is only meant to support or disapprove the degree of students’ grade and scales for grading.

There are different ways of measuring students’ achievements. The most common method of grading is assigning of letter grades, after marks scored are calculated and lumped within certain ranges. The theory of grading is based on the ability of a student to score certain level of marks in various tests. In many instances, students’ grades in different subjects are averaged, thereby leading to the creation of a grade point average (GPA), which is weighted on a continuous, interval scale ranging a probable low of 10 to a possible high of 14.33, in case the school gives grades of A+.

In his study, Boekartes (2006) believes that because grades are provided by teachers, there is a high possibility of overrated objective results. In other words, each teacher is likely to assign different grades to the same work. In a similar fashion, different teachers may assign different grades to two different students’ works that are of similar quality. However, some researchers believe that to ostensibly overcome such anomalies found in the subjectivity, a new form of assessment is normally introduced, known as standardized test, which involve similar questions given to various students and scores recorded in a scale for overall grading (Mustapha, Idris & Abdullah, 2005; Liu, 2008). Some researchers have argued that standardized tests for students’ ability by the teacher-assigned grades measure more or less the same thing, where bright students end up scoring higher in the tests and grades than their average counterparts.

Other researchers like Pepi, Faria & Alesi (2006), on the other hand, argue that standardized tests of ability and teacher-assigned grades normally measure different things. The difference is brought about by the fact that despite the difference in standardized tests that teachers may use, the final grading criteria is based on something more than marks scored. For example, many teachers will take factors such as students’ efforts, how creative a student is, and motivation the teacher receive when assigning the grades (Pepi, Faria & Alesi, 2006).

Measurement of correlation is the first approach towards gauging which method is more accurate (Liu, 2008). If the tests are strongly correlated, there’s a possibility that they are likely to be measuring more or less the same thing. However, weak correlation means the two parameters measure separate constructs. In basic sense, the strong correlation between grades and marks scored may reflect two main ideas: the level of intelligence and academic ability of individual students. But weak correlation may mean different constructs associated with the two variables. While this correlation may be of major significance, it is noted that some of the research and grading on student evaluation of teaching is highly dependent on individual student’s expectation, with more emphasis on the latter’s race as observed by Guinn (2006).

Some researchers have concentrated in the analysis of ongoing assessment, which is associated with feedback from teachers’ successive tests (Delceva, Adamcevska, Damovska, 2006). According to IEA (2005), feedback is one of the most important aspects of information acquisition, taking the students’ ability into account. However, the problem is that students only receive these feedbacks at the end of the session or term, when there’s no room for making corrections and sort out misunderstandings. It, therefore, means there is no room for revision of the grades given, even if the scores are corrected. In essence, the outcome is bound to be biased and non-reflective of the actual outcome. Furthermore, as Mustapha, Idris & Abdullah (2005) observe, the complexity is aggravated by increased use of automated marking and grading of students.

In many instances, researchers have confirmed that tests conducted continuously only represent the students’ understanding of the topic, or sometimes to the negative aspect, failure to conceptualize the concept. Yates (2009) on the other hand suggests that evaluation is a process that begins with making an assignment and concluding with a grade. This form of evaluation, however, takes the dimension of making assignment, with different schemes such as analytical scales, primary trait scores, holistic scores, rubrics and checklists. But the correctness of these scaling criteria is known to be dependent on the evaluation criteria and teachers judgment.

Livne & Wight (2007), on the other hand, take a different perspective of grading, infusing issues of technology in their argument. In their study, they established that consistent grading can be best achieved through the help of automated software for grading, which can prove to be more consistent and valid than the human grading.

In some instances, grading has been regarded as the end of learning process. That, assigning students grades is more like giving the student license to abandon anything in the paper and course content without doubt (Pepi, Faria & Alesi, 2006). In their study, the use of grading is associated with assigning students certificate of knowledge acquired, which unfortunately signals the end of learning.

The authors’ recommendation is that teachers learn to adopt portfolio testing as an alternative to large-scale writing tests. As if to discredit the scale methodologies for lack of consistent standards used by teachers, it is suggested that grading is a selective criterion, random grading, and blanket grading.

It is believed that grading mechanism is never equivalent across multicultural groups of our learning institutions and that scales are varied (Liu, 2008). It is further suggested that grading with the help of marks received is important in the aspects such as consistency and universality. As if that is not enough, many authors and scholars have insisted that grading is not appropriate for assessing skills of young school-going children, especially if it is connected to the literary studies for children below age of five. The multiple complex issues associated with these factors need to have an unending focus on critical issues of scaling (Liu, 2008). For instance, computer assisted grading is considered a faster means of giving students grades they deserve. However, computer assisted grading criteria does not tell us how those grades are calculated and scales used, thus the difficulty in knowing the scaling and minute aspects of grading. This disadvantage waters down the benefit of time saving resulting from this form of grading. At the same time, others have questioned the theoretical soundness of computer grading. For instance, most of the computer programs rely very much on error analysis in order to come up with computational scores of writing results, which also delivers individualized outcome in grading. This exposes the individual student’s own weaknesses as well as strength.

Hypothesis

The use of point scale in grading of learners has added more elements of learning and areas to be studied. At present, data on the differences of the grading scales are available for use by the interested parties. Various researches have been developed to investigate various aspects of scale grading and its significance (Liu, 2008). However, little has been done to investigate degree of significance brought about by the scales used, yet teachers have been known to employ scale grading in a non-uniform manner. For example, teacher A may use 5.0-scale while teacher B may use 10.0-scale grading. It’s hypothesized that scales used has statistical significance on the teacher grading criteria.