Standardization of Stanford Binet Intelligence Scale

Children differ qualitatively from their peers in respect to their intellectual abilities.  These qualitative differences may influence a child’s subsequent independence in his/her life as well as family and society. But it is unfortunate for those parents whose expectations and hopes are shattered by the birth of children at high risk and/or children with developmental delays. Psychologists and educators are systematically utilizing scientific methods to measure individual differences among people. Bangladesh has a dreadful need to improve and update the standard of existing assessment condition which is an integral part of instruction, as it determines whether or not the goals of education are being met. The study aims to standardize the Stanford-Binet Intelligence Scale (Fifth Edition) in Bangla for use in urban Bangladesh in order to fill up the gap in the psychometric sector. Hence, the research was designed to complete the criterion for standardizing the psychological ability test. Thus, the present research was conducted in four steps (item analysis, norm development, reliability and validity) as a part of standardization process of an intelligence scale. For the calculation of norm, the study has considered students from six divisional metropolitan cities to represent Bangladesh. After translating the original SB5 into Bangla, item analysis, as a first step of standardization process, was carried out through SB5 tool kit among the 330 students of 11 age levels (6-16 years) to scrutinize the strengths and weaknesses of the test items. In order to retain the original theme, the items were replaced with native content/symbol or object, made the items culture friendly, and often retranslated the questions for better understanding of the students. The overall reliability coefficient (α=0.84) suggests that there is high and increasing correlation among the items. Based on the raw scores obtained from the ten subtests, age norm was calculated separately for the 11 age groups. The norms were developed on 3300 students from the raw scores that were obtained in the record form. Their raw scores were ranked in 19 scores group and then the Full Scale Intelligence Quotient (FSIQ) was constructed from the ranks.  The IQ ranges of SB5-BD for age norm of 6 to 16 years children are 86 to 152. Test-retest reliability was constructed based on the scores obtained twice with the same instrument on the same individual with one week of time interval on 330 students. Test statistics suggests that there was no significant difference between the IQ obtained in the first week and again second administration that was obtained a week later. As a measure of reliability, the correlation coefficient between the first and second administration of the tests were 72%, 76%, and 75% for Non verbal IQ, Verbal IQ and Full Scale IQ respectively. In order to examine the criterion related validity SB5-BD and WISC-R (Bangla Version) were administered on the same participants. The study considered 90 students from three age groups. Findings reveal from the descriptive statistics that there were significant similarities between the IQ scores obtained by the two test administration. To find out the differences in IQ for test validity, the SB5-BD was administered on normal and students with special needs. Result indicates a low mean and standard deviation among students with special needs. The P value suggests that there is statistically significant difference between the IQ obtained by normal and students with special needs. Finally, the study extensively accomplished the four steps and standardization process successfully completed. Through this study, the standardized SB5-BD is regarded as the renovative and contemporary assessment scale in the field of psychometric testing for 6 to 16 years children in urban Bangladesh and hope; it will accelerate all the stagnant issues related to the benefit of human kind, above all for the children with special needs.

The supreme adaptive resource of human being is his intelligence – his superior intellectual ability for wisdom, interpretation and prediction. By the blessings of this resource, as a species, he dominates on many facets of his environment and establishes his superiority over other associates of the living kingdom.  Besides, this supremacy may be flattened and ruin the individual’s spontaneous autonomy if   an individual is born with or acquire developmental delays. The pursuit of an efficient and accurate way to identify and compare this ability in individual is an ongoing trends and its consequence in the field of education and development is apparent and undeniable. Thus, scholars of the earlier period explored intelligence to categorize the individual differences and their abilities but there were variations among the experts in defining intelligence in a single concept.

Hence, along with the above perspective, the present study would be considered as the Hallmark reformation in the field of educational development in Bangladesh. Consequently, the study is an efficient and renovative effort   to establish a yardstick for the benefit of human kind, above all children with special needs. This chapter  briefly presents five  issues: firstly, it states the understanding the concepts of intelligence and individual differences ; describes the psychometric  tests,  secondly, it explains the necessity   of assessing intelligence,  thirdly, the chapter  highlights the psychometric and  contemporary device for intelligence test,  fourthly, it  portrays the  present testing  and disability scenario: international and Bangladesh  perspective,  finally , this chapter   depicts the  rationale  and objectives of the study.

Understanding the Concepts– Intelligence and Individual Differences

Intelligence is versatile and often changed notion with referred to as Intelligence Quotient (IQ), cognitive functioning, intellectual ability, and aptitude, thinking skills, general ability and intellectual development (Logsdon, 2011). These multifaceted terminologies are being used throughout the study to comprehend the unique criteria of intelligence. Everyone assumes that he or she knows intelligent performance when it is observed, but when it is tried to define, the ambiguity of the trait becomes apparent (Daniels, Devlin & Roeder, 1997). With various common consents of researchers’, numerous definitions of intelligence have been proposed before the twentieth century.  Besides, various approaches to human intelligence also have been adopted of which few have been explained to validate the present research study.

The unitary concept of general ability or intelligence emerged from the definitions of Binet and Spearman. In their studies, they created a statistical technique called factor analysis to explore their approach. From the studies, they were able to report that about half of the variance in tests of mental ability was due to the general factor (Kaplan & Sacuzzo, 2001). This general or global intelligence is commonly referred to by the single italicized letter, g (Spearman, 1927).  An alternative conception of intelligence is that cognitive capacities within individuals are a manifestation of a general component, or general intelligence factor, as well as cognitive capacity specific to a given domain such as reading, mathematics and writing (Miller, 1991). Even though at present intelligence is viewed in a multidimensional concepts as emotional, multiple, social, artificial intelligence etc. In this study, the author intends to utilize the general intelligence as a global perspective to justify an individual’s intellectual capabilities that influence his /her overall developmental condition particularly   academic and social performance.

The concept of individual differences was gaining popularity around the world at the same time as Binet’s work, spurred by the movement towards universal compulsory education in many countries. At the time, many psychologists were addressing the problem of how to identify children who would have success in education (Thorndike, 1990). Thinking on the same aspect on intelligence, the pioneer of intelligence testing, Binet (1905) reflected the opinion that “In intelligence there is a fundamental faculty, the alteration or the lack of which, is of the utmost importance for practical life. This faculty is judgment, otherwise called good sense, practical sense, initiative, the faculty of adapting one’s self to circumstances.” A heightened focus on defining and assessing intelligence began in the 1800’s as part of attempts to classify between various levels of mental retardation and mental illness using psychological tests (Anastasi & Urbina, 1997). Viewed broadly, the scientific and professional organization, the American Psychological Association (APA, 1996) defines intelligence with the concept that “Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought.”  These definitions seemed to have an orientation to academic learning and performance along with emphasis on abilities that are valued by one’s culture. As cultural differences play a vital role in forming an individual’s life style, it is essential to assess how different cultures make sense of the world in terms of the meanings that represent the mind and within which the concept of intelligence is defined (Bouchard & Segal, 1985).

Therefore, at present the most acceptable definition of this concept is “Intelligence is not a single, unitary ability, but rather a composite of several functions. The term denotes that combination of abilities required for survival and advancement within a particular culture” (Anastasi, 1992; 1997). Thus, more recent definitions have been moving toward practical definitions with a view as to how the person functions in the real world as well as in traditional academic settings (Wagner, 2000). Aspects of the definition that seem to have wide appeal include learning speed, adaptability and ability to perform in the society successfully.

Hence, research in intelligence is active as well as robust, and this study investigates the power of intelligence related to educational, social learning and performance of both normal and   children with special needs. Further, a great number of researches still have been conducted through various ways, using many theoretical viewpoints and establishing a variety of results to define and measure intelligence throughout the year.  Despite the variety of terms of intelligence, the most influential approach to understanding intelligence is based on psychometric testing.  In fact, the technical term for the science behind psychological testing is psychometrics (Neisser, Boodoo, Bouchard, Boykin, Brody & Ceci, 1996).

Psychometric Tests

Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes and personality traits. The field is primarily concerned with the study of differences between individuals. It involves two major research tasks, namely: (i) the construction of instruments and procedures for measurement; and (ii) the development and refinement of theoretical approaches to measurement. (Kline, 1999).

The first psychometric instruments were designed to measure the concept of intelligence. The best known historical approach involves the Stanford –Binet intelligence scale, developed originally by the French Psychologist Alfred Binet. Contrary to a fairly widespread misconception, there is no compelling evidence that it is possible to measure innate intelligence through such instruments, in the sense of an innate learning capacity unaffected by experience, nor was this the original intention when they were developed.

Similarly, psychological testing is a field characterized by the use of samples of behavior in order to assess psychological construct(s), such as cognitive and emotional functioning, about a given individual. The burning issue at present in the field of psychology is the assessment (referred to as test, evaluation, measurement, scale, battery etc.) of an individual’s behavioral characteristics (e.g. ability of intelligence, emotional functioning, interests or attitudes, aptitude, normal, abnormal personality and achievement) through psychological tests. Psychological assessment is also referred to as psychological testing, or performing a psychological battery on a person. This is also a process of testing that uses a combination of techniques to help arrive at some hypotheses about a person and their behavior, intelligence, personality and capabilities (Framingham, 2011). Assessment can range from the formal–standardized to the informal–teacher made assessments. Standardized tests are usually considered as formal tests. These are developed by testing organizations and administered in   clinics and class room settings and scored in a consistent manner. In this aspect, the test scores are interpreted with regards to a norm or criterion, or occasionally both. The norm is established independently, or by statistical analysis of a large number of participants (Mellenbergh, 2008). There are several categories of psychological test, such as achievement test, aptitude tests, intelligence tests, neuropsychological tests, occupational tests, personality tests etc (Charles, 1996).

Table 1

Several Categories of Psychological Tests (At a Glance)

Test name Setting /Used in What Measure Example
Achievement test EducationalAchieved knowledgeGeneral Certificate of Secondary Education (GCSE)Test of English as a Foreign Language (TOEFL)
Aptitude test EmploymentAptitudeScholastic Aptitude Test (SAT)
Intelligence testClinic / SchoolPotential/ IntelligenceWISC-R, SB5
Neuropsychological ClinicDeficits in cognitive functioningCambridge Neuropsychological Test Automated Battery (CANTAB)
Occupational School / OfficeInterest in careerOccupational Interest Profile
Personality ForensicPersonalityMinnesota Multiphasic Personality Inventory (MMPI)

These psychological tests are often discussed in terms of the dimensions as they measure. They refer to these as dimensions because they are broader than a single attribute or trait level. Often these types of tests measure various personal attributes or traits. (Hersen, 2003). Professionals refer to these tests in various ways. Sometimes they refer to them as tests of maximal performance, behavior observation tests, or self-report tests. Sometimes professionals refer to tests as being standardized or non-standardized, objective or projective. Other times they refer to tests based on what the tests measure. (Rasch, 1980:1960).  Even though, from above among the various psychological tests, the study focuses only on a standardized norm-referenced intelligence test for assessing the intellectual ability of an individual. The educational need and advanced educational programs for identifying and classifying children with limited intellectual abilities and gifted learners has been an important force in the development of psychological tests. These tests also play an especially important role in special education. They can be useful for identifying an expected level of academic performance and also in helping school professionals design Individual Education Plan (IEP) for students with special needs (Sattler, 2001). Thus, the testing movement is the consequence of a need to determine the intellectual, sensory, and behavioral (personality) characteristics in individuals and hence, intelligence as a significant factor could only be established until a person’s ability is assessed.

The Necessity of Assessing Intelligence

Assessing intelligence is a complex process but has become an established practice in psychological testing because of its potential effects on individuals’ lives. Measures of a child’s intellectual abilities are considered one part of what is referred to as the ‘Fours Pillars of Assessment’. Along with behavioral observations , interview and informal assessment, intelligence testing provides an assessor with information into a child’s overall level of functioning , as well as specific abilities (Sattler, 1992).  However, intelligence tests provide information about a child’s abilities in two main ways that the above stated other methods do not. Firstly, it provides a standardized or norm referenced framework. Secondly, aptitude test has been found to be correlated with performance in both school and work environments (Sattler, 1992, Anastasi & Urbina, 1997).

Children differ qualitatively from their peers in respect to their intellectual abilities.  Besides, these qualitative differences may influence a child’s subsequent independence in his/her life as well as family and community. But it is unfortunate for those parents whose expectations and hopes are shattered by the birth of children who are at risk or children with developmental delays.  It is no secret that the number of children with special needs has dramatically increased in the past decade worldwide (Reschly, Tilly & Grimes, 1999). Therefore, comparisons between individuals, as well as intra-individual performances can be made for the purpose of placement or identifying special education needs using these tests.  According to Diagnostic and Statistical Manual of Mental Disorders Fourth Edition (Text Revision) and American   Psychiatric Association (APA), the aim of assessment is to gain insight into an individual that will aide in the decision making process with regard to screening, problem solving, diagnosis, therapy, rehabilitation, progress evaluation and to gauge the necessity for a complete battery (DSM-IV-TR & APA, 2000). Measuring intelligence is based on the fact that children become more capable mentally as they advance in age. The upper limit is reached in adolescence. Intelligence tests show that intellectual growth is rapid in infancy, moderate in childhood, and slows down in youth (Cahan &Cohen, 1989).

Thus a prerequisite criterion for the placement of such children either in mainstream or special school is to quantify their intellectual level that necessitates the measurement of intelligence through intelligence scale in accordance with their age, and sex. (Neisser, 1998). This comprehensive assessment will assist a professional to justify a child’s strength and weakness to overcome his delays. Accordingly, the goal of this   research was not to categorize children with a single score but to pinpoint a child’s intellectual level along with other multidimensional factors such as age, sex, culture.  Most significantly, Binet had the similar notion to identify children in the schools who required special educational needs. His intention was not to use IQ scores as a general device for ranking all children according to intellectual ability (Binet & Simon, 1905). Binet’s scale had a profound impact on educational development throughout the world. However, in spite of its constraints, the educators and psychologists utilized the scale worldwide with its actual value.

Based on the above pragmatic demands it can be traced that assessing intelligence among other individual traits has created an outstanding platform that depicts a person’s general level of intellectual capability, which is significant for the life of a human being. Moreover, the success of educational system in advanced countries has been owing to the development and utilization of standardized psychological testing of abilities of students. In this aspect, psychologists and educators are systematically updating and standardizing various psychometric and contemporary tests for the last century to measure individual differences among people.

The Psychometric and Contemporary Device for Assessing Intelligence

Ever since Alfred Binet’s great success in devising test to distinguish intellectually challenged children (terminologies used earlier were idiot, moron, imbecile, mentally retarded, mentally handicap, and intellectually disabled, intellectual impairment) from those with behavioral problems, psychometric instruments have played an important part in European and American life. Standardized tests are commonly used for historic, regulatory and practical reasons. A variety of historical trends, actual strengths, educational policies and commonly offered arguments justify the use of standardized tests. Tests are used for many purposes, such as selection, diagnosis and evaluation. Many of the most widely used tests are not intended to measure intelligence itself but closely related  to construct scholastic aptitude, school achievement and  specific abilities etc. Such tests are especially important for selection, decision and placement purposes (Flanagan, Genshaft & Harrison, 1997). Besides, standardized tests have been historically promoted as “objective” in the sense that the examiner’s biases would not influence the results (Domino, 2000). Moreover, psychologists, clinicians are routinely and traditionally trained in administering standardized tests due to the historic belief that standardized assessment is better because they are more formal and objective than other kinds of assessment, which are often named as “informal,” implying “less objective.” (Anastasi & Urbina, 1997). Therefore, selecting the most appropriate test for a given child or situation can be a challenging task.

A review of the last 10 years of Mental Measurements Yearbooks (MMY) indicates an increase in the number of intelligence tests that can be used for young children. A few well known individually administered intelligence tests are as follows: Stanford-Binet Intelligence Scales, Fifth Edition (SB5) (Roid,2003),Wechsler Intelligence Scale for Children – Fourth Edition   (WISC-IV)  (Wechsler, 2004),  Slosson Full-Range Intelligence Test (S-FRIT) (Algozzine, Eaves, Mann & Vance, 1993), Kaufman Brief Intelligence Test (K-BIT)  (Kaufman & Kaufman, 1993) and Woodcock-Johnson III Tests of Cognitive Abilities (WJ III COG) ( Woodcock, McGrew & Mather, 2001), Reynolds Intellectual Assessment Scales (RIAS) (Reynolds, 2003). These tests are being used in evaluating intelligence and /or cognitive abilities in schools as well as assessment centre for identification purposes. In addition to this, the tests are developed for norm on large sample sizes and justify the age appropriate intellectual ability (Chang, 2008).

Researchers have different opinions on using these tests for assessment purposes. Along with varied opinions on the use of tests,  the experts’ have come to a common consents and supports that the Stanford-Binet Intelligence Scale, Fifth Edition  is a sole contemporary device with a rich tradition since its inception in 1905 till date. Through various editions, this assessment scale is being used throughout the world.   Other strengths of SB5 include its appealing materials and cognitively appropriate tasks. Besides, psychometric properties of the test at the school age, and its comprehensive subtests are considered as other strengths to find out children’s intellectual development in both verbal and nonverbal domains (Ford & Dahinten, 2005). Bracken and Nagle (2007) also suggested the use of the SB5 to assess the cognitive abilities of children as young as school age due to its superior psychometric and qualitative characteristics. Based on its popularity, usability and standard for intelligence measurement, SB5 is acknowledged and considered as the paramount instrument to serve the purpose of the present research. It is to be mentioned that the American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], (1999) have highly recommended the use of SB5 as the Standards for Educational and Psychological Testing. Though several psychological tests have received prominence, many current innovations were derived only from the Binet-Simon scale. With regard to the current standard for educational and psychological testing, the SB5 has earned a leading position in the field of intellectual assessment.  This scale is an individually administered assessment of intelligence and cognitive abilities. The Stanford-Binet Intelligence Scales, Fifth Edition (SB5), a direct descendent of Terman’s adaptation of the Binet test developed more than 100 years ago, is used in the educational setting. The SB5 is comprised of five composite factors representing two domains as nonverbal and verbal each having five  testlets with a total of ten subtests (Roid, 2003) (reviewed and discussed in the chapter two and three ) .

Present Testing and Disability Scenario: International and Bangladesh Perspective 

Formal and systematic measurement of intelligence, begun with the French psychologists Binet and Simon at the beginning of the 20th century, heralded the modern era of psychological testing. In subsequent years, tests to measure aptitude, personality and educational achievement were developed. The need to assess various abilities of a large number of army recruits at the beginning of World War I in 1917 gave a significant boost to psychological testing (Gregory, 2007). In the 21st century, psychological testing is a big trade in developed countries especially in America. There are thousands of commercially available, standardized psychological tests as well as thousands of unpublished tests. Approximately 20 million Americans per year were taking psychological tests (Goldman & Saunders, 1995). Today, psychological testing is a part of the American culture. Psychological tests are in use everywhere. The previous tests are regularly used in the school system as tools in making placement decisions. Current research provides information that supports the relationship between achievement and intelligence tests. One of the most significant and controversial uses of psychological testing in the 21st century has been a result of the ‘No Child Left Behind Act’ of 2001 (NCLB Act). The NCLB Act contains the strategies for improving the performance of schools—strategies that were intended to change the culture of America’s schools by defining a school’s success in terms of the achievement of its students (U.S. Department of Education, 2004). While tests have always played a critical role in the assessment of student achievement, the NCLB Act requires that students be tested more often and relies on test scores to make more important decisions than in the past. On the contrary, education reform in the United States since the late 1980s has been largely driven by the setting of academic standards for what students should learn and be able to do. These standards can then be used to guide all other system components. The standards-based reform movement describes for clear, measurable standards for all school students. Expectations are raised for all students’ performance. Along with norm-referenced rankings, the performance of all students is expected to be raised. Curriculum, assessments, and professional development are aligned to the standards.

Standards-based school reform has become a predominant issue facing public schools. (Popham, 1999).  Besides, the largest Flynn effects appear instead on highly g-loaded tests such as Raven’s Progressive Matrices. This test is very popular in Europe; Raven’s test plays a central role in recent analyses of the worldwide rise in test scores. (Flynn, 2007).  Hence, the Flynn effect is coming to an end, at least in Western Europe. Recent studies in

Scandinavia show intelligence test scores plateauing and arithmetic scores dropping. Far from being surprised, Flynn has been expecting as much. Since the social condition varies from country to country, it is significant to underpin the context of the diverse world (Flynn, 2007, & Collingwood, 2008).

In relation to the worldwide present scenario of psychological and other testing, Bangladesh is still left behind in the testing pathways. Until recently, most commonly cited disability prevalence rate has been the World Health Organization (WHO), which estimates that approximately 10% of the world’s population suffers from disabilities. In Bangladesh context that estimation would interpret as approximately 15 million people with disabilities based on 15th March, 2011 census. Action Aid Bangladesh based on 5 locations of 4 districts cited that approximately 12 million people (14% of the total population) require some form of immediate service due to disability related issues. (Action Aid Bangladesh, 1996).  However, lack of quality data about those with disabilities makes addressing their needs difficult. Besides, according to ICDDR, B and core donor AusAID, “Unless international development programmes are inclusive of and accessible to persons with disabilities, achieving the UN Millennium Development Goal (MDG) is not possible”. In assistance with University of Melbourne in 2009, ICDDR, B developed a Rapid Assessment of Disability (RAD) toolkit for use by governments, NGOs and other organizations. This toolkit is easy-to-use, comprehensive way to measure disability prevalence, quality of life, social participation, access to and effectiveness of related development programs. The toolkit contains a four-part questionnaire in collaboration with Australian and Bangladeshi disability organizations and service providers. (Keeffe, Baker, Booth, Goujon, Edmonds, Huq & Quaiyum, 2011). On the other hand, the WHO has designed a set of Disability Assessment Schedules (known as the WHO-DAS) which have a long series of activity and participation based questions.  Moreover, since the formal or mainstream schools run by the  government,  do not have overall disability programmes or activities at all, very few NGOs are being set to provide the programmes of identification, assessment, placement and decision making for leveling the degrees and type of the disabilities (Choudhuri, Alam, Hasan & Rashida , 2005). Thus with the above discussion till to date assessment plays a central element in the overall quality of teaching and learning in education.    It also serves for the purposes of occupational prognosis, for clinical diagnosis, as well as psychological research and theorizing (Devlin, Feinberg, Resnick, & Roeder, 1997; Herrnstein & Murray, 1994). At the end of 19th century a few psychologists and educators have taken the initiative to standardize and develop non-standardized   need based assessment scales which at present is outdated with time. Therefore, no disability prevalence data, the absence of reliable and consistent data on the magnitude and educational status of children with disabilities makes it difficult for educators, policy-makers and programmers to understand the nature of the problem and identify possible solutions.

Rationale of the Study

Appropriate stimulation in childhood occupies one of the most important platforms that influence normal development. Likewise, children use different modes in making sense of their experience and the world around them. They also acquire set of standard norms, knowledge, skills and attitude which the society demands for their existence. In this context, education (also called learning, teaching or schooling) in the universal sense is any act or experience that has a formative effect on the intelligence, character or physical ability of an individual. In its practical sense, education is the process by which society deliberately transmits its construct ability, knowledge, skills and values from one generation to another.

Globally, the enactment of legal issues related to compulsory and quality education would ensure a positive and desirable change in all aspects of an individual’s development. Based on the philosophy of Public Law 107-110 (2001), No Child Left Behind (NCLB) is a comprehensive plan in USA to reform schools, change school culture, empower parents and improve education for all children as well as improve instruction in high-poverty schools. Further the law ensures that poor and minority children also have the same opportunity as other children to meet the challenging academic standards. This law has brought sweeping changes to education across the world.  Moreover, the recent implementation of the No Child Left Behind Act (NCLBA, 2002); the government of the United States mandated that all school-age children be tested for educational progress. In order to execute the mandate of NCLBA, along with the assessment provision, the need for translation and adaptation of test would eventually lead to assess student from multicultural and multilingual context (Allalouf, 2003 & Chang, 1999; Mathews, 2003).

Similarly, with the growing interest in cross-cultural research and evaluation, the interest in testing is not limited only in education but also in other fields. Such as psychological, vocational, career planning, selection and international comparative studies. The result of this interest is a boon for psychometrically equivalent, multi-lingual versions of assessment instruments.  With the increasing demand for the use of psychological tests in various cultures and countries, the need for translation and adaptation of the test is of   main concern.  It is also apparent that the test adaptation is appropriate and significant.

In order to re-affirming the vision of Education For All (EFA), it is stated in the World Declaration made at Jomtien (1990) as: “All children, young people and adults have the human right to benefit from an education that will meet their basic learning needs in the best and fullest sense of the term” . With a view to ensure quality education as a human right , assessment should be considered as an  important prerequisite to determine a student’s ability.  It will enable the teachers to gear up and tap each individual’s talents and potentialities, so that they can benefit from education and improve their lives and transform to their societies. In accordance with the international commitments and legal acts, Bangladesh government has   taken a positive initiative through the National Education Policy 2010 by Ministry of Education. This policy has highlighted the improvement of education system by including students with special needs in mainstream schools.  It is unfortunate that in order to maintain the standard of the education system, the policy has not given any emphasis on screening and assessment of students’ intellectual ability.  It should be mentioned that, according to United Nations Development Programme (UNDP) report in 2011, the ranking status of Bangladesh for literacy is 163 and literacy rate is 55.9 %. 

In advanced countries the decision for placement of children in regular classroom or special classes is prioritized through a standardized comprehensive individual assessment of the children’s needs. The use of such psychometric tests also facilitate teachers in educational planning by providing approach to determine possible teaching learning strategies, which is  regarded as a major initiative in order to ensure the goals for achieving education for all. Similar to many other low income countries, at present in Bangladesh, there have been no attempts to conduct regular national disability prevalence survey by the national statistical agency, Bangladesh Bureau of Statistics (BBS). The evolution of educational systems for children with special needs started from the introduction of special education in low income country like Bangladesh a long time ago. Over the time, the concept of disability as a social issue rather than a medical issue has become more understood and therefore the concepts of education systems also have been changed and developed towards as an integrated system and more recently an inclusive system, in accordance with local socio-economic and cultural conditions (Choudhuri et al., 2005).

The study Educating Children in Difficult Circumstances states that 8% of children with disabilities in Bangladesh are currently enrolled in various educational institutions (ESTEEM, 2002). Of these, 55% had physical disabilities, 13% were visually impaired, 12% were hearing and speech impaired and 10% had intellectual disabilities. About 68% of enrolled children with disabilities were in government and private primary schools and 15% were in pre-primary educational settings. About 48% were seeking formal education, 23% were in integrated schools, 15% in special education and 5% in inclusive education. Among enrolled children with mild and moderate disabilities, 79% are enrolled in formal educational settings. Of those with severe and profound disabilities, 83% were enrolled in special education. Nearly, 74% of those who are currently not enrolled in any form of education expressed a keen interest in receiving education (ESTEEM, 2002).

These educational systems are being practiced for children with special needs with few numbers. Likewise, the government’s Department of Social Services (DSS) is operating 5 special schools for children with visual impairment, 7 for children with hearing impairment, 1 for children with intellectual disability. The DSS is also operating a total of 64 integrated schools for blind children in 64 districts. NGOs are operating many special and inclusive education centers but there is no reliable data available on the number of schools that they operating (Choudhuri, et al., 2005).  Although school enrolment (80%) is increasing at a fast rate, but the enrolment of children with disabilities is extremely low. Children with disabilities are often marginalized in mainstream schools as a result of negative attitudes towards them. A lack of child-centered approaches in education and the physical inaccessibility of schools are other reasons for low enrolment. In addition, some children with special needs are being enrolled into the mainstream education system by default. Some of them transferred from integrated and or special education systems (primarily visual impaired students) while a few make their way to the mainstream education system directly due to self- initiative and interest.  Moreover, there are more than a million primary school-age children with assorted disabilities and disadvantages, but without access to basic education. The major shortcomings are due to the lack of educational reformation, improper implementation of the existing education policy and ignorance of parents.  Besides, other barriers of failure in schools and low standard of achievement are due to the lack of proper assessment; counseling and guidance are not offered to students and parents before and during the tenure of their education. Similarly, the high rate of dropout after being enrolled is due to improper use of teaching learning strategies as well as other educational provisions. Even, examination or evaluation system is not suitable for these students. Lack of support systems like; IEP (Individual Education Plan) or provision of extra sessions to cope with the mainstream curriculum is remarkable (Choudhuri et al., 2005). Besides, lack of proper assessments of a student’s intellectual capability also plays a significant role in classroom performance as well as to hold on to the retention of students to avail school completion certificates. As a result, the necessity to standardize an appropriate and up to date assessment scale has become essential to mitigate the problem of disability prevalence and the present status of quality education for students with special needs and other marginalized population.

The Objectives of the Study

The study aimed to standardize the Stanford-Binet Intelligence Scale (Fifth Edition, 2003) in Bangla  for use in Bangladesh. However, the specific objectives, as a part of standardization process are stated as follows:

  • To translate and adapt the ten subtests of Stanford-Binet Intelligence Scale for children aged 6 to 16 years.
  • To determine the reliability and validity of the adapted versions.
  • To develop the norm for Bangladeshi children aged 6 to 16 years.

Following the description and importance of assessing the intellectual ability of an individual in this chapter, chapter two will discuss the literature review compiling the historical studies on intelligence and its assessment along with international and national perspectives on standardization of SB5. Besides, chapter three will describe the methods and methodology involved in standardizing the test. Whereas, chapter four will analyze the results found for the study in Bangladesh. Moreover, chapter five will explain the rationale and justification of the research. Finally, the conclusion and implication and further recommendations for the study will be discussed in chapter six followed by the limitations of the study in the field.

Literature Review

The chapter focuses on the review of psychometric tests, historical studies on intelligence, its assessment and historical perspectives on intelligence test development, history of the Stanford-Binet and its various editions. This chapter also covers the overview of international and national perspectives on standardization of Stanford-Binet Intelligence Scale, description of standardization process and cross cultural assessment. Moreover, this literature   review is an approach to enter into the related field of knowledge and offers an opportunity to enhance the understanding for the accomplishment of a quality study.

Prior to the contributions of many theoretical and practicing psychologists in the early nineteen hundreds, the concept of intelligence as it is understood worldwide today was unknown. Thus, the change in focus began unfolding. From its initial pre-scientific and philosophical roots, the study of intelligence changed drastically (Meloff, 1987).

Review of Psychometric Tests

By the end of the 19th century, people attending scientific or industrial expositions were taking various tests that assessed their sensory and motor skills, the result of which were compared against norms (Anastasi & Urbina, 1997). One active area in the scientific research is the tests of psychological characteristics most commonly,   intelligence themselves. Intelligence and the ability to assess it, is considered as an important concept in relation to academic settings. Although many claim that intelligence is defined by what intelligence tests measure, many other theorists and researchers argue that this definition is too circular and narrow. Moreover, scores on intelligence tests are designed to reflect the definitions of intelligence rather than serve as an exact and unqualified representation of intellectual ability (Gardner, Kornhaber & Wake, 1996).  Nevertheless, IQ tests are useful tools for various purposes. Moreover, psychometrics is applied widely in educational assessment to measure abilities in domains such as reading, writing, and mathematics. The main approaches in applying tests in these domains have been Classical Test Theory and the more modern Item Response Theory (IRT) and Rasch measurement models (Kline, 1999). Such approaches provide powerful information regarding the nature of developmental growth within various domains.

Besides, college entrance exams, classroom tests, structured interviews, assessment centers, and driving tests are also psychological tests. On the other ways, many popular psychological testing reference books also classify tests by subject. For example, the Seventeenth Mental Measurements Yearbook  (Geisinger, Spies, Carlson, & Plake, 2007) classifies thousands of tests into 19 major subject categories  like as   Achievement, Behavior assessment,  Developmental,  Education,  English, Fine arts, Foreign languages,  Intelligence,  Mathematics,  Miscellaneous (for example, courtship and marriage, driving and safety education, etiquette),  Multiaptitude batteries, Personality, Neuropsychological, Reading, Science,  Sensor motor  (Thompson, 2003). Although some are more typical, all meet the definition of a psychological test. Together, they convey the very different purposes of psychological tests. In the following figure, a continuum of some of the most and least commonly recognized types of psychological tests are shown (Chun, Cobb, & French, 1975).

Historical Studies on Intelligence and Its Assessment

During the era of psychometrics, intelligence was thought to be a single, inherit entity. The human mind was believed by some to be a “blank slate” that could be educated and trained to learn anything if taught in the appropriate manner (Sternberg, 2000). However, contrary to this notion, an increasing number of researchers and psychologists now believe that the opposite is true; that is, individuals are born with and possess different levels of ability. The development and use of intelligence tests have been one way that researchers and psychologists have attempted to support their argument.  While intelligence is one of the most talked about subjects within psychology, there is no standard definition of what exactly constitutes ‘intelligence.’ Some researchers have suggested that intelligence is a single, general ability; while other believe that intelligence encompasses a range of aptitudes, skills and talents (Horn & Noll, 1994). The following are some of the major theories of intelligence that have emerged during the last 100 years.

Charles Spearman – General Intelligence.

British psychologist Spearman (1904) described a concept and   referred to a general intelligence, or the g factor. After using a technique known as factor analysis to examine a number of mental aptitude tests, Spearman explained that scores on these tests were remarkably similar. People who performed well on one cognitive test tended to perform well on other tests, while those who scored badly on one test tended to score badly on others. He concluded that intelligence is general cognitive ability that could be measured and numerically expressed.

Louis L. Thurstone – Primary Mental Abilities.

Psychologist Thurstone (1938) offered a differing theory of intelligence. Instead of viewing intelligence as a single, general ability, Thurstone’s theory focused on seven different “primary mental abilities.” The abilities that he described were: verbal comprehension, reasoning, perceptual speed, numerical ability, word fluency, associative memory, spatial visualization.

Howard Gardner – Multiple Intelligences.

One of the more recent ideas to emerge is Howard Gardner’s theory of multiple intelligences. Instead of focusing on the analysis of test scores, Gardner proposed that numerical expressions of human intelligence are not a full and accurate depiction of people’s abilities. His theory describes eight distinct intelligences that are based on skills and abilities that are valued within different cultures. The eight intelligences Gardner described are: Visual-spatial Intelligence, Verbal-linguistic Intelligence, Bodily-kinesthetic Intelligence, Logical-mathematical Intelligence, Interpersonal Intelligence, Musical Intelligence, Intra personal Intelligence, Naturalistic Intelligence(Gardner, 1983).

Robert Sternberg – Triarchic Theory of Intelligence.

Psychologist Robert Sternberg defined intelligence as “mental activity directed toward purposive adaptation to, selection and shaping of, real-world environments relevant to one’s life.” While he agreed with Gardner that intelligence is much broader than a single, general ability, he instead suggested some of Gardner’s intelligences are better viewed as individual talents. Sternberg proposed what he refers to as ‘successful intelligence,’ which is comprised of three different factors (Sternberg, 1985).

On the other hand, human being has been fascinated by the noticeable differences in mental capacity that has existed among individuals in society. Ideas relating to intelligence remained a philosophical issue until the late nineteenth century when psychologists began the systematic investigation of intelligence (Thompson, 1984).  In 1996, Williams reviewed the definition of intelligence in his studies that most experts would accept the constructs of goal directed behaviors’ that are adaptable across environments. He included in his studies the opinions of experts to define intelligence in two themes that are common to both definitions. The first common theme was focused on the individual learning from experience and the second on the individual’s ability to adapt to the environment.  In several and similar studies, Chen, 2007; Hale & Jansen, 1994; Myerson, 2003, viewed the processing speed and working memory capacity as the currently predominant integrative constructs for explaining g.  Much of the difficulty in developing an adequate intelligence assessment tool is the lack of a consensus definition of what the concept actually represents. Before selecting the task of assessing cognitive abilities, those abilities must be operationally defined.  François (1995) stated that in order to make use of what intelligence tests explain us; we must first understand what intelligence is. Through the years, the nature of the types of abilities believed to represent intelligence has taken numerous routes. Even the term intelligence itself has recently taken a back seat to a broader viewpoint involving various cognitive abilities.

Spearman, in 1904, put forth the concept of a ‘g’ factor, or an overall general intelligence, based on the positive correlations between cognitive tests (Duncan, Seitz, Kolodny, Bor, Herzog, Ahmed, Newell, & Emslie, 2000). He used a factor analysis of many cognitive measures in order to suggest that the main underlying component of these measures was an overall intelligence, or ‘g’ (Spearman, 1904; Duncan et al., 2000).

In 2002, a study by Ken Richardson on “What IQ Tests Test” describes about how human intelligence should be and whether IQ tests actually measure it and if they don’t, what they actually do measure. The study suggests that IQ scores can be described in terms of sociocognitive-affective factors that differentially prepare individuals for the cognitive, affective and performance demands of the test. The paper shows that how such factors can explain the correlational evidence usually thought to validate IQ tests, including associations with educational attainments, occupational performance and elementary cognitive tasks, as well as the inter-correlation among tests themselves.

Studies on Intelligence Test Development

The study of intelligence and its measurement traces its roots to physicians, educators and psychologists who were deeply involved with population at the extremes of intellectual continuum. Esquirol (1938) and Seguin (1907) were committed to the study of intellectually disabled individuals, and Galton (1884) was fascinated by the mental abilities of geniuses. The separate contributions of these pioneers have been profoundly felt in the field of intelligence testing. It was the innovative research investigations of Binet (1903) who focused on the mental abilities of typical or average children at each age, that have had the longest, lasting and most direct effect on individual intelligence testing as we know it today (Anastasi, 1992).

Esquirol made several important contributions, most notably by distinguishing “between the idiots, whose intelligence does not develop beyond a very low level and the demented person” (Peterson, 1925). This distinction between intellectually disabled and emotional disturbance reflected a vital breakthrough for assessment and indicated the primitive state of the art in the early nineteenth century. Esquirol also described a hierarchy of retardation (or feeble mindedness, as it was known in earlier times) with ‘idiots’ occupying the bottom rung, followed by “imbeciles” and peaking with “morons” (Peterson, 1925). He was well ahead of his time in concluding that the use of language was the most dependable criterion for inferring a retarded individual’s intelligence level. Esquirol (1938) was also credited with developing a precursor of the mental age concept by pointing out that an idiot is incapable of acquiring the knowledge common to other persons of his own age (Anastasi, 1976).  Seguin was heavily influenced in his work with mentally retarded individuals by Itard, of Wild Boy of Aveyron fame. Like Esquirol, Seguin (1907) tried to establish criteria for distinguishing between different levels of retardation, although he focused on sensory discrimination and motor control. Optimism regarding treatment of retarded individuals characterized Seguin’s approach and he instituted a comprehensive programme of sense training and muscle training techniques much of which live on in present day institutions for the mentally retarded (Anastasi, 1992: 1976).

Francis Galton (1884) transformed his enthusiasm for gifted men and genius and the study of the genetics of intelligence into the development of what was apparently the first comprehensive individual intelligence test. Galton   believed  that intelligence must be intimately related to sensory abilities because environmental knowledge comes to us via the senses, he developed a series of tests such as weight discrimination, reaction time, visual discrimination, steadiness of hand,keenness of sight and strength of squeeze. His empirical justification for this test battery came from comparisons between gifted and retarded individuals that, not surprisingly, showed obvious superiority in favour of the gifted (Peterson, 1925). Galton’s influence spread far beyond his laboratory as “Galton type tests” were developed throughout Europe and the United States. Cattell (1890) coined the term “mental tests”; Galton’s influence was clearly evident in Cattel1’s 40-60 minute individual examination, as after-images, colour vision, sensitivity to pain and the like (Peterson, 1925). Cattell elaborated on and improved his mentor’s methodology by emphasizing the vital notion that administration procedures must be standardized to obtain results that were strictly comparable from person to person and from time to time (Huq, 1992).

Later a challenge was issued to the Galton view of sensory and motor intelligence from Alfred Binet of France. In collaboration with Simon and Henri (1895), Binet conducted numerous investigations of complex mental tasks rejecting the Galton notion that performance on simple, elementary sensory discrimination and motor co-ordination tasks equates to intelligent behavior. According to Cattell (1976) and Horn & Noll (1997), Stella Sharp (1899) directly compared sensory-discrimination tests with tests of complex mental functions and concluded that the simplest mental processes yielded comparatively unimportant information, whereas the tests of Binet and Henri showed much value in assessing “individual psychical differences. Even though initial reaction to the two studies was predominantly antitesting causing a lack of enthusiasm for the Galton-Cattell as well as the Binet-Henri approach in the United States, the methodology of Binet eventually triumphed first throughout Europe and finally in America (Peterson, 1925).

Interestingly, a research by Jensen (1979) and his students Vernon (1981) has revived the early work of Galton to some extent. Although they confirmed that simple reaction time measures contribute little to variation in intellectual function, these researchers have found substantial relationship between intelligence and complex reaction time over repeated trials of the same task. Thus adaptations of Galton’s work might yet be found to impact on objective intellectual assessment in future (Huq, 1992).

History of the Stanford-Binet and Its Various Editions

The most revolutionary contribution of all the theorists of their time was that of Alfred Binet and his young associate Theodore Simon.  In 1905, they developed a useful tool to assess general intelligence, which is widely cited as the first major break- through in intelligence testing (Roid, 2003).

Early Work of Binet.

As a member of a French governmental commission working on mental retardation, Binet developed a practical test, sensitive to different levels of cognitive development, which could be given during a clinical interview.  Alfred Binet’s early work began with intelligence testing, when Binet collaborated with Victor Henri to outline a project for the development of a series of mental tasks to measure individual differences (Binet & Henri, 1895).The tasks were designed to differentiate a number of complex mental faculties, including memory, imagery, imagination, attention, comprehension, aesthetic sentiment, moral sentiment, muscular strength, motor ability and hand-eye coordination.

The 1905 Binet – Simon Scale in France.

Binet initiated the leading role in devising a useful and reliable diagnostic system for identifying children with mental retardation. Binet’s project culminated in the publication of the first practical intelligence test with physician Theodore Simon (Binet & Simon, 1905). Binet sought to make the 1905 scale efficient and practical: “We have aimed to make all our tests simple, rapid, convenient, precise, heterogeneous, holding the subject in continued contact with the experimenter, and bearing principally upon the faculty of judgment” ( Binet & Simon, 1916).The scale consisted of 30 items, which were scored on a pass-fail basis. The items presented various word problems, paper-cutting tasks, repeating sentences and digits, and comparing blocks to put them in order by weight (Wolf, 1973).

The 1905 scale included several important innovations that would be used in subsequent measures of intelligence. Items were ranked in order of difficulty and accompanied by careful instructions for administration. Binet and Simon also utilized the concept of age-graded norms (Wolf, 1973).The use of age-graded items allowed the scale to estimate mental age by the pattern of correct answers.  The 1905 Binet – Simon Scale was revised in 1908 (Binet & Simon, 1908)   and again in 1911. By the completion of the 1911 edition, Binet had extended the scales through adulthood and balanced them with five items at each age level. The scales included procedures for assessing language, auditory and visual processing, learning, memory, judgment and problem solving (Roid, 2003).

Figure 1. History of the Stanford-Binet.

History of the Stanford-Binet

Terman’s 1916 Stanford Revision in America.

Realizing the importance of the theoretical and practical value of Binet’s work, Terman (1911) of Stanford University began to adapt the test to the American culture.  Within a few years, the improved scale was published as the Stanford Revision and Extension of the Binet-Simon Scale. However, Terman’s 1916 revision retained Binet’s concept of intelligence as a complex mixture of abilities and is the only revision that has stood for publication to the present day. The standardization that Terman accomplished was quite rigorous for the early 1900s and increased the scale’s technical quality (Roid, 2003).

Revisions of the Terman Scales in 1937, 1960 and 1972.

Within 20 years of its release in 1916, the Stanford revision emerged as the most widely used test of intellectual ability in America. The scale had several language translations and was used internationally. In subsequent years, Terman continue to experiment with easier and more difficult items to extend the measurement scale downward and upward and to increase the age range by including more standardization samples. The new edition was called the New Revised Stanford-Binet Tests of Intelligence (Terman & Merrill, 1937).

The 1937 revision was standardized on 3,200 examinees aged 1 year 6 months to 18 years. Terman made efforts to include a broader representation of geographic regions and socioeconomic levels in the normative sample. Two alternative forms, Form L and Form M were included. Improvements over the 1916 edition included greater coverage of nonverbal abilities, less emphasis on recall memory, extended range of the scale at the lower and upper ends, and more objectified scoring methods. (Terman & Merrill, 1937).  As happens with any widely used test of ability or achievement, obsolete items   were considered for further revision by Terman and Merrill based on the accumulated information and data collected since 1937. Thus, the Stanford-Binet Intelligence Scale, Third Revision, 1960 was published. Several new features were included in the third revision as the use of deviation IQ, (standardized normative mean of 100 and SD of 16), combination of Form L-M while keeping the most discriminating 142 items from the 1937 revision.

After Maud Merrill retired, Robert L. Thorndike of Columbia University was asked to lead a project to collect new norms for the third edition. Thus, the same edition was reprinted with the new normative tables-an update of Form L-M (Terman & Merrill, 1973). Because the Cognitive Abilities Test (CogAT; Thorndike & Hagen, 1994) was being standardized at the same time as the 1972 reforming of the Stanford-Binet, Thorndike selected subjects and some siblings of subjects tested on the CogAT to compose the new norm sample. The stratification variables used on the sample (e.g., age, geographic region, ethnicity, and community size) was similar to those used today, as were the levels of ability on the verbal portion of the CogAT. The items in the test remained essentially the same as on the 1960 revision, with two minor exceptions.

The 1986 Edition by Thorndike, Hagen and Sattler.

In 1986 Thorndike and his associates accomplished the test with a new appearance and structure. The Stanford Binet Intelligence Scale Fourth Edition (SB4) was based on a four-factor, hierarchical model with general ability (g) on a bell curve score (Thorndike, Hagen, & Sattler, 1986). The four cognitive factors were Verbal Reasoning, Abstract/ Visual Reasoning, Quantitative Reasoning and Short-Term Memory. The most significant change from previous editions, however, was the use of point scales for all subtests rather than the developmental age levels used in previous forms. Vocabulary was still retained as a routing test, allowing the test to be tailored to the examinee’s verbal ability. Also, many classic Stanford-Binet tasks were retained, including absurdities, vocabulary, matrices, quantitative reasoning and memory for sentences—tasks also included in the SB5. Composite and profile scores for each subtest would permit a comprehensive examination of strengths and weaknesses among abilities within general intelligence (Roid, 2003).

Stanford Binet Intelligence Scale Fifth Edition (SB5) by Gale H. Roid.

Development of the SB5 is heavily based on the new Cattell-Horn-Carroll (CHC) theory of intellectual abilities.  In continuation with the past editions as the SB4, five key factors of CHC theory were selected for the development of   SB5. In 1995, Gale H. Roid, the author of the SB5 had undertaken the initiative for a new revision and developed it as the fifth edition in 2003. Considering a normative sample of 4,800 subjects, whose ages ranged from 2 to 85 years.  The Fifth Edition includes extensive high-end items designed to measure the highest level of gifted performance. It also includes improved low-end items for better measurement and low-functioning of young children with intellectual disability. Furthermore, the inclusion of age-graded norms in SB5 serves as a unique criterion provided for the estimation of mental age (Roid, 2003a; Thorndike, Hagen, & Sattler, 1986).

Composition of the SB5.

The SB5 design crosses the five factors with the two domains resulting in ten (5×2) subtests. Based on the literature such as manuals of SB5 by Roid (2003), the factors, domains and subtests are reviewed below.


Factors are the important dimensions of cognitive ability that are measured by the items and subtests of SB5. The factors measured in the SB5 are:  Fluid Reasoning (FR), Knowledge (KN), Quantitative Reasoning (QR), Visual-Spatial Processing (VSP) and Working Memory (WM). These factors, the central components of SB5, are discussed below.

Fluid Reasoning (FR)

Fluid Reasoning, as defined by Roid, (2003b) is “the ability to solve verbal and nonverbal problems using inductive or deductive reasoning.” The inductive reasoning component requires the individual to derive the general whole from its specific parts. Likewise, the deductive reasoning component requires that the individual draw a conclusion, implication, or specific example from a general piece of information about the topic.

Knowledge (KN)

According to Roid (2003b), knowledge “is a person’s accumulated fund of general information acquire at home, school, or work.” This construct is often referred to as crystallized intelligence, as it involves learned material that has been stored in long term memory. It also requires perception of detail, attention, concentration, geography, science, and inference skills.

Quantitative Reasoning (QR)

Roid (2003b) defines Quantitative Reasoning, as “an individual’s facility with numbers and numerical problem solving, whether with word problems or with pictured relationships” (p. 136). The items included on the SB5 Quantitative Reasoning target problem solving abilities as opposed to rote mathematical knowledge. As the subtests progress, items become more complex.

Visual-Spatial Processing (VSP)

Visual-Spatial Processing, as defined as the “measures an individual’s ability to see patterns, relationships, spatial orientations, or the gestalt whole among diverse pieces of a visual display” (p. 137). The items of this factor assess the individual’s ability to move pieces and shapes to form a proper whole. All levels within this area address visual construction abilities (Roid, 2003b).

Working Memory (WM)

In 2003, Roid defines Working Memory, as “a class of memory processes in which diverse information stored in short-term memory is inspected, sorted, or transformed” (p. 137).  The individual must filter out the irrelevant information and maintain focus on the pertinent. Furthermore, the information must be manipulated, which places both memory, organizational, and visual-spatial demands on the individual.


A domain represents the degree to which a class of item requires the use of language skills, particularly in generating a response to an item. The SB5 contains two domain composites: Nonverbal and Verbal domains. The assessors should consider that the terms “nonverbal “and “verbal” are relative and comparative terms in the SB5. At present, the two domains are discussed accordingly.

Nonverbal  Domain

This domain requires less language ability or little or no vocal response or speech and thus has lower language demands. The nonverbal tasks involve a small degree of examiner-spoken directions.

Verbal Domain

This domain requires some degree of expressive language, often as simple as a word or phrase or a degree of reading for the average and high functioning students.

Consideration of the nonverbal versus verbal difference, verbal domain has become increasingly important as society has become more culturally and linguistically diverse.

Nonverbal Fluid Reasoning (NFR)

The Fluid Reasoning subtests within the nonverbal domain are Object-Series and Matrices. Initially, the individual is required to match objects. These objects are then placed into a series, either repetitive or not, that the individual must complete. The last phase is similar to the classic matrix-reasoning measures that are common among intelligence testing. (Roid, 2003b).

Nonverbal Knowledge (NK)

The Knowledge subtests within the nonverbal domain include procedural knowledge and picture absurdities. At the lowest end of the spectrum, the subject is required to communicate basic human needs using gesture. As the task demands increase, the subject is presented with impossible pictures in which he is required to point out what is odd or impossible about the scene. The Nonverbal Knowledge tasks tax an individual’s basic level of common knowledge about natural phenomena (Roid, 2003b).

Nonverbal Quantitative Reasoning (NQR)

The Quantitative Reasoning subtests within the nonverbal domain have been carried over from the SB4. However, the focus of the subtests from the SB5 is on the reasoning behind the mathematical concepts, as opposed to the rote solving of mathematical items. In order to succeed on the higher level tasks, the subject must use problem solving strategies, persistence, and cognitive flexibility (Roid, 2003b).

Nonverbal Visual Spatial Processing (NVSP)

The Visual-Spatial Processing subtests within the nonverbal domain incorporate the form board activity from the SB4. However, tasks have been added in order to expand the evaluation of Nonverbal Visual-Spatial Processing activities. Initially, shapes are matched and then inserted into forms. As the individual progresses, accurate duplication of patterns using the provided shapes is targeted (Roid, 2003b).

Nonverbal Working Memory (NWM)

The Working Memory subtests within the nonverbal domain begin by assessing the individual’s ability to hold fundamental, observable objects in short term memory and progress into a rote memory block tapping task. However, towards the higher end of the subtests, the information presented becomes less concrete and more complex (Roid, 2003b).

Verbal Fluid Reasoning (VFR)

The Fluid Reasoning subtests within the verbal domain measures reasoning, absurdities and analogies. As mentioned earlier, the individual is required to sort, identify what is absurd or impossible about verbally presented sentences and pictures, to make generalizations about the information provided (Roid, 2003b).

Verbal Knowledge (VK)

The Knowledge subtests within the verbal domain are Vocabulary. The subject is required to identify several objects and perform through picture vocabulary. As the difficulty level increases, the subject must clearly define vocabulary words. At the upper levels, performance on this subtest is influenced by schooling (Roid, 2003b).

Verbal Quantitative Reasoning (VQR)

The Quantitative Reasoning subtests within this verbal domain measure an individual’s ability to use a variety of mathematical skills. The subtest assesses the individual’s basic addition and subtraction skills, geometric, measurement skills and to complete word problems involving multiplication at difficulty level (Roid, 2003b).

Verbal Visual Spatial Processing (VVSP)

The Visual-Spatial Processing subtests within the verbal domain assess the individual’s ability to understand spatial concepts and relationships. The lower levels of the test include terms such as “ahead” and “behind,” and do not rely heavily upon expressive vocabulary. However, as the task demands increase, expressive vocabulary is needed to explain the complex relationships between geographic information (Roid, 2003b).

Figure 2. Organization of the SB5.

Organization of the SB5.

Verbal Working Memory (VWM)

The Working Memory testlets within the verbal domain begin with Memory for Sentences, which has long been a component of the Binet scales. As the subtests increase in difficulty, the individual is required not only to retain bits of information in working memory, but to manipulate these bits as well. Oftentimes, individuals are able to complete the rote memory sections but encounter difficulty when information manipulation is required (Roid, 2003b).

Based on extensive discussion of SB5, the above mentioned subtests are basic key to judge an individual’s overall intellectual ability through an intelligence scale as SB5. Thus, the author has given an emphasis on these subtests as major variables for her study as standardization of the SB5 for use in Bangladesh.

Changes from the Previous Editions.

The Stanford-Binet has a long tradition, beginning with Terman’s 1916 American revision  called the Stanford Revision and Extension of the Binet-Simon Intelligence Scale (Binet & Simon, 1908).Through various editions in 1937, 1960, and 1986, the Stanford-Binet has become widely known as a standard measure of intellectual abilities. The SB5 blends the use of routing subtests in the point-scale format of the 1986 edition with the functional level design of the 1916 to 1960 editions (Roid, 2003). Moreover, modern Item Response Theory (IRT) provides a strong psychometric foundation for the routing subtest and functional-levels design (Rasch, 1980; Wright & Uneacre, 1999). Test design for the SB5 employed many of the “new rules of measurement,” based on IRT, recognized by psychometric experts (Embretson, 1996; Embretson & Hershberger, 1999; Reckase, 1996). These new measurement rules include methods such as calibrating items in an extensive item pool and adaptive testing through the use of routing subtests. By adapting the test, the routing procedure of the SB5 increases the precision of measurement by tailoring the level of item difficulty to the examinee’s level of cognitive functioning. Traditionally, routing has been a unique feature of the Stanford-Binet scales. Many of the familiar subtests of previous editions remain in the SB5. Examples include Picture Absurdities, Matrices, Vocabulary, and Memory for Sentences, Quantitative Reasoning, and Verbal Absurdities. The use of a hierarchical model of intelligence (with a global g factor and multiple factors at a second level in Fig 3), established in the Stanford-Binet Intelligence Scale: Fourth Edition (SB4) (Thorndike, Hagen, & Sattler, 1986) is repeated in the SB5. A few classic items, such as those in picture absurdity, have been included in the new edition to provide consistency across editions. Changes from the Fourth Edition include a general modernization of artwork and item content as well as the following enhancements (Roid, 2003).

Additional factor.

 The SB5 includes five factors (Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial Processing, and Working Memory) instead of the four factors in the SB 4.

Child-friendly materials.

Responding to many user requests, the SB5 brings back many of the toys and colorful manipulative that are engaging for small children and helpful for early-childhood assessment.

Enhanced nonverbal content.

One half of the subtests in the new edition employ a nonverbal mode of testing, requiring no, or minimal, verbal responses from the examinee. Unique to the SB5, compared to other intelligence batteries, is that the Nonverbal IQ covers all five major cognitive factors.

Increased Breadth of the Scale.

New items to measure very low functioning and very high giftedness have extended the scales upward and downward to provide a wider range of assessment. For example, Object Series items were added to the lower end of Matrices to provide an exceptional floor for the routing tests.

Enhanced usefulness of the test.

The types of items, scores, and factors for the SB5 have been designed to facilitate clinical use of the SB5. The contrasts between verbal and nonverbal facets of each of the five factors, the Abbreviated and Nonverbal forms of the test, and the Working Memory subtests enhance the interpretations and applications of the test in clinical, school, and occupational settings. Based on the description of changes from earlier editions, the unique features of the SB5 are as follows (Maddox, 2003):

  • Wide variety of items requiring nonverbal performance by examinee – ideal for assessing subjects with deafness or communication disorders.
  • Ability to compare verbal and nonverbal performance – useful in evaluating learning disabilities.
  • Greater diagnostic and clinical relevance of tasks, such as verbal and nonverbal assessment of working memory.
  • Extensive high-end items, many adapted from previous Stanford-Binet editions and designed to measure the highest level of gifted performance.
  • Improved low-end items for better measurement of young children, low functioning older children or adults with intellectual disability.
  • Co-normed with measures of visual-motor perception and test-taking behavior.
  • Enhanced artwork and manipulative that are both colorful and child-friendly.

The Standardization of (Original) 2003 Edition (SB5)

The total of ten subtests, five nonverbal and five verbal provides measures of the five CHC factors in the SB5: Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial Processing and Working Memory. Out of nearly 1000 items from the pilot and tryout phases of the project, approximately 375 items were employed in the 5th standardization edition.  The final published version separated the nonverbal and verbal subtests into separate easel books whereas the longer Standardization Edition had a mixture of nonverbal and verbal subtests in each functional level of the test. Very close statistical equivalence for the two versions (longer Standardization Edition and shorter final version) was demonstrated, and no significant context or order effects were observed between the two versions (Roid, 2003).

Psychometric Properties of (Original) SB5 for Standardization

Extensive studies of reliability, validity, and fairness were conducted as part of the SB5 standardization.

Item Analysis.

The items from all Stanford-Binet editions were rated by experts in the Cattell-Horn-Carroll (CHC) theory of intellectual abilities during the first year of the development of the SB5(Carroll, 1993; Cattell, 1963; Evans, Floyd, McGrew, & Leforgee, 2001; Horn, 1994). The experts noted the CHC factor or factors being measured by each item, and all items were classified into comprehensive lists for each factor. These lists proved valuable in creating early versions of new items and new subtests. Factor analyses of Forms L and M of the Stanford-Binet Intelligence Scale (Terman & Merrill, 1937) and the Stanford-Binet Intelligence Scale: Fourth Edition (Thorndike et al., 1986) further verified the items and subtests most central to each of the factors. Extensive item analyses, including classical and item response theory methods, were conducted on SB5 items. Item analyses, subtest scaling analyses, reliability studies, and item factor analyses were conducted using pilot, tryout edition, and standardization edition studies.  The final selection of items for the standardization edition involved many sources of information, item analyses, and the comparative merit of items.


The sample was nationally representative and matched to percentages of the stratification variables identified in U.S. Census Bureau (2001) publications. The stratification variables were age, sex, race/ethnicity, geographic region and socioeconomic level, each of which is being defined below.


For stratified sampling purposes, 30 age groups were defined. Age was defined by subtracting the birth date from the testing date, with months of age treated as 30 days.


Either examinees or their parents or guardians identified the sex of the examinee on the required consent form. Examiners verified sex by interview (and by markings on the SB5 Record Form) if this information was missing or unclear. An approximate 50% split between female and male examinees were targeted at all age levels except the elderly, where census studies clearly show a larger percentage of females.

Geographic region.

The four U.S. geographic regions in the census (Northeast, Midwest, South, and West) were employed in stratifying the normative sample. The home or usual residence of the examinee denned the region, not the school or agency where testing was conducted.

Socioeconomic level.

As with numerous other published instruments in psychology and education, educational attainment was employed as the indicator of socioeconomic level. The other popular indicators of socioeconomic standing, occupation and income, were judged to be problematic. Although occupational information was collected for the SB5, it is by nature a complex description of the jobs of the parents or guardians that would then have to be categorized by various scales of occupational level—a time-consuming and fairly subjective process. (U.S. Bureau of the Census, 2000).


Scores obtained from tests of intellectual ability such as the SB5 must be as precise as possible, given that they are used for life-changing decisions of treatment, placement, or classification. However, the concept that all test scores have some degree of measurement error is critically important to the ethical use of tests (Turner et al., 2001). Measurement error is evaluated by examining the reliability of each test score. The reliability of a test score refers to its precision in measuring the true attributes of a person and its consistency across sets of items, multiple testing occasions and other conditions that affect score stability. Reliability for SB5 scores includes internal consistency, test-retest stability and errors of measurement. Internal-consistency reliability ranged from 0.95 to 0.98 for IQ scores and from 0.90 to 0.92 for the five factor index scores. For the 10 subtests, average reliabilities (across age groups) ranged from 0.84 to 0.89, providing a strong basis for profile interpretation. Test-retest reliability studies were also conducted and showed the stability and consistency of SB5 scoring (Roid, 2003).


Validity has numerous features and is established by the presentation of content-related, criterion-related and construct-related evidence. Validity is assessed by correlating measures with a criterion measure known to be valid. Evidence for content and criterion-related validity of the SB5 was conducted. Examples of validity, including the correlations with other assessment batteries were computed for standardization of original SB5. The correlations shown are quite substantial and similar in magnitude to the concurrent correlations observed for other major intelligence devices (Roid, 2003). Besides, the research related to the foundation of the five key factors of SB5 is reviewed below.

Research Related to the Stanford-Binet Intelligence Scale (SB5)

 The landmark research of Carroll (1993) based on 461 factor studies of intelligence has resulted in an integrated theory of intellectual ability which was regarded as the leading research-based model of intelligence. The integration of Carroll’s work with previous research has lead to the new Cattell-Horn-Carroll (CHC) theory of intellectual abilities (Flanagan, 2000; Evans et al., 2001). As a result, the selection of the CHC model allows the SB5 and its users to benefit from more than 60 years of accumulated research and clinical experience in the assessment and interpretation of intellectual abilities. Studies on the early Stanford-Binet Forms L and M showed that the CHC factors were clearly recognizable in the early editions of the Binet scales (Woodcock & McGrew, 1997), adding an even greater degree of historical and clinical meaningfulness to the CHC model. Figure 3 shows the CHC model with the five factors of the SB5 displayed in the middle row below general ability (g).

Five Factor Model.

 The SB5 includes five factors from the CHC model (as shown in Figure 3).  The importance of these five factors emerged from extensive review of the literature on intellectual assessment and extensive discussions with experts in giftedness, special education, pre-school assessment and adult-clinical disorders.

The Structure of Cognitive Abilities.

Higher Order Factors in Studies of Intelligence and Cognition.

Hierarchical studies of intelligence and cognition originally grew out of Spearman’s (1927) model of general intelligence, which he labeled g. More recent descriptions of hierarchical models appear in Carroll (1993),  who proposed a three-stratum model with numerous specific factors in stratum one, eight factors in stratum two and ‘g’ in stratum three. Carroll’s eight factors include those listed in the CHC model (Figure 3) except that he placed quantitative reasoning as part of fluid reasoning and reading and writing ability as part of crystallized knowledge. He also stated that he accepted the basic features of Spearman’s concept of g and the enhancements developed by his colleague Holzinger (1936). This later Spearman-Holzinger model was similar to Carroll’s hierarchical three-stratum theory, except that it included only the top two strata—the specific group factors and the g factor (Roid, 2003).

Besides, Thurstone (1938) identified seven primary mental abilities: verbal, word fluency, number facility, spatial visualization, reasoning, memory and perceptual speed. Carroll (1993) indicated that the modern three-stratum model was a direct outgrowth of Thurstone’s (1947) method of successive factorization of correlation matrices at higher orders. In a multidimensional scaling reanalysis of Thurstone’s data conducted by Snow, Kyllonen, and Marshalek (1984), three superordinate clusters of tests—verbal, spatial, and quantitative—were identified (Roid, 2003). In another hierarchical model, Vernon (1961) defined a superordinate g factor and two lower order factors called v:ed (verbal-educational ability) and k:m (mechanical-spatial ability). The v: ed subdivides into verbal and numerical, while k:m subdivides into space ability, manual ability and mechanical information. Carroll (1993) noted that Vernon’s model was valuable in confirming a hierarchical g factor, but was oversimplified in claiming only two lower order factors (Roid, 2003).

Cattell (1943) developed the initial fluid and crystallized model of intelligence. Cattell considered fluid intelligence to consist of deductive and inductive reasoning and the ability to solve novel problems. Crystallized intelligence involved the processing of accumulated knowledge due to acculturation, schooling, language development and general ability to reason with stored information and methods. Horn (1965) confirmed the fluid-crystallized distinction, but added other factors now identified as visual-spatial ability, short-term memory, processing speed, and long-term retrieval (Horn & Cattell, 1966). Quantitative reasoning was identified by Horn (1989) and in the cross-battery factor analyses of Woodcock (1990).

Independent of other investigators, Gustafson (1984) proposed a three-level model of intelligence. At the highest level is ‘g’ (general intelligence) and at the next level are three broad factors. These factors are labeled crystallized intelligence (dealing with verbal information), fluid intelligence (ability to solve novel problems), and general visualization (dealing with figural information). In Gustafson’s data, fluid intelligence showed an extremely high relationship to the higher order ‘g’ factor, suggesting that fluid reasoning is at the core of general intelligence. At the third level are the primary factors of verbal and numerical achievement within crystallized intelligence. Also, speed of closure, figural relations, induction and memory span are found within fluid intelligence. Finally, visualization, spatial orientation and flexibility of closure are found in general visualization. Gustafson and Undheim (1992) replicated these findings with 12- and 15-year-olds. They reported a general intelligence factor with residual factors representing crystallized intelligence (read as verbal) and general visualization (read as figural or nonverbal). The consistency between Gustafson’s model and the major factors in CHC theory and the SB5 are striking (Roid, 2003).  The varied independent investigations of Spearman-Holzinger, Carroll, Thurstone, Cattell-Horn and Gustafson converged on the types of factors found in the CHC model. In all of these studies, prominent verbal and visual factors emerge as important along with quantitative, memory, and reasoning factors, providing indirect support for the SB5 verbal-nonverbal dichotomy and the five-factor model (Roid, 2003).

Historical Antecedents of the Non verbal-Verbal Domain of the SB5.

Alfred Binet was clearly aware that intellectual behavior could occur without language-based determinants. As his development of intelligence measures evolved, Binet (1903) became aware of the fact that intelligence may be formulated through thinking without images and thinking without words:

The images, the interior language, and the acts are the conscious forms of the thought; they are its light; they render the thought visible to us, they reveal its details to us…. But they come only after the thought, they are its results; before the images, before the words, the thought is understood, it is performed. . . . We believe that we have established beyond any doubt, by precise observations, that there is thought without images, that there is thought without words, and that thought is formed by an intellectual feeling (Binet & Simon, 1908, pp. 338—339).

Terman and Merrill (1937) sought to increase the number of nonverbal procedures at the lower levels of the Stanford-Binet, motivated by concerns with the verbal nature of the lower items. Several subsequent attempts have been made to create nonverbal scales for the Stanford-Binet. In the first attempt, McNemar (1942) created two 20-item parallel nonverbal intelligence scales from items in the Stanford-Binet Forms L and M, although he noted some limitations: “Since the directions for these items are mainly verbal rather than pantomime, it follows that some understanding of language is involved and consequently that the items are not to be regarded as purely non-verbal” Roid (2003). In a recent attempt to develop a nonverbal short form, Glaub and Kamphaus (1991) selected subtests from the Fourth Edition for use with children who are hard-of-hearing, speech and language impaired, and limited-English-proficient children. This nonverbal short form consisted of Bead Memory, Pattern Analysis, Copying, and Memory for Objects and Matrices.

More recent advances in test development suggest that multidimensional tests may be developed that are largely nonverbal (Roid & Miller, 1997; Roid & Holadyna, 1982). These tests usually involve pantomime administration, content and structural studies regarding the degree to which item performance is mediated through verbal or nonverbal means and varied item response modes (e.g., pointing to a stimulus book, placing cards in an appropriate arrangement, building three-dimensional constructions). The SB5 development team (Roid, 2003) studied these nonverbal innovations and other previous methods of verbal and nonverbal assessment. Accordingly, it was possible to construct verbal and nonverbal Stanford-Binet scales for each of the five factors assessed in the SB5.

Cross –Cultural Assessment and Standardization Process – An Overview

During the past several decades, the unique challenges of cross-cultural assessment and counseling have attracted considerable attention. Cross-cultural assessment has become a sensitive issue due to specific concerns regarding the use of standardized tests across cultures (Chang, 2008).  Before selecting an assessment instrument for use in counseling or research, counselors and researchers are trained to verify that the test is appropriate for use with their population.  In order to assess overall performance, most psychological tests employ a standardization process. It allows the test developer to create a normal distribution which can be used for comparison of any specific future test score. The term standardization refers to the process of determining established norms and procedure for a test to act as a standard reference point for future test results. The criteria of standardization   for any psychological test are as referred to as item analysis, norm development, reliability and validity (Anastasi & Urbina, 1997). This depicts that the investigation of validity, reliability and appropriate norm groups to which the population is to be compared.

Hence, the researcher has followed and established standardization process based on literature review of SB5 for her study. It allows the test developer to create a normal distribution which can be used for comparison of any specific future test score. The standardization process includes following steps.

Item Analysis.

Item analysis is the process of collecting, summarizing and using information from students’ responses to assess the quality of test items. The analysis depicts the   effectiveness of the items in a given test that discriminate between students with higher and lower scores in the ability measured. Presence or absence of faults logically affects the values of discrimination. Items that discriminate poorly indicate adaptation and modifications. Difficulty Index (P) and Discrimination Index (D) are two parameters which help to evaluate the standard of test items used in an assessment. (Mitra, Nagaraja, Ponnudurai & Judson, 2009).


A psychological test would be effectively standardized for using in other language and culture when the items of the test are being well adapted. With this view, standardization and adaptation of psychological tests for cross-cultural assessment is becoming prominent worldwide at an increasing pace. For years, psychological and educational tests have been translated for use in different languages (Geisinger, 2003cited  in Matthews, 2003). The contribution that adaptation brings to the process is the flexibility to make an altered instrument not only linguistically appropriate but also culturally fit with an intended targeted population. (Hambleton & Bollwark, 1991cited  in Matthews, 2003). Moreover, adaptation and translation of tests are not limited to the areas of academic testing. In particular, researchers in the psychological arena are also using adapted tests to assess intelligence, aptitude, personality etc (Chang & Myers, 2003cited  in Matthews, 2003).  Adapted tests are usually a mixture of three types of items: newly developed items, translated items and adapted items (Church, 2001).The variation in adaptation depends on the function and the purpose of the test being used (Hambleton & Bollwark, 1991 in Matthews, 2003). Verbal tests, for example, may require more new item development and item adaptation whereas a test of mathematical reasoning may consist primarily of item translation and corresponding adjustments to instructions.(Matthews,2003 ). A safe rule of thumb to translate or adapt an item is to ensure whether the proposed target item reflects the spirit of the original item (Allalouf & Chang, 1999; Sireci, 1998 cited in Matthews, 2003).  Moreover, adapting an existing instrument instead of developing a new one has remarkable benefit. By adapting a test, the researcher is able to compare the cross-cultural studies at both the national and international level. For any test developers and users, adaptations also conserve time and expenses (Hambleton, 1994). Test adaptation can lead to increased fairness in assessment by allowing individuals to be assessed in the language of their choice (Hambleton, Merenda & Spielberger, 2005). On the other hand, along with newly developed tests, in other context, tests are also being translated and adapted in different cultures.  Test adaptation is exclusively necessary for language and cultural differences (Reckase 1989). The most significant recognition for any test adaptation in any country is the provision of guidelines from the International Test Commission (ITC). ITC recommends a    guideline for Test Translation and Adaptation is: “Test developers/publishers should ensure that the adaptation process takes full account of linguistic and cultural differences in the intended populations.” (Guideline D1, ITC, 2001; Hambleton, 2005).  This guideline can act as a benchmark for any country in translating and adaptation of psychological tests.  Examples of psychological assessments available in various languages and culture, including intelligence and general ability tests, Stanford-Binet Intelligence Scale Fifth Edition (SB5),   the Wechsler Intelligence Scale for Children- Revised (WISC-R) , the Peabody Picture Vocabulary Test, the Bateria Woodcock-Munoz, the Children’s Hope Scale, the Sixteen Personality Factor (16PF) Questionnaire, Miller’s Analogy Test (MAT) and the Wonderlic Personnel Inventory are  adapted instruments. (Matthews, 2003). These adapted tests are widely used over the world, especially in developing country for increasing fairness and usability in assessment. Norm.

Norms are not standards of performance, but serve as a frame of reference for test score interpretation. Norm groups can range in size from a few hundred to a hundred thousand people. The more people are used in norm group, the closer the approximation to a normal distribution. The standardization sample is also referred to as the norm group.  Generally for standardization, the samples are representative and matched to percentages of the stratification variables such as age, sex, race/ethnicity, geographic region and socioeconomic level (described in chapter two) (Overton,  1992).


Reliability in assessment refers to the confidence that can be placed in an instrument to yield the same score for the same student if the test is administered more than once. Besides, it considers the degree to which a skill or trait is measured consistently across items of a test. Since educators use assessment as a basis for educational intervention and placement decisions, understanding of reliability aids educators in determining the accuracy and dependability of an instrument (Overton, 1992). The reliability of a test score refers to its precision in measuring the true attributes of a person and its consistency across sets of items, multiple testing occasions and other conditions that affect score stability (Rousson, Gasser, & Seifer, 2002).


The accumulation of evidence for the validity of test scores and their interpretation is a complex effort. Validity of test scores depends on the proper administration of the test by an experienced examiner and proper recognition of the unique characteristics of the individual examinee (Matarazzo, 1990). Technically, a test is neither valid nor invalid by itself, but instead, the uses and interpretations of test scores are valid or invalid based on accumulated evidence (Turner, DeMers, Fox, & Reed, 2001).

Thus, validity and reliability take an additional dimension in cross-cultural testing as do the question of the appropriate norm group. The instrument must be validly adapted, the test items must have conceptual and linguistic equivalence and the test items must be bias free (Domino, 2000).  As stated earlier, the International Test Commission (ITC) has provided guidelines for translating and adapting tests in 1992. It further highlights administration and interpretation of tests to improve the accuracy and compile evidence on the equivalence between the different language versions (Guideline D1., ITC, 2001; Hambleton, 2005).  There is considerable evidence indicating that the need for multilanguage versions of intelligence, achievement, aptitude and personality tests are growing.  These adapted tests would then be appropriate if further research and cross-cultural comparative studies are being carried out (Hambleton, 1994).

Interpretation of Test Scores: Now and Then

In the early decades of intelligence testing, intelligence test scores were expressed as a true quotient, hence the term IQ or intelligence quotient. An IQ was defined as a ratio of the examinees mental age to the examinees chronological age which was then multiplied by 100 to eliminate dealing with fractional scores [(MA/CA) X 100]. This form of calculation for an IQ has serious psychometric and related measurement problems and has been abandoned for decades although its presentation continues to be common in many introductory psychology and education textbooks. In the early 2000s, IQs are calculated in the form of age corrected deviation scaled scores. These are formal transformations of raw scores (i.e., number of points obtained or items answered correctly) into a standard score format that incorporates the use of the mean and the standard deviation of the raw scores at predetermined age intervals so that the IQ given by the test has the same percentile ranking at each age level, which is not true of the old ratio style IQ. Further, IQ score is a necessarily incomplete reflection of intelligence. It is far from perfect as an index of a person’s total intellectual ability and is not useful in identifying specific talents.  Scores from intelligence tests are interpreted properly only when the standardized instructions for administering and scoring the test have been followed rigidly. Deviations from standardized administration and scoring cause the scores to move up or down for an individual examinee inappropriately and in ways that are unpredictable, rendering the scores uninterruptable (Lee, Reynolds, & Willson, 2003). Intelligence test scores are viewed by test interpreter as reflecting innate potential but clearly that is not the case. While innate ability contributes to intelligence test performance, many other variables contribute to performance on ability measures as well. Intelligence as measured on such tests as described here is a summative construct at any given point that is a reflection not only of a person’s innate potential but the interaction of this potential with the entire life experiences of the individual (Reynolds, Livingston, & Willson, 2006).

UN Convention on the Rights of the Child (CRC) – 1989

Apart from two countries, this convention has been ratified by all the member states of United Nations. The four principles of CRC (Non-discrimination: Article # 2, Best Interest of the Child: Article # 3, Survival & development: Article # 6 and Participation: Article # 12) applies to children with disabilities also. Article # 28 of CRC insists that all children have the right to education on the basis of equal opportunity & Article # 29 emphasizes that the education of children shall be directed to: the development of a child’s personality, talents and mental and physical abilities to their fullest potential; the development of respect for human rights and fundamental freedom… ; parents, own cultural identity, language and values including national values… and the participation of the child for a responsible life in a free society…. Etc.

Education for All (EFA): Jomtien (1990)

The basic idea of inclusion can also be found in the Jomtien Declaration. Here, Education for All (EFA) emphasizes the inherent right of every child to a full cycle of primary education, and commitment to a child-centered pedagogy, where individual differences are accepted as a challenge, and not as a problem. The Jomtien Declaration also emphasizes the need for improvement in the quality of primary education and teacher education, recognizing and respecting the wide diversity of needs and patterns of development among primary school children.

Salamanca Declaration (1994) World Conference on Special Needs Education –

This international declaration states “Schools should accommodate all children’s conditions”. Inclusive education was adopted at the World Conference on Special Needs Education (SNE) as a principle in addressing the learning needs of various disadvantaged, marginalized and excluded groups. This includes children with disabilities and gifted children, street and working children, children from ethnic minorities, refugee children and other marginalized or disadvantaged children. In this context “special education needs” refers to all children that experience barriers in equal access and equal participation in education. SNE, since the Salamanca Declaration, is viewed as an integral part of all Education for All (EFA) discussions.

Standard Rules on the Equalization of Opportunities for Persons with Disabilities (1993)

The UN “Standard Rules on the Equalization of opportunities for persons with disabilities” comprised 22 Rules. The Rule 6. Education: ‘States should recognize the principle of equal primary, secondary and tertiary educational opportunities for children, youth, and adults with disabilities, in integrated settings.

Dakar Framework (2000)

The need for inclusive education has been repeated in the notes on the Dakar Framework for Action, which mentions “…In order to attract and retain children from marginalized and excluded groups, education systems should respond flexibly. …Education systems must be inclusive, actively seeking out children who are enrolled and responding in a flexible way to the circumstances and needs of all learners”. The achievements 10 years on since EFA have been assessed and analyzed. The Jomtien goals have not been reached and some of them were taken on board again in Dakar, extending the time for achieving the goals.

E-9 Declaration (2000)

The declaration on EFA was agreed upon during the fourth summit of the nine high population countries (which includes Bangladesh) in February 2000, and also highlights as one of the main goals that “all children with special needs will be integrated in mainstream schools.”

Children with Disabilities in No Child Left Behind

The No Child Left Behind Act of 2001 (NCLB) is a United States Act of Congress concerning the education of children in public schools. Several key pieces of legislation over the past three decades have contributed to the evolution of the assessment process for young children with special needs. Specifically, the Education for All Handicapped Children Act amendments (P.L. 99–457, 1986), later renamed the Individuals with Disabilities Education Act (IDEA, P.L. 102–119, 1998), the 1997 version of IDEA (P.L. 105–17, 1997–1998), the 2001 Elementary and Secondary Education Act (No Child Left Behind, P.L. 107–110), and the most recently authorized 2004 version of IDEA (Individuals with Disabilities Education Improvement Act, P.L. 108–446) have all provided critical guidelines for the identification, assessment, and treatment of young children with special needs. While, initially, the focus of legislation was to merely identify children in need of early intervention services, there has been an increased emphasis in the most recent legislation (IDEA 2004; NCLB) on looking ahead to school-based services. No Child Left Behind requires all government-run schools receiving federal funding to administer a state-wide standardized test annually to all students. This means that all students take the same test under the same conditions. The students’ scores determine whether the school has taught the students well. The No Child Left Behind Act (NCLB) includes incentives to reward schools showing progress for students with disabilities and other measures to fix or provide students with alternative options than schools not meeting the needs of the disabled population. The law is written so that the scores of students with IEPs and 504 plans are counted just as other students’ scores are counted. Schools have argued against having disabled populations involved in their AYP measurements because they claim that there are too many variables involved.

DPI Position Paper on Inclusive Education

‘Disabled People International (DPI) believes that education should be accessible to all who desire to be educated, no matter their ability; disabled people should have the option to be integrated with the general school population, rather than being socially and educationally isolated from the mainstream without any choice in the matter. Students who are deaf, blind or deaf-blind may be educated in their own groups to facilitate their learning, but must be integrated into all aspects of society’.

The National Policy on Education in Bangladesh

Disability Welfare Act- 2001

The National Disability Welfare Act-2001of Bangladesh emphasized: establishing specialized education institutions in order to cater for the special needs of different types of disabled children, designing and developing specialized curriculum and production of text books; creating opportunities for free education to all children with disabilities below 18 years of age and provide them with books and equipment free of cost or at low-cost; endeavor to create opportunities for integration of students with disabilities in the usual classroom setting of regular normal schools wherever possible; arranging training for the teachers and other employees working with the disabled  and to arrange easy transport facilities for attending school.

The National Literacy Goal of Bangladesh

The National Literacy Goal of Bangladesh is to ensure 100% literacy rate by the year 2015. If this target is to be achieved, the education needs of children with disabilities cannot be ignored. But there is no specific mention about inclusive education or any specific intervention to address the issues of educating children with disabilities.

National Education Policy (2000)

Chapter 18: Special Education, Health and Physical Education, Scout and Girls Guide

Special Education

The children unable to fulfill requirements of their daily life due to physical and mental problems need special education, competent remedial measures, special care and nursing. The deaf, blind, physically handicapped, mentally handicapped and the epileptics fall within the purview of special children. In accordance with the degree of disability, they are termed as mildly, moderately and severely disabled. The principal aim of special education is to help the disabled persons establish themselves in society through different special education programs depending on their degree of disability. The policy describes the special education strategy as: conducting national surveys on the prevalence of disability in accordance with types and degree of disability; improving the quality of existing special and integrated educational institutions and increasing the number of special and integrated schools for different types of disabled children; initiating an integrated education system in district and sub-district level primary schools; to establish teachers training colleges/institutions for teachers of special schools; to include disability issues in mainstream teachers training curriculum; provision to be made for ensuring free supply of education materials to disabled pupils; alternative curriculum to be followed for children unable to cope with the mainstream curriculum etc. The National Education Policy (2000) does not include any specific policy guideline or action plan to either address or facilitate inclusive education. Rather, the emphasis is on special and integrated education. The strategies mentioned in the policy for special education, remain on paper and have not been implemented yet.


Following recommendations made in a study in 2002 carried out by CSID in association with Cambridge Education Consultants Limited, UK, (commissioned by the Department of Primary Education, Government of Bangladesh), The Ministry of Primary Education in its Primary Education Development Project (PEDP) –II included a component of inclusive education for children with disabilities from 2004. However, it has not been implemented yet.

National Education Policy 2010

 According to national education policy 2010, at present, the drop-out rate till or before the completion of Class V is about 50% and of the rest, about 40% leave the school before completing Class X. It is extremely urgent to bring down this rate of drop-out. So, necessary measures will be implemented so that all students are enabled to complete Class VIII and it will be ensured by 2018.


Process of Standardization

The present research was conducted in four steps as a part of standardization process of Stanford Binet Intelligence Scale. First step was the strength or weaknesses of items were identified through item analysis.  Secondly, the norm was calculated and developed, thirdly reliability and afterward validity was tested. For different estimation, different samples were considered. For the calculation of norm, the study has considered students from six divisional metropolitan cities (Barisal, Chittagong, Dhaka, Khulna, Rajshahi and Sylhet) to represent Bangladesh. Instrument as SB5 for standardization (discussed in this chapter and briefly in chapter two) and standardized procedure for test administration (also outlined in this chapter) were followed for all the participants in different steps.

Research Design

The research was designed to fulfill the criterion for standardizing the cognitive ability test; the study was designed in the following four steps involved in standardization process.

Research design for the standardization of SB-5 in Bangladesh.

The following table summarizes the different process involved in standardizing the SB-5 on 11 age groups (06 – 16 years).

General Procedure for Field Study for Norm.

Prior to the task of testing, the following procedure was taken into account. A list of the existing schools from six metropolitan divisions of Bangladesh was obtained from the Bangladesh Bureau of Educational Information and Statistics (BANBEIS) representing the metropolitan thanas (Appendix7). Later, the researcher was given a letter of permission (Appendix 1) by the supervisor which explained the purpose and significance of the research. With this letter the researcher obtained a formal consent (Appendix 2) from the District Education Officer (DEO) for the school authority to administer the test. Further, the head of the institutions were assured of the confidentiality on the information gathered by the researcher. The researcher then gave a clear description of the test in general and shared with them about the implications of the study. Usually, the test was administered on two boys and two girls from each age group from a school. Thus, from a primary school 20 students were taken whereas a school having both primary and secondary unit 44 students was selected for test administration. Further, Khulna and Sylhet metropolitan cities had few number of schools compared to other divisions. Thus, the numbers of students were approximately 50 to 55 per school.  The study has considered the average and above average students based on their school academic performances.

Standardization, a process of testing a group of people, is defined as a test with clearly defined procedures for administration. Many standardized tests are also norm referenced; that is, test scores are interpreted with reference to the scores from a normative sample. With standardization, the norm group must reflect the population for which the test was designed. The group’s performance is the basis for the tests norms. Standardized testing involves using testing instruments that are administered and scored in a pre-established standard or consistent manner. For the present study age norm was computed from the raw scores obtained by the subjects after administering the non­verbal and verbal subtests (five subtests from each domain) of the Stanford-Binet Intelligence Scale (Fifth Edition).


Standardized Testing Procedures for SB5.

Standardized test assesses a student’s functional abilities under controlled conditions. In this context, the test was administered in a separate setting along with quiet and well-lit room to avoid extraneous variable that would affect the test scores. Before test administration, a good rapport between the test administrator and the participants was established.  The test administrator (researcher) was sensitive to the pace at which the participants worked most comfortably. She presented the tasks rapidly enough to maintain the examinee’s interest, but not so quickly that the examinee felt rushed. She also established a relaxed and pleasant environment and made the testing session a positive experience for the examinee. The record form of SB5 also comprised of a series of checklist for the observation of a student during the testing session. The test administrator would note in the form any unusual examinee responses, reactions, or distractions, such as extreme distractibility, anger or opposition, poor communication skills, or highly emotional responses and would include this information as report of the results. To be scientifically accurate, the researcher   also followed the standard instructions given in the administrative manual.

The Standard Order of Administration for the ten subtests comprised of a systematic layout as proposed by the original scale was also followed for the testing sessions (Figure 8).

Figure. Standard administration order for the Stanford-Binet Intelligence Scales, Fifth Edition.

Standard  administration order for the Stanford-Binet

Routing Subtests

Two special subtests—the routing subtests—were administered at the beginning of the SB5. The routing subtests identify an individual’s developmental starting point for the entire remaining subtest.

The Nonverbal Routing or Fluid Reasoning  subtest (Object Series/Matrices) provides an indicator of an individual’s nonverbal ability and serves as the basis for determining his/her  starting point for the remaining four subtests in the nonverbal domain.  The other routing subtest, Verbal Knowledge (Vocabulary), provides an indicator of an individual’s verbal ability and is the basis for determining his/her starting point for the remaining verbal subtests. Both of the routing subtests are included in SB5 Item Book 1 and Item Book 2 contains the remaining nonverbal subtests, and Items Book 3 contains the remaining verbal subtest. Figure 8 shows this organization and the proper order for standard test administration for an explanation of an alternative nonverbal administration of the SB5.


Items for all subtests except the two routing subtests are grouped into testlets. These testlets are then arranged into levels of difficulty, with six levels for the Nonverbal domain and five levels for the Verbal domain. In both domains, Level 6 is the most difficult, but the Verbal domain only contains five levels, labeled Levels 2 through 6. Nonverbal Level 1 consists of two testlets at the lowest level of difficulty and has no direct counterpart in the Verbal domain. Within each domain, Levels 2 through 6 each consist of four testlets, one for each remaining factor.


Many of the SB5 subtests contain more than one type of item. This is necessary because of the wide range of ages and abilities that each subtest spans. An activity that works well to assess a particular factor for young children may not be the most appropriate way to assess that factor for adolescents or adults. For example, the Nonverbal Visual-Spatial Processing subtest uses simple Form Board activities for the initial tasks and Form Patterns activities for all tasks at subsequent levels.

Wechsler Intelligence Scale for Children -Revised (WISC-R).

The Wechsler Intelligence Scale for Children -Revised (1974) was used to validate the SB5 by computing the correlation among the three (Verbal, Nonverbal and Full Scale) IQ scores. The scale was translated and adapted in Bangla in 1980 by Sharmin Huq. The age ranges were from 6 -15 years. The WISC-R comprised of the verbal and the performance scale. Each scale consisted of five subtests. The ten subtests are shown through the following Table 4.

Table 4

Showing the ten subtests of WISC-R (Bangla version)

Verbal Subtests

Performance Subtests


Picture Completion


Picture Arrangement


Block Design


Object Assembly



Ethical Consideration

A prerequisite of any standardized intelligence test is to follow the standard procedure as stated in the examiners and technical manual. Similarly, given the long history of Stanford Binet Intelligence Scale and the importance of accurate assessment of intellectual abilities, the researcher of this study followed the same instructions as proposed by the author (Roid, 2003) of original SB5. Considering the professional and ethical issues related to the overall assessment of IQ, researchers expertise, training, data collection process and its confidentialities, technical qualifications of scoring procedures, data analysis were regarded from the very beginning till the writing of the report. In this context, scores obtained from tests of intellectual ability such as the SB5 had to be as precise as possible, given that they were used for life changing decisions of treatment, placement or classification. However, the concept that all test scores had some degree of measurement error is critically important to the ethical use of tests. A shorter retesting interval would allow the SB5 to be highly useful in the assessment of treatment interventions in clinical and neuropsychological settings as well as in re-evaluations for special education. The stability of the SB5 is even more impressive in light of the relatively shorter test-retest interval on the SB5 (5 to 8 days) compared to that on the Wechsler scales (23 to 35 days on average).  In addition, in this research, the researcher would also concentrate on the basic ethical norms required by the American Psychological Association (APA, 1992, 2000)   during research study. Furthermore, naturally,   researcher also followed an ethical obligation to prevent physical and mental destruction to her subjects.  Researchers also would allow her participants to withdraw from the study at any time if they wish to stop participating. Finally, researcher pursued the strategy as an obligation to protect the ambiguity of their participants understanding on overall test administration.


The results section describes the findings of different segments that involved in completion of standardization process. Eventually, this section is also organized into four segments such as item analysis, standardization of norm & IQ of SB5, reliability and validity of the test.

Item Analysis 

When norm-referenced tests are standardized for instructional purposes, to assess the effects of educational programs, or for educational research purposes, it becomes very important to conduct item analyses. Item analysis, as a first step of standardization of intelligence scale, was carried out through SB5 test kit among the 330 students of 11 age levels (6-16 years) to scrutinize the strengths and weaknesses of the test items. Examination of each item was done in terms of (i) Difficulty Index (ii) Discrimination Index. Item analysis was computed on the scores obtained by the participants for the ten subtests of SB5. As discussed earlier, item difficulty and discrimination index were calculated within 165 of the total sample group. Following the total raw scores obtained by the students, scores were divided into two groups as upper group (first 25% students) and lower group (last 25% students).According to the testing procedure, the students responded to items based on their age appropriate basal ability level. Item analysis results have been analyzed   in the Table 7 and Table 8 showing the re-arranged items of the ten subtests of Stanford-Binet Intelligence Scale (Fifth Edition), Moreover, modified and adapted items for both verbal and nonverbal subtests based on P and D value are also presented. Besides, pictorial presentation of those modified and adapted items are shown in Appendix 10.

Difficulty Index.

The level of difficulty of an item focuses on the proportion of students who correctly answer an item. The higher the correct response from both groups, the easier the item. On the other hand, as the item gradually becomes more difficult the proportion of answering an item correctly becomes lower (Ahmann and Glock, 1981). An item difficulty of 1.0 indicated that everyone answered correctly, while 0.0 means no one answered correctly. The item analysis considered 30 students from each age group (6 to 16 years). These analyses were done into two ways. In the first stage, in calculating the discrimination index and difficulty index participants from all age groups (6 to 16) were considered. In the second stage, age specific difficulty index and discrimination were calculated. In calculating difficulty index and discrimination index 165 students from all ages were considered, where, 82 were from lower score group (lowest quartile) and 82 from upper score group (highest quartile).

Table 5

Difficulty Index (P) for each items of SB5 in Bangladesh (All age)

 Difficulty LevelDifficulty Index (P) ValueResults from the Study
 Non VerbalVerbal
Low Greater than and equals to 0.8055 items50 items
Moderate  Ranges from 0.31 to 0.7943 items54 items
High Less than and equals to  0.3032 items15 items
 Did not answered22 items22 items
Total152 items141 items

The above Table 5 describes the number of items that were found difficult (low, moderate and high difficult) considering all age group.  Findings reveal that from non verbal domain 55 items were found low difficult, 43 were moderate and 32 items were high difficult. From the verbal domain, 50, 54, 15 items were found low, moderate and high difficult respectively. It was also found that the difficult items were usually from upper level of the test domain (e.g. level 5; level 6). Item-wise difficulty index value was presented in Appendix 9.

For age specific difficulty index, score and items of 30 students of each age were considered. Each age level was separated for the analysis. At each age level the study had 30 participants. Participants were ranked following their total raw score then grouped into higher and lower category. In this calculation, 15 students were from lower score group and 15 were from upper score group of the 30 students.

Discrimination Index (D).

The item discrimination index (D) can vary from -1.00 to +1.00. A negative discrimination index (between -1.00 and zero) results when more students in the lower group answered correctly than students in the higher group. A discrimination index of zero means equal numbers from higher and lower students answered correctly, so the item did not discriminate between groups. A positive index occurs when more students in the higher group answer correctly than the lower group (Jean-Marc, 2008). The following table depicts the items which have discrimination index and which do not have. As like as the difficulty index, the discrimination index were analyzed into two sections namely all ages and age specific sections.

Table 6

 Discrimination Index (D) for each items of SB5 in Bangladesh (All age)

Discrimination LevelDiscrimination Index (D)ValueFindings from the Study
 Non VerbalVerbal
Very good Greater than or equals to 0.651 items16  items
Good When D ranges from 0.31 to 0.6015 items31 items
AcceptableWhen D ranges from 0.01 to 0.3014 items41 items
Bad Less than or equals to 0.2050  items31 items
 Did not answered22 items22 items
Total152 items141 items

 The above Table 6 describes the number of items by their discrimination level (very good, good, acceptable, and bad) considering all age group.  Findings reveal that from non verbal domain 51 items and from verbal domain 16 items were found bad considering the discrimination of the items. From the non verbal domain 15, 14, 50 items and from the verbal domain 31, 41, 31 items were found very good, good and acceptable respectively at the decimation level. It was also found that the discriminating items are usually from middle level of the test domain (e.g. level 3; level 4). Item-wise discrimination index values are presented in Appendix 9.

Similarly, for age specific discrimination index, score and items of 30 students of each age was calculated. The below figure 11 shows that in non verbal domain as age increases, it gradually reduces the number of unanswered items.  Again, as age increases the proportion of acceptable items also increases.


As discussed in earlier chapter two, the test adaptation is a process by which a test (or assessment instrument) is transformed from a source language and/or culture into a target language and/or culture. The dynamic strength behind test adaptation is test validity (Geisinger, 1994cited in Matthews, 2003). Since we know that the purpose of any testing is to produce meaningful and interpretive assessment outcomes, then the aim of any test adaptation is the same; to provide a fair, equivalent, applicable and interpretable assessment instrument (Misra, Sahoo & Puhan, 1997 in cited Matthews, 2003). In accordance with this point of view, this research includes test adaptation for the completion of standardization of Stanford Binet Intelligence Scale Fifth Edition for use in Bangladesh. The standard guideline recommended by the International Test Commission (ITC) was followed for the process of adaptation (Guideline D1., ITC, 2001; Hambleton, 2005).

The adaptation and modification of the items were done based on the difficulty and discrimination index. It is obvious that the items of original SB5 thematically correlated to identify one’s intellectual ability. In order to retain the original theme, the items were replaced with native content/symbol or object, made the item culture friendly, and often retranslated the question for better understanding of the students. As this test was developed for ages 2-85 years and since a specific age group (6 – 16 years) was considered for the present study, the items were not eliminated. Moreover, when the items were found to be continuously difficult through difficulty index in several age groups, the items were adapted or modified considering the color or the object or the language.

Determination of Validity

In psychology, validity has two distinct fields of application. The first involves test validity, a concept that has evolved with the field of psychometrics: “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” Anastasi & Urbina, (1997) describe. In this study we conducted content related validity and contrasted group validity (studies of special group) as well as SB5 (adapted in Bangla through this study) vs. WISC-R (Bangla version) under criterion related validity.

Content Validity.

As discussed in chapter three, a concrete and comprehensive opinion based on SB5 test adaptation including conceptual, methodological issues, items for producing adapted instruments, translation and questions of ethics in cross-cultural contexts were taken by the professionals (Appendix 4). The decisions for item acceptance and other issues for adapted instrument were made in assistance with the guidelines recommendation from the International Test Commission (Guideline D1., ITC, 2001). Timing issues were also specifically addressed within the context of adapted test in their discussion.

Criterion Validity.

The study explored the correlation with the different domain scores of SB5 (adapted in Bangla through this study) and WISC-R (Bangla version) and Contrasted Group Validity to meet the criterion related validity.

SB5 (adapted from this study) vs. Wechsler Scales (WISC-R, Bangla Version).

In order to examine the criterion related validity SB5 and WISC-R were administered on the same participants. The study considered 90 students from three age groups (age 7, 11, and 14). The study found that the mean of IQ varies by age in considering SB5-BD (NVIQ, VIQ, FSIQ) and WISC-R (Verbal, Performance, Full Scale). Findings reveal from the descriptive statistics that there were significant similarities between the IQ scores obtained by the two tests.

Table 22

Mean and SD of IQ Scores of SB5 (Standardized from this study) and WISC-R (Bangla Version)







P value

Performance (WISC-R)













Verbal (WISC-R )













Full Scale (WISC-R)













 The Table 22 states lower correlation with the verbal (SB5) and verbal (WISC-R) IQ scores for three age groups. This correlation also figured out similar trend for nonverbal (SB5, adapted from this study) and performance (WISC-R) and FSIQ (SB5-BD) and FSIQ (WISC-R) and Full Scale (FSIQ-BD) IQ scores. Studies suggest that WISC-R is comparatively difficult as well as obsolete to the students whereas SB5 is usually user-friendly and latest edition. Besides, this might be because the adapted version WISC-R (1974) might need to revise as this one was standardized on 1980. However, the English version of WISC is currently updated in 2001.

Contrasted Group Validity / Studies of Special Group.

To find out the differences in IQ, the Stanford Binet Intelligence Scale Fifth Edition was administered on normal and different types of students with special needs (intellectually disabled, autism). With the three IQs [Non-verbal IQs (NVIQ), Verbal IQs (VIQ) and Full Scale IQs (FSIQ)] separate mean and standard deviations were calculated for the two groups. Figure 13 shows the mean and SD of the NVIQ, VIQ and FSIQ of the normal students and special needs student (following the norm developed by this study).

The result indicates that there is a similar trend among all the mean and standard deviations and the range is between 101 and 105 along with 4.16 and 5.32 respectively. Result indicates a low mean and standard deviation among special needs students.


Bangladesh has a dire need to improve and update the standard of existing assessment techniques in intelligence testing which are a continuous process and an integral part of educational instruction and development, as it determines whether or not the goals of education are being met. Assessment affects decisions about grades, placement, instructional needs and curriculum. To accomplish this goal, assessing individual’s intellectual abilities is one of the prime requirements to ascertain one’s potentialities.  In broader perspective, the purpose of the present study was to improve and strengthen the existing standard of intelligence testing and upholds the trends by standardizing contemporary device for ensure the provision of identification, decision for   educational placement and intervention services for the children with special needs in Bangladesh. In this context, standardization of a current intelligence test such as the Stanford Binet Intelligence Scale Fifth Edition (Roid, 2003) was required to serve this purpose.

At present the most useful evaluation method available for understanding of human intellectual abilities, and human knowledge is through standardized testing. However, validly and reliably used, standardized tests provide useful information to decision-makers that no other evaluation method can provide. Thus such tests have been developed and administered on a large scale population in advanced and high economic countries (Phelps, 2008). It is also true that, there is a necessity for a comprehensive approach to compare between individuals, as well as intra-individual performances. Considering that intelligence tests have long been regarded acceptable ways of predicting future outcomes, it occupies an important place in the educational and psychological landscape (Anastasi & Urbina, 1997). Likewise, Stanford-Binet maintained a hybrid structure, combining point-scale and age-scale formats. This fifth edition also improved the psychometric characteristics of the test by introducing a parallel form and more representative norms from the earlier versions. The test would provide an estimate of the level at which an individual is functioning based on a combination of many different subtests or measures of skills (Becker, 2003).

This chapter provides a comprehensive description of the outcome of the study. It illustrates the findings of the standardization procedure with the various significant reflective features based on national and international perspective. Taking into consideration of the findings of this study, the core concepts – item analysis, construction of norm, reliability and validity of standardization process are discussed in this chapter.  After the adaptation and completion of standardization process comprehensively and effectively through this study, the term SB5-BD was used in this chapter to discriminate from original SB5 and make the term reader friendly.

Standardization of SB5 in Bangladesh (SB5-BD)

Item Analysis.

In the context of psychometrics, item analysis is the procedure employed in test construction of qualitative evaluating each test item in terms of its content and form (Cortina, 1993). The item analysis is an important step in the development of any psychological tests. In this step statistical methods are used to identify test items that are too easy and/or too difficult. Item analysis is especially valuable in improving items which will be used again in later tests. But it can also be used to eliminate ambiguous or misleading items in a single test administration (Anastasi, 1997). The responses of students’ performance in this test were similarly used to determine the difficulty and discrimination index for each test item.  In this study, hence, the higher this index value, the lower was the difficulty. The higher the discrimination index, the better the item could determine the difference, i.e., discriminate between those students with high test scores and those with low ones. The findings of the study in discriminating power and tending to centre on difficulty indexes of less than 50% were accepted for rearrangement, which also showed similar result in item analysis conducted by Huq in 1989 that the items have higher discriminating and difficulty indexes of 50% acceptance.

There is an increasing demand and need for psychological tests for different cultures and countries. In consequence, much greater awareness is being taken into consideration regarding the development or adaptation of test items. The items in the test must be culturally equivalent, where the meanings of the items need to be correctly translated and adapted so as to maintain the validity of the test in the new cultural context (Vijer, & Hambleton, 1996).  Similarly, the findings of this study revealed that several items were adapted, modified or rearranged to keep the cultural fairness and sequential consistency from easy to difficulty order in comparison with age. In addition, by retaining the cultural uniqueness of the original SB5, the test was translated and adapted for Bangladesh culture. Further, Rodriguez and her associates in 2011 also emphasized in his study that there is a high correlation among intelligence, culture and age.

Adaptation of the items in Non verbal domain

Nonverbal domain presents a student’s intellectual ability to analyze and solve problems without relying upon or being limited by language abilities. It intends to depict an individual’s understanding for reorganization of visual sequences, analogies and causal relationships in illustrative and symbolic mode (Robinson, 2002).

Based on the findings, difficulty (Table 5, Figure 9, 10) and discrimination index (Table 6, Figure11, 12), the study adapted few items from original SB-5. Table 7, Table 8 and Appendix 10 provide the number of items modified, changed and retranslated during the adaptation process for both domains. By adapting the test to an examinee’s functional level, the SB5 routing procedure increases the measurement precision by tailoring the difficulty of the items to the examinee’s level of cognitive functioning. Vincent & Kamphaus (2011) conducted a study in USA for the construction and adaptation of nonverbal subtests of the Stanford-Binet Fourth Edition. This study recommended with a rationale for use of the adapted items in clinical and academic assessment practices.

 However, in Routing- Nonverbal (Fluid Reasoning) subtest number of items that were found relatively difficult to understand was rearranged. Item 7, 26 and 28 (logical and mathematical problem solving) were replaced with the original item 8, 27 and 32 respectively. In Knowledge subtest, pictures that felt to be unfamiliar and uncommon to the students were modified. Items from level 2 (feeding a child), level 3 (drinks with straw from glass, eat with spoon from bowl, sweeps with mop), level 4 (stamp in wrong place, balanced scales, rooster on nest) were replaced with culturally appropriate pictures keeping the theme unchanged (Appendix 10). As to item 4 (South America and North America) was found difficult it was rearranged with item 6 (wind in two directions) of the same level. For Quantitative Reasoning, item from level 2 (5 birds) was replaced with picture of native birds (Appendix 10). Item 1 from level 4 (mathematical sequential order) was rearranged with the item 3 from the same level where item 3 was found easier in comparison  to mathematical sequential order. In Visual Spatial, level 2 was reshuffled with level 3 in Nonverbal domain, as  level 3 was correctly answered by majority whereas level 2 of Visual Spatial Processing was more difficult than the former . The study found no change or rearrange in the working memory subtests. The Cronbach’s alpha coefficient for the nonverbal items (items 152) is 0.82 suggesting that the items have relatively high internal consistency concluding that there is similar performance trend in the non verbal domain.

The items of the test (original SB5) were firstly edited and standardized for USA culture. The exclusive feature of the original SB5 sustains the nature of being culture free which required simply for the items to be rearranged and replaced in context to Bangladesh culture. Consequently, the themes of the items (people, common object) were consistent and unchanged instead of item elimination. So it has become apparent that the items of original SB5 thematically correlated to identify one’s intellectual ability. Thus, in order to retain the original theme, the items were replaced with native content/symbol or object, made the item culture friendly, and often retranslated the question for better understanding of the students. However, as original SB5 is a test for 2 to 85 years of people whereas through this study, the Bangla version was standardized on students aged 6-16 years. Hence, this study did not attempt to modify the items of higher level (specially level 5 and 6), even  though,  the items were found too difficult.  Moreover, when the items were found to be continuously difficult through difficulty index in several age groups, the items were adapted or modified considering the colour or the object or the language. The study by Huq, 1989 (previously discussed) also revealed that the adaptation of nonverbal subtest of Stanford Binet Intelligence Scale Fourth Edition (SB4) for use in Bangladesh was due to similar reasons, specially rearranging items and replacing local items. Besides, as adaptation is essential for standardizing a test, several studies suggest for the adaptation of Binet scale for use in their respective culture. Similarly, in India, Binet Scale has also been considered as a standard criterion for the assessment of intelligence. Among many adaptations, Kulshrestha (1971) adapted the Stanford Binet Intelligence Scale Form L – M in Hindi.  Later, The Binet Kamat Scale of intelligence (BKT) is also another Indian adaptation of the Stanford-Binet Scale of Intelligence. In this context, some of the test items and materials were replaced to suit Indian conditions, such as Indian coins, typically Indian pictorial scenes, vocabulary and Indian concepts. The intelligence scale assessed the child’s skills in six areas: memory, language, conceptual thinking, reasoning, numerical reasoning, visual-motor coordination and social intelligence. A similar study on item fairness of nonverbal subtest of SB5 by Hurlow in 2011 suggests that there is little evidence of item bias between children and adolescents who are from a Latin country and Caucasian/White Non-Hispanic children with comparable ages, genders, and socioeconomic status. However, this study compares age specific items while the students were mostly representative of the urban middle class status.

Adaptation of the items in Verbal domain.

The verbal domain refers to the extent to which a student can approach words, sentences, written texts verbs, adjectives, as well as, the extent to which he/she can comprehend meanings, produce synonyms and antonyms, know the meaning and use of words, complete sentences with words omitted based on the word context and have a critical view towards written speech. Verbal skills may involve concepts as concrete or abstract ideas. It includes ability to analyze information and solve problems using language-based reasoning (Munoz –Sandoval, Cummins, Alvarado & Ruef, 1998). Verbal reasoning is important in most aspects of school work. Even the more abstract courses such as math and physics require verbal reasoning skills, as most concepts are introduced orally by the teacher. The verbal reasoning also reflects children’s ability to explain verbal concepts clearly, provide rationale for their choices and explain conceptual information. Verbal ability, measured by the verbal IQ, is one of the most accurate predictors of academic success in formal school programs (Munoz –Sandoval et al., 1998).

Similar to the non verbal domain, adaptation was also done in the verbal domain. Likewise, 14 items were adapted from the original SB5.  In Routing/Knowledge subtest, number of items that were found relatively difficult to understand were modified and adapted with culturally appropriate picture while keeping the sequence and theme unchanged for items 10 – 14. In Fluid Reasoning subtest, pictures that were felt to be unfamiliar and uncommon to the students were modified. Items from level 2 (cat and boy playing, laundry, puzzle), were replaced with culturally appropriate pictures keeping the theme unchanged. In level 3, baseball and chop- stick (utensil) were replaced with cricket ball and fork respectively (Appendix 10). In level 4, the item3 named as melted ice-bergs in Caribbean country were replaced with culturally appropriate themes (melted icebergs in Bay of Bengal). As the verbal analogy subtest is usually a complex concept for secondary age students with few exceptional correct responses. Finally, in level 5 and 6, the items had to be retained because of the higher – order age sequence. Since the age range of the scale starts from 2+ years the standard of the items of level 2 in Quantitative Reasoning subtest are suitable for very small children.  Therefore, the items of level 2 were not taken into consideration for modification until further research on lower age levels.  The picture of item 4 in level 2 was adapted keeping the theme unchanged to be culture friendly.  In Visual Spatial Processing subtest, items 1 (In front   of girl) and 2 (Behind the girl) of level 3 were found difficult and rearranged. The rearrangement was done by easier items 3 and 4 in place of 1 and 2. The study found no change or rearrange in the working memory subtests as it shown in nonverbal working memory subtests.

Findings reveal that there is a bit lower alpha value for verbal section compared to nonverbal section. The Cronbach’s alpha coefficient for verbal items (items 141) is 0.75 indicating an acceptable internal consistency.  This might be causes of our educational system in Bangladesh where verbal communication is not given preference in the school grading system. Thus verbal items were difficult to response.   Another current study evaluated the applicability of the Australian Adaptation of SB4 found considerably higher mean IQ scores than the normative US means  (Rodriguez, Treacy, & Sowerby,1998) .  Retaining the theme of original SB5, the findings of the study revealed that the items were significantly adapted, modified and rearranged through item analysis. The overall reliability coefficient (α= 0.84) suggests that there is high and increasing correlation among the items.  The study concluded that the items were adapted for Bangladesh which was cultural friendly, changing the order, language through using the tools of item analysis. Since the item analysis in present study was the part of standardization of SB5 for use in Bangladesh, this criterion of standardization (item analysis) established a path for further accomplishment of norm development.

 Construction of Norm of SB5-BD

Construction of age norm, as a part of standardization process, involves the administration of a test under uniform and standardized conditions to a large numbers of individuals at various ages.  The two test areas for identifying ones intellectual ability are norm reference and criterion reference tests. Literature suggests that many standardized tests are norm referenced; i.e., test scores are interpreted with reference to the scores obtained from the sample. Norm-referenced tests are designed to examine individual performance in relation to the performance of a representative group. Criterion-referenced testing, unlike norm-referenced testing, uses an objective standard or achievement level. An individual is required to demonstrate ability at a particular level by performing tasks at that degree of difficulty. Scores on criterion-referenced tests indicate what individuals can do — not how they have scored in relation to the scores of particular groups of persons, as in norm-referenced tests. For the present study, age norm was computed from the raw scores obtained by the subjects after administering the non­verbal and verbal subtests (five subtests from each domain) of the Stanford-Binet Intelligence Scale (Fifth Edition).  In the study, the Binet Scale (SB5) was administered 3300 students from 11 age groups (300 from each age group) for the construction of norm. The original Stanford Binet Scale (Fifth Edition) was normed on a nationally representative sample of 4800 individuals, ages 2 to 85+ years.  The sampling of original SB5 was matched on several variables (age, gender, ethnicity, geographic region and socio-economic levels) based according to national US Census, 2001 (Roid, 2003). In contrast, the present study maintained a standard technique in selecting an adequate number of sample sizes. In addition, the study focused on age, gender, metropolitan region and middle class status.  However, the samples were purposively selected from 6 divisions whereas the seventh division was officially declared after the data collection (Appendix 5).

While in constructing the age specific norms, the study followed the standard procedure of calculation. It estimated the mean and SD of raw scores that was found from administering the test to individuals in assessing the IQ of two different domains (viz. non verbal and verbal) as well as the FSIQ. The FSIQ is computed as a sum of all the activities in the SB5; i.e., all subtests covering both the verbal and nonverbal domains of cognitive ability. Thus FSIQ is a global summary of the current general level of intellectual functioning as measured by SB5. In several times, researchers such as Carroll (1993) and Gustafson (1984) and Roid (2003) would describe the FSIQ as a measure of the hierarchical factor that exist among the scores of an intelligence test. The FSIQ score for the SB5 particularly strong in its predictive promise because it covers more factor than widely used batteries and includes a balanced coverage of both nonverbal and verbal aspects of each factor. FSIQ is intended to measure all possible aspects of intelligence that could occur across all cultures or settings. For example, some dimensions not represented in FSIQ include long term memory, auditory,  and kinesthetic abilities. Similarly, a study by Ken Richardson in 2002 suggests that IQ scores can be described in terms of sociocognitive-affective factors that differentially prepare individuals for the cognitive, affective and performance demands of the test. This study supports the findings of the present study for the construction of norm.  The final outcome of this study is the qualitative description as well as categories of intellectual ability based on IQ scores. The qualitative descriptions and IQ ranges  for age norm of 6 to 16 years children are Significantly Below Average   (≤ 86), Moderately Below Average    (87-94), Below Average      (95-104), Average     (105-115), Above   Average     (116-123), Moderately above average     (124-127), Significantly above average   (128-152) and Above respectively . Similarly, the ability levels from the findings (Table 18) based on FSIQ Scores of SB5-BD are developed and followed according to the findings of the study of application of SB5 results to learning in the classroom by Carson &   Roid (2004).

Reliability of SB5-BD

The reliability of a test score refers to its precision in measuring the true attributes of a person and its consistency across sets of item, multiple testing occasions and other conditions that affect scores stability (Roid, 2003). The present study attempted to investigate the quantitative index of reliability for SB5-BD scores including test retest stability. The most common method for finding out reliability of test scores is by repeating the identical test on a second occasion, particularly after a week gap. The reliability coefficient in this case is simply the correlation between the scores obtained by the same persons on two times administration of the test. 

The IQ scores (Nonverbal, Verbal and Full Scale) of the study were calculated with  multiple formula (Pearson Product Moment Formula, Spearman’s and Kendall’s formula) to find out the  reliability coefficients (Nunnaly, 1967, in Roid, 2003). The coefficients for the Non verbal (r=0.71), Verbal (0.76) and Full Scale IQ scores (0.75) of SB5-BD were with Pearson Product Moment. The study depicts that there was a consistency between the two correlations of tests scores. The subtest wise reliability ranges 0.88 to 0.98.  Likewise, the original SB5 had a higher reliability of 0.90 to 0.93 (Roid, 2003). In a similar study, Madsen (1934) proved that the reliability coefficients of Stanford Binet IQs ranged from 0.65 to 0.94 respectively.  Moreover, in Bangladesh, Huq (1992) determined the test retest reliability of the nonverbal subtests of SB4.  The correlation thus computed was found to be 0.97 for urban sample. Measurement error is evaluated by examining the reliability of each test score. The reliability of a test score refers to its precision in measuring the true attributes of a person and its consistency across sets of items, multiple testing occasions, and other conditions that affect score stability. Reliability for SB5 scores includes internal consistency, test-retest stability, and errors of measurement. Internal-consistency reliability ranged from 0.95 to 0.98 for IQ scores and from 0.90 to 0.92 for the five factor index scores. For the 10 subtests, average reliabilities (across age groups) ranged from 0.84 to 0.89, providing a strong basis for profile interpretation. Test-retest reliability studies were also conducted and showed the stability and consistency of SB5 scoring (Roid, 2003).

In another study, the reliability of the Stanford-Binet scale was determined by Lincoln, E. A. (2010). The findings of this research revealed that the correlation between the first and second examinations was 0.95. Test Retest reliability shows  the extent to which scores on a test can be generalized over  different occasions,  the higher the reliability, the  less susceptible  the  scores are to the random daily changes  in  the conditions of  the  subject  or  of  the  testing   environment  (Anastasi  & Urbina,1997). In this study, means and standard deviations for test retest administrations of three IQ scores were consistent illustrating the stability of the scores obtained by students across time. Because there are many sources of random error across testing occasion, such as noise, distractions and moods of the students, the correlations are not expected to be as high as the internal consistency estimates presented in original SB5. Due to the effects of practice and familiarity with testing procedures, the mean scores of the test-retest sample may show some degree of improvement across administrations. Studies of test retest effects normally show that practice effects dwindle across intervals of several days or weeks (McArdle & Woodcock, 1997). Based on the SB5 studies and from comparisons with other IQ scale, the SB5 IQ scores appear to be quite stable and less affected by practice effects. (Gregory, 1996 cited in Roid, 2003).

Validity of SB5-BD

The most acceptable way of assessing an instrument’s legitimate usefulness is through the use of validity studies (Anastasi & Urbina, 1997; Sattler, 1992). Validity studies, as outlined in the Standards for Educational and Psychological Testing are thought of as the “most important consideration in test evaluation” American Educational Research Association (AERA, 1999).  In this current study (SB5-BD), validity is used to determine whether the assessments in question are in fact decent means of assessing intellectual abilities. Validity of test scores depends on the proper administration of the test by an experienced examiner and proper recognition of the unique characteristics of the individual examinee (Matarazzo, 1990 cited in Roid, 2003). Validity has several features and is established by the presentation of content related, criteria related and constructs related facts. Thus there is no single indicator of validity (Roid, 2003).

Expert opinions were taken into consideration while standardizing the test instruments. According to experts, the items and activities containing the SB5 were valid to be used for Bangladesh culture and content of the adapted items were highly correlated and consistent with the earlier Bangla version of SB4. On the other hand, Mark Pomplun and Michael Custer (2006) conducted a study on the validity of the measures of verbal and nonverbal working memory of SB5. The item mapping clearly demonstrated a parallel between increasing item difficulty and a progression of item characteristics that placed increasing demands on verbal and nonverbal working memory. The findings reveal that the higher correlations between SB5 verbal working memory and reading skills and between SB5 nonverbal memory and mathematics skills are consistent with past research.

Along with this content validity, criterion related validity and contrast validity were also experimented. There were significant difference (p<0.001) among the mean score between the special needs and normal students. Besides, there was lower correlation (r =47.31) between the two groups. This section of the study suggests that the test has ability to discriminate the normal and children with special needs. In 1991, Kline, Graham & Lachar investigated the contrast validity of  nonverbal subtests of  SB4 between students with verbal ability  and students with  reading  problems. The findings depicted that there was significant difference and lower correlation among the scores of two groups that supports the outcome of the present study. Laurent et al. (1992) reviewed the Stanford-Binet Intelligence Scale-Fourth Edition and revealed the scale as a valid measure of general mental ability. The review also suggested that the SB4 could distinguish between groups of young students with differing intellectual abilities (e.g., mentally handicapped, gifted, neurologically impaired) and that the test correlated highly with scores on achievement tests. On the basis of validity information, recommendations for the use of the SB4 were made. A validation study by Tucker (1991) was conducted on the Stanford-Binet-Fourth Edition for using in the re-evaluation of learning-disabled students. The findings of the study ascertained that the   SB4 scale is appropriate and effective assessment tool in evaluating strength and weakness of children with learning disability. In 2007, Abbott   conducted a comparative Study of the Working Memory Scales of the WISC-IV and the SB5 in Referred Students.   The study compared the working memory scales of the WISC-IV and the SB5 as both tests are used, in part, to develop academic interventions for students. There is a moderate correlation (0.6) between the two tests with 33 % of shared variance. The findings indicate that the two tests do not measure a similar ability and scores obtained on them should not be interpreted in the same manner. More research is needed to investigate the specific constructs measured and which test is most appropriate to assess working memory problems.

In another similar study, Askarian, Ali, Kambiz & Hassan (2011) computed the diagnostic validity for new edition of Tehran-Stanford-Binet Intelligence Scale in order to identify the children with learning disabilities. The results showed that this scale had the good diagnostic validity and desirable potential to identify students with learning disabilities. So according to them, this scale as a valid tool can be used for identifying students with learning disabilities can be used.

 Apart from this, the study explored the correlation with the different domain from scores of SB5 and WISC-R Bengali version with a group of 90 homogeneous, nonexceptional  school aged students,  from three age groups (age 7,11 and 14). So, SB5-BD and WISC-R were administrated on the same students consecutively. Here, this study found a lower correlation 0.22, 0.23 and 0.13 between the IQs scores of two different intelligence scales (Table 23).   Studies suggest that WISC-R (adapted in 1984) being outdated is comparatively difficult to the students where as original SB5 is usually standardized student-friendly and culture free test. Moreover, this lower correlation indicates that an individual’s intellectual ability would never stagnant and permanent and should not assess through an obsolete and outdated intelligence test like WISC-R (adapted in 1984). Despite the limited nature of the sample, the findings suggest that the SB5 has a significant similar positive trends and relationship with the WISC-R. The tests displayed a moderate level of common variance. Inter co- relation partially supported the SB5 predictions of relationship between the two instruments.

Prior to this present standardization, the five nonverbal subtests of the Stanford Binet Intelligence Scale Fourth Edition (SB4) were also standardized in Bangladesh by Huq in 1991. The study validated the subtests with the three well known tests which were also previously standardized for use in Bangladesh namely Independent Behavior Scale (IBS), Denver Developmental Screening Test (DDST) and WISC-R. The Pearson Product Moment coefficient of correlation was computed between the SB4 and three tests. Findings revealed that the correlation between SB4 and IBS was 0.71. Similarly, the correlation SB4 and WISC-R was 0.73; the two correlations seemed to be reasonably high. Finally the coefficient of correlation between SB4 and DDST was 0.57.

Another study that is adapted Bangla version of the WISC-R was standardized in Dhaka city by Huq (1980). The validity was computed to find out the separate correlation. The correlation between verbal subtests and school final examination was between 0.31 and 0.78. The correlation between performance subtests and annual examination ranged from 0.44 to 0.75. As for the full scale the correlation was between 0.06 to 0.69. Though the students or participant’s school final examination record  were not taken into consideration in this study  for determining the correlation but all the participants were purposively selected on the basis of their school academic performance and that is average and above average students. Kush (2004)  reviewed and described his study of  comparison among  the Stanford Binet Intelligence Scale Fifth Edition with the Wechsler Intelligence Scale for Children-Third Edition (WISC-III) (r = 0.84); and the Woodcock-Johnson III Tests of Cognitive Abilities (r = 0.78). The study found supplementary criterion-related validity between the two scales. In 1990, Hollinger & Baldwin examined the performance of 19 exceptional children on the Stanford-Binet Intelligence Scale, Fourth Edition (SB4) and the WISC-R. The results obtained for the naturally occurring sample of exceptional children indicate nonsignificant differences in performance between SB4 and WISC-R Full Scale IQ.

Similarly, the validity of the Stanford-Binet Intelligence Scale-Fourth Edition (SB-4) and that of the Kaufman Assessment Battery for Children (K-ABC) were investigated by Emily and Robert (1987). In this study, the SB-LM was used as a criterion measure with which to compare SB4 and K-ABC scores. The study found significant correlation among the test scores. A paper prepared by Bivens in 1994   on Stanford Binet 4th Edition for adaptation in Australia.  The correlation of the criterion validity in this study ranged between 0.67 and 0.83 which is statistically significant.  Further, a longitudinal study of the Stanford-Binet and WISC-R with special education students by Covin, Theron & Sattler was conducted in 1985. Correlations between Stanford – Binet and WISC-R Full Scale IQs were significant in both ethnic groups, with r =0.60 for the total group.

Thus the processes of standardization are discussed based on study findings with several other significant literatures. Through various effective criterion measures, this process has been completed and finally it can be traced as a conclusion that the adapted standardized SB5-BD intelligence scale completed its standardization process through item analysis (modification and adaptation), construction of norm IQ range of children 6 to 16 years. Besides, SB5-BD has been established as a reliable tool through its consistency and accuracy of measuring intelligence of an individual. Moreover, standardization process also confirmed that SB5-BD is   such a norm referenced standardized test that has been established by collecting relevant outline of evidence on individuals’ intellectual ability so that educators can draw appropriate interpretations of assessment results and named as valid tool.


The author of the present study thinks that there are substantial outcomes from the research results on intelligence testing through the standardization of Stanford-Binet Scale Fifth Edition . A reliable and valid psychometric test device can be useful in educational and clinical evaluations. Its use has been and will continue to be influential in shaping educational policy and practice. In the past, scores from intelligence tests had led to wide spread ability grouping.

The most recently authorized 2004 version of IDEA (Individuals with Disabilities Education Improvement Act, P.L. 108–446) have emphasized critical guidelines that focused on the significance of this study for the identification, assessment  and treatment of young children with special needs .With the contribution of legislation, the evolution of assessment process have focused from  mere identification of children in need of early intervention programmes  to increased emphasis on  school-based services.

The results of these intelligence measures have at least three major areas of educational application.

Educational Provision

Since general intelligence plays an important role in many valued life outcomes, this research also suggests that IQ correlates with academic success eventually leading to future job performance and socioeconomic advancement (e.g., level of education, occupation and income).

The implications of SB5-BD provide a basis for ability performance relationships across major life arenas, including learning, work and daily life. The findings of this research suggests for making practical use of SB5-BD – for learning in classroom and classroom instructions. The outcome of the IQ scores can play a vital role in identifying the strength and weakness of low and high functioning students along with their special needs in classroom settings. The relative strength of nonverbal and verbal domain would be preferred for learning activities. For example, if the verbal domain is relatively stronger, the student is more likely to engage in learning through verbal means, such as reading, oral communication and through practice activities that emphasize the roles of speech and language. On the other hand, if the non verbal domain is relatively stronger, the student may be more likely to engage in learning activities that permit practice through nonverbal means.

The focal variables of this study were five factors (Knowledge, Fluid Reasoning, Quantitative Reasoning, Visual Spatial Processing and  Working Memory) comprising of  two domains (Nonverbal and Verbal), resulting in ten subtests (discussed in chapter two).  The results unveil the pragmatic demands or recommendation of the above subtests for enhancing teaching learning strategies considering students learning style. The implications of each subtest in educational settings are stated below.

Nonverbal Fluid Reasoning (NFR).

A teacher can teach and evaluate a student’s sequential and inductive reasoning ability through solving novel figural problems, sequences of pictured objects, geometric patterns in classroom situation while planning and preparing his / her regular lesson. At primary level, students can learn to match simple objects that are then placed in series (e.g. decreasing the size of the counting objects). Besides, students can learn to identify and extend the series of sequential objects. At secondary level, teacher can teach students to continue a series of pictures to form repeating patterns (e.g. ball, bat, ball, bat etc.). Moreover, students can be taught by using the theme of this subtest through showing logical patterns of figural objects with one missing part.  Research proves that it is a good measure of ‘g’ that   assesses students’ ability of correlational interpretation, attention span, perception of part versus whole, concentration and some degree of spatial analysis (Sattler, 1988).

Nonverbal Knowledge (NK).

To use the theme of this subtest in an educational program, a teacher can evaluate his/ her students’ knowledge about common signals, actions, objects and the ability to identify absurd or missing details in pictorial material. At primary level, the activities, based on this subtest in a teacher’s lesson plan, can measure a student’s understanding of basic human activities (e.g. feeding a child, combing hair, clapping hands.etc) demonstrated in gestures. At secondary level, the activities will be more complex. The students can study pictures showing people in odd or inappropriate situations (e.g. girl with hair blowing in one direction while the wind blows the nearby trees in another direction) and point out the absurdity. The task requires students to have a basic level of common knowledge about people, nature and physical laws of the universe (Sattler, 1988). It also requires perception of detail, attention and concentration, inference, knowledge of science (e.g. how a balance works), and geography (missing nations on a world map). Teachers or educators also plan the activities to explain the absurdity vocally despite the presence of a visual illustration to assist the students. The students can point to the location and use gestures in addition to vocal speech to explain the silliness.

Nonverbal Quantitative Reasoning (NQR).

Based on this subtest, the educators as well as teachers can utilize their knowledge  in their educational planning. They can judge students’ ability to solve increasingly difficult premathematic, arithmetic, algebraic, functional concepts and relationships depicted in illustrations. At primary level, the activities design on this subtest can measure basic concepts (e.g. bigger/ smaller), counting, addition using objects and pictures and recognition of numbers. At secondary level, students can be taught and assessed with increasingly complex activities with illustrations depicting figural series, functional relationships, linear transformations and logic or algebraic relationships.   Research shows that VQR and VK subtests will have greatest relevance for school based learning and possible academic interventions (McGrew, Keith, Flanagan & Vanderwood, 1997).

Nonverbal Visual Spatial Processing (NVSP).

By using this subtest in the lesson, the teachers can plan to teach and assess student’s ability for visualization and solution of spatial and figural problems. Hence, at elementary level, this subtest can be used in assembling puzzle like pieces and visual matching activities. At the secondary level, this subtest is to be considered as a unique and interesting as well as new challenging task to above average and higher functioning students. In this context, the students can duplicate familiar patterns such animals, objects (e.g. house, boats) and people in motion by properly arranging the object pieces.

Nonverbal Working Memory (NWM).

A teacher will be able to measure fundamental short term memory of her students with observable objects and utilize this skill in tapping sequential activities. The teacher must plan his/her teaching activities containing this subtest with recalling a sequence of block taps.  According to Reid, Hresko & Swanson (1996), students can learn to memorize and sort out the activities as well as chunk of numbers that are stored in short term memory.

Verbal Fluid Reasoning (VFR).

The activities in this subtest include early reasoning, verbal absurdities and analogies. Based on this, a teacher can judge the ability to analyze and explain, using deductive and inductive reasoning and problems involving cause – effect connections in pictures, classification of objects, absurd statements and interrelationships among words. In this context, the activities at primary level will require the students to verbally describe implied connections in pictured events, sorting and classifying pictured objects. The activities of secondary level are to be designed to assess verbal reasoning and completion of analogies such as “_ is to B as C is to _.” Moreover, students of this level can learn by using their verbal abilities such as verbal fluency, vocabulary meanings and variations and solve problems such as guessing and checking (Carroll, 1993).

Verbal Knowledge (VK).

This test is termed as Vocabulary which measures general and crystallized ability and is applied in psycho educational settings. To judge the ability, the teacher can emphasize on students knowledge and memorization of concepts and language, and to identify and define increasingly difficult words. Likewise, this subtest can be used in elementary level in identification of body parts (on the students own body and on the child picture), toy objects and picture vocabulary. Besides, at secondary level, students are presumably influenced more by the effects of schooling and extensive reading.  They can be more competent in learning more upper level vocabulary activities lead to higher literacy level, exposure to higher levels of spoken and printed Bangla and English language. This subtest does not emphasize articulation but it requires an individual’s ability to understand and explain the meaning of words.

Verbal Quantitative Reasoning (VQR).

The lower levels activities of this subtest are to be designed to measure counting of toys,  basic addition and subtraction using pictured objects and/ or word problems. At secondary levels activities, students are to be taught in measurements, geometric and word problems with multiple methods of solution. As stated earlier, since this subtest is based on academic learning, a teacher motivates students’ reading ability to solve increasingly difficult mathematical task involving the above outlined activities.

Verbal Visual Spatial Processing (VVSP).

Using this subtest, a teacher can determine the requirement of the ability to identify common objects and pictures using common visual spatial directions, indicating direction and position in relation to a reference point. For example, the activities in elementary level are to be designed containing with pictorial tasks requiring understanding of basic spatial concepts such as “behind” or “away from”. At secondary level, more expressive language will be required  to explain spatial orientations and directions in increasingly complex tasks.

Verbal Working Memory (VWM).

This subtest uses the activities of memory for sentences and last word. Through this test, a student requires the ability to demonstrate short term and working memory for words and sentences and to store, sort and recall verbal information in short term memory. To teach students, the teacher can read short phrases and sentences aloud to the students, who then recall them accurately. The teacher can also ask sets of questions and the students recall the last word in each question. The students are required to answer each question, ‘yes’ or ‘no’. There are many situations in the classroom where a student must selectively attend to portions of a teacher’s messages. According to Roid, 2003, such efforts entail filtering out noise from other students to hear the important messages from the teacher. A low level of ability in selective listening would seem to be predictive of an individual’s underachievement in the group – instruction methods of modern education.  This subtest will provide information about students’ cognitive deficits to teachers and parents.

Besides, a teacher will acquire self-confidence in handling students with diversity. This approach will promote peer interaction as well as create student friendly teaching learning classroom environment. The five factors of SB-BD  has been proved (according to findings) as the precursors for the measurement of literacy skills (reading and mathematics). Educators will get a clearer picture of potential academic difficulties and determining which educational interventions may be helpful at the school level (Coleman, Buysse, & Nietzel, 2006).  The implications of SB5-BD can also be considered in planning curriculum modification for children with special needs. The study reveals that the IQ scores will guide a teacher to have in-depth knowledge of a student’s potentialities. Further, based on the findings and  in relation to the qualitative categories as well as  FSIQ Scores (Table 18), the author recommends the following instructional strategies that can be utilized while teaching children with special needs along with other students in mainstream or inclusive settings (Table 25). Relations between FSIQ Scores of SB5-BD and Recommended Methods of Instruction

Ability Level and FSIQOptimal Method of Instruction
Significantly Below Average   ( ≤ 86)  Ensure that learning is at an appropriate slow speed, simple and supervised.
Moderately Below Average    (87-94)Provide very direct, hands-on-instruction.
Below Average      (95-104)At lower range, may benefit from plenty of direct supervision.
Average     (105-115)Students can thrive in learning in a traditional classroom format, with mixed
Above   Average     (116-123)Can more readily acquire skills in collecting and gathering their own information.
Moderately Above Average     (124-127)Create opportunities for these individuals to seek and find their own information and provide information as needed, particularly in these information search skills.
Significantly Above Average   (128-152) and Above These individuals may enjoy reasoning things through on their own. Use more direct methods as needed, but remember that traditional classroom teaching methods may become boring for these students.

 The above recommendations for optimal methods of instruction in classroom application based on FSIQ Scores of SB5-BD are similar and followed the study of application of SB5 results to learning in the classroom by Carson &   Roid (2004).

In addition, the SB5-BD factors may show a difference among students with high and low abilities. Students with high ability will appear to be creative and rely on reasoning skills by which they can reach to decisions with confidence. They can be judged several tasks at a time. High functioning students require enriched environments to gear up their creativity and potentialities.  On the other hand, students with low abilities will tend to avoid unstructured problems, become frustrated with too much demands from others, tendency to avoid mathematical problem solving issues, finds difficult to visualize problems through imagination, they are viewed as distracted, forgetful and inattentive by others.  Moreover, students with low functioning must have appropriate and simplified curriculum and teaching method with sufficient hands on instruction and plenty of direct supervision. Thus the above characteristics identified for this group has to be taken into consideration in the classroom teaching learning situation along with SB5-BD. The researchers think, incorporating the verbal domain in national curriculum would benefit the students to improve their verbal communication skills. Based on the rationale of the study (sketch out in chapter one), application and implication of all the ten subtests of SB5 are discussed in this chapter. It is to be mentioned that individuals’ potentialities and abilities should be judged by using these subtests but not biased by the subjective opinion of the teachers and parents.

Screening, Diagnosis and Remedial Planning

Early detection and diagnosis is the single most important key for effective way to reduce the risk of developing secondary problems as well as availing reasonable preventive measures.

However, intelligence testing is the estimation of a student’s current intellectual functioning through performance of various tasks designed to assess different types of abilities. The test scores will provide important information on how children’s  ability can be properly interpreted  to help educators for developing  appropriate  educational strategies  for remedial planning and intervention program and decisions for placement.  Besides, information from tests is more scientifically consistent than from a clinical interview, as well as for legal matters,   when decisions have to be made for disability issues, the standardized information from tests scores will help to overcome the personal judgment of the authority.

It is well established that since intelligence tests can reasonably predict levels of achievement, SB5-BD can be considered as a tool for identifying low, average and high intellectual functioning of students along with its relevant assessment techniques.

Accountability, Research and Evaluation

IQ measures are often included among outcome measures related to programme effectiveness. In addition, they are among the measures used in research to account for pupil characteristics. Besides, Stanford-Binet intelligence test is one of the important tools of psychological assessment used by professionals, psychologists in a variety of settings such as private offices, public and private schools, private and public mental health clinics and institutions, hospitals, the personnel offices of industrial companies and the counseling centers of colleges or universities are among others. This research states that the implication of IQ scores can play a major role in academic settings to determine developmental disability, prevalence of intellectual disability and other exceptional children with high or low capabilities.  Additionally, there are other educational indications, such as eligibility criteria for service delivery and school accommodations.

 It has become apparent that the implications of SB5-BD for future intelligence testing and for education are numerous. Assessment of intellectual qualities should go much beyond present standard intelligence tests, which seriously neglect important abilities that contribute to problem-solving and creative performance in general. Educational philosophy, curriculum-building, teaching procedures and examination methods should all be improved by giving attention to the structure of intellect as the basic frame of reference. The standardization of latest intelligence scale and its application in Bangladesh is not a new trend; rather it is a useful, dynamic and constant continuing practice in the field of assessment. Along with above significant and detailed discussions throughout the study, it has become crystal clear that Stanford Binet is the exclusively psychometric and contemporary intelligence scale among its counterpart which have been standardized in various editions and adapted in several cultures only for the necessity of assessing intellectual ability of an individual  and for the provision of  intervention and remedial services for children with special needs.    It can be concluded that the special features of SB5-BD also proves that  various intellectual abilities in individuals can be improved by utilizing this standard assessment scale which holds a platform  on top of all intelligence testing and goes beyond all the debate of traditional testing.


  • Classroom teaching –learning strategies can be modified by utilizing the theme of ten subtests of SB5-BD.
  • This study highly recommends utilizing the test in clinical setting for the identification and educational placement of children with special needs.
  • Further the outcome of the study can be executed in developing functional and simplified curriculum for children with special needs.
  •  Educators as well as professional of the educational institutions can utilize this SB5-BD for the evaluation of their student’s intellectual performance.
  •  Government should take necessary initiative to utilize the ten subtests of SB5-BD for upcoming NCTB curriculum modification.

Recommendations for Further Research

1. The need for a holistic concept of a student’s intellectual ability and for considering  other age levels, further research are recommended along  with an extended age range.

2. Deeming the present study a success, there needs to be further research taking into consideration students from rural and other diversified region to obtain a more comprehensive knowledge and understanding of the students’ ability.

3. In order to further validate these results, this study or similar studies need to be replicated with a larger sample size to indicate a more significant difference.  In addition, a more representative study is needed that would include more diverse subjects to generalize the population.

4. More research needs to be conducted regarding IQ testing with persons who have Intellectual Disability (ID).  We need further data to determine whether the Stanford-Binet produce similar or different results.  Furthermore, other IQ scales also need to be compared with the Stanford-Binet for persons who have ID.  This is an important issue that needs to have careful review and study as the public policy implications are huge.

5. The Government may afford funds for establishment of Assessment and Counseling Unit in the department of Special Education within IER, DU and may also plan to set up a test taker training center in this unit to increase the human resource in the country so that the access of students with special needs in mainstream schools can be ensured.


Since the standardization was limited to age (6 – 16 years) and for the sake of assessing children, the ten subtests (nonverbal and verbal) of the scale were taken into consideration for adaptation and standardization. The first limitation of the study was that the sample had been taken from a relatively small geographic region. Participants were considered from urban and metropolitan six cities area in Bangladesh. This means that the results of the study might not generalize to children from other geographic regions such as rural area. Although the study would attempt to contain demographic variables representative of a national sample , participants might be limited in other variables such as economic status and ethnicity. Since the study would be comprised of only students between the ages of 6-16 years and results might not generalize to other age groups of original SB5. Finally, obtaining a sample size of 4400 participants, the study limits the applicability of the results to more global populations. Caution should be used when making assumptions regarding large populations utilizing limited sample sizes.

Over and above, the study had to regard other limitations in context to the desired time frame for test administration as given below:

  • The authorities of few schools were less enthusiastic in pursuing the task of test administration that made the work lengthy.
  • Considering the schools’ academic formalities (e.g. class test, half yearly and final examination), test administration had to be deferred as well as in many situations, thus the pre-set schedule could not be achieved.
  • Besides, the other major limitation at the field level was the school holidays (following the school calendar) which also acted as a serious back lock for the smoothness of the overall research work.
  • Further, as test administration follows a systematic and standard procedure as given in the manual, the researcher had to obtain a thorough understanding of the test procedure. This required ample time to gain the proficiency in administering an individual test.
  • Since it was an individual test, the researcher had to consider a student’s pace of response.
  • Finally due to lack of expertise on test administration, along with research assistant, the researcher conducted the test individual handed.


The research relevant to theory and practice in intelligence   shows that the field of testing intelligence is active and dynamic. Also, it should be evident that intelligence researchers of the 21st century are addressing a broader, more complete concept of intelligence than was evident in the previous century. As related research in biology of the mind, emotion, neuropsychology, family dynamics and cognitive processing progresses to new findings, these results will be incorporated into increasingly useful models and theories of the workings of intelligence and how to assess individual intelligence. After completion of this study, the SB5-BD appears to be an effective measure of general intelligence across a specific age range (6 to 16 years) and the standardization sample appears to be a close match to the population on key demographic variables. IQ scores also cover a wide range of ability from the lower levels of moderate intellectual disability to the higher levels of intellectual giftedness (Table 18). As such, they will be helpful in assessing students with intellectual disability, learning disabilities and intellectual giftedness and interpretation of the global full scale IQ appears to have strong empirical support. Therefore, the present standardized test SB5-BD could be considered a unique achievement in the field of testing for a country like Bangladesh.  The Stanford-Binet has been standardized, translated and adapted in many languages and used in many developed and developing countries. The necessity for a standard assessment scale was required to diagnose children with special needs for their appropriate intervention and the outcome was the standardization of Stanford – Binet Intelligence Scale (Fifth Edition) in Bangladesh, SB5-BD.

Though there are constitutional, legislative and policy bindings and Bangladesh ratified the CRC and signed the entire International and Regional declaration on Education (reviewed in chapter two ), the Government of Bangladesh has not yet undertaken significant steps to ensure education for children with special needs. Bangladesh is far behind in developing an effective education system for children with disabilities, and whereas the Government of Bangladesh established a special and integrated education system and NGOs are implementing special and inclusive education system as well. The educational programme of children with special needs remains under the Ministry of Social Welfare, which indicates that the educational issue of these children is being considered as a welfare concern, rather than a developmental subject. So, there is a big gap in incorporating children with special needs into mainstream education. In Bangladesh, the Education Policy provides provision for ‘Education for All’ and primary education is compulsory and free. Children with special needs are left out of this programme as their educational provision is seen as a welfare and charity issue. Under PEDP-II (Primary Education Development Project- II), it has been specified that in primary schools children with special needs with mild delays would be enrolled, but unfortunately this does not happen in practice. (Choudhuri et al., 2005).

With the overall discussion and observation, it is clear and apparent that professionals like school psychologists, educators are to be concerned while selecting an effective assessment scale to measure a student’s intellectual and achievement ability.  Based on their practical and realistic judgment and through common consent it has been proved worldwide that the Binet scales along with its rich tradition and popularity, solely occupies the field of assessment. Similarly, in compliance with the above view, the present study also standardized the latest revision of the Stanford Binet Intelligence Scale for use in Bangladesh. In addition there will be a great impact as well as change on our education system by utilizing the test scores. On the other hand, it should be added that intelligence testing should not be conducted in a vacuum. Furthermore, by using previously mentioned norm-referenced tests, an assessment should include a variety of other data from a multitude of informants. Sattler (2001) suggests that norm-referenced testing should be accompanied by interviews with a parent, teacher and student; observations of the student during both the formal testing and natural environment (e.g. classroom, lunchroom, playground); and informal assessment procedures (e.g., district-wide criterion-referenced tests, school records). Such an assessment will provide the most accurate information by which educators can most effectively serve the student.

 Though Government of Bangladesh has frantically taken several remarkable attempts to fill up all the loopholes in educational development but their effort will go in vein if the foundation of educational development such as intelligence testing would not be considered and included as main concern issue. Thus, it can be concluded that through this study the researcher would like to draw the attention of the Government and policy makers to be acquainted with the importance and the impact of assessment. Further, to include the need of assessment as a prerequisite criterion to justify a student’s academic progress based on their potentialities. In this respect the Government can include the compulsion of assessment in education policy to provide effective guidance and counseling programme in all educational institutions.  The Government should mobilize funds for test development and address the issue of curriculum modification to ensure the access of students with special needs in mainstream schools to fulfill the commitment of Education for All.

stanford BINET