Development of HOTS Problem-Based Test Instruments to Measure Level 4 Numeracy Capabilities Using Rasch Model

Numeracy ability is one of the abilities measured in the Minimum Capability Assessment (AKM). Numeracy ability is very important to be improved because numeracy is not only doing mathematical calculations, but also a basis of knowledge and increases confidence to apply it practically. This research uses qualitative and quantitative approaches. This research method is research and development (R&D). The product developed in the form of a level 4 numeracy test instrument based on HOTS. The problem developed in the form of multiple choice questions and complex multiple choices is a number of 40 questions. The variables used as reference are validity, difficulty level, and reliability. This analysis was done with the help of Winsteps software. Based on winsteps program output obtained results according to rasch model with average values outfit MNSQ for persons and items respectively 0.89 and 0.91. The ZSTD Outfit values for persons and items are 0 and -0.01, respectively. While the reliability of the instrument expressed in alpha cronbach is worth 0.87.


INTRODUCTION
21st century education is an education that develops one's numeracy skills in the face of the demands of the times. The Indonesian government will conduct a Minimum Capability Assessment (AKM) in 2021 to prepare students to face the challenges of the 21st century (Kemdikbud, 2020). The AKM includes an assessment of reasoning skills using language (reading literacy) and mathematics (numeracy). AKM is conducted nationally and in class. The national AKM does not provide reports for the individual/student level, but teachers can access classroom AKM to design learning that suits the needs of students.
Numeracy is the ability of individuals in formulating, applying, and interpreting mathematics in various contexts of daily life (Maulidina & Hartatik, 2019;Tyas & Pangesti, 2018). Ojose (2011) explained that numeracy is not just doing mathematical calculations, but also a basis of knowledge and increases confidence to apply it. A person who mastered numeracy skills will more easily adjust to the changing times. However, the numeracy ability possessed by students in Indonesia is still relatively low. This can be seen in the 2018 PISA results (OECD, 2019) which showed that 71% of students were below the minimum competence for mathematics.
Numeracy skills can be developed through hots problem-based mathematics learning (Hera & Sari, 2015;Tyas & Pangesti, 2018). HOTS problems not only measure mathematical ability in conceptual and procedural terms, but also connect several problem concepts, problem solving processes, and interpret up to decision making. Giving hots questions can be done by providing contextual and actual questions (Tyas and Pangesti, 2018; Kamsurya and Saputri, 2020). Contextuality will provide motivation for students in solving problems because of the curiosity they have.
Improving the quality of learning in the classroom can not be separated from the assessment / assessment carried out by a teacher. According to Chan, et. al. (in Uli Sihombing, Rosita;S. Naga, Dali;Rahayu, 2019), a good assessment process must be able to measure learning outcomes accurately and produce a proper analysis of the efforts to improve ability measured. Therefore, the instrument used must be of good quality. Good instruments can be developed using the Rasch Measurement Model (RMM) approach. RMM aims to develop an objective measurement (Safihin & Hamdani, 2019). According to Supriyati, Raihanati and Nilawati (2020), RMM not only explained the quality of the problem item but also the quality of respondents in giving their responses. So that the data obtained is more precise and has a smaller error rate.
Numeracy skills can provide a capital for students to be able to face the future. AKM in Indonesia will also be implemented soon as a replacement for the National Examination in 2021. However, until now there are no instruments to measure numeracy capabilities that are in accordance with the development design of AKM problems. Currently, the national AKM is in the process of development while the class AKM can be developed by teachers in every school. So in this research, will be developed numeracy proficiency test instruments for students level 4 (grades 7-8) with the Rasch Measurement Model (RMM) approach. This research is expected to be a source of information in developing numeracy capabilities. In addition, the results of this study can also be used in AKM level 4 class. This research scheme is the research of beginner lecturers, so the expected externality of this research is in the form of publication of research results in scientific journals.

METHOD
This research uses qualitative and quantitative approaches. This research method is research and development (R&D). The research and development model used is a modification of the Borg &Gall model. There are seven stages of development: • Phase 1: Literature study on the development of AKM and analysis of the needs of teachers and students in dealing with AKM in school. • Phase 2: Planning for instrument development is done by preparing a schedule and describing the core activities that must be done. • Phase 3: The initial product is developed in accordance with the conditions in the field and the development design of the AKM problem. Furthermore, the product will be validated by several mathematical education experts. • Phase 4: The original product that has been valid or revised will be piloted in the initial stage to a limited sample. • Phase 5: The results of the trial will serve as a basis for improvement to produce problems that are suitable and reliable. • Phase 6: The revised initial product (final product) will be piloted in the field to find out the feasibility of the instruments that have been made. • Phase 7: The final product will be revised based on the analysis of the trial results. If the instrument is in accordance both in terms of the quality of the problem and the ability of the student, it can be continued with conclusions.
The instruments used in this study are problem validation sheets, HOTS-based numeracy tests, suspension rubrics. The problem developed is BASED ON HOTS and in accordance with the form of national assessment problems consisting of multiple choices, complex multiple choices, matchmaking, short stuffing, and description. The number of problems developed in the initial product is as many as 50 questions.
The population of this study is the students of class VIII junior high school in the city of East Jakarta. The number of study samples was adjusted to the sample size for rasch modeling. The study used a rasch model with a stable aitem calibration value of ±0.5 logit and a 95% confidence level of 64-144. (Sumintono & Widhiarso, 2014).
This study uses rasch modeling which is one of the Item Response Theory (IRT) models that can explain the interaction between students' abilities and test item/ item. Data analysis is done with the help of Winstep software. The data generated by rasch modeling has met objective measurement criteria. The results of the data analysis obtained include the quality of students, the quality of instruments, and the interaction between the two. The criteria used to see the suitability of test items (Sumintono & Widhiarso, 2014)

RESULT AND DISCUSSION
The numeracy capability test instrument developed is a hots problem-based problem that uses reference development of Minimum Capability Assessment (AKM). The domains of numeracy developed include geometry, algebra, data and certainty.
The problem developed as many as 50 problems consisting of the cognitive realm of knowledge (C1), the cognitive realm of understanding (C2), and the cognitive realm of application (C3). The developed problem was tested to junior high school students as many as 2 times trials. The first trial was conducted on a small group of 32 students, while the second trial was conducted on a large group of 78 students. This aims to determine the suitability of the model which includes the validity of the problem, the difficulty level of the problem, and the reliability of the problem.
Based on data analysis using Winstep software, there are 40 questions that are fit with rasch model and 10 problems that misfit with Rasch model. These results in full can be presented as follows.  Table 1 shows that the Mean Square Outfit Value (MNSQ) obtained is 0.89 for person and 0.91 for items, this means that both values are located in the range of 0.5 < MNSQ < 1.5 so it can be concluded that the test instrument developed is in accordance with the model to measure level 4 numeracy capabilities. In addition, the Z-Standard Outfit Value (ZSTD) is 0 for persons and -0.01 for items. Both values are located in the range of -0.2 < ZSTD < +0.2. This shows that the data obtained is a rational value and can be stated that the problem item developed is in accordance with the model and can be used as a test instrument for level 4 numeracy capabilities. Table 1 also shows an item reliability value of 0.71, person reliability of 0.76, and alpha cronbach value of 0.87. The value shows that the level of consistency of shiva in answering questions is quite high and the quality of the problem item is quite good. In addition, cronbach's high alpha value indicates that there is a significant interaction between the person and the item as a whole. The distribution of misfit problem items with models is as follows. Table 2. Misfit Distribution Data with Rasch Models.

CONCLUSION
Test instruments developed to measure level 4 numeracy capabilities have been in line with rasch's model. This is indicated by an item reliability score of 0.71, person reliability of 0.76, and alpha cronbach value of 0.87. In addition, the Mean Square Outfit Value (MNSQ) obtained is 0.89 for persons and 0.91 for items. The Z-Standard Outfit Value (ZSTD) is 0 for people and -0.01 for items. So it can be concluded that the problem item that is fit with the Rasch model is as many as 40 points of the problem.

SUGGESTION
The author realizes that it still has flaws that the author needs to correct. This is due to the lack of knowledge of the author and limited time in the implementation of research. Therefore, constructive criticism and advice from readers is very much expected as an evaluation material for the future.

ACKNOWLEDGMENTS
The author thanked KemenristekBrin for providing appropriate material support. Thank you also spoken to STKIP Media Nusantara Citra who has provided support for the development of this research.