What is Data SGP?

For those who are not familiar with data sgp, it is a set of classes and functions that provide an approach to analyze large scale longitudinal education assessment datasets. This package provides a number of methods for calculating student growth percentiles and percentile growth projections/trajectories from these data using derived coefficient matrices.

It also provides an approach to data preparation that is based on the idea of providing information in terms of student characteristics and performance levels rather than aggregating the raw scores and results. This is in contrast to full community databases such as Genbank and EarthChem which primarily aggregate and make available all the raw data. This distinction is important because it highlights the fundamental difference between research consortia and full community databases: research consortia are intended to address specific research questions while community databases have as their primary goal the archival and public access of essentially all of the data.

Data SGP is being assembled at a much larger scale than has ever been done before but, by comparison to a database such as Genbank, it is still relatively small. This means that the effort to assemble the data is being carried out on a relatively modest budget and with relatively little technical expertise. The working groups assembling the data are also using it to answer specific research questions of interest to the researchers themselves and these are not always the same as those for which large community databases have been designed.

The lower level SGP functions (studentGrowthPercentiles and studentGrowthProjections) require WIDE formatted data whereas the higher level functions that wrap those (studentInstructorLookup and PercentileTestMethod) are meant to be used with LONG data. This difference is because the lower level functions do all of the calculations and the high level functions simply serve to prepare the data for analysis. It is recommended that users consult the vignette on SGP data analysis for more detailed instructions on how to use this package with WIDE and LONG formats.

In the sgpData example above the first column, ID, is the student identifier and the next five columns provide the scale score for each year of testing. The last three columns are the student’s growth based on the most recent scale score compared to all students with comparable prior test scores (their academic peers). In this way, a student can be said to have grown more or less than 85 percent of their academic peers.

This information is valuable to teachers and parents because it provides them with a meaningful measure of their student’s progress towards meeting grade level learning goals. This is in contrast to the grading system of most schools which relies on a single, high or low score that has no connection to any individual learning objective. This is a critical distinction because it allows teachers to focus on the areas where the student needs most help. In addition, this allows for more accurate prediction of future achievement given a change in the learning environment.