The data sgp package is designed to make it as simple as possible for users to run analyses and interpret results. There are two basic steps to running an SGP analysis, data preparation and the analyses themselves. The bulk of the time and effort spent on SGP analyses is in data preparation.
Once the data has been properly prepared, the analyses can be done relatively quickly and easily. The vignettes provided on the wiki and the functions in the data sgp package help to guide users through this process.
Data sgp is software that allows users to run analyses of student growth percentiles and student growth projections using large scale, longitudinal education assessment data. The analyses rely on quantile regression techniques to estimate the conditional density for each student and then use those matrices to calculate projected/achievement level trajectories.
A number of different algorithms are available to perform these calculations. The most commonly used is a Gaussian Process Regression model, which uses the inversion of the covariance matrix K to estimate the probability distribution. However, this approach is computationally expensive (O(N3) in time and O(N2) in memory for N samples) and it is impractical to apply to large datasets.
Instead, approximation methods such as sparse GP and variational inference have been developed to improve the efficiency of these models. These approaches reduce the number of model parameters and thereby increase its speed and reduce its memory cost.
Another important factor in the performance of SGP analyses is the availability of data to train the models. While the research consortia and full community databases aggregate data from many different sources, they are not designed to be able to accommodate all of the data that could potentially be available in an operational setting. In addition, the models that are created to answer specific scientific questions will not always be fully compatible with larger community databases.
As a result, it is important for school districts to ensure that they have the necessary hardware and software to run SGP analyses. The SGP package is designed to work with the open source program R. This program is available for Windows, Mac and Linux and can be downloaded from CRAN. SGP requires a version of R that supports the latest round of releases of the software.
In most cases, a district will want to use the SGP packages with the sgptData_LONG data set. This data set contains 8 windows (3 windows per year) of assessment data in long format for 3 content areas. This data set contains the variables VALID_CASE, CONTENT_AREA, YEAR and ID (required if creating individual level student aggregates by the summarySGP function) along with the demographic/student categorization variables CLASS, SCALE_SCORE and GRADE.
It is also recommended that the sgptData_WIDE data set be used for most other analyses, since it contains a subset of the same variables as the sgptData_LONG file but without the additional demographic and student categorization variables. These variables are required if the user wants to create Student Growth Plots, which utilize student aggregates calculated by the summarizeSGP function.