The Data SGP Project - Neratepayers.org

The data sgp project is an effort to compile and analyze multi-proxy sedimentary geochemical (iron, carbon, sulfur, major and trace metal isotopes) data from various regions of the world for every Paleozoic epoch and Neoproterozoic time slice. This will allow researchers to address a wide variety of Earth history questions that require the analysis of multi-proxy data. The project is aimed at addressing specific research questions, rather than creating full community databases like Genbank and EarthChem. However, the research consortia that result from this project will likely be integrated with such larger community databases.

While large community databases aggregate and make accessible essentially all data, research consortia are designed to assemble or generate specialized data to address unique research questions. As such, they are less suited to the collection of metadata and legacy data that are typically archived by large community databases. Nevertheless, research consortia are often the most efficient way to acquire data and metadata for new and innovative research questions that have not yet been addressed by the larger community.

In order to conduct SGP analyses, a number of steps must be completed. The first step is to prepare the data. This can be done using a number of different methods and software programs. The SGP package has wrapper functions (bcSGP and updateSGP) that simplify the source code associated with these steps.

Next, the data must be formatted to be compatible with the SGP analyses. This can be done by defining an SGP data set, which is a file that contains all of the necessary information to perform an SGP analysis. The SGP package provides a number of examples for the creation of these data sets.

Finally, the data must be loaded into the SGP system and analyzed to determine student growth percentiles and projections. This can be done in one of two ways: 1) by using the lower level functions studentGrowthPercentiles and studentGrowthProjections, or 2) by using a more automated function called prepareSGP. The latter is preferred for operational analyses, as it reduces the amount of human interaction required to perform an analysis.

The exemplar SGP data set, sgpData, is an anonymized panel data set consisting of 5 years of annual, vertically scaled, assessment data in WIDE format. This data set models the format of data used with the lower level functions. In addition to the standard SGP variables VALID_CASE, CONTENT_AREA, YEAR and ID, the sgpData_INSTRUCTOR_NUMBER lookup table is also used in this data set to provide instructors with their student identifier.

The sgpData_INSTRUCTOR_NUMBER table provides instructors with their students’ test records in the form of a list of teacher names and numbers. Each record contains a list of students and their scores in a single content area for each of the years. sgpData_INSTRUCTOR_NUMBER allows multiple teachers to be assigned to the same student in each of the years in question. In this way, sgpData_INSTRUCTOR_NUMBER can be used to assign weights to the results of each student’s assessments and use those weights to calculate the SGP percentiles.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31