|
The environmental data sets used in drinet are managed by the Purdue TeraGrid data management system.
As part of NSF's TeraGrid initiative, we have developed and deployed a flexible, multidisciplinary data management framework at Purdue University. This framework can be used to manage data from different sources and provide multiple access points for users from different communities with different levels of IT expertise.
The architecture of the framework consists of five layers: data capture layer, iRODS (Integrated Rule-Oriented Data System) layer, application layer, Web Services layer, and presentation layer. The base component is iRODS, a client-server middleware developed at the Data Intensive Cyber Environments Center (DICE Center) at the University of North Carolina at Chapel Hill that provides a uniform interface to heterogeneous resources. It also allows users to discover data based on logical attributes instead of physical file names and path names.
All resources are connected to the 40 Gbps high-speed TeraGrid backbone network via a 10 Gbps optical lambda. Our framework has been successfully applied to the management of several data collections from different application domains with different data formats, including LARS remote sensing image data, PTO satellite real-time data, NWS streaming NEXRAD radar data, and scientific datasets from climate modeling.
Users may access the data through various interfaces, including a user-friendly Gridsphere-based data portal; a set of iRODS client tools including command line utilities and web/desktop interfaces; a set of web service interfaces; and application-specific tools, including clients enabled by OPeNDAP/THREDDS.
Metadata
One important component in the project centers on developing the metadata standardization and descriptions for collections of data that will reside within the DRINET portal. Over the past year the DRINET project team has been engaged in ongoing discussions with our advisory committee and stakeholder communities to gather input and solicit recommendations for the development of DRINET. These discussions have included the identification of relevant datasets for ingest into the DRINET portal, possible use cases for these datasets and some of the functional requirements for making use of the data. As datasets are identified for inclusion into the DRINET portal, we are developing an inventory list of the data including how the data are currently described and documented. As the use cases and functional requirements for the data come into focus, we are exploring how and to what extent these requirements could be addressed through metadata. This exploration includeds an environmental scan of relevant descriptive metadata standards and how these standards could potentiallyan be applied to the datasets within the DRINET portal. A particular challenge in selecting standards is the need to accomodate disparate data sets from multiple sources that span beyond a single discipline or area of research. Beyond the ability of the metadata standard to address stated needs of stakeholders, the additional criteria for selecting a standard for the DRInet portal include acceptance and support from a community similar to DRInet, flexibility to accommodate local needs, usability by multiple audiences, and the investment from the data producer required to generate the metadata.
After reviewing many potential standards, the Directory Interchange Format (DIF) is emerging as the strongest candidate for the foundational metadata standard within DRInet. The DIF is a well established standard used primarily for NASA’s Global Change Master Directory (http://gcmd.nasa.gov/) to describe data sets related to the earth sciences. The DIF standard addresses many of the needs identified thus far for the metadata and appears to provide the flexibility that will be needed for the DRInet environment. Crosswalks have been developed between the DIF and other metadata standards relevant to DRInet (notably FDGC and Dublin Core) to aid interoperability for DRInet metadata.
The DIF standard is being tested through the development of sample metadata records for the data sets currently in DRInet. Gaps in addressing functional requirements are being documented and customiza-tions to address them will be developed. Once the records are complete, the application of the DIF standard to the DRInet environment will be The standards that best fit the use cases and address the functional requirements will be identified and reviewed with the advisory committee and community stakeholders. RemainingThe gaps between the functional requirements and the DIF standards under consideration and suggestions for modifications to address these gaps will also be coveredaddressed. The discussion and feedback will inform subsequent work on describing the data sets and metadata development in DRINET.
|