ClimDB/HydroDB/EcoTrends Network Database Integration Path

The LTER Network supports a number of existing synthesis database applications, including EcoTrends and ClimDB/HydroDB (ClimHy), which are maintained as isolated projects and without direct coordination between one another. As part of the LTER NIS development process, these separate database applications will be integrated into the PASTA Framework.

These existing synthetic database applications all follow similar processing steps: 1) collect site data and organize it into a standard format, 2) store restructured data into a common relational database schema, and 3) provide discovery and access tools via a web-based interface, including basic analytical applications. Each one of these applications, however, requires separate management and maintenance resources (this is especially true between EcoTrends and ClimHy because of their very different architectures). In some cases, applications require additional resources at the site-level to reformat data into a standard structure. As such, integrating these independent database applications into the LTER NIS will provide scalability and ensure data integrity by utilizing a common application framework (PASTA) that supports automated data harvesting. New features of the PASTA Framework will become directly available to the integrated database applications without additional effort at the sites.

The following describes Integration activities along a timeline for inclusion of  ClimHy and EcoTrends databases into the NIS via the PASTA Framework.  Please provide your comments publically by using the "Add new comment" button at the bottom of this page.

2011 - The ClimHy/EcoTrends databases will remain status quo for this period. ClimHy is now located at the LNO and will be managed by LNO staff with assistance from IMC; the EcoTrends database continues to be hosted at LNO and will continue to be managed by JRN with help from LNO.   Discussions will be held with the IMC to develop an approach for EML replacement of the encoded metadata that is currently in ClimHy.  An EML document standard that meets all of the ClimHy requirements will be developed by the IMC and evaluated for being PASTA compliant.  In addition, the current web interface for ClimHy, which exists in a deprecated version of Microsoft’s web server, will be migrated to a modern and secure web server infrastructure by LNO.

2012  - All sites will create  EML for ClimHy harvest files based on the standard to be developed above.  These EML documents will be harvested into the NIS, along with other site metadata.  The current ClimHy system will run concurrently with the initial harvests of ClimHy metadata and data into the NIS to ensure integrity between the two systems.  This process will involve : 1) EML changes that trigger data harvests into the NIS and 2) a workflow script that will initiate harvests into the ClimHy database using a slightly modified version of the current ClimHy web service that is currently used for harvesting.

2013 -  IMC will assist LNO in the development of a RESTful web service, following the standards established by PASTA, to: 1) load data into the existing  EcoTrends time-series data model and 2) refactor the existing ClimHy ingestion engine into a RESTful web service.  EcoTrends data products will be prioritized by the Network for inclusion in the NIS.  With help from the IMC, EcoTrends data products with high priority will be documented with EML for immediate harvest into the NIS.

2014 -  Tools that highlight prioritized ClimHy/EcoTrends data products will be prototyped in the NIS data portal, which will provide full access to metadata and exploratory tools for reviewing data and their trends.  These products will be evaluated by Network committees to establish new or modified requirements from the original ClimHy and EcoTrends applications.

2015 - New Network climate data and time-series data products will continue to be added the NIS using the PASTA Framework tools.  Advanced technological approaches will be added to the NIS data portal to strengthen the discovery and evaluation of these data products.

Comments regarding ClimHydroDB migration

Ultimately, I think EcoTrends and ClimDB/HydroDB will each need specific migration plans. I can visualize a vtc in April to consider the migration of ClimHy and possibly begin a working group to address these issues. A production workshop to address ClimHY migration may be necessary in the fall, as well as additional vtcs or some time devoted to this topic at the annual meeting.

I pose the following questions to help clarify what is intended, but also questions for general discussion.

 “Discussions will be held with the IMC to develop an approach for EML replacement of the encoded metadata that is currently in ClimHy. An EML document standard that meets all of the ClimHy requirements will be developed by the IMC and evaluated for being PASTA compliant.”

What resource is this EML document describing? Are we referring to the EML document that will describe any site’s data to be harvested? Or is this EML document describing the derived and integrated set of tables that currently comprise the ClimHy relational database?  Is the “encoded metadata” the specialized ClimDB/HydroDB metadata entered through web forms in the current implementation?

Some follow up thoughts for discussion…

Will the harvested data simply be our climate and streamflow data that will be modified in PASTA using workflow scripts to place in the PASTA ClimHydro database? Or, will we continue to harvest our data as always with an EML document used to describe the current exchange format?  Will the additional ClimDB metadata elements (now captured through web forms) be packaged into existing EML elements based on “best practice”, or will an extension to EML be developed for purposes of this specialized ClimDB/HydroDB metadata?  To what extent can SiteDB and/or personnelDB be employed to capture key coordinate information about stations, station and measurement histories, etc., or personnel?

integration plan comments

I think the general implementation is well laid out, but the details are missing.  The exact steps of how each phase of the integration will be accomplished are missing.  Will there be a beta testing group of a few sites or will all sites get through this together?  I also feel that the timeline is a bit presumptuous and unrealistic since it requires input/work from IMC. 

In 2011, how and when will the IMC develop an EML document for ClimHy requirements?  It's not exactly clear what exactly this document is and its development will require a working group and volunteer time of IMC members.  It may be more realistic to hold the discussion to develop an approach with IMC in 2011 and maybe we can put some time into our annual meeting for work on this.  However, with the EIMC this year, the annual meeting time is limited.  It may be necessary to hold a specific workshop to accomplish this, which likely wouldn’t happen until late 2011 or early 2012.  It also seems like the whole ‘requirements building’ effort would be an iterative process and may actually take several months to build.  I don’t have enough experience in this to say, but these are my thoughts about 2011 work.

In 2012, I could see a training session for creating EML for ClimHy harvests.  I think that this will be necessary to get all sites to create EML.  Will you start with a test group of sites? 

For the 2013 work, where IMC will assist LNO in the development of a RESTful web service, it seems that there are only a few people within the IMC that have the capability to do this.  Will this be IM Buy out time and/or a small workshop (like for the current Database Redesign group)?  Maybe we’ll be experts by then!  It’s also not clear what the sites will need to do on their end to take advantage of the RESTful web services to harvest data into the common PASTA framework.  Maybe there’s nothing that the site’s need to do since the data delivery information will be described in the EML, right?

The work for 2014 and 2015 sound like reasonable next steps and don’t necessarily require any IMC commitments.  However, if you talk about tools, it seems like scientists/researchers may need to be involved to ensure that the tools they want/need are being developed.  When and by what mechanism will this type of interaction happen?

Synthesis database applications and Database Integration

Synthesis database applications: It is not clear what the term "synthesis database applications" means. Is it a web service that allows you to manage data files with the same structure for comparison, distribution, etc?

Database Integration: it would be nice to say how integrating these databases into PASTA we will achieve the coordination between these projects and what the ultimate goal is. I thought it was synthesis but taking into account the term "synthesis database applications" each individual project (EcoTrends, Climhy) achieves a certain level of synthesis. What does PASTA provide to achieve whatever goal it has? Does it allow the user to join data with different structures?