|
|||||||||||||||||||||||||||||||||||||||||||||||||||
Environmental Scenario Search Engine (ESSE)M. Zhizhin, Geophysical Center, Russian Acad. Sci. Problem statementEnvironmental informatics (Hilty 1995) is a rapidly expanding area of computer and natural science. The increasing data volumes from today's collection systems and the needs of the scientific community which requires the inclusion of an integrated and authoritative representation of the natural environment in their analysis needs a new approach to data management and access. The natural environment includes elements from multiple domains such as space, terrestrial weather, oceans and terrain. Systems such as the Global Change Master Directory (GCMD) from NASA or the Master Environmental Library (MEL) from the DOD and others provide the ability to search by "keywords" for archived environmental data sets distributed across the network, but the ability to search for specific "scenarios" (sets of conditions within the environmental data) does not yet exist. At the same time, the environmental modeling community has begun to develop several archives of continuous environmental representations. These archives take observational data and through modeling create a regular, parameterized view of the Earth system. The models use all available observation data as initial conditions for the numerical models, so the resulting data sets jointly may be considered as authoritative high-resolution representation of terrestrial weather and the near-Earth space during the last 50 years. So when interacting with these enormous resources, imagine for example that the end user doesn't need all the weather data covering Florida for the last 50 years, but rather needs an example of a typical Florida spring storm. Further imagine that the user needs to know how often such storms occur or if they have been increasing in the last 10 years. The Environmental Scenario Search Engine (ESSE) will address such problems. The prime requirement of the ESSE system design will be to allow the user to query environmental data archives in human linguistic terms. Natural language is not easily translated into the absolute terms of 0 and 1 which make up the digital world. The mapping between human language and computer systems will involve fuzzy logic. Fuzzy logic is a superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth -- truth values between "completely true" and "completely false". It was introduced by Dr. Lotfi Zadeh (Zadeh 1965) of UC/Berkeley in the 1960's as a means to model the uncertainty of natural language. The ESSE will act as a bridge between the questions the user needs to act of the environment and the data which describes it. ArchitectureThe ESSE architecture will rely heavily on an object-oriented fuzzy logic engine to perform searching and statistical analysis of the distribution of the identified events for the user. It will allow parallel mining of several distributed data sources, possibly from different subject areas, and not limited to only space physics or terrestrial weather. Both the fuzzy logic engine and data sources will be implemented as web services (Figure 1), so that third-party applications written in different languages (Java, C++, Perl, C#) can select from different data sources and search for events with the fuzzy logic engine using interfaces and data structures derived from the definitions of the web-services (WSDL). We will use web-services as mediators to the environmental archives helping to abstract into a generic ESSE data model and to bypassing security limitations, posed by firewalls on the most of the other connection protocols.
To illustrate possible use, the ESSE system will include a prototype user interface implemented as a web application. In the web application it will be possible: Web services for the ESSE authoritative data sources and fuzzy search engine, as well as the prototype user interface will be installed on two mirroring servers one in the U.S. and one in Russia. Authoritative data sourcesThe real connection between the ESSE system and a given user community are the data sources that support it. It will be relatively easy to add a new data source to the ESSE through the web-services interface, so the list below should not be taken as limiting but rather as a starting point that demonstrates the ESSE functionality.
The first thing to notice is the relatively large size of the archives. Using the distributed database concept allows us to perform interactive mining on these substantial data sources. The second thing to notice is the long temporal ranges. The ESSE will be most useful when the size of the archive prohibits or makes impractical searching by hand. To describe each resource briefly, the NCEP/NCAR reanalysis data archive was derived from numerical weather prediction model runs. It represents gridded output on a regular time step (6 hours) and fixed grid step (2.5 deg). The model uses data ingest procedures to assimilate observational data into model results to produce a consistent picture of the terrestrial weather during the last 50 years. The Space Physics Interactive Data Resource (SPIDR) is an observational data source which incluces the output of numerical models. The SPIDR system currently handles the following: Defense Meteorological Satellite Program (DMSP) visible, infrared and microwave browse imagery, ionospheric parameters, geomagnetic variations, geophysical and solar indices, GOES satellite x-ray, plasma, and magnetometer data, cosmic rays, and solar radio telescope data sets. The ESSE would also plan to add gridded space weather, ocean and terrain data in the near future, making the ESSE mining technology available across a wide representation of the "Digital Earth" environment. To add a new data source to the ESSE it will require: a) to write a web-service implementing standard interface with 2 methods, getMetadata and getData, which subsets the data source into a simple data model; b) create metadata document following pre-defined XML schema describing parameters, time and spatial coverage of the data source; c) add the web service address and credentials to the ESSE configuration. Fuzzy search algorithmPeople often use qualitative notions to describe such variables as temperature, pressure, pulse rate. In reality, it is difficult to put a single threshold between what is called "warm" and "hot". Fuzzy set theory serves as a translator from vague linguistic terms into strict mathematical objects. This is exactly what is needed to bridge the gap between current environmental archives and the policy makers, users and scientists who need to access them. Intelligent environmental scenario searching across the distributed resources will be performed within the ESSE fuzzy search engine. The scenario editor from the ESSE user interface will be used to formulate a set of conditions to be satisfied by the candidate events. The search conditions may be specified in a number of ways depending on the user's familiarity with the region/data of interest. An expert user can specify exact thresholds and/or limitations that must be maintained on certain parameters. Conditions can also be specified via abstract natural language definitions for each parameter. For instance, temperature limitations can be specified as "hot", "cold", or "typical". The query can also be specified in terms of predefined rules which collect conditions into a named set. Thus, a user can specify the following weather search request: (VERY LARGE "Kp index") AND (VERY LOW "Dst index"). The result of such a request reported by the fuzzy search engine will be a list of the "most likely" dates for the event ranked by the sorted values of the aggregated multidimensional fuzzy membership function (MF). The aggregation will be done using fuzzy analog of the logical AND operator. The ESSE client application will be searching for events in the environment where the input variables and the one-dimensional MFs depend on time, as well as the fuzzy AND aggregation of the desired conditions. We consider the values of the resulting time series as the "likeliness" that the environmental event to occurred at the time moment t, and search for the highest values of the aggregated MF and consider these to be the most likely candidates for the environmental events. To be able to search for events like "the hottest day" or "the hottest week" we introduce the concept of event duration. For example, the time step of the parameter from the NCEP/NCAR reanalysis database is 6 hours, so the minimum event duration is also 6 hours, but the event duration could be also 1 day, 1 week, etc. We will do a moving average of the input parameters with the time window of the event duration before calculation of the one-dimensional MFs and the fuzzy AND aggregation. Possible useThe applications of the ESSE systems are broad. As more and more data archives become available through projects like CLASS (NOAA), EOSDIS (NASA), DODS (Univ. RI) and other network accessible data systems, the tools to extract information from them become more valuable. As Nature declared in a 1999 article (Reichhardt 1999) "It's sink or swim as a tidal wave of data approaches". ESSE can help users prepare by providing tools which sift through the vast quantities of data available on-line and point at the interesting bits. This means that even with the volume of data increasing so rapidly and the number of researchers remaining relatively level we can hope to extract the most valuable information from the observations and carry that back to the relevant scientific communities. The application of fuzzy logic based data tools goes far beyond simple event selection. For example an ever present issue when dealing with these large data sets is quality control. There is simply too large a volume to reasonably screen by hand. Searching capabilities can be used, for example, to analyze climate trends. Using techniques such as peer-matching and expert systems we can extend the ESSE to monitor data and alert data managers to changes and anomalies. As the computational power available expands we can extend the system into areas such as data classification whereby we can identify modes of the environment and perhaps identify new unknown relations in specific regions. Finally the emergence of a network infra-structure for data access is providing new opportunities for the scientific researcher. It is now fairly trivial to reach out across discipline boundaries and access data in an immediately useable format. This is true for example in the case of the terrestrial weather community being able to make use of the space data made available through the SPIDR, for example, to study the influence of space weather on the Earth's climate. With these opportunities come challenges. As researchers expand into domains in which they may not be expert they will come to rely on intelligent tools to support them. The mission of the ESSE is fundamentally to help a user distill the vast amount of available data down to a manageable amount of information. Beyond this however the ESSE has applications in the area of data quality control, data classification and even forecasting. The increasing data volumes available in the future demand different techniques to handle it and the ESSE framework is one exceptional method for a user to handle it. The found event can be used as a source of a real scenario for computer games and simulators.
Project deliverables In order to enable true platform independence the team is working in both Open Source and Microsoft ASP.NET web services infrastructures, developing scientific software with the same external and internal interfaces. The same portal user interface can consume either of implementations. Microsoft .NET Framework includes comprehensive set of classes that supersedes many of the commonly used open source libraries. This is especially true for XML processing libraries and enabled easy creation of wrapper classes ensuring portability of the source code created in this project. ReferencesAvailable from: http://ideas.ngdc.noaa.gov/ideas/papers/websim99/WebSim.htm Available from: http://ideas.ngdc.noaa.gov/ideas/papers/DS259.pdf Available from: http://www.saa.noaa.gov/cocoon/nsaa/products/welcome Available from: http://edhs1.gsfc.nasa.gov/ Available from: http://hdf.ncsa.uiuc.edu/ Available from: http://java.sun.com/xml/jaxrpc/ Available from: http://www.unidata.ucar.edu/packages/netcdf/ Available from: http://www.unidata.ucar.edu/packages/dods/ Available from: http://www.w3.org/2002/ws/ Available from: http://clust1.wdcb.ru/papers/openMap/index.html ESSE Enabling Technologies
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||