Technical report on the project

Environmental Scenario Search Engine (ESSE) – distributed, optimized, visible

 

June 1, 2006 – May 31, 2007 development phase

 

Mikhail Zhizhin1, Alexei Poyda2, Dmitry Mishin1, Dmitry Medvedev1, Eric Kihn3, Vassily Lutsarev4

1 Geophysical Center, Russian Academy of Sciences, Moscow, Russia

2 Moscow State University, Russia

3 National Geophysical Data Center, NOAA, Boulder, CO, USA

4 Microsoft Research, Cambridge, UK

 

Table of contents

Environmental Scenario Search Engine (ESSE) – distributed, optimized, visible. 1

1      Introduction. 2

2      Weather reanalysis data. 2

2.1       NCEP/NCAR Reanalysis database. 5

2.2       Optimization for data mining applications. 9

2.3       Optimization for diverse data queries. 11

2.4       Cashing of data requests in the OGSA-DAI container 13

2.4.1        Service layer 14

2.4.2        DAI Resource layer 14

2.4.3        Database API layer 14

2.4.4        API source layer 15

3      Fuzzy scenario search for environmental events. 17

3.1       Methods. 17

3.1.1        Fuzzy logic expressions. 18

3.1.2        Fuzzy event scenario. 21

3.2       Architecture. 22

3.3       Search time optimization. 26

3.3.1        Quantization of fuzzy membership values. 28

3.3.2        Minimax norms and trapezoid membership functions. 32

4      Data processor 33

4.1       Data model 34

4.2       Methods. 35

4.3       Architecture. 36

4.4       Use cases. 36

4.5       Components interplay. 38

4.5.1        Computation plan. 39

4.5.2        Distributed queries. 41

5      Visualization engine. 45

5.1       Supported data sources. 45

5.1.1        Metadata. 45

5.1.2        Gridded data. 46

5.1.3        Geolocated images. 47

5.2       Architecture. 47

5.3       Visualization clients. 48

5.3.1        NASA World Wind plugin. 49

5.3.2        OpenGIS server and thin client 52

6      Applications. 54

6.1       Fuzzy search and statistics of extreme weather events. 54

6.1.1        Global air temperature trends. 55

6.1.2        Extreme weather events statistics. 57

6.2       NOAA CLASS API 59

7      Conclusion. 64

8      Bibliography. 64

 

1         Introduction

In this project we develop algorithms and software toolbox for the parallel mining for a set of conditions inside distributed very large databases from multiple environmental domains. The software toolbox is called Environmental Scenario Search Engine (ESSE). The prime requirement of the ESSE system design is to allow the user to query the environmental data archives in human linguistic terms. The mapping between human language and computer systems involves fuzzy logic. We use Data Resource Grid-service abstraction layer to virtualize “sequential databases” providing time-series for our search engine. Results of the feasibility study were published in 2006 in the MSR Technical Report No. 1116  [MSR TR].

 

Algorithms and software development of the ESSE project in June 1, 2006 – May 30, 2007 was performed mainly in the directions of optimization of the NCEP Reanalysis database schema (Section 2) and data mining algorithms for speed (Section 3), as well as development of the alpha-version of the distributed data processor (Section 4) and visualization modules (Section 5). All server-side components of the ESSE search engine are platform-independent and can be built either for .NET or Linux. The first applications of the ESSE technology are presented in Section 6.

Next >>>