User Tools

Site Tools


1. Introduction

1.1 What is an ETL tool

An ETL tool is a type of software used to populate databases or data warehouses from heterogeneous data sources. ETL stands for:

  • Extract – Extract data from data sources
  • Transform – Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, etc.
  • Load – Load transformed data into a target DBMS, service, file format …

An ETL tool should manage the insertion of new data and the updating of existing data. It should also be able to perform transformations from an OLTP system to another OLTP system and from an OLTP system to analytical data warehouse.

1.2 Why use an ETL tool

There are many reasons to use an ETL tool. Among them are the automation of complex and repetitive data processing without producing any specific code, the conversion between various data formats, the migration of data from a DBMS to another, the data feeding into various DBMS, the population of analytical data warehouses for decision support purposes and so on.

1.3 What is Spatialytics ETL

Spatialytics ETL is a powerful, metadata-driven spatial ETL (Extract, Transform and Load) tool dedicated to the integration of different data sources for building and updating geospatial databases, data warehouses and web services with direct execution of transformations.

Spatialytics ETL enables the Extraction of data from data sources, the Transformation of data in order to correct errors, clean data, change data structure, make data compliant with standards, and Loading of transformed data into a target Database Management System (DBMS), GIS file, or geospatial web service. Spatialytics ETL is particularly useful for automating complex and repetitive data processing without producing specific code, converting between data formats, migrating data between databases, feeding data into databases, etc.

Spatialytics ETL is stable, fast, standards compliant, with hundreds of functions and read/write support for many file formats, services and DBMS. Spatialytics ETL is used by diverse organisations from around the world, including governmental agencies, banks, insurance and geospatial system integrators.

Spatialytics ETL supports several DBMS and file formats:

  • DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, … (total of 37);
  • Read/write support of various data file formats: text, Excel, Access, DBF, XML, …
  • Various services/systems: LDAP, CRM, …

as well as numerous transformation steps.

Spatialytics ETL provides a true and consistent integration of the spatial component. All steps provided by Spatialytics ETL are able to deal with geospatial data types. Some geospatial dedicated steps have been added (SRS, SOS, CSW, Spatial Analysis …) which allow a powerful integration of corporate and spatial data.

1.4 Spatialytics ETL Features

Extract data from:

  • 35+ database types: MySQL, PostgreSQL/PostGIS, Oracle, etc.
  • XML files
  • XLS files
  • Xbase files (dBase, Foxpro, etc.)
  • File systems information
  • Generated data
  • MS Access files
  • LDAP
  • OLAP & SOLAP (Spatial OLAP) systems: Mondrian, GeoMondrian
  • Geospatial data formats: Shapefile, GML 3.1.1, KML 2.2, all OGR supported formats
  • OGC Web services: Web Feature Service (WFS), Sensor Observation Service (SOS), Catalogue Web Service (CSW)

Transformation of data:

  • Engine based data transfer (no code generator)
  • Looking up data in databases, files or memory
  • Calculating
  • Scripting: Javascript, SQL, RegExp
  • Splitting
  • Mapping
  • Selecting
  • Partitioning
  • Filtering
  • Merging
  • Joining
  • Duplicating
  • Clustering (MPP)
  • Pivoting
  • Geospatial data analysis: buffer, centroid, distance, intersection, union, etc.
  • Advanced geoprocessing: clipping, delaunay, simplify/smooth geometry, split features, …
  • Spatial aggregation
  • Cartographic preview

Load data into a target format:

  • Database loads: MySQL, PostgreSQL/PostGIS, Oracle, etc.
  • Data warehouse population
  • Geospatial data formats: Shapefile, GML 3.1.1, KML 2.2, all OGR supported formats
  • OGC Web services: Catalogue Web Service (CSW), etc.
  • Partitioned loading
  • Bulk loading
  • Parallel loading
  • Clustering

Environment:

  • Full GUI named “Spoon” to edit every transformation options
  • Command line tools: execute jobs and transformations
  • Web server: remote execution and clustering perfect in cloud computing environment for very large datasets processing
  • Programming API for Java
  • Module eco-system

1.5 What's next

In the coming chapters, we will see how to install Spatialytics ETL on different platforms, the Getting Started chapter will take us through the main features of Spatialytics ETL, we will see in details the Transformations and Jobs concepts, the Modules chapter will take us through the different modules offered to extend Spatialytics ETL capabilities, the Tools in Details will explain the tools available for an optimal use of Spatialytics ETL and the Variables chapter will show us how to fine tune the software.

en/spatialytics_etl/001_introduction.txt · Last modified: 2013/02/11 14:48 by edinel

Page Tools