User Tools

Site Tools


Repository

A Repository in GeoKettle lets you create a common workspace where to share Jobs and Transformations between different users across your organization. It is a practical way for some users to collaboratively work on a data integration process and to keep track of changes on the jobs and transformations during time. As it allows the centralisation of jobs and transformations in a shared repository, it avoids having different versions of a same transformation on various computers, which often results in the users finally do not know anymore what is the right and up-to-date version of the transformation.

A repository in GeoKettle relies on the users/privileges concept and can be stored in any DBMS via a JDBC connection. All users have to be granted access (read/write) to the repository in order to be able to read/write transformations and jobs. You can then define some repository admins that would have some write rights and more basic users that would only read transformations and would execute them. Setting up a repository is also a good idea when you want to deploy some transformations through the web via the Carte tool provided with GeoKettle.

Repository creation

In this section, we briefly explain how to create a job/transformation repository.

1. Create the repository database

It is a good idea to create a dedicated database in which your repository will lay. It allows a clear separation between all the applications you use/develop.

Let us imagine you want to store your repository in a PostgreSQL DBMS. As user postgres (or any other user with create database privileges), just type in a terminal:

createdb -E utf-8 -O postgres geokettle_repository

2. Create a connection to your repository

Start GeoKettle or if it is already launched, go to menu “Repository” and click on the “Connect to repository” menu item. The “Select a repository” dialog box should display.

In the dialog box, click on the “New” button to create a new connection to your database. A second dialog box should appear.

Click on the “New” button in order to set up your database connection.

Fill in the form with all the requested info.

Click on the “test” button in order to check if connection establishes correctly. On success, click OK.

3. Create the repository

Once the database connection created, you are sent back to the “Repository information” dialog box. Please provide a “Name” and a “Description” for your new repository.

Click the “Create or upgrade” button. Click “Yes” in the two confirmation dialog boxes that display. A “Simple SQL editor” window should then appear. It contains all the SQL commands to be executed in order to create the repository.

Click “Execute” button to start the repository creation process. GeoKettle will create 42 tables in your repository database to store and manage all your transformations and jobs info.

Check that all statements have been executed successfully and click “Close”.

Close the “Simple SQL editor” window and click on “OK” button of the “repository information” window.

At this stage, your transformation/job repository is now created and ready to be used. We will see in the next section, how to populate and use it.

Explore and populate/use your GeoKettle repository

To start a GeoKettle session that will use your newly created repository, you just need to connect to your repository by providing valid username/password credentials in the “Select a repository” window. As a reminder, you can access this window when you start GeoKettle or if it is already launched, when you go into the “Repository” menu and click on the “Connect to repository” menu item.

Please, note that your Repository account is not a database account. When connecting for the first time, use username: admin / password: admin to login to your repository as an administrator.

Explore the repository content

Once GeoKettle is launched, you can explore the repository content by going into the “Repository” menu and clicking on the “Explore repositry” menu item. The “Repository explorer” window should then appear.

As you can see, the repository contains different sections:

  • Database connections: Here are all database connections you define in a GeoKettle job or transformation that is stored in the repository.
  • Partition schemas
  • Slaves and clusters: Here are all groups (clusters) of computers you have defined in order to distribute the processing of a given job or transformation stored in the repository.
  • Transformations & Jobs: Here will go all transformations/jobs you will design. This section could be organised in different subdirectories.
  • Users & Profiles: Here are stored all the groups (profiles) and users that have access to the repository. Profiles define the privileges (use job, use transformation, …) granted to the users which belongs to. Users have a login name, a full name, a password, a description and is member of a profile.

You can edit/access the details of a given item in the repository by double clicking on it or by right clicking on it and selecting the appropriate item in the popup menu that appears. As an example, here is the property window that displays when double clicking on the admin user account.

In a newly created repository, there are two already defined users: admin and guest. Guest user is by default not granted access to Spoon. The admin user is the user you use in order to presently explore the repository. With this account, you have all (administrative) privileges and hence perform all types of modification into the repository. For security reasons, you are encouraged to change the admin password by modifying the appropriate field in the previous dialog box. To make your changes persistent, you have to click on the “Commit changes” button of the “Repository explorer” window.

Three profiles are defined by default: Administrator, Read-only and User. Members of the Administrator group are granted all privileges: they can add/delelte/modify jobs, transformations, profiles and users. A User (read a member of this group) can use and modify all transformations/jobs stored in the repository but can not add/modify profiles or users (except its own user account). Members of the Read-only group can not use Spoon to edit a job/transformation but can run transformations/jobs from the Pan or Kitchen command line tools.

As an example, you (as a guest user with Read-only privileges) can list the transformations that are stored in the root directory of your repository by issuing the following command in a terminal:

./pan.sh -rep:"GeoKettle jobs/transformations repositiory" -dir:"/" -user:"guest" -pass:"guest" -level=Basic -listtrans=Y

To execute (as guest) a transformation named “intersection” and stored in the “samples” directory of your repository, please type:

./pan.sh -rep:"GeoKettle jobs/transformations repositiory" -dir:"/samples" -user:"guest" -pass:"guest" -level=Basic -trans:"intersection"

As a consequence, job/transformation editors should belong to the User profile, while it is useful and secure to define a user with a Read-only profile whom credentials will be used just to run some jobs or transformations stored in the repository in batch mode via Kitchen/Pan tools or as a scheduled task (e.g. via a crontab entry on Linux) or even as a web service via the Carte tool provided with GeoKettle. For security reasons, we also encourage you to change the username and the password of the guest account, by creating a new user that belongs to the Read-only group and by deleting the guest account.

Obviously, you can also define new profiles that fulfil your requirements (eg. have separate job and transformation editors), just by selecting appropriate permissions in the “Profile information” dialog box associated to your new profile as illustrated in the figure just below.

Once created, do not forget to assign some users to this new profile in order to grant them appropriate permissions.

Populate the repository with a new or existing transformation

When you are connected to a repository, you can not anymore load a job or a transformation file, i.e. XML files with .kjb or .ktr extensions. But, you can import them into the repository via the “Import from an XML file” menu item in the “File” menu of GeoKettle.

Select a transformation file (i.e. a file with .ktr extension) on your local drive. You can find sample GeoKettle transformations into the samples/transformations/geokettle subdirectory.

The selected transformation opens in the Spoon workbench. Save it (Ctrl-S or File → Save) and the transformation will be automatically transferred into the repository. By default, all new transformations are stored into the root (”/”) directory of the repository. You can easily drag-and-drop it into another repository folder, as for instance illustrated hereafter with the “intersection” transformation stored in the “samples” directory.

As you can see, the repository explorer provide you with some details about the transformation such as which user have committed the last version of the transformation, at what date/time, etc.

Load a job/transformation from the repository

Once connected to a repository, it is very easy to open and load a job or a transformation. Just select the “Open” item into the “File” menu of GeoKettle. A “Select repository object” window will open and will allow you to choose a job or a transformation stored in the repository. Double click on one item and it will open in the Spoon interface.

Disconnect from the repository

When you have finished your work session, you must disconnect from the repository. For that, just click on the “Disconnect repository” item in the “Repository” menu of the Spoon interface and it is done!

en/spatialytics_etl/007_spatialytics_in_details/repository.txt · Last modified: 2013/02/06 16:29 by sbedard

Page Tools