Apache Hop - Use Hop GUI with Remote Pipeline Engine Setup
Here I explain how you can use the Hop GUI to run pipelines or workflows remotely using the Hop Server. For that I will add the Hop Server
and a Remote Pipeline Run Configuration
within the client as metadata.
Apache Hop is an open source data integration platform and a fork of Pentaho Data Integration. In recent years I have enjoyed using Pentaho both at home and professionally and I’m very excited about Hop.
For those who know Pentaho, much will be familiar. You can read about the differences (and similarities) here.
Finally don’t be fooled by the slightly dated user interface. Give it a chance because under the hood Hop is a very powerful and efficient data integration platform. Despite the fact that I have tested all kinds of innovative BI software I always ended up using Pentaho, and now Apache Hop, to build (more complex) ETL processes.
Dependencies
Make sure the Hop Server is running and you can access the Hop Server status via the following URL:
http://<IP DOCKER HOST>:8182/hop/status
Login with your username and password. This will give you an overview of the pipelines and workflows after these are executed through the server.
Hop GUI
To build pipelines and workflows you need to download the Hop GUI. Or use Hop Web (in development). I will use the Hop GUI in Windows and start hop-gui.bat
-
Create a new project:
-
(optional) Create lifecycle environments if necessary. For each project I create a
development
environment andproduction
environment by default. You can configure variables for each environment. For example, if I want to connect to a sandbox API I use the development environment. - Click the
metadata
button all the way left and in the list right-clickHop Server
and choose forNew
. Enter the following:- Server name: for example
Ubuntu VM Hop Server
- Hostname or IP address: the IP of your Docker host
- Port: the Docker port, for example
8182
- Username: your username, for example
admin
- Password: your password, for example
admin
Close theNew Hop Server
tab and confirm you want to save the new Hop Server
Now the newly created server is listed underHop Server
- Server name: for example
- In the same metadata list we are now going to add a new
Pipeline Run Configuration
. Right-clickPipeline Run Configuration
and choose forNew
. Enter the following:- Name:
remote hop server
- Engine type:
Hop remote pipeline engine
- Hop server: choose the server you just created, for example
Ubuntu VM Hop Server
- Run configuration:
local
Close theNew Pipeline Run Configuration
tab and confirm you want to save the new configuration
Now your metadata looks like this:
- Name:
- Click the
Data Orchestration
button all the way left (above the metadata button) and click theOpen...
button - Navigate to the transforms folder of the samples project. In my case the full path is:
C:\hop\config\projects\samples\transforms
- Open (double-click)
fake-data-generate-person-record.hpl
- Now click the
Start execution of the pipeline
blue play button under the tabfake-data-generate-person-record
-
You will see the
Run Options
window. Selectremote hop server
asPipeline run configuration
and click on theLaunch
button: - The
Results Pane
will now open at the bottom of the screen. Under theLogging
tab you can see the output, in this case the random person records - Now look in the browser at the server status (example url:
http://<IP DOCKER HOST>:8182/hop/status
). You will see that the pipeline is now added with statusFinished
:
It is also nice to know that by selecting the pipeline and then clicking the View pipeline details
(eye symbol) button you can view the result in XML or JSON format and also the Pipeline log. Furthermore, the metrics such as numbers and speed are also displayed and even a preview of the canvas with the transforms.
You can use the remote workflow engine in a similar way. I am also very curious about other possibilities to use the Hop Server. My next experiment will be running a Hop Web Service
Tip: If you want to use Version Control, don’t forget the option to use GIT!
Read other notes
Tags
Notes mentioning this note
- Docker - Hop Server Container Setup
Here I describe my setup of the Docker Apache Hop Server container. Apache Hop is an open source data integration...
Comments
No comments found for this note.
Join the discussion for this note on this ticket. Comments appear on this page instantly.