Apache Hop - Use Hop GUI with Remote Pipeline Engine Setup


Here I explain how you can use the Hop GUI to run pipelines or workflows remotely using the Hop Server. For that I will add the Hop Server and a Remote Pipeline Run Configuration within the client as metadata.

Apache Hop is an open source data integration platform and a fork of Pentaho Data Integration. In recent years I have enjoyed using Pentaho both at home and professionally and I’m very excited about Hop.

For those who know Pentaho, much will be familiar. You can read about the differences (and similarities) here.

Finally don’t be fooled by the slightly dated user interface. Give it a chance because under the hood Hop is a very powerful and efficient data integration platform. Despite the fact that I have tested all kinds of innovative BI software I always ended up using Pentaho, and now Apache Hop, to build (more complex) ETL processes.

Dependencies

Make sure the Hop Server is running and you can access the Hop Server status via the following URL:

http://<IP DOCKER HOST>:8182/hop/status

Login with your username and password. This will give you an overview of the pipelines and workflows after these are executed through the server.

Hop GUI

To build pipelines and workflows you need to download the Hop GUI. Or use Hop Web (in development). I will use the Hop GUI in Windows and start hop-gui.bat

  1. Create a new project:

  2. (optional) Create lifecycle environments if necessary. For each project I create a development environment and production environment by default. You can configure variables for each environment. For example, if I want to connect to a sandbox API I use the development environment.

  3. Click the metadata button all the way left and in the list right-click Hop Server and choose for New. Enter the following:
    • Server name: for example Ubuntu VM Hop Server
    • Hostname or IP address: the IP of your Docker host
    • Port: the Docker port, for example 8182
    • Username: your username, for example admin
    • Password: your password, for example admin Close the New Hop Server tab and confirm you want to save the new Hop Server
      Now the newly created server is listed under Hop Server
  4. In the same metadata list we are now going to add a new Pipeline Run Configuration. Right-click Pipeline Run Configuration and choose for New. Enter the following:
    • Name: remote hop server
    • Engine type: Hop remote pipeline engine
    • Hop server: choose the server you just created, for example Ubuntu VM Hop Server
    • Run configuration: local
      Close the New Pipeline Run Configuration tab and confirm you want to save the new configuration
      Now your metadata looks like this:
  5. Click the Data Orchestration button all the way left (above the metadata button) and click the Open... button
  6. Navigate to the transforms folder of the samples project. In my case the full path is: C:\hop\config\projects\samples\transforms
  7. Open (double-click) fake-data-generate-person-record.hpl
  8. Now click the Start execution of the pipeline blue play button under the tab fake-data-generate-person-record
  9. You will see the Run Options window. Select remote hop server as Pipeline run configuration and click on the Launch button:

  10. The Results Pane will now open at the bottom of the screen. Under the Logging tab you can see the output, in this case the random person records
  11. Now look in the browser at the server status (example url: http://<IP DOCKER HOST>:8182/hop/status). You will see that the pipeline is now added with status Finished:

It is also nice to know that by selecting the pipeline and then clicking the View pipeline details (eye symbol) button you can view the result in XML or JSON format and also the Pipeline log. Furthermore, the metrics such as numbers and speed are also displayed and even a preview of the canvas with the transforms.

You can use the remote workflow engine in a similar way. I am also very curious about other possibilities to use the Hop Server. My next experiment will be running a Hop Web Service

Tip: If you want to use Version Control, don’t forget the option to use GIT!


Read other notes

Comments

    No comments found for this note.

    Join the discussion for this note on this ticket. Comments appear on this page instantly.

    Tags


    Notes mentioning this note


    Notes Graph