Skip to content

Apache Hop GUI remote pipeline engine setup

Introduction

The Hop GUI enables you to run pipelines or workflows remotely using the Hop Server. By configuring and managing your projects, environments, and pipelines within the GUI, you can seamlessly execute and monitor them through the Hop Server.

Don’t let the slightly dated user interface deceive you. Give Hop a chance because, beneath the surface, it’s a highly powerful and efficient data integration platform. Even though I’ve experimented with various innovative BI software, I consistently return to Pentaho, and now Apache Hop, for building complex ETL processes.

Requirements

Make sure the Hop Server is running and you can access the Hop Server status. Login with your username and password. This will give you an overview of the pipelines and workflows after these are executed through the server.

Setup

To build pipelines and workflows, you need to download the Hop GUI. Alternatively, you can use Hop Web (currently in development). I’ll be using the Hop GUI on Windows and starting it with hop-gui.bat.

  1. Create a new project: New Hop project

  2. (optional) Create lifecycle environments if necessary. For each project I create a development environment and production environment by default. You can configure variables for each environment. For example, if I want to connect to a sandbox API I use the development environment. Hop env

  3. Click the metadata button all the way left and in the list right-click Hop Server and choose for New. Enter the following:

    • Server name: for example Ubuntu VM Hop Server
    • Hostname or IP address: the IP of your Docker host
    • Port: the Docker port, for example 8182
    • Username: your username, for example admin
    • Password: your password, for example admin Close the New Hop Server tab and confirm you want to save the new Hop Server. Now the newly created server is listed under Hop Server
  4. In the same metadata list we are now going to add a new Pipeline Run Configuration. Right-click Pipeline Run Configuration and choose for New. Enter the following:

    • Name: remote hop server
    • Engine type: Hop remote pipeline engine
    • Hop server: choose the server you just created, for example Ubuntu VM Hop Server
    • Run configuration: local Close the New Pipeline Run Configuration tab and confirm you want to save the new configuration. Now your metadata looks like this: Hop metadata
  5. Click the Data Orchestration button all the way left (above the metadata button) and click the Open... button

  6. Navigate to the transforms folder of the samples project. In my case the full path is: C:\hop\config\projects\samples\transforms

  7. Open (double-click) fake-data-generate-person-record.hpl

  8. Now click the Start execution of the pipeline blue play button under the tab fake-data-generate-person-record

  9. You will see the Run Options window. Select remote hop server as Pipeline run configuration and click on the Launch button: Hop run pipeline remote

  10. The Results Pane will now open at the bottom of the screen. Under the Logging tab you can see the output, in this case the random person records

  11. Now look in the browser at the server status (example url: http://<Hop server IP>:8182/hop/status). You will see that the pipeline is now added with status Finished: Hop server status

You can use the remote workflow engine in a similar way. I am also very curious about other possibilities to use the Hop Server. My next experiment will be running a Hop Web Service.

Comments

    No comments found for this note.

    Join the discussion for this note on Github. Comments appear on this page instantly.

    Copyright 2021- Fiction Becomes Fact