Automate and Orchestrate your workflow using Kestra

Automate and Orchestrate your workflow using Kestra

In blog, I am gonna introduce you to an Open Source Platform (Kestra). Which let you to manage your workflow using a simple YAML file.

·

8 min read

Before diving into the world of Kestra, I want to thank Kunal Kushwaha for organizing HackFrost hackathon and introducing us to such an amazing tool. I loved to use Kestra ❤️ throughout the hackathon and will definately use Kestra for my future projects as well. So lets get started.

Introduction

Kestra is a open source orchestration platform for making and managing your workflows. And to make any workflows in Kestra you just need to write YAML. Nothing fancy over here. Isn’t it cool ?

Not only this, in kestra you can schedule your workflows using cron jobs ( cron jobs are use to schedule some task for execution in specific time interval ). And also you can trigger a flow when somethings happens on other platforms like AWS, GitHub etc.

Kestra also has hundreds of plugins for different usecases. Like for connecting different services such as MongoDB, dynamoDB, ApacheKafka etc. Which you can integrate to your workflow.

Here is the official site of Kestra: kestra.io


Running Kestra

Kestra has a cloud hosted platform but it is in private alpha stage ( eagerly waiting for the public launch ). But you can run Kestra on your local device using Docker. Although there are other methods to run kestra, which you can checkout here.

Prerequisites

  • Docker installed on your machine

  • Smile on your face 😄

Steps

  1. Spinning up docker container for kestra in your terminal by the below command
docker run --pull=always --rm -it -p 8080:8080 --user=root -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp kestra/kestra:latest server local

Note: By this method your works in the Kestra will not be persistent. So make sure to store the YAML files on separate folder.

  1. Now go to your browser and enter localhost:8080 in the address bar, hit enter.

Now you are good to get started with Kestra.

Let’s now understand some key terms in Kestra.


Flows

In Kestra, Flows defines how your workflow will looklike in a YAML file format. And after writing the flow in YAML, you can also see its topology or visual representation of the flow in the Topology section. A YAML file for the flow keeps it language agnostic.

A simple example of Kestra flow is given below:

id: first_flow_example
namespace: ronak.blog
description: "Flow is to just log a message"

tasks:
  - id: logging_task
    type: io.kestra.plugin.core.log.Log
    message: "Successfully executed a kestra flow with log"

In the above kestra flow there are certain keywords which I am explaining in detail below:

In each flow, there are three required fields: id, namespace, tasks

  • id: The id field is use to give a name to a flow. Each flow’s id in the same namespace should be different but for different namespaces it could be same as well.

  • namespace: You can think namespaces as like a folder structure. Where you can segregate different flows according to their namespace. And you can’t change a flow’s namespace once it is created or saved.

  • tasks: The tasks field is use to define one or multiple tasks for the flow. And the tasks will execute as per the order i.e. top to bottom. Each task has its own id, type, and other fields associated with that type. In the type field you can choose what kind of task you want to do like fetching documents from mongodb or dynamodb etc. Each task type is already predefined. You can see the documentation to see all the task types, although I will be explaining some of the popular and frequently used task types in this blog.

  • description: The description field is a optional field, which is use to just give a description to the flow for much more clarity. We can give description to each tasks as well.


Task Types

Kestra provides multiple task types like Core, Scripts, Internal Storage, KV Store, Plugins. Some of the popular and frequently used task types are given below:

  • io.kestra.plugin.core.log.Log: This task type is use to log messages in the Kestra log section while executing a flow. Checkout Doc

  • io.kestra.plugin.core.http.Requests: This task type is use to make a http request for the given uri. Checkout Doc

  • io.kestra.plugin.scripts.python.Script: This task type is use to run Python code in a task. Checkout Doc

  • io.kestra.plugin.mongodb.Find: This task type is use to fetch documents from a given mongodb database uri. Checkout Doc

  • io.kestra.plugin.git.Clone: This task type is use to clone a repository from GitHub. Checkout Doc

  • io.kestra.plugin.scripts.node.Script: This task type is use to run Javascript on Node runtime in a task. Checkout Doc

For all the plugins and task types checkout here.

Note: I will be writing a separate blog on various cool task types


Inputs

As we take inputs in programming languages to make our program more dynamic. We. can also take inputs in Kestra workflow to make our workflow more dynamic. The concept is simple you can take input from the user and depending on that you can execute your flow.

In kestra, inputs is a key-value pair, which contains id, type, defaults fields in YAML.

Below is a simple flow with inputs:

id: flow-with-inputs
namespace: ronak.blog

inputs:
  - id: name
    type: STRING
    default: "Ronak"

  - id: age
    type: INT
    default: 20

tasks:
  - id: logging_task
    type: io.kestra.plugin.core.log.Log
    message: "My name is: {{inputs.name}} and age is {{inputs.age}}"

In the above YAML file, we have inputs field which contains two types of inputs one is name (STRING) and age (INT). And we are accessing the inputs in our tasks by the following syntax inputs.[input_id]

We can do more complex kind of inputs as well like SELECT, DATE etc. But I will discuss those in a separate blog in depth.

But if you want to read you can checkout here.


Outputs

As the name suggests, outputs in Kestra are use to pass values from one task or flow to another. There are various methods to pass outputs from one task or flow to another for different situations or usecases.

For example you can to fetch some data from the database and want to process that data in python. In this scenario you will make two tasks: one for fetching the data’s from the database and other for processing the data in python. But before processing the data, you have to pass the fetched data to the processing task, for that you will use outputs in kestra.

Below is a very simple example of outputs:

id: flow-with-outputs
namespace: ronak.blog

inputs:
  - id: api_url
    type: STRING
    defaults: https://example.com/

tasks:
  - id: fetch-data
    type: io.kestra.plugin.core.http.Request
    uri: "{{ inputs.api_url }}"

  - id: log-fetched-data
    type: io.kestra.plugin.core.log.Log
    message: "Fetch data body: {{outputs.fetch-data.body}}"

In the above YAML file, there are two tasks: fetch-data and log-fetched-data. And we are accessing the outputs of fetch-data by the following syntax:

outputs.[task_id].[output_value]

Note: Other task types outputs can be pass in different ways but the syntax for accessing it will be the same.


Triggers

Trigger’s are one of the great feature of Kestra (My favorite ❤️). The trigger’s are use to trigger a flow in Kestra on the basis of some events. Suppose you wants to trigger a flow when some image is uploaded to your AWS S3 bucket for processing it and then saving to S3. In this case you can use trigger to make you flow.

There are various kinds of triggers available in Kestra. And those are:

  • Schedule Trigger: This kind of trigger is use to schedule your flow for execution at certain interval of time using cron jobs.

  • Webhook Trigger: This kind of trigger is use to execute a flow on the basis of API requests from other sites or platform. (Suppose running a flow when a user clicks on a button on your site’s frontend)

  • Polling Trigger: This kind of trigger polls a external system for presence of data and on the basis of that it executes a flow.

  • Plugin based Triggers: Some plugin’s also has its own special kinds of trigger like S3’s trigger and others.

  • Flow Trigger: This kind of trigger are use to execute a flow when another flow finishes its execution.

  • Realtime Tigger: This kind of trigger can be use when we need realtime behaviour in flow execution.

Overall, each trigger has its own uniqueness and usecases.

Below is an example of a simple Schedule Trigger:

id: flow_with_triggers
namespace: ronak.blog
tasks:
  - id: log_task
    type: io.kestra.plugin.core.log.Log
    message: This is my first trigger flow

triggers:
  - id: schedule_trigger
    type: io.kestra.plugin.core.trigger.Schedule
    cron: 0 8 * * * # Everyday 8 AM

In the above YAML file, the flow will execute everyday 8 AM.


Error and Retries

Error and Retries are also a imporant feature in Kestra. We know that error are found to happen in any system. So Kestra provides a way to do some task on a error in the flow or namespace like sending a Email or Slack message on getting a error in the flow.

And also you can retry the execution of the flow on encountering with an error.

Below is a simple example of error handling flow:

id: flow_with_error
namespace: ronak.blog

tasks:
  - id: fail
    type: io.kestra.plugin.core.execution.Fail # for forced failure

errors:
  - id: alert_on_failure
    type: io.kestra.plugin.notifications.slack.SlackIncomingWebhook
    url: "{{ secret('SLACK_WEBHOOK') }}"
    payload: |
      {
        "channel": "#alerts",
        "text": "Failure alert for flow {{ flow.namespace }}.{{ flow.id }} with ID {{ execution.id }}"
      }

In the above YAML file, the flow will give an error and after that a slack message will be send to the given SLACK_WEBHOOK with the respective message.

And a simple example of retry flow:

id: flow_with_retry
namespace: ronak.blog

tasks:
  - id: api_request
    type: io.kestra.plugin.core.http.Request
    uri: https://example.com/products
    retry:
      type: constant # type: string
      interval: PT20S # type: Duration
      maxDuration: PT1H # type: Duration
      maxAttempt: 10 # type: int
      warningOnRetry: true # type: boolean, default is false

In the above YAML file, the flow will request at the given http endpoint and retry for every 20 seconds upto 1 hour and maximum of 10 attempts until the endpoint is hitted successfully.


Conclusion

So far I loved Kestra. And the thing that it is open source, exicites me more. In my future blogs I will be writing more on this tool from concept deep dive to actual project implementation. Till then Byy 👋

I hope you liked the blog. Let me know what do you think about Kestra.