Contents

ArgoWorkflow教程(八)---基于 LifecycleHook 实现流水线通知提醒

https://img.lixueduan.com/devops/argo-workflow/cover/argoworkflow-8-workflow-notify.png

本篇介绍如何使用 ArgoWorkflow 中的 ExitHandler 和 LifecycleHook 功能,可以根据流水线每一步的不同状态,执行不同操作,一般用于发送通知。

1. 概述

ArgoWorkflow 提供了 ExitHandler 和 LifecycleHook 功能,可以根据流水线每一步的不同状态,执行不同操作,一般用于发送通知。

比如当某个步骤,或者某个 Workflow 执行失败时,发送邮件通知。

在 ArgoWorkflow 不同版本中中有两种实现方式:

  • 1)v2.7 版本开始提供了 exit handler 功能,可以指定一个在流水线运行完成后执行的模板。同时这个模板中还可以使用 when 字段来做条件配置,以实现比根据当前流水线运行结果来执行不同流程。
    • 已废弃,v3.3 版本后不推荐使用
  • 2)v.3.3 版本新增 LifecycleHook,exit handler 功能则不推荐使用了,LifecycleHook 提供了更细粒度以及更多功能,exit handler 可以看做是一个简单的 LifecycleHook。

2. ExitHandler

虽然官方已经不推荐使用该功能了,但是还是简单介绍一下。

ArgoWorkflow 提供了 spec.onExit 字段,可以指定一个 template,当 workflow 执行后(不论成功或者失败)就会运行 onExit 指定的 template。

类似于 Tekton 中的 finally 字段

同时这个 template 中可以使用 when 字段来做条件配置。比如根据当前流水线运行结果来执行不同流程。

比如下面这个 Demo,完整 Workflow 内容如下:

# An exit handler is a template reference that executes at the end of the workflow
# irrespective of the success, failure, or error of the primary workflow. To specify
# an exit handler, reference the name of a template in 'spec.onExit'.
# Some common use cases of exit handlers are:
# - sending notifications of workflow status (e.g. e-mail/slack)
# - posting the pass/fail status to a webhook result (e.g. github build result)
# - cleaning up workflow artifacts
# - resubmitting or submitting another workflow
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: exit-handlers-
spec:
  entrypoint: intentional-fail
  onExit: exit-handler
  templates:
    # primary workflow template
    - name: intentional-fail
      container:
        image: alpine:latest
        command: [sh, -c]
        args: ["echo intentional failure; exit 1"]

    # exit handler related templates
    # After the completion of the entrypoint template, the status of the
    # workflow is made available in the global variable {{workflow.status}}.
    # {{workflow.status}} will be one of: Succeeded, Failed, Error
    - name: exit-handler
      steps:
        - - name: notify
            template: send-email
          - name: celebrate
            template: celebrate
            when: "{{workflow.status}} == Succeeded"
          - name: cry
            template: cry
            when: "{{workflow.status}} != Succeeded"
    - name: send-email
      container:
        image: alpine:latest
        command: [sh, -c]
        # Tip: {{workflow.failures}} is a JSON list. If you're using bash to read it, we recommend using jq to manipulate
        # it. For example:
        #
        # echo "{{workflow.failures}}" | jq -r '.[] | "Failed Step: \(.displayName)\tMessage: \(.message)"'
        #
        # Will print a list of all the failed steps and their messages. For more info look up the jq docs.
        # Note: jq is not installed by default on the "alpine:latest" image, however it can be installed with "apk add jq"
        args: ["echo send e-mail: {{workflow.name}} {{workflow.status}} {{workflow.duration}}. Failed steps {{workflow.failures}}"]
    - name: celebrate
      container:
        image: alpine:latest
        command: [sh, -c]
        args: ["echo hooray!"]
    - name: cry
      container:
        image: alpine:latest
        command: [sh, -c]
        args: ["echo boohoo!"]

首先是通过 spec.onExit 字段配置了一个 template

spec:
  entrypoint: intentional-fail
  onExit: exit-handler

这个 template 内容如下:

    - name: exit-handler
      steps:
        - - name: notify
            template: send-email
          - name: celebrate
            template: celebrate
            when: "{{workflow.status}} == Succeeded"
          - name: cry
            template: cry
            when: "{{workflow.status}} != Succeeded"

内部包含 3 个步骤,每个步骤又是一个 template:

  • 1)发送邮件,无论成功或者失败
  • 2)若成功则执行 celebrate
  • 3)若失败则执行 cry

该 Workflow 不论执行结果如何,都会发送邮件,邮件内容包含了任务的执行信息,若是执行成功则会额外打印执行成功,若是执行失败则会打印执行失败。

为了简单,这里所有操作都使用 echo 命令进行模拟

由于在主 template 中最后执行的是 exit 1 命令,因此会判断为执行失败,会发送邮件并打印失败信息,Pod 列表如下:

[root@argo-1 lifecyclehook]# k get po
NAME                                              READY   STATUS      RESTARTS        AGE
exit-handlers-44ltf                               0/2     Error       0               2m45s
exit-handlers-44ltf-cry-1621717811                0/2     Completed   0               2m15s
exit-handlers-44ltf-send-email-2605424148         0/2     Completed   0               2m15s

各个 Pod 日志

[root@argo-1 lifecyclehook]# k logs -f exit-handlers-44ltf-cry-1621717811
boohoo!
time="2024-05-25T11:34:39.472Z" level=info msg="sub-process exited" argo=true error="<nil>"
[root@argo-1 lifecyclehook]# k logs -f exit-handlers-44ltf-send-email-2605424148
send e-mail: exit-handlers-44ltf Failed 30.435347. Failed steps [{"displayName":"exit-handlers-44ltf","message":"Error (exit code 1)","templateName":"intentional-fail","phase":"Failed","podName":"exit-handlers-44ltf","finishedAt":"2024-05-25T11:34:16Z"}]
time="2024-05-25T11:34:44.424Z" level=info msg="sub-process exited" argo=true error="<nil>"
[root@argo-1 lifecyclehook]# k logs -f exit-handlers-44ltf
intentional failure
time="2024-05-25T11:34:15.856Z" level=info msg="sub-process exited" argo=true error="<nil>"
Error: exit status 1

至此,这个 exitHandler 功能就可以满足我们基本的通知需求了,比如将结果以邮件发出,或者对接外部系统 Webhook,更加复杂的需求也可以实现。

不过存在一个问题,就是 exitHandler 是 Workflow 级别的,只能整个 Workflow 执行完成才会执行 exitHandler。

如果想要更细粒度的,比如 template 级别则做不到,v3.3 中提供的 LifecycleHook 则实现了更加细粒度的通知。

3. LifecycleHook

LifecycleHook 可以看做是一个比较灵活的 exit hander,官方描述如下:

Put differently, an exit handler is like a workflow-level LifecycleHook with an expression of workflow.status == "Succeeded" or workflow.status == "Failed" or workflow.status == "Error".

LifecycleHook 有两种级别:

  • Workflow 级别
  • template 级别

Workflow 级别

Workflow 级别的 LifecycleHook 和 exitHandler 基本类似。

下面就是一个 Workflow 级别的 LifecycleHook Demo,完整 Workflow 内容如下:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
    exit: # Exit handler
      template: http
    running:
      expression: workflow.status == "Running"
      template: http
  templates:
    - name: main
      steps:
      - - name: step1
          template: heads
    
    - name: heads
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was heads\""]
    
    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

首先是配置 hook

spec:
  entrypoint: main
  hooks:
    exit: # Exit handler
      template: http
    running:
      expression: workflow.status == "Running"
      template: http

可以看到,原有的 onExit 被 hooks 字段替代了,同时 hooks 字段支持指定多个 hook,每个 hook 中可以通过 expression 设置不同的条件,只有满足条件时才会执行。

这里的 template 则是一个内置的 http 类型的 template

    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

该 Workflow 的主 template 比较简单,就是使用 echo 命令打印一句话,因此会执行成功,那么 hooks 中的两个 hooks 都会执行。

两个 hook 对应的都是同一个 template,因此会执行两遍。

template 级别

template 级别的 hooks 则是提供了更细粒度的配置,比如可能用户比较关心 Workflow 中某一个步骤的状态,可以单独为该 template 设置 hook。

下面是一个template 级别的 hooks demo,Workflow 完整内容如下:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-tmpl-level-
spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: step-1
            hooks:
              running: # Name of hook does not matter
                # Expr will not support `-` on variable name. Variable should wrap with `[]`
                expression: steps["step-1"].status == "Running"
                template: http
              success:
                expression: steps["step-1"].status == "Succeeded"
                template: http
            template: echo
        - - name: step2
            hooks:
              running:
                expression: steps.step2.status == "Running"
                template: http
              success:
                expression: steps.step2.status == "Succeeded"
                template: http
            template: echo

    - name: echo
      container:
        image: alpine:3.6
        command: [sh, -c]
        args: ["echo \"it was heads\""]

    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

内容和 Workflow 级别的 Demo 差不多,只是 hooks 字段的位置不同

spec:
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: step-1
            hooks:
              # ...
            template: echo
        - - name: step2
            hooks:
						  # ...
            template: echo

在 spec.templates 中我们分别为不同的步骤配置了 hooks,相比与 exiHandler 则更加灵活。

如何替代 exitHandler

LifecycleHook 可以完美替代 Exit Handler,就是把 Hook 命名为 exit,虽然 hook 的命名无无关紧要,但是如果是 exit 则是会特殊处理。

官方原文如下:

You must not name a LifecycleHook exit or it becomes an exit handler; otherwise the hook name has no relevance.

这个 exit 直接是写死在代码里的,具体如下:

const (
    ExitLifecycleEvent = "exit"
)

func (lchs LifecycleHooks) GetExitHook() *LifecycleHook {
    hook, ok := lchs[ExitLifecycleEvent]
    if ok {
       return &hook
    }
    return nil
}

func (lchs LifecycleHooks) HasExitHook() bool {
    return lchs.GetExitHook() != nil
}

那么我们只需要将 LifecycleHook 命名为 exit 即可替代 exit handler,就像这样:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: lifecycle-hook-
spec:
  entrypoint: main
  hooks:
    exit: # if named exit, it'a an Exit handler
      template: http
  templates:
    - name: main
      steps:
      - - name: step1
          template: heads
    - name: http
      http:
        # url: http://dummy.restapiexample.com/api/v1/employees
        url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"

4. 常见通知模板

通知一般支持 webhook、email、slack、微信通知等方式。

在 ArgoWorkflow 中则是准备对应的模板即可。

Webhook

这应该是最通用的一种方式,收到消息后具体做什么事情,可以灵活的在 webhook 服务调整。

对于 ArgoWorkflow 模板就是执行 curl 命令即可,因此只需要一个包含 curl 工具的容器

apiVersion: argoproj.io/v1alpha1
kind: ClusterWorkflowTemplate
metadata:
  name: step-notify-webhook
spec:
  templates:
    - name: webhook
      inputs:
        parameters:
          - name: POSITIONS # 指定什么时候运行,多个以逗号隔开,例如:Pending,Running,Succeeded,Failed,Error
            value: "Succeeded,Failed,Error"
          - name: WEBHOOK_ENDPOINT
          - name: CURL_VERSION
            default: "8.4.0"

      container:
        image: curlimages/curl:{{inputs.parameters.CURL_VERSION}}
        command: [sh, -cx]
        args: [
          "curl -X POST  -H \"Content-type: application/json\" -d '{
          \"message\": \"{{workflow.name}} {{workflow.status}}\",
          \"workflow\": {
                \"name\": \"{{workflow.name}}\",
                \"namespace\": \"{{workflow.namespace}}\",
                \"uid\": \"{{workflow.uid}}\",
                \"creationTimestamp\": \"{{workflow.creationTimestamp}}\",
                \"status\": \"{{workflow.status}}\"
              }
        }'
        {{inputs.parameters.WEBHOOK_ENDPOINT}}"
        ]

Email

对于邮件方式,这里简单提供一个使用 Python 发送邮件的 Demo。

# use golangcd-lint for lint
apiVersion: argoproj.io/v1alpha1
kind: ClusterWorkflowTemplate
metadata:
  name: step-notify-email
spec:
  templates:
    - name: email
      inputs:
        parameters:
          - name: POSITIONS # 指定什么时候运行,多个以逗号隔开,例如:Pending,Running,Succeeded,Failed,Error
            value: "Succeeded,Failed,Error"
          - name: CREDENTIALS_SECRET
          - name: TO # 收件人邮箱
          - name: PYTHON_VERSION
            default: "3.8-alpine"
      script:
        image: docker.io/library/python:{{inputs.parameters.PYTHON_VERSION}}
        command: [ python ]
        env:
          - name: TO
            value: '{{inputs.parameters.TO}}'
          - name: HOST
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: host
          - name: PORT
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: port
          - name: FROM
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: from
          - name: USERNAME
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: username
          - name: PASSWORD
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: password
          - name: TLS
            valueFrom:
              secretKeyRef:
                name: '{{inputs.parameters.CREDENTIALS_SECRET}}'
                key: tls
        source: |
          import smtplib
          import ssl
          import os
          from email.header import Header
          from email.mime.text import MIMEText

          smtp_server = os.getenv('HOST')
          port = os.getenv('PORT')
          sender_email = os.getenv('FROM')
          receiver_emails = os.getenv('TO')
          user = os.getenv('USERNAME')
          password = os.getenv('PASSWORD')
          tls = os.getenv('TLS')

          # 邮件正文,文本格式
          # 构建邮件消息
          workflow_info = f"""\
            "workflow": {{
              "name": "{{workflow.name}}",
              "namespace": "{{workflow.namespace}}",
              "uid": "{{workflow.uid}}",
              "creationTimestamp": "{{workflow.creationTimestamp}}",
              "status": "{{workflow.status}}"
            }}
          """
          msg = MIMEText(workflow_info, 'plain', 'utf-8')
          # 邮件头信息
          msg['From'] = Header(sender_email)  # 发送者
          msg['To'] = Header(receiver_emails)  # 接收者
          subject = '{{workflow.name}} {{workflow.status}}'
          msg['Subject'] = Header(subject, 'utf-8')  # 邮件主题
          if tls == 'True':
            context = ssl.create_default_context()
            server = smtplib.SMTP_SSL(smtp_server, port, context=context)
          else:
            server = smtplib.SMTP(smtp_server, port)

          if password != '':
            server.login(user, password)

          for receiver in [item for item in receiver_emails.split(' ') if item]:
            server.sendmail(sender_email, receiver, msg.as_string())

            server.quit()          

5. 小结

本文主要分析了 Argo 中的通知触发机制,包括旧版的 exitHandler 以及新版的 LifecycleHook,并提供了几个简单的通知模板。

最后则是推荐使用更加灵活的 LifecycleHook。