How to deploy airflow in kubernetes cluster that uses istio

90 Views Asked by At

I am trying to deploy Airflow on Kubernetes with Istio. Here is my VirtualService config:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-virtualservice
  namespace: mynamespace
spec:
  hosts:
  - "myapp.example.com"
  gateways:
  - mygateway
  http:
  - match: 
    - uri:
        prefix: /api/v1/
    route: 
    - destination: 
        host: backend-service
        port:
          number: 8000
  - match:
    - uri:
        prefix: /airflow/home/
    rewrite:
      uri: /home
    route:
    - destination:
        host: airflow-service
        port:
          number: 8080
  
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-service
        port:
          number: 443

So when I access https://myapp.example.com/airflow/home/, it reaches my airflow webserver in the pod, and I can see this log:

10.196.182.95 - - [20/Mar/2024:15:51:39 +0530] "GET /home HTTP/1.1" 302 319 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0"

But then it tries redirecting to the login page based on headers Location: https://myapp.example.com/login/?next=https%3A%2F%2Fmyapp.example.com%2Fhome but it cannot find it, I think. So, then it redirects to https://myapp.example.com/404?next=https:%2F%2Fmyapp.example.com%2Fhome, and that's it. I cannot reach airflow UI at all, getting 404 error all the time.

How to fix the redirection in this case?

Here is my airflow.cfg for webserver:

[webserver]
base_url = https://myapp.example.com/airflow/
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_worker_timeout = 1200
enable_proxy_fix = True
web_server_ssl_cert = /airflow/cert/tls.crt
web_server_ssl_key =  /airflow/cert/tls.key

I tried accessing the webserver without istio:

kubectl port-forward svc/airflow-service 8080:8080

and I was able to reach the airlfow UI and the login page on localhost:8080 locally on my machine, so it seems that airflow is setup correctly but something might be wrong with istio. Any ideas?

EDIT: This approach worked but its not very clean and honestly I would prefer to have something that works in a proper way:

  - match:
      - uri:
          prefix: /airflow/home/
      - uri:
          prefix: /airflow/home
    rewrite:
      uri: /home
    route:
      - destination:
          host: airflow-service
          port:
            number: 8080
  - match:
      - uri:
          prefix: /login
    rewrite:
      uri: /login
    route:
      - destination:
          host: airflow-service
          port:
            number: 8080

So in this setup I can go to myapp.example.com/login, then login to airflow first. Then it will redirect me to myapp.example.com/home (which is page not in airflow app, just my base app). But since I am logged in I can access now myapp.example.com/airflow/home and the airflow app will not redirect me to myapp.example.com/login page anymore which caused 404 previously and I am able to use airflow.

But it would be nice to have /airflow/login and /airflow/home redirection working correctly and from proper /airflow/login page not /login page

2

There are 2 best solutions below

3
VonC On

You have:

 Internet
    |
 [Istio Ingress]
    |
 VirtualService
    |----------------|
    |  /api/v1/      |-----> [backend-service:8000]
    |  /airflow/home/|-----> [airflow-service:8080]
    |  /             |-----> [frontend-service:443]
    |----------------|

In your case, the Istio VirtualService is not correctly handling the redirection initiated by Apache Airflow after accessing /airflow/home/.
Airflow expects its base_url to be accessible directly, which includes the path /login.
Since your VirtualService rewrites the path /airflow/home/ to /home, it does not rewrite or handle /login, resulting in a 404.

So you would need to adjust your Istio VirtualService configuration to include a rule for handling /login or more broadly, any path under /airflow/ to be correctly forwarded to the airflow-service.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-virtualservice
  namespace: mynamespace
spec:
  hosts:
  - "myapp.example.com"
  gateways:
  - mygateway
  http:
  - match: 
    - uri:
        prefix: /api/v1/
    route: 
    - destination: 
        host: backend-service
        port:
          number: 8000
  - match:
    - uri:
        prefix: /airflow/
    rewrite:
      uri: "/"
    route:
    - destination:
        host: airflow-service
        port:
          number: 8080
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-service
        port:
          number: 443

That way, any request starting with /airflow/ (including /airflow/home/, /airflow/login/, etc.) will be rewritten to / and routed to the airflow-service. That should make sure the redirection by Airflow to /login is handled correctly by Istio and directed to the Airflow service.


It didn't work. The issue is that when accessing /airflow/home this way, it will redirect to the /login page. But the /login page will not have the airflow prefix when being redirected, so it will end up on a 404 page.

To address this, you might need to implement a workaround since the Airflow redirection strips the desired path prefix. One strategy involves using an Istio EnvoyFilter to manipulate HTTP redirection headers, adding the necessary prefix to the redirection path. But that might be complex.

A simpler, more maintainable approach would be to make sure all Airflow-related paths explicitly include the /airflow prefix. That involves adjusting the Airflow configuration and possibly using additional rewrite rules in Istio to handle edge cases like this redirection issue.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-virtualservice
  namespace: mynamespace
spec:
  hosts:
  - "myapp.example.com"
  gateways:
  - mygateway
  http:
  - match: 
    - uri:
        prefix: /api/v1/
    route: 
    - destination: 
        host: backend-service
        port:
          number: 8000
  - match:
    - uri:
        prefix: /airflow/login
    rewrite:
      uri: "/login"
    route:
    - destination:
        host: airflow-service
        port:
          number: 8080
  - match:
    - uri:
        prefix: /airflow/
    rewrite:
      uri: "/"
    route:
    - destination:
        host: airflow-service
        port:
          number: 8080
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-service
        port:
          number: 443

That will handle the /airflow/login redirection by rewriting and directing it properly to the Airflow service. However, this solution assumes the redirection to /login can be intercepted and correctly rewritten by Istio, which may not always work depending on how Airflow constructs its redirection responses.

Ultimately, the most reliable solution may involve modifying the Airflow application to make sure it generates redirection URLs that include the necessary path prefix. That could be achieved by making sure that the base_url configuration in Airflow correctly reflects the desired external path, or by customizing the Airflow login process to generate the correct redirection URLs.


So I can reach /login and /home separately in my new setup but the automatic redirection from login to home is not working.

Instead of broadly rewriting paths, the Istio configuration should ideally preserve the original path structure as much as possible, facilitating the interaction with Airflow's expected routing logic.

Make sure the Istio VirtualService configuration matches and routes specific paths without overly broad rewrites. That can help maintain the integrity of Airflow's expected URL structure, especially for internal redirections.

Since automatic redirection from /login to /home post-login is not working as intended, consider defining explicit routes for common Airflow paths, making sure they are handled without unintended rewrites.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp-virtualservice
  namespace: mynamespace
spec:
  hosts:
  - "myapp.example.com"
  gateways:
  - mygateway
  http:
  - match:
    - uri:
        prefix: /api/v1/
    route:
    - destination:
        host: backend-service
        port:
          number: 8000
  - match:
    - uri:
        exact: /airflow/login
    - uri:
        exact: /airflow/home
    - uri:
        prefix: /airflow/
    route:
    - destination:
        host: airflow-service
        port:
          number: 8080
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-service
        port:
          number: 443

That configuration now includes exact matches for /airflow/login and /airflow/home, aiming to make sure these specific paths are handled accurately.
A prefix match for /airflow/ makes sure other Airflow-related requests are routed appropriately without unintended path rewrites.

0
Mohamad Hashemian On

It looks like the issue might be related to the way Istio is handling the redirection of URLs in the VirtualService configuration. One thing you can try is to explicitly set the path prefix in the rewrite field for the /login path to /airflow/login instead of just /login. This way, when the redirection happens, it will go to the correct path within your airflow application.

Here is an updated version of your VirtualService configuration with this change:

  • match:
    • uri: prefix: /login rewrite: uri: /airflow/login route:
    • destination: host: airflow-service port: number: 8080 With this change, the redirection should now point to /airflow/login instead of just /login, which should resolve the 404 error you are facing.

Additionally, you may also want to check if the base URL in your airflow.cfg file (base_url = https://myapp.example.com/airflow/) matches the path prefixes in your VirtualService configuration to ensure consistency.

I hope this helps resolve the redirection issue and allows you to access the airflow UI correctly.