I'm trying to use httr2 in R to access an API behind a SSO server. I've been given a way to do it with python and I just can't get beyond the login page in R. This works in python:
import requests
from bs4 import BeautifulSoup
# URLs
login_url = "https://auth.ABCDEFG.org/"
app_url = "https://myDestination.ABCDEFG.org/"
api_url = app_url + "api/"
sess = requests.session()
auth_req = sess.get(url=login_url)
# get auth form token
soup = BeautifulSoup(auth_req.text, "html.parser")
auth_form_token = soup.select_one("input#token")["value"]
# get credentials (not shown)
# post credentials plus token
auth_req = sess.post(
url=login_url,
data={"user": username, "password": pwd, "token": auth_form_token},
)
# initial GET to acquire csrf token
get_req = sess.get(api_url)
sess.headers["Referer"] = app_url
sess.headers["X-CSRFToken"] = get_req.cookies["csrftoken"]
With the approach above, the session authorizes, gives me the CSRF token and lets me request data from the API.
With httr2 I there's no "session" (but help suggests that's ok as cookies are cached).
There also doesn't seem to be a way to pass user/pwd credentials and the auth_token. I've tried adding it as auth_bearer_token but that seems to replace the user/pwd settings (not amend/append).
I've tried various version of POST from both httr and httr2 and none seem to give me a successful login.
Here's my best attempt:
library(rvest)
library(httr2)
# URLs
login_url = "https://auth.ABCDEFG.org/"
app_url = "https://myDestination.ABCDEFG.org/"
api_url = app_url + "api/"
auth_page_data <- read_html(login_url)
# a hack, but it gets me the same token as in the python, above.
auth_form_token <- auth_page_data %>% html_elements("input") %>%
html_attrs() %>% # get list of attributes
.[[which(lapply(., '[', "id") =='token')]] %>% # get token line from list
.["value"] # get token value
# get credentials (username, pwd), not shown
# login
req <- request(login_url) %>%
req_auth_basic(username, pwd) %>%
req_auth_bearer_token(auth_form_token)
resp <- req %>% req_perform()
# > resp
# <httr2_response>
# GET https://auth.ABCDEFG.org/
# Status: 200 OK
# Content-Type: text/html
# Body: In memory (5436 bytes)
# > resp_headers(resp)
# <httr2_headers>
# server: nginx/1.18.0
# date: Mon, 03 Jul 2023 16:38:49 GMT
# content-type: text/html
# x-xss-protection: 1; mode=block
# x-content-type-options: nosniff
# cache-control: no-cache, no-store, must-revalidate
# pragma: no-cache
# expires: 0
# access-control-allow-origin: *
# access-control-allow-credentials: true
# access-control-allow-headers: *
# access-control-allow-methods: POST,GET
# access-control-expose-headers: *
# access-control-max-age: 86400
# x-frame-options: DENY
# content-security-policy: default-src 'self';img-src 'self' data:;style-src 'self';font-src 'self';connect-src 'self';script-src 'self';form-action *;frame-ancestors 'none';
# content-encoding: gzip
That final response gives Status OK, but the headers and body suggest to me that it's just bringing the login page again (e.g. the login failed).
I've been trying to figure this out for a few days now but it's really got me stumped.
Thanks in advance for any advice.