Chromedp Golang scraper not executing correctly

849 Views Asked by At

I have created a Go module that uses chromedp to login and download some reports from my companies CRM system. I pulled this report on to our Ubuntu 20.04 server, created a bash file that enters the directory of the module and call the command: go run.

If I ssh into the server and call the bash command on the file it executes as expected, I have this being execute by a CRON job and logging to a file in my selected directory. It is only logging the first output of the bash script and then it seems to be stuck after that.

Here is an example of my bash file that is called by the cron job:

cd ~/projects/DNC_Bot/dnc-bot
go run .

Here is an example of my line in the crontab:

30 7 * * 1-5 bash ~/projects/DNC_Bot/dnc-bot > ~/cronLogs/output.log 2>&1

I expect this to run the same as when I ssh into the server and call bash {bashfile} and it and it runs no problem, guidance on this is very appreciated!

Update to show code example:

func downloadLists() {
    // Adding options to run in head mode
    opts := append(chromedp.DefaultExecAllocatorOptions[:],
        // Change headless flag to false to see browser when executing
        chromedp.Flag("headless", true),
        // chromedp.UserDataDir(""),
    )

    allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
    defer cancel()

    // also set up a custom logger
    taskCtx, cancel := chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
    defer cancel()
    // Loop through Logics sites
    for entity, url := range logicSites {
        log.Printf("Entity: %s | Logging in to: %s\n", entity, url)
        // Log in and open up cases link list
        err := chromedp.Run(taskCtx,
            // Set Download Behavior/Directory
            page.SetDownloadBehavior(page.SetDownloadBehaviorBehaviorAllow).WithDownloadPath("./downloads/"),
            chromedp.Navigate(url),
            // wait for footer element is visible (ie, page is loaded)
            chromedp.WaitVisible(`#txtUsername2`, chromedp.ByID),
            // Login
            chromedp.SendKeys(`#txtUsername2`, BOT_USER, chromedp.ByID),
            chromedp.SendKeys(`#txtPassword2`, BOT_PASS, chromedp.ByID),
            chromedp.Click(`#btnLogin2`, chromedp.ByID),
            // Wait for Homepage to be viewable
            chromedp.WaitVisible(`#page-nav`, chromedp.ByID),
            // Click on Cases link
            chromedp.Click(`#page-tree > ul > li:nth-child(4)`, chromedp.ByQuery),
            chromedp.Sleep(1*time.Second),
            // Call func to download filtered views
            downloadFilters(),
        )
        if err != nil {
            log.Fatal(err)
        }
    }
}
1

There are 1 best solutions below

3
Zeke Lu On

First of all, if there is not a special reason, you should build your go app first and run the app directly. The advantage is that your app can run on a server that don't get Go installed.

I have just run the following demo as a cron job, and it works. You can try this demo first to check if there is anything wrong.

Note: check your system log (most of the time, it's /var/log/syslog to check if any error message is logged and make sure the cron job has been run).

Files:

├── bot.bash
├── go.mod
└── main.go

bot.bash:

cd ~/temp/q74482280
go run .

go.mod:

module m

go 1.19

require github.com/chromedp/chromedp v0.8.6

require (
    github.com/chromedp/cdproto v0.0.0-20220924210414-0e3390be1777 // indirect
    github.com/chromedp/sysutil v1.0.0 // indirect
    github.com/gobwas/httphead v0.1.0 // indirect
    github.com/gobwas/pool v0.2.1 // indirect
    github.com/gobwas/ws v1.1.0 // indirect
    github.com/josharian/intern v1.0.0 // indirect
    github.com/mailru/easyjson v0.7.7 // indirect
    golang.org/x/sys v0.0.0-20220928140112-f11e5e49a4ec // indirect
)

main.go:

package main

import (
    "context"
    "log"

    "github.com/chromedp/chromedp"
)

func main() {
    log.Println("hello chromedp")
    ctx, cancel := chromedp.NewContext(context.Background(), chromedp.WithDebugf(log.Printf))
    defer cancel()

    if err := chromedp.Run(ctx, chromedp.Navigate("https://httpbin.org/status/200")); err != nil {
        log.Fatal(err)
    }
    log.Println("goodbye chromedp")
}

cron job:

40 * * * * bash ~/temp/q74482280/bot.bash > ~/temp/q74482280/log.txt 2>&1