Getting data from an updating website using perl

73 Views Asked by At

I've been trying to make a perl program that tells me the water level of a river from an updating website (https://www.vizugy.hu/?mapModule=OpGrafikon&AllomasVOA=73F7E310-985C-11D4-BB62-00508BA24287&mapData=Idosor), but my program can't access the website and I'm completely stuck and I'm a beginner.

#!/usr/bin/perl -w

$url = "https://www.vizugy.hu/?mapModule=OpGrafikon&AllomasVOA=73F7E310-985C-11D4-BB62-00508BA24287&mapData=Idosor";

use LWP::Simple;

$site = get($url) or die "The webpage won't load";

if($site =~ /<strong>(\d+)<\/strong>/ig){
    $waterLevel= $1;
    }else{
    die "Can't find the water level (Vízállás (cm))";
}

if($site =~ /<strong>(\d+.\d+.\d+. \d+.:\d+)<\/strong>/){
    $date = $1;
    }else{
    die "Can't find date (Időpont)";
}


print("The water level in Komarom is $waterLevel cm (Date: $date)\n");

I do this for a class and I have to use LWP. The website is in Hungarian and so were the variables, but I tried to translate as much as I could.

2

There are 2 best solutions below

0
Dave Cross On BEST ANSWER

Your code works as expected from my Linux command line. But I see exactly the same behaviour as you when using this online IDE.

The problem with LWP::Simple is that it's hard to debug what's going wrong. So I've replaced the top part of your code so it uses LWP::UserAgent instead.

#!/usr/bin/perl

# Always use these
use strict;
use warnings;

my $url = "https://www.vizugy.hu/?mapModule=OpGrafikon&AllomasVOA=73F7E310-985C-11D4-BB62-00508BA24287&mapData=Idosor";

use LWP::UserAgent;

print "Make a UA\n";
my $ua = LWP::UserAgent->new;

print "Request\n";
my $resp = $ua->get($url) or die "The webpage won't load";

print "Response\n";
print $resp->code, ': ', $resp->message, "\n";

my $site = $resp->content;

my ($waterLevel, $date);

if ($site =~ /<strong>(\d+)<\/strong>/ig) {
    $waterLevel= $1;
}else{
    die "Can't find the water level (Vízállás (cm))";
}

if ($site =~ /<strong>(\d+.\d+.\d+. \d+.:\d+)<\/strong>/) {
    $date = $1;
}else{
    die "Can't find date (Időpont)";
}

print("The water level in Komarom is $waterLevel cm (Date: $date)\n");

The response I see is:

Make a UA
Request
Response
500: Can't connect to www.vizugy.com:433 (Temporary failure in name resolution)
Can't find the water level (Vízállás (cm)) at main.pl line 21

So it looks like your online IDE isn't set up correctly to make HTTP requests. You could contact the owners (the email address is on the front page of their site) or you could just report the problem to your lecturer.

3
Polar Bear On

Please study following demo code which

  • downloads web page
  • extracts data from Javascript block
  • processes obtained data for output
  • forms hash %data
  • outputs data as a table with utilization of perlform
use strict;
use warnings;
use feature 'say';

use Data::Dumper;
use LWP::UserAgent;

my $ua  = LWP::UserAgent->new;
my $url = 'https://www.vizugy.hu/?mapModule=OpGrafikon&AllomasVOA=73F7E310-985C-11D4-BB62-00508BA24287&mapData=Idosor';
my $req = $ua->get($url);

if ($req->is_success) {
    say 'INFO: Success loading web page';
} else {
    die "Could not head($url): " . $req->status_line;
}

my %data = $req->decoded_content =~ /(\w+) = new Array(.*?);/g;

$data{$_} =~ s/[()']//g for keys %data;

$data{Vizhozam} =~ s/[<sup>|<\/sup>]//g;
$data{Vizhozam} =~ s/(\d+) (\d{3}),(\d{2}) m3/$1$2.$3/g;
$data{Vizho}    =~ s/ \x{b0}//g;
$data{Vizho}    =~ s/(\d+),(\d+)C/$1.$2/g;

for (keys %data) {
    my @array = split(',', $data{$_});
    @array = map { s/^ // && $_ } @array;
    $data{$_} = \@array;
}

#say Dumper(\%data);

my $count = @{$data{Idopont}}-1;
my($date,$level,$flow,$temp);

$^ = "STDOUT_TOP";
$~ = "STDOUT";

for ( 0..$count ) {
    ($date,$level,$flow,$temp) = ($data{Idopont}[$_],$data{Vizallas}[$_],$data{Vizhozam}[$_],$data{Vizho}[$_]);
    write;
}

$~ = "STDOUT_BOTTOM";
write;

format STDOUT_TOP =
+-------------------+------------------+-------------------+----------------+
| Date              | Water level (cm) | Water flow (m3/s) | Water temp (C) |
+-------------------+------------------+-------------------+----------------+
.

format STDOUT =
| @<<<<<<<<<<<<<<<< |             @>>> |      @>>>>>>>>>>> |          @>>>> |
$date, $level, $flow, $temp
.

format STDOUT_BOTTOM =
+-------------------+------------------+-------------------+----------------+
.

Generated output

INFO: Success loading web page
+-------------------+------------------+-------------------+----------------+
| Date              | Water level (cm) | Water flow (m3/s) | Water temp (C) |
+-------------------+------------------+-------------------+----------------+
| 2022.01.06. 07:00 |              331 |           2620.00 |            5.8 |
| 2022.01.07. 07:00 |              334 |           2650.00 |            5.3 |
| 2022.01.08. 07:00 |              331 |           2620.00 |            4.9 |
| 2022.01.09. 07:00 |              289 |           2240.00 |            4.5 |
| 2022.01.10. 07:00 |              272 |           2100.00 |            4.4 |
| 2022.01.11. 07:00 |              260 |           2010.00 |            4.1 |
| 2022.01.12. 07:00 |              243 |           1880.00 |            3.8 |
| 2022.01.13. 07:00 |              228 |           1770.00 |            3.4 |
| 2022.01.14. 07:00 |              213 |           1660.00 |            3.3 |
| 2022.01.15. 07:00 |              196 |           1550.00 |            3.4 |
| 2022.01.16. 07:00 |              195 |           1540.00 |            3.4 |
| 2022.01.17. 07:00 |              183 |           1470.00 |            3.5 |
| 2022.01.18. 07:00 |              185 |           1480.00 |            3.1 |
| 2022.01.19. 07:00 |              173 |           1410.00 |            2.8 |
| 2022.01.23. 07:00 |              164 |           1360.00 |            2.2 |
| 2022.01.24. 07:00 |              154 |           1300.00 |            2.0 |
| 2022.01.25. 07:00 |              161 |           1340.00 |            2.1 |
| 2022.01.26. 07:00 |              173 |           1410.00 |            2.4 |
+-------------------+------------------+-------------------+----------------+