WWW::Mechanize::Firefox runs well: some attempts to make the

Alles, was Perl betrifft, kann hier besprochen werden.

WWW::Mechanize::Firefox runs well: some attempts to make the

Postby unleash » 01. April 2012 14:33

hello dear perl-friends,


well i have a nice script that works as a image-scraper: for the first trials and tests all goes well.

here a list or urls that i use in urls.txt - that i am running against with the script. Note this is only a short list. i need to run against 2500 Urls - so it would be great if the sript is a bit more robust and would continue to run - if some urls are not available or take too much time to get. i thint that the script is running into some problems if some Urls are not available or take too much time or do block mozrepl and www:Mechanize::FireFox too much time.

Well - do you think that my ideas and suggestions are probably the cause of the issue or not. If so - how can we improve the script and make it stronger and more powerful - and robust so that it does not stop tooo soon.

love to hear from you

greetiings


Code: Select all
http://www.bez-zofingen.ch
http://www.schulesins.ch
http://www.schulen-turgi.ch/pages/bezirksschule/startseite.php
http://www.schinznach-dorf.ch
http://www.schule-seengen.ch
http://www.gilgenberg.ch/schule/bez/2005-06/
http://www.rheinfelden-schulen.ch/bezirksschule/
http://www.bezmuri.ch
http://www.moehlin.ch/schulen/
http://www.schule-mewo.ch
http://www.bez-frick.ch
http://www.bezendingen.ch
http://www.bezbrugg.ch
http://www.schule-bremgarten.ch/content/view/20/37/
http://www.bez-balsthal.ch
http://www.schule-baden.ch
http://bezaarau.educanet2.ch/info/.ws_gen/index.htm
http://www.benedict-basel.ch
http://www.institut-beatenberg.ch/
http://www.schulewilchingen.ch
http://www.ksuo.ch
http://www.international-school.ch
http://www.vsgtaegerwilen.ch/
http://www.vgk.ch/
http://www.vstb.ch

well but i guess that i would be very happy if it is more robust than now

well sure thing it is driving a real browser as with WWW::Mechanize::Firefox

so somewhere it might be somewhat instable - perhaps some bit more than any other screen-scraping solution. I am getting sometimes some errors like the following... (see below) note i also had a closer look at the debugging pages WWW::Mechanize::Firefox::Troubleshooting - search.cpan.org with its hints and tricks and workarounds regarding various bugs, troubles and things like that.



Code: Select all
  #!/usr/bin/perl
  use strict;
  use warnings;

  use WWW::Mechanize::Firefox;

  my $mech = new WWW::Mechanize::Firefox();

  open my $urls, '<', 'urls.txt' or die $!;

  while (<$urls>) {
    chomp;
    next unless /^http/i;
    print "$_\n";
    $mech->get($_);
    my $png = $mech->content_as_png;
    my $name = $_;
    $name =~ s#^http://##i;
    $name =~ s#/##g;
    $name =~ s/\s+\z//;
    $name =~ s/\A\s+//;
    $name =~ s/^www\.//;
    $name .= ".png";
open(my $out, '>', "/home/martin/images/$name") or die $!;
  binmode $out;
    print $out $png;
    close $out;
    sleep 5;
  }

see the results and yes, also the errors where it stops.

Code: Select all
martin@linux-wyee:~/perl> perl test_10.pl
http://www.bez-zofingen.ch
Datei oder Verzeichnis nicht gefunden at test_10.pl line 24, <$urls> line 3.
martin@linux-wyee:~/perl> perl test_10.pl
http://www.bez-zofingen.ch
http://www.schulesins.ch
http://www.schulen-turgi.ch/pages/bezirksschule/startseite.php
http://www.schinznach-dorf.ch
http://www.schule-seengen.ch
http://www.gilgenberg.ch/schule/bez/2005-06/
http://www.rheinfelden-schulen.ch/bezirksschule/
Not Found at test_10.pl line 15
martin@linux-wyee:~/perl>



what do you suggest - how can we make the script a bit more robust - how to get it so that it does not stop so early!?
unleash
 
Posts: 147
Joined: 03. December 2011 10:16
Operating System: OpenSuse Linux 12.1

Return to Perl

Who is online

Users browsing this forum: No registered users and 6 guests