html::TokeParser & anschließendes Speichern in mysql-db

Alles, was Perl betrifft, kann hier besprochen werden.

html::TokeParser & anschließendes Speichern in mysql-db

Postby salsa_experience » 13. October 2010 12:31

hi guten tag,

will den u.g. code so laufen lassen auf dem xampp. Dabei werden ca. 10 Tausend files geparst.
Es läuft auf einer OpenSuse_Linux 11.3

Es soll aber auf einem Xampp auch laufen!

Nur will ich die Daten alle in die mysql-db einfügen. Geht das denn!?



Code: Select all

#!/usr/bin/perl
use strict;
use warnings;

use HTML::TokeParser;

my $file = 'school.html';
my $p = HTML::TokeParser->new($file) or die "Can't open: $!";

my %school;
while (my $tag = $p->get_tag('div', '/html')) {
        # first move to the right div that contains the information
        last if $tag->[0] eq '/html';
        next unless exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'inhalt_large';
       
        $p->get_tag('h1');
        $school{'location'} = $p->get_text('/h1');
       
        while (my $tag = $p->get_tag('div')) {
                last if exists $tag->[1]{'id'} and $tag->[1]{'id'} eq 'fusszeile';
               
                # get the school name from the heading
                next unless exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'fm_linkeSpalte';
                $p->get_tag('h2');
                $school{'name'} = $p->get_text('/h2');
               
                # verify format for school type
                $tag = $p->get_tag('span');
                unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'schulart_text') {
                        warn "unexpected format: parsing stopped";
                        last;
                }
                $school{'type'} = $p->get_text('/span');
               
                # verify format for address
                $tag = $p->get_tag('p');
                unless (exists $tag->[1]{'class'} and $tag->[1]{'class'} eq 'einzel_text') {
                        warn "unexpected format: parsing stopped";
                        last;
                }
                $school{'address'} = clean_address($p->get_text('/p'));
               
                # find the description
                $tag = $p->get_tag('p');
                $school{'description'} = $p->get_text('/p');
        }
}

print qq/$school{'name'}\n/;
print qq/$school{'location'}\n/;
print qq/$school{'type'}\n/;

foreach (@{$school{'address'}}) {
        print "$_\n";
}

print qq/\nDescription: $school{'description'}\n/;

sub clean_address {
        my $text = shift;
        my @lines = split "\n", $text;
        foreach (@lines) {
                s/^\s+//;
                s/\s+$//;
        }
        return \@lines;
}




freu mich auf tipps.

lg SE
salsa_experience
 
Posts: 104
Joined: 25. August 2006 10:46

Re: html::TokeParser & anschließendes Speichern in mysql-db

Postby Nobbie » 13. October 2010 16:10

salsa_experience wrote:Geht das denn!?


Warum nicht? Alles geht, man muss es nur programmieren.
Nobbie
 
Posts: 13170
Joined: 09. March 2008 13:04

Re: html::TokeParser & anschließendes Speichern in mysql-db

Postby salsa_experience » 13. October 2010 16:43

Hi Nobbie

ich guck mir das heute Abend mal genauer an- Wie gsagt das script geht schonmal.
Vielleicht sollte ich mir PERL DBI mal ansehen.

By the way: wo würdest du denn die DB-Funktionalität ansetzen.
Ehe der Ausdruck kommt - schätz ich mal!?!

gruß
SE
salsa_experience
 
Posts: 104
Joined: 25. August 2006 10:46


Return to Perl

Who is online

Users browsing this forum: No registered users and 6 guests