Applied Web Tools




Purchase Advertising On AWT



Upload & Sell Your Software



Upload & Sell Your eBook



Enter The AWT Control Panel		Contact Applied Web Tools



Sell Your eBook @ The PDF Store		Password Troubles? Click Here!

KnowledgeCenter

Examine Essay or Document
Hosted & Distributed Free Of Charge

Contact Editor@AppliedWebTools.net To List Your Stuff

Back To Essay & Document Listing

From Flat File To Database
Learn to operate at a professional level with Perl & MySQL

By Thomas Valentine

Viewed 12345 Times

Posted Sept. 18, 2024

The other day I was tasked to come up with a solution to our web developer’s problem. She needed some 1200 html files put into a database, stripping the HTML markup and leaving behind only text files. She stripped the HTML code with a Regular Expression I gave her. It was the text files, formerly HTML documents, that needed to be put into the database.

The content we were working with was an online language reference for the HTML, CSS, DOM and JavaScript specifications. We presented the reference to our users via a few iFrames utilizing a bunch of document.write() statements. Using iFrames with a database in this way we could eliminate 1200 HTML files from our servers.

We decided to put the HTML files into a database because handling 1200 HTML files was becoming bothersome. Placing the informative text within a database to be served using just a few scripts was a better method of delivering the information to the user. Using a database driven solution such as this also makes edits easy, with just a few more Perl scripts used to edit the information.

I decided that each line of each file should be a table row, with the contents of each file being a table. I used the name of the file for the table name, and added that name to a lookup table for easy access later on.

The beauty and flexibility of Perl is demonstrated very well in this small construction. The entire process took some 3 minutes to perform on our little server and resulted in a database that is easily referenced due to the simplicity of the database model. By nesting three foreach loops, I was able to perform the step by step operation with a minimum of fuss, cycling first through @dirlist for the directory names, then through @filelist for each file, which was then opened and read. The file data was split on the new lines and placed into the @split_text array. For each $filename[$filename_count], we added a table and populated it if the if statement evaluated to true.

This is what I came up with:

#!/usr/bin/perl

use CGI;
use DBI;
use CGI::Carp (fatalsToBrowser);

$cgi = new CGI;

### make the directory list
@dirlist = (CssBackgroundProperties,
CssBorderProperties,
CssClassificationProperties,
CssFontProperties,
CssClassificationProperties,
CssFontProperties,
CssMarginProperties,
CssPaddingProperties,
CssTextProperties,
DomCollectionsAndArrays,
DomEventHandlers,
DomMethods,
DomObjects,
DomProperties,
HtmlAttributes,
HtmlEventHandlers,
HtmlTags,
JavaScriptControlStatements,
JavaScriptGlobalFunctions,
JavaScriptObjectMethods,
JavaScriptObjectProperties,
JavaScriptObjects,
JavaScriptOperators,
JavaScriptStatements);

### connect to the database
$dbh = DBI->connect('DBI:mysql:host=mysql2.domain.com;database=databasename', username', 'password', {'RaiseError' => 1}) or die "Cannot Connect to Database";

### create the master table
$query = qq{CREATE TABLE master (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
tablename VARCHAR (200) NOT NULL,
category VARCHAR (200)
)};
$sth = $dbh->prepare($query);
$sth->execute();

$count = "0";
$dirlist_count = "0";
$filelist_count = "0";
$type = "text";
foreach (@dirlist) {
while ($thisfile = <$dirlist[$dirlist_count]/*.txt>) {
push @filelist, $thisfile;
}
foreach (@filelist) {
open (FILEHANDLE, "$dirlist[$dirlist_count]/$filelist[$filelist_count]");
$text = "";
$newtext = "";
while (read (FILEHANDLE, $newtext, 1)) {
$text .= $newtext;
}
close FILEHANDLE;
@split_text = split "\n", $text;
$split_text_count = "0";
foreach (@split_text) {
$split_text[$split_text_count] =~ s/\n+//g;
$this_split = $split_text[$split_text_count];
$this_split =~ s/[ ]+//g;
if ($this_split != "") {
$query = qq{CREATE TABLE $filelist[$filelist_count] (
type VARCHAR (200) NOT NULL,
text BLOB
)};
$sth = $dbh->prepare($query);
$sth->execute();

$query = qq{INSERT INTO $filelist[$filelist_count] (
type,
text
) VALUES (
'$type',
'$split_text[$split_text_count]'
)};
$sth = $dbh->prepare($query);
$sth->execute();

$query = qq{INSERT INTO master (
tablename,
category
) VALUES (
'$filelist[$filelist_count]',
'$dirlist[$dirlist[$count]'
)};
$sth = $dbh->prepare($query);
$sth->execute();

++$split_text_count;
++$count;
}
++$filelist_count;
}
++$dirlist_count;
}

### print da shtuff
print qq{Content-type: text/html\n\n};
print qq{I’m Done!<br>};
print qq{I have created $count tables};
exit;

The @dirlist array holds the names of the directories the files are placed in, and each file within the directory is glob’d for the file names that are placed in @filelist. I then I used a foreach loop to open and read each file byte by byte, the contents of which is placed in the aptly name scalar $text.

I then split $text on the new lines and added each new line of text into @split_text. I removed the new line characters and proceeded to make a copy of each item within $this_split to later test if there is no data in the array item, as each original text file had several blank lines. We split on the new line characters then removed them, remember, so each file will have empty items (file rows) in @split_text that we don’t want to see as part of the database.

I removed every space from the copied array item. If the copied array item was empty, the script would not do anything but continue on to the next item in @split_text through the use of the aforementioned foreach loop.

If $this_split != "" evaluated to true, I then created a table using the file name as the table name and populated it with rows, each row of the text file being worked on being a row in the table. Each table name was then placed in the master lookup table.


I do not employ cookies or tracking devices of any kind