PDA

View Full Version : site-search



tkv
12-29-2007, 08:37 PM
This script doesn't use the database.

The first part of the file is the HTML to set up the page. You'll have to fill in your site logo etc. here.

The script has 2 user-defined functions:

search_dir() - This function goes thru all the files in the current directory. If it's a web page - the file name is passed to the next function - search_file()... If it's a directory, however, the current directory is changed to it, and this function calls itself to search that directory.

This makes search_dir() a recursive function. Don't worry this is quite normal programming stuff... The computer keeps track of each instance of the function by putting all the data onto a stack every time it's called. When the function's finished going through a directory, it comes off the top of the stack, and the one that was running previously starts again. Stacks are a common programming device!
search_file() - This function searches through a file for the specified search term. It converts the whole file to lowercase so the match is not case sensitive.

It has to do a couple of tricks:
It keeps a copy of the file in mixed-case so we can get the title of the page and display it, capitals and all, in the search results.
If found, it displays the text following the search term. To do this it has to remove all HTML tags (formatting) from the text.

The current directory is stored as an array of directories: $cur_path. This is so you can add and remove directories from the list as the program traverses the directory tree.

Here's the code:

The first line starts the definition for the search_dir() function..
function search_dir () {
This makes these variables - defined outside the function - available inside it.
global $cur_path, $dir_depth, $matches;
If there are already over 100 matches - don't do any more searching.
if ($matches > 100) { return; }
create a string $s_dir containing the full current path. This is so the path is available in a form we can use.
$s_dir="";
for ($c=0; $c<=$dir_depth; $c++) { $s_dir .= $cur_path[$c]; }
Open the current directory using the $s_dir, then start a loop reading all the files in that directory...
$dhandle=opendir("$s_dir");
while ($file = readdir($dhandle)) {
Ignore the 'this' and 'parent' directory items which appear in every directory.
if (($file!=".") && ($file!="..")) {
the is_dir() function returns TRUE if the file is a regular file. We create the full pathname for the file from the current directory plus the filename. (The '.' operator concatenates strings.)
if (is_file($s_dir.$file)) {
Get the last 3 characters of the filename into the variable '$ext'. Only process .php files - you may need to change this!
$file_ext = substr($file, strlen($file)-3, 3);
if (($file_ext == "php")
Also, only process files if: They aren't this search script or any other navigation scripts. (You'll probably need to change this too!) Call the search_file() function if the file fits the bill.
&& (strcmp ($file, $PHP_SELF) != 0)
&& (strcmp ($file, "right-nav.php") != 0)
&& (strcmp ($file, "menu.php") != 0)) { search_file($s_dir.$file); }
Else, if the file's a directory, add the current file plus '/' to the current path, and increase it's depth by one. Then search the directory.
elseif (is_dir($file)) {
$cur_path[++$dir_depth] = ($file."/");
search_dir();
Once the search is complete, reduce the depth of the current directory back down by one. The function will then continue to loop thru the files in the original directory.
$dir_depth--;
Start the definition for the search_file() function... Define the global variables we need access to.
function search_file ($file) {
global $search_term, $results, $r_text, $r_title, $matches;
create a string $s_dir containing the full current path
$s_dir="";
for ($c=0; $c<=$dir_depth; $c++) { $s_dir .= $cur_path[$c]; }
Open the file, read it's contents into a variable $f_data.
$f_size = filesize($file); $f_handle = fopen($file, "r"); $f_data = fread($f_handle, $f_size);
Create a variable $f_dlc which is the data in lowercase.
$f_dlc = strtolower($f_data);
Create 2 variables for getting the de-tagged version of the text...
$t_text = "";
$in_tag = 0;
If the lowercase data contains the search term, strstr() will return the text from the match to the end of the file. Put this 'match text' into the variable $text. If it doesn't contain the text it'll return FALSE!
if ($text = strstr($f_dlc, $search_term)) {
Record the full pathname of the match.
$results[$matches] = $file;
Remove any HTML tags from the text. When $in_tag = 1 - we're 'in' an HTML tag - so no text is copied to the destination variable $t_text.
for ($c = 0; $c < 200; $c++) {
if (strcmp(substr($text, $c, 1), "<") == 0) { $in_tag=1; }
elseif (strcmp(substr($text, $c, 1), ">") == 0) { $in_tag=0; }
elseif ($in_tag == 0) { $t_text .= substr($text, $c, 1); }
}
Add the de-tagged text to the array of result text.
$r_text [$matches] = "...". $t_text. "...";
Get the position of the page's title text from the lowercase version of the data. Then get the actual text from the mixed-case version. Put it into the array of matches titles. $matches++ increments the number of matches found.
$t_start = strpos ($f_dlc, "<title>") + 7;
$t_end = strpos ($f_dlc, "</title>");
$r_title[$matches++] = substr($f_data, $t_start, $t_end-$t_start);
Close the file. End of function.
fclose($f_handle);


To start the whole search running you need to initialse some variables, then call the search_dir() function on the base directory you want to search...
Initialse all the global variables.
$dir_depth=0;
$matches=0;
$cur_path = array("./");
$results = array(""); // Results = all the files that matched
$r_text = array(""); // ...the text they contained...
$r_title = array(""); // ...the titles of the pages...
Convert the search term to lowercase.
$search_term=strtolower($search_term);
As long as the search term isn't blank - start the search!
if (strcmp($search_term, "")!=0) { search_dir(); }