Sunday, November 26, 2006

search engine in maze

With Directory server, we can collect information of sharing files of all users. By running Indexer program on collected information, index is built. Then, by running index server program, we can provide search service. However, this service is not oritented to common users.
Search requests from maze client are sent to a CGI program(ftpsearch) on a specified httpd server. Then, this CGI program formats the search request string and forwards the formatted request to Index Server. So, Index Server implementes searching function actually. After Index server handles this request, it sends result back to the CGI program(ftpsearch). Then, this CGI program formats the output and sends final result to maze client. The Index server sends result to CGI program in XML format.

Why are CGI program and Index Server separated? Why don't we write both CGI program and Index server program in a single program?
I think of one reason:
This is in line with current web development trend. So far, the MVC(Modle-View-Control) framework is proved to be a good architecture. By separating the components of view formatting and transaction handling, we can modify either part easily and don't need to change the other part. As a result, it is easy to maintain.

Detailed Set Up Procedure:
First, we need to run a httpd server. Apache is a good choice. CGI program ftpsearch should be built successfully. Then we configure apache to specify cgi directory, root directory and so on. Then we put CGI program in the specified directory. Besides that, we also should write a configuration file for this CGI program to tell it the addresses of Index Server ... The configuration file looks like:
wwwpath=/home/webdoor/wwwroot/
confpath=/home1/maze/Maze/bin/
logpath=/home1/maze/Maze/log/
NoSumUp=ON
servernum=10
server0=162.105.146.85
serverport0=23007
serverproto0=
server1=162.105.146.86
serverport1=23005
serverproto1=
server2=162.105.146.87
serverport2=23000
serverproto2=
server3=162.105.146.3
serverport3=23003
serverproto3=
server4=162.105.146.46
serverport4=23001
serverproto4=
server5=162.105.146.85
serverport5=23009
serverproto5=
server6=162.105.146.86
serverport6=23006
serverproto6=
server7=162.105.146.87
serverport7=23004
serverproto7=
server8=162.105.146.3
serverport8=23008
serverproto8=
server9=162.105.146.46
serverport9=23002
serverproto9=

server: IP address of index server
serverport: service port of index server(This must agree with the port which index server listens to)
serverproto: protocol of communication(ftp or http)
One more file called "xmlresul.xml" is needed by CGI program to generate the final result. Option "wwwpath" in configuration file above indicates where this xml file is stored.

However, I find an interesting phenomenon.
When I search key word "latex", two files are returned. They are "latex.doc" and "latex.txt".
When I search key word "latex.doc", no files are returned.
I guess the index server or cgi program maybe does not handle those special characters(such as ".") properly.

No comments: