Using the sample main program to process corpora

The simplest way to use the FreeLing libraries is via the provided analyzer sample main program, which allows the user to process an input text to obtain several linguistic processings.

Since it is impossible to write a program that fits everyone's needs, the analyzer program offers you almost all functionalities included in FreeLing, but if you want it to output more information, or do so in a specific format, or combine the modules in a different way, the right path to follow is building your own main program or adapting one of the existing, as described in section 4.2

FreeLing provides also a couple of programs analyzer_server and analyzer_client that perform the same task, but the server remains loaded after analyzing each client's request, thus reducing the starting-up overhead if many small files have to be processed.

To ease the invocation of the program, a script named analyze (no final r) is provided. This is script is able to locate default configuration files, define library search paths, and decide whether you want the client-server or the staight version.

The sample main program is called with the command:

 analyze [channel-name] [-f config-file] [options]

If channel-name is ommited, the analyzer is started in straight mode: It will load the configuration, read input from stdin, and write results to stdout.
E.g.:

 analyze -f en.cfg  <myinput  >myoutput

When the input file ends, the analyzer will stop and will have to be reloaded again to process a new file.

If config-file is not specified, a file named analyzer.cfg is searched in the current working directory. If it is specified but not located in the current directory, it will be searched in FreeLing installation directory (/usr/local/share/FreeLing/config if you installed from source, and /usr/share/FreeLing if you used a binary .deb package).

Extra options may be specified in the command line to override any settings in config-file. See section 5.2.1 for details.

If channel-name is specified (any valid filename is acceptable), a server is initiated, and named pipes with the given name are created.
E.g.:

 analyze mychannel -f en.cfg  <myinput  >myoutput

Then, clients can request analysis to the server, with:

 analyzer_client mychannel  <myinput  >myoutput

When the server ends attending one request, a new client can be started, and the server will be ready to analyze it, without having to reload the analyzers.

The server doesn't take care of syncronizing with the clients, which means that if two clients send data to the same server, their inputs will intermix in the output.

The user starting the clients must ensure there is no overlapping in the requests.



Subsections
root 2009-09-28