DocOrigin Filter Editor provides an interactive way of creating rules for extracting data


An Intro.

You have spoken with the sales person, witnessed a demonstration of the product, and decided to install an evaluation copy of DocOrigin. You have reviewed the sample designs and previewed the sample forms and output. Now, you want to design your own DocOrigin form and generate an actual document.

Using DocOrigin Design, you have laid out your form; placed text, graphics and data fields exactly where they need to be. Now. you want to preview a pdf of your new document. However, when you select the Design menu option “PDF Preview,” you realize that you do not have data to preview. Sure, you can use a Design-generated sample file, but it does not have actual company data that you can show your friends and family. It just contains X’s and 0’s.. Plus, it turns out that your company’s line of business software does not provide the type of data file that DocOrigin requires to generate a completed form (i.e. an XML file).

What to do? Enter DocOrigin Filter Editor.

DocOrigin Filter Editor is a GUI tool that provides an interactive way of creating rules for extracting data from textual overlay / spool / print /report files that are produced by your existing business systems software. Filter Editor extracts the data based on a set of rules that you define, and produces an XML data file that can be used in DocOrigin Merge. The XML file can also be used to meet many of your own or your business partners’ XML data requirements.

There are two ways to develop the XML file definition used in Filter Editor.

Remember the sample data that could be used to preview your form in DocOrigin Design? That sample data is in the form of a well-structured XML file. That file can be saved during the Design preview process. When you start your Filter Editor definition, you can import that xml file, letting you use the names and structure of the file’s nodes. This is the preferred method of generating the XML file. The second way is to open Filter Editor and specify the overlay file to use. As you work, you can identify rules and data elements as you go, and create the XML file structure “on the fly.” This is the method you might use if you are creating the XML file/extraction prior to the actual form design process.

Filter Editor cannot automatically detect where fields are, how wide they are, or what their names are. Therefore, it is critical for you to understand the source overlay file so you may identify sentinel locations (identifying markers/values) that you, and the script you are producing, can reference for section and data identification.

For example: each overlay/spool file probably has a header section, and in that section will be a value, or a piece of information, that will specify what the file represents (Invoice, Purchase Order, Acknowledgment, etc.). You could use that information for use as a trigger (or sentinel) to identify each time the document has reached the beginning of that section of the page. Knowing what section of the page you have entered will allow for the identification and location of all pertinent data in that section. Another common use of the information in the header section is the use of “Page #” or “Page Number.” You can set a rule that says whenever the location that contains “Page #” is equal to “Page 1” a new document (a new invoice, a new acknowledgment, etc.) has begun.

Once you have defined a sentinel, you can define any number of data extractions relative to that sentinel’s location. And you can have the script continue to look for other sentinels, which when found, can be used to perform more sentinel-relative data extractions. You can have as many sentinels as is necessary to identify all of the data in the overlay file.

In Filter Editor, you define Rules and Actions. Each Rule is really a specification of something to look for in the coming lines of the source file. An Action is usually an extraction; to read some data off of the source file and put it in the target XML file. An Action can also be to further enhance the XML file. (Perhaps a node of the XML file does not have the necessary data in the overlay file. We can use a piece of information that is in the overlay file and read a separate file by executing some JavaScript to retrieve the desired data).

The Rules are processed, top to bottom. The first one that finds a sentinel is executed, and the extractions related to that found sentinel are completed.

Once a Rule has successfully matched a line in the source file, you can add Extraction definitions which tell Filter Editor how to take data fields from the source file and store them into the XML file.

Extractions are not restricted to only reaching forward from the current line, nor are they bound by one-line-at-a-time inspection. Extractions specify relative line and absolute column ranges. They can reach back to before where the sentinel was found. They can reach forward as many lines as necessary to extract the data of interest. Having found a sentinel, data can be read from anywhere in the document. Doing an extraction has no effect on Filter Editor‘s notion of the current line.

In the next and subsequent editions of our blog, we will create a complete data extraction and XML file generation. The Filter Editor definition will grow from relatively simple to quite complex. For more information about Filter Editor and DocOrigin, contact Eclipse Corporation at +1.678.408.1245, or visit the website at www.EclipseCorp.US.