How To Create a Nabjet Submission

The NabJet engine indexes people related data from web sites, allowing it to be searched from the NabJet home page.  In order to intelligently index a web site, NabJet needs basically two things - the URL to the web site and a Template definition.

The following directions will explain how to create a submission record and how to edit the template that tells NabJet how to index the web site.

Register With NabJet

The first thing you need to do is go to the My Account page and create a Username and Password.  This will create an area in the NabJet site for you to create and edit your submissions.

The email address field is not required when registering.  However, it is helpful to have an email address on file in case you loose your username and password.  We do not release email addresses to third party companies.  Please see our Privacy Policy

Create a Submission

Once you are logged in, click the "My Submissions" link on the left.  From here, click the "Add" button. You will be taken to a form with the following information:

Save Your Work

It is very important to know that if you enter information into the submission form and navigate away from the page, your changes will be lost.  This includes clicking on the "Secondary URLs" or "Index Results" tabs.

To make sure your data is saved, always click the "Save and Index" button

Using One Template for Multiple URLs

Many web sites have multiple page of information, all in the same format.  Let's say a particular web site had 5 pages of passenger records, all in the same format.  First, you would create a new submission and enter one of the URLs in the Web URL field.  Once you have the template text set up, click the "Save and Index" button, then review the results in the Index Results tab.

If everything looks good, go to the Secondary URLs tab.  Continuing with our example, enter the remaining 4 URLs into the "List of URLs" field.  Important:  make sure you set the Type of Date, Source Year, City, County, State and Country fields.  These URLs might contain different information than the original URL, even if it's in the same format.   For example, the first URL might be for a cemetery in one town, and the remaining 4 might be for another town.

Once you've entered the additional URLs and other information on the form, click the "Add URLs" button.  This will create new secondary submissions below the form.  You can then select these records and index them individually or all together.  To index secondary submissions, click the check box on the left for those you want to index, then click the "Index" button at the bottom of the page.  Once indexed, the page will be re-displayed, and the number of records indexed will be shown to the right of each secondary submission. You can click this number to review the date.

Remember that secondary submissions will use the template from the original submission in the Primary URL tab.

Hint:  if you have multiple URLs to submit and each is from a different location, enter one URL at a time along with the geographic information, then click the Add URLs button.  Then repeat for each remaining URL. That will allow you to enter different geographic information for each secondary submission.

The Template Format

A Template is a set of commands that describes the contents of a particular web page.  This description is necessary for NabJet to know where the data begins and ends, and know what all the different values mean.  A template is divided into four sections:

Each section may have a number of commands to further explain how to index the data.

Probably the easiest way to understand a template is to look at an example.  Let's say a web site contains a section that looks like this:

The following headstones were found in Main Line Cemetery:

Smith, John, b 1832, d 12 Oct 1876
Smith, Betty, b. Nov 1835, d. 1885
Jones, Paul, 1818, 1891

For more information, contact Ed at ed@hotmail.com 

By looking at the web page, it's pretty easy to see there are cemetery records for three people.  For each person, the record contains the last name, first name, date of birth and date of death, all separated by commas. This will be a piece of cake for Nabjet to index!  Let's look at a template that could be used to index this:

Template: Explaination:
[START]
skippast Cemetery:

[END]
skipto For more information

[RECORD]
Type Separator
Separator ,

[FIELD]
Fieldname LastName
Column 1

[FIELD]
Fieldname Firstname
Column 2

[FIELD]
Fieldname Birthyear
Column 3

[FIELD]
Fieldname Deathyear
Column 4

The data starts just after the word Cemetery:


The data ends before the For more information
line

These records have separator characters between them
The separator character is a comma

The first part is the last name



The second part is the first name



The third part contains birth year



And the fourth part contains the death year

That's it!  These 25 lines describe everything that Nabjet needs to index these cemetery records.  That may seem like a lot of work for just three records, but what if there were 1,000 records on that page?  The template wouldn't need to change at all.

It is important to note that the field names listed, like LastName, are considered keywords.  That means that there are a set number of field name keywords that you can use.  The list of field names that Nabjet currently recognizes are:

That should give you a basic understanding of what should go into a template.  What follows is a detailed explanation of each of the possible template commands for each section.

[START] section

The [START] section describes where to find the start of the data to be indexed.  This section is optional.  If it is not included in the template, the start is assumed to be the beginning of the web page. The following commands can be used in [START] section:

Skipto some text
This command skips from the current position to the string found after the Skipto command.  In this example, it will skip until it finds the words "some text".  Note that quotes are not necessary.  However, if you have a string that starts with spaces, you can enclose the string in double quotes.

Skippast some text
The same as the Skipto command, but sets the current position just past the text found.

Note that you can enter multiple skipto and skippast commands in the [Start] and [End] sections

[END] section

The [END] section describes where to find the end of the data to be indexed.  You should use the same skipto and skippast commands as described above.  The first skipto and skippast will start from the position defined in the [START] section.

This section is optional.  If it is not included in the template, the end is assumed to be the end of the web page.

[RECORD] section

The [RECORD] section describes how each record is formatted.  It is important for Nabjet to know if all the fields are in fixed columns, if they are separated by some character, or if they are in HTML tables.  The following are the commands that can be used in this section:

Type  typevalue
This command tells what type of record to index.  Replace typevalue with one of the following:

Special Note on hCard Data

If you specify the HCARD type, you do not need to include [START], [END] or [FIELD] sections.  By default, NabJet will index every hCard from a web page.  The minimum Template for an hCard would include just two lines:

[RECORD]
Type HCARD

You can still use [FIELD] definitions for HCARD types if you need to exclude some records.  For example, if you only want to only index hCard records with a City, you could add the following:

[FIELD]
FieldName City
IgnoreIfBlank

NabJet will extract the following standard hCard fields: 'n' and 'fn' (name fields), 'bday', 'adr' ('locality', 'region', 'country-name') plus 'gender', 'dday' which are proposed extensions to the hCard format.  The program will also extract 'county-name' from the address if it exists, even though it  is a non-standard field.

Separator separatorvalue
If you use type SEPARATOR, you should specify what character separates the fields.  For example "Separator ,".  This line is optional and defaults to a comma separator.

Linebreak linebreakstring
This defines where Nabjet should assume the line breaks are.  For example "Linebreak <br>" will assume that each record (or line) starts after a <br> tag.   This is optional, and defaults to a new line character.

IgnoreBetween string1 string2
There may be special cases where a section of text should be ignored for every record.  For example, everything between the tags <H3> and </H3>.  This command will remove everything between those two strings, as well as string1 and string2.

[FIELD] section

The [FIELD] section is different than the other sections in that you will usually have more than one of them. A separate [FIELD] section is required for each data field to be indexed.

Fieldname  fieldname
There are a fixed set of field names that Nabjet will index.  See above for the complete list

Note that Nabjet does not store full dates.  Instead, it only stores years.  In general, that is sufficient for searching.  It also makes it easier to index since there are so many date formats out there.  Notice in our example, one of the death dates was "d 12 Oct 1876" and another was "d. 1885".  Nabjet is able to pull out the 4 digit year from each of these.

You should know there is a special field called nothing.  It's primary purpose is to provide a way to ignore records that could not be ignored any other way.  For example, let's say we have the following data:

BARTON, James McG.      b.              d. 1849
BAUM, Elizabeth         b,              d. 11-Aug-1867
BEATTY, James           b.              d. 1827
  (Cumberland Co. Militia - Rev. War)
BEATTY, Thomas          b.              d. 1830 
BEECH, Charles          b.              d. 23-Jul-1965 - 90 yrs

This is a FIXED format data file, so you would most likely pull the name fields from the first 24 characters.  But line #4 would cause a problem, with the program assuming "   (Cumberland Co. Militi" is the name field.  By using the following field definition, you can tell NabJet to skip any line where there isn't anything in the first two characters of the line:

[FIELD]
FieldName Nothing
start 1
length 2
IgnoreIfBlank

Column col_num
For record types of SEPARATOR or TABLE, this is the column number of the data for each record.

AfterColumn col_num
For record types of SEPARATOR, this extracts everything after the column number given to use for the field.  You would typically use this when a field separator is used in the middle of the data you want to extract.  Let's look at the following example:

Borkon, Louis Yale, 04 Jul 1895 - 20 Feb 1975, (contributed by Rich Boyer)
Borkon, Ruth Ashinsky, 18 Jan 1899 - 21 May 1990, (contributed by Rich Boyer)
Caplan, Jacob, (view 2 , 3), d. 26 Jan 1939, age 35Y, (contributed by Ellis Michaels)

For this example, we can use a separator of a comma, but notice that on the 3rd line, there are extra commas before the date range.  To get around this problem, we can use an "AfterColumn 2" command, combined with the the BirthYearFromRange and DeathYearFromRange transform commands.  The following field definitions would correctly index this example:

[FIELD]
FieldName LastName
Column 1
IgnoreIfBlank

[FIELD]
FieldName FirstName
Column 2
IgnoreIfBlank

[FIELD]
FieldName BirthYear
AfterColumn 2
Transform BirthYearFromRange

[FIELD]
FieldName DeathYear
AfterColumn 2
Transform DeathYearFromRange

Start position
This command is only used if the record type is FIXED.  It defines the starting position of this field.

Length position
This command is only used if the record type is FIXED.  It defines the length of the data for this field.

IgnoreIfEmpty
If the data for this field is empty, ignore the whole record and don't index it.  This is useful in cases where you may have a first name but no last name.  Put the ignoreifempty command on the Last Name field and only records with last names will be indexed.

Transform transform_option
This is a special command to transform the data for this field into something else.  The following transform options are available:

There are many cases where the Transform command is necessary.  For example, let's say you had the following data from a web page:

The following headstones were found in Main Line Cemetery:

John Smith , b 1832, d 12 Oct 1876
Betty Smith, b. Nov 1835, d. 1885
George P Jones, 1818, 1891

For more information, contact Ed at ed@hotmail.com 

Notice that the first and last names are not separated by commas.  Instead, use the transform command to pull the first word into the firstname field, and the last word into the lastname field.  The field sections might look like this:

[FIELD]
FieldName Firstname
Column 1
Transform Firstword

[FIELD]
Fieldname Lastname
Column 1
Transform Lastword

[Field]
Fieldname BirthYear
Column 2

[Field]
Fieldname DeathYear
Column 3

Notice that firstname and lastname both use column 1 for their data, but use the transform command to pull either the first word or last word.

That's about it for now.  As more commands are made available, I'll update this documentation.