20 An Overview of the Graphical Search Engine (GSE) |
|
An Overview of the Graphical Search Engine
Agreement Condition Word Boxes
The Invert Results/NOT Operator
For most applications the BibleWorks Command Line interface provides all the search capabilities you will need. It does basic Boolean searches with ease and speed. But there are practical limits to how complex the searches can get when entered on a Command Line. The need was felt for a more powerful search engine that would permit the user to construct very complex searches using a graphical user interface, and so the Graphical Search Engine (GSE) was created.
Using the Graphical Search Engine (GSE) is quite simple. You construct word boxes that you can click and drag around the screen to arrange them in a logical order. You then draw lines to connect these word boxes to merge boxes which correspond roughly to the familiar AND, OR and NOT operators. To specify ordering and proximity you simply draw lines between the word boxes. You specify agreement by connecting the word boxes with agreement boxes. There are very few limitations on the complexity of the constructions you can put together with this interface. You can save queries for later use, e-mail them to other people, and even plug the results back into new queries. The GSE also supports punctuation-delimited searches, case sensitivity, and multiple version searches.
Users should keep in mind that the GSE is designed for complex queries. It would not make sense to use it to look up single words or phrases unless you need certain capabilities (like case sensitivity) that the Command Line search engine does not support. For that reason we recommend you familiarize yourself with Command Line operations before digging into the complexities of the GSE. This has the added advantage that any search you can type on the Command Line can be transferred and properly formatted for the Graphical Search Engine. The best way to begin learning about the GSE is to do some familiar Command Line searches and transfer them to the GSE interface (simply by opening the GSE with the search still on the Search Window Command Line).
Consider for example a
simple phrase search for the phrase "in the beginning" in the NAS version. The
search on the Command Line would look like the picture on the right.
If you enter this search on the Command line and open the GSE
you will see something like this. BibleWorks will reformat the command and copy
it to the GSE Window.
For a case like this it does not make sense to use the GSE. But when the searches start to get more complex the Command Line quickly runs out of steam. The GSE on the other hand has very few significant limits on the complexity of the searches that you can perform.
The following section describes the different components of a GSE query. It is an overview and is intended only to explain basic concepts of the GSE. There are four main elements in a GSE query: word boxes, merge boxes, agreement boxes, and ordering boxes. These boxes are combined to build queries that would be otherwise impossible to express on a single, linear Command Line.
§
Word Boxes
The word box
represents a word, a set of words, a wildcard, a set of wildcards, or a list of
references. It is equivalent to the individual words that you type on the
Command Line but can represent a much wider class of objects. For example, a
single word box could represent one of the following:
A particular word like "love"
Any of a group of words matching a wildcard specification like "lov*"
Any word in one list of words but not in another list
Any word in a specific range or list of verses
Any arbitrary list of words, such as a list of synonyms
A GSE query can contain many word boxes. The relationships between the word boxes are defined by different connections which the user makes between word boxes by inserting various objects and connecting them to word boxes with lines.
There are three basic kinds of connections that can be made between Word Objects:
§
Merge Boxes
One element to
which word boxes connect are the merge boxes. A merge box represents a Boolean
operation (such as AND or OR). When word boxes are connected to a merge box,
the verses represented by the individual word boxes are combined using the
operator specified in the merge box. If the merge box is set to AND, then the
results of the underlying word boxes will be AND-ed together. If the merge box
is set to OR, then the results of the underlying word boxes will be OR-ed
together. In addition, multiple merge boxes may be combined and connected to
each other, specifying very complex Boolean conditions. For example you could
AND several word boxes together, then OR several other word boxes together, and
then AND the results of both operations together to form a sort of search tree.
§
Agreement Boxes
Word boxes may
also be connected to agreement boxes. A given agreement box can be set to
require agreement in gender, case, number, etc. All word boxes connected to an
agreement box will be required to match in the selected agreement conditions.
Multiple agreement boxes, each with different agreement conditions, may be
used, and a given word box can connect to multiple agreement boxes. This allows
very complex agreement conditions to be specified. For example, you could
specify that a given word box must agree in gender, case, and number with two
other word boxes, but that it must also agree in lemma and part of speech with
some other set of word boxes.
§
Ordering Boxes
Finally, word boxes may be connected to each other using ordering boxes. When you draw an ordering connection between two word boxes, an ordering box is automatically inserted between the two word boxes. Ordering boxes are used to specify ordering requirements between two boxes. You can specify that a given word must be immediately before or after another word, and you can also specify the space that must occur between two words. Word boxes can have multiple ordering boxes connected to multiple word boxes, so complex phrases with multiple endings can be constructed using ordering boxes. For example, you could build a query that finds all passages where "[noun1] [verb1] ...[article1] [noun2] [verb2]" or "[noun1] [verb1] ...[verb2] [noun2]" occur (along with arbitrary agreement conditions set between the various word box).
If you have a search for the GSE, but just can't seem to figure out how to build a query to express it, walk through the following steps to build the query:
§
Step 1: Build your
word boxes and merge boxes
Figure out the various words, lemmas, or morphologies that should appear in the
results. Build a word box for each one and connect them appropriately to merge
boxes. At this point, you can run the query and see the initial list of verses
that the GSE will use for the verse test phase.
§
Step 2: Add range
filter word boxes or agreement condition word boxes
If your query involves range filter word boxes (e.g. All words before/after
another word must meet a certain property) or agreement condition word boxes
(e.g. If a certain word pattern occurs before/after another word, then
apply a certain agreement condition), this is the time to add them to the query
window. Don't forget that these types of word boxes aren't connected to
merge boxes.
§
Step 3: Add
ordering connections
Next, draw ordering connections between all words that have order
relationships. This is also the step where you can set up punctuation filters
and specify the distance between words. At this point, you can run the query
and see the list of matching verses -- the list should contain all matching
verses (without agreement tests).
§
Step 4: Add
agreement boxes
In this step you add agreement boxes to the query and connect word boxes to the
agreement boxes.
§
Step 5: Set
options
Finally, if you want to cross verse boundaries, specify search limits, search
qere/kethib, etc. set these options in the | Query | Properties | window.
Sometimes you will build a query and get results that don't seem right. Make sure you understand the query processing phases outlined in the previous section (see Query Processing Phases). If you don't understand the order in which the GSE performs each test, your queries will not run the way you expect them to.
If this doesn't help, incrementally re-build the query, one box at a time, testing each step by running the query and checking the results. First build the query using only the merge boxes and word boxes. If the query output looks right, add the agreement condition word boxes and range filter word boxes and ordering connections, one at a time, running the query after each addition. If the query output looks right, add the next box or condition.
§
How to open a GSE window using the keyboard or
menus:
From the Command Line or Search Window, you can open a GSE in three ways:
From the button to the right of the Command Line, from the main menu, or from the
GSE Button on the Main Window Button Bar.
§
How to select, move, connect, and delete
multiple objects:
You can select more than one box in the GSE window at a time. In selection
mode, drag a rectangle around all boxes to select. Boxes can be added or removed
from a selection by holding down Ctrl and clicking on them. When multiple
objects are selected, you can move then by clicking and dragging any of the
selected boxes. Likewise, selecting | Edit | Delete | from the
menu will delete all selected boxes.
When multiple word boxes are selected, you can switch to connect mode or
ordering mode and drag connections from all of the selected boxes to another
box. However, if you need to specify a primary word box for an agreement
condition, you should connect the word boxes to the agreement box individually.
§
How to disconnect boxes:
When you want to disconnect two or more boxes, simply select the boxes to be
disconnected (must be at least two) and choose | Edit | Disconnect | from
the menu. To select multiple boxes hold down the <Ctrl> key while
you select them.
§
How to build queries more quickly:
The easiest way to build a GSE query is to start on the Command Line and
export a query to a GSE window. That way, most of your word boxes and initial
connections to merge boxes are already set up. From there, you can manually
finish up the construction.
§
How to search on punctuation:
If you want to specify that a certain set of punctuation marks must appear or
must not appear between two words, follow these steps:
In the Query Properties window, find the language group for the version you are
using in the search. Type the punctuation marks that you want to use into the
window for that language group.
Between all words where you want to require or eliminate punctuation, connect
them with an ordering connection and check Require or None.
A Useful Tip: To search for an ordered string of words terminated by a period,
add a "*" word box to the end of the query, set the punctuation group
to be '.' only, turn on the punctuation flag in the ordering box before the
"*", and turn on "Cross Verse Boundaries."
§
How to tell which object a window is connected
to:
If you open the windows for several merge boxes or word boxes, you may
forget which window belongs to which box. To find the box to which a window
belongs, click on the window and the box will be highlighted.
§
How to decide between verse proximity and word
proximity:
Verse proximity is best used in queries without ordering and agreement
(Boolean operations only). Word proximity is better used in queries
involving ordering or agreement.
§
How to use the status bar toggles:
The status bar at the bottom of a GSE window displays the state of several
query options. The options can be toggled simply by double-clicking on the
option in the status bar.
§
How to do topical searches using Louw-Nida
Domain lists
If you have a Greek Morphological database (GNM, BNM, BGM, etc.) as your
version for a word box, when you select Inclusion/exclusion list and
select More>> | Add Louw-Nida Domain in the Word Box window, a
window will open that allows you to easily add Louw-Nida Domain Word Lists to
the GSE inclusion list. Don't miss the power that this gives you. In effect it
allows you to search on domains the same way you search on words. The window
has two list boxes. The one on the left is a display of the Louw-Nida domains
and sub-domains. When you click on one of the domains, the corresponding lemmas
will appear in the right-hand list box. You can select one or more (or all) of
these words by clicking on them. As you change the domains you look at you can
select different words in each domain and a record will be kept of what you
have selected.
If you want to find what domains include a particular Greek word just type it
in the "Show domains with this string" box and click on "Apply
filter." You can even use wildcards. This will generate a more abbreviated
domain list. If the "Accents" checkbox is activated, accents will be
significant in the domain string that you supply, otherwise not.
There are buttons on the right to select all the displayed lemmas or to clear
the words in the currently selected domain.
When you are ready to copy the list to the GSE inclusion list, just click on
OK. Duplicates will be removed before copying and Louw-Nida entries that are
phrases rather than words, will not be copied - the inclusion list cannot
currently handle phrases. Phrases will be grayed out in the "Words to
export" List Box to remind you that they cannot be copied. We kept them
there just to remind you of the fact that you may be losing some of the domain
content (though not a significant amount).
In this section we will discuss what happens behind the scenes when a query is run. We will also discuss each of the screen objects in greater detail. With this understanding, you will have the information you need to build complex queries.
When processing a query, the Graphical Search Engine passes through several phases. If you understand the different phases and the strict order in which they occur, you can better understand how to build a complex query and how to interpret the results of a query. For complicated queries, it is important to know the order of steps the GSE follows, otherwise you will not know how to build queries to find the answers you want.
§ In the first phase, called the Verse List phase, the search engine constructs a verse list for each word box. These lists are based on the word, wildcards, or inclusion/exclusion lists specified in the word box. For each word box, every verse which contains a word matching the word box specification is collected into the verse list. Of course, if the word box represents a verse list from disk, no work is done for the word box. For word boxes specifying inclusion/exclusion lists, a verse is included in the verse list if some word in the verse matches the inclusion/exclusion strings.
§ The second phase uses the merge boxes to combine the verse lists of the word boxes from the first phase. This is the Boolean Operation phase. During this phase, all word boxes with connection arrows into the merge box have their verse lists converted to the Bible version specified in the merge box. After the verse lists are converted, the AND, OR, and NOT operations specified in the merge box are done on the input verse lists. Verse proximity specifications are also tested here. At the end of this phase, the search engine has generated a single list of verse references per merge box. If a merge box is an input for another merge box, its results (a verse list) are fed into the connected merge box and processed in the same way that a word box verse list is processed. At the very end of the entire phase, a single verse list is produced. This verse list contains all verses that can possibly contain a "hit". The verse list represents the results that you would get if you removed all ordering, agreement, range filters, and agreement conditions from the query.
§ The third phase, the Verse Test phase, walks through each verse in the verse list produced at the end of the second phase and examines the text of each verse. For each verse, the following tests and actions are performed:
1.
Match List Construction
Which combinations of the words specified in the word and merge boxes are in
this verse? A given query may have more than one possible combination of word
boxes that may satisfy the query. For example, if a query specifies "(word
A OR word B) AND word C", then "word A" and
"word C" is a possible combination of word boxes that will satisfy
the Boolean conditions. Likewise, in this example, "word B" and
"word C" is another possible combination of word boxes that will
satisfy the Boolean conditions. In this phase, all possible combinations of
word boxes that will satisfy the Boolean conditions are tested. A match list
is a single combination of word boxes that will satisfy the merge box Boolean
conditions. All of the match lists for the query are collected into a list
of match lists. The GSE must next see which, if any, of the match lists can
be mapped to the text of the current verse. If each of the word boxes in a
given match list can be mapped to a word in the text of the current verse, the
match list is a possible hit.
For example, if a query specified "Find all words A or B or C, where A or
B or C occur before D", the possible match lists are "A and D"
or "B and D" or "C and D". A given verse may only contain
"B and D" and "C and D", so only "B and D" and
"C and D" would be put into the list of match lists for this given
verse. When the next verse is examined, however, it may be that only "B
and D" occurs, so for that verse, "B and D" would be the only
match list in the list of match lists. Note that match lists are only
collections of words and wildcards -- nothing about ordering is included. At
this point, however, any absolute word position tests are also done (e.g. a
word box specifies that a word must occur at the beginning or at the end of the
verse).
2.
Ordering Test
In this verse, are the ordering conditions satisfied by one of the match lists?
For example, if the list of match lists for a given verse only contains "B
and D", and an ordering box specifies that D must occur with exactly three
words intervening before B, and the text of the verse is "A A D B A A D",
the verse would fail this test. If the text of the verse contained "A A A
D A A A B", the verse would pass this test.
3.
Agreement Test
In this verse, are the agreement conditions satisfied by one of the match
lists?
4.
Punctuation Test
In this verse, are the punctuation conditions satisfied by one of the match
lists?
Let's use the following Granville Sharp query to illustrate the different phases (see Example 10: The Granville Sharp Rule) :
In the first phase, the GSE builds four reference lists, one for each of the following word boxes: the "*@d*" word box (the one before the first noun), the two "*@+/-v{pr}..." word boxes, and the "kai" word box. The verse list for the "*@d*" word box contains all verses where a word matching "*@d*" occurs. The "*@+/-v{pr}..." word box contains multiple specifications. It specifies nouns and certain participles, so the verse lists for these word boxes contain all verses where a word matching "*@n*" or "*@v{pr}..." occurs. The verse list for the "kai" word box contains all verses that have the word "kai" in them.
In the second phase (the Boolean Operations phase), these four reference lists are AND‑ed together, producing a single reference list.
In the third phase, the verse test phase, the GSE looks at each verse and performs the following tests on each verse to decide whether to keep the verse or eliminate it. The Match List Construction is trivial for this query since there is only one possible match list ("*@d*, *@n*, kai, and *@n*").
§ Ordering Test: In this verse, do these words appear in an "article-noun- kai -noun" ordering (zero words between the first "*@d*" and first "*@n*", with at most two words between all other words)?
§
If so, do the three
words agree in gender, case, and number? Also, if any articles appear between the
kai and the second noun, do they all disagree in case with the second
noun?
Understanding query processing phases prepares you to use subqueries, with which you can run multiple queries in a single GSE window and combine query results in another query (see GSE Examples, Example 15). Subqueries are useful for comparing phrases in different Bible versions. They can also be used to eliminate phrases in a query.
Each subquery is processed as a single query, using the query processing phases above. Subqueries are run one at a time and can be nested in subqueries. Subqueries lower in the tree run first. When an individual subquery is finished, the result is simply a list of verses. So the entire subquery can then be thought of as nothing more than a single word box using a reference list from disk. This verse list is passed up to the merge box above the subquery, and is processed during the parent query's Boolean operations phase.
This feature is especially useful if you want to run a query that checks ordering or agreement in more than one Bible version. For instance, to find all verses where the BGM has "o anqrwpoj" and where the NAS has "the man" and where the NIV has "the man", you would build a query composed of three subqueries: One subquery to find "o anqrwpoj" in the BGM, one subquery to find "the man" in the NAS, and one subquery to find "the man" in the NIV. The merge box for each subquery would have "Make subquery" checked. The three merge boxed would be joined with outgoing links to a fourth AND merge box. A sample query is included, entitled "subq1.qf."
A word box can represent a word/wildcard, a reference list saved on disk, or a set of words and wildcards to include or exclude. In addition, there are three different "flavors" of word boxes: normal word boxes, range filter word boxes, and agreement condition word boxes. In this section we will discuss the different types of word boxes.
The three different kinds of word boxes can be best understood if we limit the discussion to word/wildcard boxes (boxes that represent a single word or wildcard such as "Lord" or "faith*"). In these cases the normal word box (specified by checking the "Normal" option in the word box window) is used in the first and third query processing phases. In the first phase (the verse list phase), the word or wildcard specified in the word box is used to find all verses containing an occurrence of the word or wildcard. For instance, if the word box specified "faith*", all verses containing a word starting with "faith" would be put into the verse list in this phase. In the third phase, the search engine first determines if at least one word matching the word or wildcard occurs in the text of the given verse. If it does, the match lists containing the word box are candidates for producing a hit.
Range filter word boxes are only used in the third query processing phase. They are ignored in the first and second phase. Range filter word boxes are not even connected to merge boxes. This type of word box is used when we want to require that a range of words before, after, or between two normal word boxes must match a certain word or wildcard. It is also used to specify that a range of words must meet one or more agreement specifications. The word or wildcards in a range filter word box are only examined in the third query processing phase. For example, if we wanted to find all verses containing word A and word B where all words between A and B must be accusative, we would use a range filter word box between A and B to specify the accusative condition. Another example requiring a range filter is a query where word A occurs somewhere before word B and no nouns may occur between A and B. A range filter word box between A and B would be used to enforce the condition that no nouns occur between the two words. Finally, if you wanted to specify that all words between A and B must agree with word B in gender, case, and number, a range filter word box must be used between the word box for A and the word box for B. An example is given in Example 6 (see the GSE Examples section).
There are two types of range filter word boxes. The distinction between these two types is subtle, but important. The easiest way to explain the difference between the two types is to give an example. In this example, we have a normal word box describing, say, a noun. Preceding the word box, we have a range filter word box that specifies that the two words preceding the noun must not be an article. Now, this query may be interpreted in two ways. The first interpretation says, "Find all nouns and ensure that there are no articles in the two preceding words." The second interpretation says, "Find all nouns where there are two other words preceding the noun. Furthermore, the two words preceding the noun must not be articles." In the first interpretation, the two words represented by the range filter word box may or may not exist, but they must not be articles. For instance, a noun appearing at the beginning of the verse will not have any words preceding it at all, if the search is limited to single verses (when the "cross verse boundaries" option is off), so such nouns will always be considered hits. In the second interpretation, the two words represented by the range filter word box must exist (and they must not be articles). If a noun occurs at the very beginning of a verse, and the "cross verse boundaries" option is off, then this noun is not a hit, because it does not have two words preceding it.
Both of these interpretations illustrate a flexibility that the user needs to have. In order to tell the GSE how to interpret your query, you use one of the two types of range filter word boxes. These two types allow you to specify how to treat queries that have hits right at the beginning or end of a verse. When a query specifies that a search does not cross verse boundaries, the GSE will restrict the scope of its search to a single verse at a time. When a query uses a range filter word box to describe words that must appear at the very beginning or at the very end of a phrase and you want to insist that the words described by the range filter word box must exist, you use the range filter option labeled "Range filter (all specified words must match and must exist within the verse bounds setting)". When a query uses a range filter word box to describe a specification that you want to apply to words if they exist within the verse boundaries (depending on the setting of the "cross verse boundaries" option), then you should use the range filter options labeled "Range filter (all specified words must match IF they are within the verse bounds setting)."
It should be noted that the GSE does not cross Psalm chapters and ends of books, even if the "cross verse boundaries" option is on. In other words, a "hit" will not span two books or Psalm chapters.
§ Agreement Condition Word Boxes
An agreement condition word box is used when you want to specify that a connected agreement box is to be used only under certain conditions. Without an agreement condition box, an agreement box's condition will always be enforced. An agreement condition word box does not require the word or wildcard in the word box to exist -- it only specifies that if a word in the verse matching the word or wildcard condition in the word box exists in the correct ordering position, then the attached agreement boxes must be enforced. For instance, if you wanted to build a query that specified that word A must agree in gender, case, and number with the word immediately preceding it only if the word immediately preceding A is an article, you would have to use an agreement condition word box to represent the article. Another example requiring an agreement word box is a query where word A precedes word B and we want to require that if a word C or D occurs between A and B, then C or D must agree with A and B in person, but we do not want to require that any and all words between A and B have to have any agreement with A or B. We would use an agreement condition word box to represent C and D. Note that the agreement condition word box for C and D does not require words C or D to occur. It merely specifies that if C or D occurs, the connected agreement box must be satisfied. Agreement condition word boxes are only tested in the third query processing phase and are ignored during the first and second phase. Like range filter word boxes, agreement condition word boxes are not connected to merge boxes.
§ The Invert Results/NOT Operator
When you want to search for verses that do not contain a particular word, or when you want to search for phrases, but exclude words from the phrase, you will need to use the Invert results (NOT) option in the word box. This flag is easy to understand, but the details of how it works contains subtleties that we will explain in this section. When checked, this flag works in two different ways.
Generally there are two reasons for wanting to "NOT" a word box. Here's an example of each of the two types of queries:
1. Find all verses containing the word "Lord" but NOT the phrase "Lord God".
2. Find all verses containing the word "Lord" but NOT the word "God".
In each query you would want to NOT the word box containing "God", but notice the differences. In the first query a verse can contain the word "God" as long as "God" does not immediately follow "Lord". In the second query any verse containing the word "God" must be eliminated from the verse list.
Now let’s look at how the NOT option is processed during the different query processing phases.
1. During the Boolean Operation phase of the query processing, all
verses containing words that match the specification in the invert/NOT word box
are placed in the verse list for that word box. The verse list compiled at this
invert/NOT word box is then inverted (all verses not in the list are included,
and all verses in the list are excluded). The result is that for this
invert/NOT word box, the GSE constructs a list of all verses that do not
contain any occurrence of a word matching the specifications in the invert/NOT
word box. There is, however, an exception to this: If the invert/NOT word
box has ordering or agreement links, then the invert/NOT word box is completely
skipped during the Boolean Operation phase. The reason for this exception
is that an invert/NOT word box that appears in a phrase (ordering) or in
agreement conditions should not eliminate a given verse if a word matching the
specification in the invert/NOT word box appears somewhere else in the verse.
There may be a combination of words in the given verse that satisfy the
invert/NOT condition and the ordering or agreement conditions connected to the
word box, in spite of the existence of other words in the verse that match the
invert/NOT word box's specification.
This is somewhat complicated to describe, but the following example should help
clarify the point. Say for example we have a query that searches for all verses
containing "Lord" but not the phrase "Lord
God". Hit verses can contain the word "God" as long as
"God" does not immediately follow "Lord". The GSE
query (see above) would contain two word boxes: one for "Lord"
and one describing NOT "God". These two word boxes would be connected
by an ordering link, specifying that the "Lord" word box must precede
the NOT "God" word box. Thus, the NOT "God" word box will
be ignored during the Boolean Operation phase, allowing verses that contain the
word "Lord" to be examined. This is necessary since a verse that
contains "God" should not be eliminated as long as "God"
does not immediately follow "Lord". If the GSE were to
eliminate all verses that contained the word "God", verses with
possible hits would be unnecessarily eliminated (such as "And
Abraham...called there on the name of the LORD, the everlasting God.").
2. During the third phase of query processing (the Verse Test phase), a match is made with a specific word in the text if the word box specification does not match the word in the text. In the case of inclusion/exclusion lists, the roles of the inclusion list and exclusion list are swapped. Note that in this phase, we are searching through the specific words in a verse. The goal during this phase is to map the word boxes to specific words in the text of a verse.