Artificial Intellect Information Retrieval Engine
We propose the AIRE (Artificial Intellect Information Retrieval
Engine) natural language processor
applicable to your document management system.
The Engine, together with special customized add-ons, can
- categorize the documents from an input flow, or created by user;
- assign tags and prepare the documents to use in the Customer's
- non-structured text documents;
- parameters of categorization, e.g. a structure of tags.
- a set of the tagged documents;
- statistical data.
- Universal search system between corporate documents of The
Constitutional Court of the Russian Federation (in Russian); a working
model based on a subset of real documents is available by inquiry.
How does it work:
A natural language processor AIRE has been recently developed basing
on many years activity in the field of Artificial Intelligence .
It provides grammatical and semantical analysis of input texts
starting from each byte from the input stream upon the agenda, and
building hypotheses of binding and decoding for each byte pair,
further - for the results of the preceding binding cycle, etc.
Progressively, an hierarchy of of the conceptual bindings is built:
from bytes - to graphemes, then - to phonemes - syllabic chains -
morphemes - morphemic complexes - sentences - paragraphs - whole
Those bindings are provided using so called mindmaps of corresponding
levels (mindmaps of character encoding, graphematics, phonology,
morphology, syntax,and finally, formal semantics and pragmatics rules
and concepts that are embedded into the UCO (Universal Conceptual
Simultaneously, the working binding hypotheses are rated, according to
the coverage criterion, counting the number of bytes bounded/decoded
by this hypothesis.
This process results in construction of the so called Conceptual
Graphs which are sets of identified concepts and relations between
them. They are formed by a combination of routes that connect the
bytes covered by the hypothesis, and that pass through the concepts of
higher levels of hierarchy.
As related to a problem of intellectual search, such conceptual graphs
are built both for
for a collection of raw plain-text documents, and for a search query.
The information retrieval can be done by by matching the graphs for
the query with those for he result, in
this case so called informational noise, that is typical for any other
search engines, is completely
Also, the search is done by sets of equivalent graphs, or by graphs
with superclasses of concepts, these sets being built during indexing
process. A measure of relevance between the query and the result of
search depends inversely on the product of lengths of routes from the
superclasses in the query graph to subclasses in the graph of the
result. So, the search by synonyms or by the closeness of meanings can
The processing of input texts described above can be used to provide
automatic subject rubrication - the
attribution of the input text to that or another predetermined
category, or even an automatic creation of
subject heading lists is possible, that is based on a-priori unknown
content and structure of the input
 A.V.Dobrov, Technologies of Intellectual Information Retrieval and
Techniques Evaluating Their Effectiveness (in Russian: Технологии интеллектуального поиска и
способы оценки их эффективности // Структурная и прикладная
лингвистика, вып. 8 - Издательство СПбГУ, 2010)