An intelligent search engine based on artificial intelligence and big data processing technologies
NAUMEN Intelligent Search reduces time of daily information searches through diverse sources and offers business users accurate and detailed answers to complex questions related to production processes, rendered services and applied research. The smart search engine is based on machine learning and natural language processing technologies. Itis capable of processing vast volumes of data and delivers accumulated knowledge to company’s employees.
What objectives you can achieve
Combining all information sources into a single search environment
Our solution enables you to conveniently search dozens of data sources connected to the search engine, i.e. network files/folders, company’s systems and portals, electronic libraries, etc.
Receiving high-quality search results for analysis and decision-making
A search engine helps you to quickly find information you need in large volumes of unstructured data from various sources and provides a user-targeted response.
Expanding knowledge management areas
You can expand your company’s knowledge management by connecting to the search engine knowledge bases and other sources, which contain important business information. Also, your company’s employees can be timely informed about new materials of interest.
A single search boxfor all queries
Regardless of the information location, users can easily access it via a universal search interface: a search box, search result pages and applicable filters. Connectors of the search engine to Internal data sources are specified at the deployment phase. A spider bot (web crawler) collects information from the external data sources.
Searching files of different formats
A full-text search is carried throughout all Microsoft Office files (doc, docx, xls, xlsx, ppt, pptx, etc.), web pages (html, htm), open document format files (odt)as well as files of text documents iin graphic-formats (pdf, djvu, jpeg, etc.).
Supported file formats
- Text documents: RTF, TXT, ODT;
- Microsoft Word documents: DOC, DOCX;
- Microsoft Word XML Document: XML;
- Web page: HTML, HTM;
- Microsoft PowerPoint Presentation: PPT, PPS, PPTX, PPSX;
- Microsoft Excel workbook: XLS, XLSX;
- Text documents presented in graphic formats: PDF, JPEG, JPG, BMP, GIF, TIF, TIFF, PNG, DJVU. span>
Understanding meaning of documents and users’ queries
NAUMEN Intelligent Search understands the meaning of documents and provides users with advanced search results, which are relevant to their search intent. It is also capable of understanding acronyms and terminology, used by company’s employees and, generally, in this industry. Moreover, machine learning technologies constantly improve the search engine’s degree of understanding the meaning of documents.
The search engine uses extended document data obtained through semantic analysis technologies. Document attributes are determined at the semantic analysis phase, which represent the document summary to group documents by their contents, highlight key words, assign tags, etc. Search algorithms, considering these data, significantly improve quality of search results even if documents, containing responses to users’ queries, do not contain words from the original queries (fuzzy search).
When delivering search results, the system accounts for features of users’ profiles, users’ areas of interest, history of user’s queries as well as some unique parameters generated by the system based on analysis of the users’ documents.
Self-Learning Search Engine
Machine learning ensures search quality and accuracy when a number of documents constantly increases, new data sources are connected, new document versions are released and other changes occur related to a company’s Information Storage and Processing Policy.
Delivers all features of modern search engines in-one solution
Provides content-wise and attribute-based full-text search
When delivering search results key words are searched through the document content and the attributes (fields) of the document profile.
Uses Morphology-based and Exact Match Search
In morphology-based search document key words are searched not only in a strictly specified form, but also in all morphological forms, such as gender, number and case inflection.
Provides Facet Filter Search
Users can manage a document sample range in the search results using a group of filters (facets), which identify different document features (type, author, creation date, etc.).
Enables Thesaurus Search
Searches using the thesauri and semantic similarity data obtained by distributional semantics methods.
Enables Contextual Search
With contextual search document are searched by key words, if they are spaced from each other at a distance less than the specified distance.
Uses Unified Document Catalogue
The technologies for cataloging and categorizing data are used to create an unified document catalogue from all data sources with user-friendly structure and convenient navigation.
Largest implementation projects
Cognitive search system
for Gazprom Neft Research and Engineering Center
users at the pilot project phase
electronic documents available for search
Previously search queries gave excess information, but the cognitive search system enables to make a clarified query and obtain focused responses, create filters by particular aspects.
Global CIO Union of Russian IT Directors
Project of the Year 2018 Contest Award Special Nomination: Global CIO Choice
Top 10 Oil&Gas IT Projects
Nomination: Corporate Information System
Stages to implement a company’s Intelligent Search System
Implementation of the Intelligent Search System in a company is a full-scale project, which involves NAUMEN’s team of experts. As a rule, there are several main phases of the project subject to the nature of the objectives set.
1. Analysis of Data Sources, Types and Formats
First, studying all data sources, document types and storage formats, contents and attributes. This stage is the most time consuming, because of requirement to identify the maximum number of details and data management features to minimize the risk of unnecessarily costly changes in data extraction and storage algorithms in the future.
2. Source Integration and Data Pre-Processing
Integrating data sources and creating an unified search environment at the second stage. To achieve this, our experts develop a data model, on the basis of which interaction with the data sources occurs as well as create a data bank for several categories of data. The data uploaded into the data bank are pre-processed to improve the quality of scanned documents, solve encoding issues, delete garbage characters, etc.
3. Language Modelling
Third, constructing a language model based on extracted from the documents text data The model accounts for specific features and wording standards in different types of documents: technical, scientific, etc. The language model further enables the search engine to understand meaning of the documents.
4. Semantic Analysis and Document Structuring
After passing through the machine learning phase the system uses the language model to identify special attributes of documents, which reflect their summaries. Finally a semantic space, is constructed. It is a basis for further analysis and system intelligence by including document structuring tasks: grouping documents by contents, identifying key words, assigning tags.
5. Configuring Search and Ranking Algorithms
The final stage is to configure search and ranking algorithms. The model of ranking document in the search results can be adjusted subject to numerous parameters, which ensure high relevance of results: document relevance, different priorities for document content and attributes, specific features of query wording, etc. The filters and thesauri of the subject domain are set up, which expand the search results due to inclusion of documents with similar meaning.
The main components of NAUMEN Intelligent Search are written in Scala programming language, relational PostgreSQL (indices) and non-relational MongoDB (content storage) are used as DBMS. It also uses open source components: Elasticsearch search engine and Apache Spark framework for a distributed processing of unstructured and semi-structured data.