Document Analysis

Selko Analytics has two modes of analysis for categorising text.

User has the freedom to choose which one they want to use based on available search terms. Some analysis formats work better in different situations.

See below to identify what works bets for you.







1. Intelligent Search

• Category can be defined by a set of keywords (e.g. pressure gauge, control room)

• Perfect for fast analysis

2. User-trained Machine Learning

• Broader or a less understood topic (e.g. mechanical engineering)

• Minumum of 100 example sentences needed for analysis. Pre-define, or automatically generate while processing everyday documents

• All the benefits of machine learning, no AI infrastructure or technical knowledge needed







1. Intelligent Keyword Search


In situations when there is very little training data, or the search topic is very well known, an intelligent keyword search can provide good results fast and without much preparation.

 

You need to simply give the category a name, a description and some keywords. The analysis will find related topics and highlight them in the document. An intelligent keyword search can also be used to search for a specific word in the text.

 

The technology is based on exponentially weighted bags of words. Multiple bags of words are designed using Natural Language Processing (NLP) and expert knowledge from the domain. The exponential family of distributions is used to weight the bags for classifying the text.



2. User-trained machine learning


To find categories in a document Selko uses text classification methods. A data model can be trained using a set of example sentences, which can then be used to find similar sentences in new documents.


Example:

An energy company wants to identify all sections in their technical documents that are related to control rooms. They input example sentences into Selko Analytics from old documentation that are similar to the information they wish to search. The user starts of the training and soon has a ready for control rooms. The energy company can now import new documentation identify all control room related sentences.

 

• Using example sentences, you can train your own model to identify categories.

 

• Selko’s easy interface allows you to input and manage these sentences, manually start the training and validate results. Mass import is also possible for pre-labelled data.


• Any changes the user makes will be stored and used to improve the model in the future.






Usually companies have very little domain specific information about unseen documents. Machine learning models usually require large amounts of training data for reasonable predictions. Selko Analytics uses transfer learning to overcome this limitation and can classify text using only 100 labels per category.




not enough EXAMPLES?




For best results, Selko uses transfer learning, clever fine-tuning of the transferred model, followed by predictions on unseen text. Predictions can be multi-class, multi-label, multi-class multi-label or hierarchical classification.




MORE ON OUR TECHNOLOGY


 

 

Deep learning with less effort

Traditionally high quality results were only accessible with a minimum of 10 000 example sentences. However, recent advances in machine learning make it possible to gain similar results with as little as a 100 sentences per category. Selko uses transfer learning technology to ease the collection of training data, and allows for fast results.




100 too much? Turn daily document analysis into future training data

Selko Analytics learns from any validation changes made to the results. The user can add or delete sentences from categories, allowing the learning model to be adjusted based on user selection. All sentences added in by the user are included in as training data. The user can start building the model even without any example data.












×