Cutting Through Nonsense in Large Text Documents

Sifting through mountains of irrelevant text just to locate important nuggets of information is as irritating as it is inefficient.

Yet, whether you’re a master’s student putting together your thesis or an engineer trying to categorize technical specifications, you’re going to have to do just that. Worst of all, these kinds of texts run rampant with ambiguities and unclear language making it unusable. Furthermore, they aren’t usually formatted with navigational purposes in mind.

Most of the time, extracting the important information out of text-dense documents is time sensitive, but how can 500 pages be analyzed for specific data in good time?

Improve Your Manual Process

Techniques such as speed reading can aid you tackles these large documents, but there are some faster, out of the box methods, that you can try out and which can help you tackle all that data.

If you’re a lawyer, engineer, economist, or any other highly-educated professional, you’ve had to make your way through undergraduate and graduate programs. One of the main principles of studying and learning is knowing what not to read when trudging through arduous texts such as scholarly and academic journals.

That principle never changes once you’re out of the realms of academia, and since every industry has deadlines you can’t waste time absorbing superfluous information.

Here are some tips on how to get to the meat of larger documents:

1) If a document contains an abstract, read the “findings” and “conclusions” sections first as they usually contain the pertinent information

2) Read the first and last sentence of every paragraph in order to get the gist

3) See if you can find a better-written source with similar information, it may make it easier to work through the more complex text

3) Understand the style format of text you’re reading so you know what to avoid

By mastering these more traditional methods of gathering necessary information, you will be well on your way to better efficiency but harnessing modern technologies can push you the extra mile.

Technology to Rescue

Modern advances in technologies, such as natural language processing, have given rise to a number of intelligent tools to help in different situations. Whether it be thousands of technical specifications, a large pile of CV’s or an uncontrollable amount of customer feedback, we can find tools that can help process the data and give extra insight.

Finding The Right Information In R&D

Collecting a reading list for a new problem takes industrial researchers an average of 3 weeks, with some of the material still being irrelevant for the project. Using artificial intelligence software, these mundane tasks can be sped up by using solutions such as’s exploration tool. builds an interdisciplinary research map based on a problem statement or research paper of your choice, and delivers a precise reading list in less than 2 days.

Accessing Emotional Insights

Many retailers need to understand what’s being said about them on social media. For those who’ve found themselves through various Twitter or Facebook wormholes, social media can present a vacuous hole of worthless text.

Even still, if you don’t assess endless social media posts about your products or services, you won’t be able to optimize your response to issues or fully grasp how your company is performing.

Companies such as Lumoa allow you to track all your customer experience insights from an online platform; finding what drives the customer experience, comparing it with KPI’s and understanding what you are doing well, and what you should be doing better.

A more do-it-yourself solution is also available from Amazon. Amazon Comprehend uses sentiment analysis to computationally determine whether a piece of writing is positive, negative, neutral, or mixed. It can be used as part of a serverless event driven architecture on Amazon Web Services.

Engineering Document Analysis

On the engineering side, tools such as Selko Analytics extract vital information from text-dense engineering specifications. Engineering departments can easily set up their own machine learning model through the tool and pre-process technical text to spot contractual risks, streamline procurement activities, and categorize tasks by groups or architectural levels.

The industry standards demand an influx of data in need of processing, analysis, and implementation, particularly given the complexities of safety and engineering requirements. Time is often of the essence and the engineering specifications are large written documents which need to be processed to get new projects underway.

Selko Analytics’s utilises its intelligent search features to locate items of text adhering to certain groups. Using its user-trained machine learning technology one can properly categorize text after analysing only 100 samples.

Requirements and regulations can be assessed with more efficiency and detail. All results can be verified by a person before export, and any changes made will be learned to produce better results next time.

Solutions Are Out There

Locating the important information in large documents with impenetrable walls of text is frustrating for any professional with a need to get projects and initiatives off the ground.

Thankfully there are plenty of methods to help you through this difficult process. Manual “scanning” methods employed by students are good tips that help with large documentation, but modern technology tools can also be utilised for improved efficiency and productivity.

Text based data can be hard to keep consistent and to navigate, and walls of pointless text can seem frustrating, but there are plenty of solutions abound to cut through the jargon and get right to the point.

What is Natural Language Processing?

Have you ever wondered how services like Siri, Alexa, Cortana, and Google’s assistant work? Perhaps you’re content with a little mystery in your life, but if you want to learn about a new technology that is revolutionizing many industries, read on.

Natural Language Processing (NLP) is becoming increasingly ubiquitous across many devices with new uses emerging frequently. This article gives a quick look at the fundamentals of NLP, what it’s used for and different techniques that make it possible.

What is NLP?

NLP is the branch of artificial intelligence (AI) that is responsible for developing ways for machines to understand human language. Its development has been driven by the growth in big data, machine learning, computational linguistics, computer science and the desire to have more human to machine interaction.

A basic human-computer interaction with NLP may look like this:

• Human speaks to the computer
• The computer captures audio and converts it to text
• Computer processes “translated” text
• The computer converts it back to audio and “speaks” to human

Asimo robot communicating

With advances in technology, NLP is capable of analyzing large volumes of language-based data in a consistent way.

How is NLP Used?

Beyond requesting songs to be played or marked as your favorites, NLP can do much more. Some of the common ways NLP is currently used include:

• Chatbots to automate customer service and ordering
• Improve search results
• Make text processing faster
• Create advertisements
• Provide suggested responses to texts and emails
• Extract information from websites
• Answer complex questions
• Translation and sentiment analysis

At this stage of development, it’s likely that companies have only scratched the surface of what NLP is capable of. As machines and algorithms get more powerful and complex, it’s like that uses of NLP will expand beyond what we can imagine today.

What are Different Techniques of NLP?

While language and communication, in general, rely on syntax, semantics and pragmatics analysis, NLP needs slightly different techniques to carry out its impressive accomplishments.


This technique separates records into different groups or categories based on labels or codes.


Summarization is the process by which NLP can extract a key sentence or develop a short and accurate summary of a longer piece of text.


This technique organizes documents or records within a classification group. It creates clusters within the broader labels.


NLP also relies heavily on extracting data, keywords, keyphrases and other text.


This technique is utilized by search engines frequently as a way to match similar, duplicate or near-duplicate words or phrases. It’s a way NLP can be leveraged to find similarities between different records.

Sentiment Analysis

Sentiment analysis is the technique by which NLP can understand the nuance of and emotion behind what a human is trying to say.

Semantic Analysis

This technique helps machines learn and understand contextual clues humans give when they speak.

Does Your Business Need NLP?

Most likely. If you’re not currently using AI in any capacity, it may be time to catch up.

Many functions within your business – sales, marketing, finance, operations, etc. – could benefit from the adoption of AI and natural language processing.

Selko at Slush 2018 – We’re Excited, and You Should Be Too

Slush 2015, photo by Jussi Hellsten

Selko is excited to be joining Slush 2018, since our experience at last year’s event was nothing short of amazing. Out of thousands of startups, we were chosen as a top 10 finalist in the Slush 100 Showcase (their pitching competition). To say we got the most out of attending Slush 2017 is an understatement.

This time around, as a seasoned attendee and more mature business overall, we can enjoy this year’s event even better than before. Slush started off as a 300 person event a few years ago and has now grown to 20,000 tech heads converging in Helsinki. This year we are focusing on meeting potential partners, industry reps and even possible recruits at the event. That kind of ambiance, energy and overall atmosphere not only brings great talent into one space but also provides a great environment to network and create great contacts.

Tuomas Ritola from Selko pitching on stage, Slush 2017

For More Than Inspiration

Tech events are known for impressive lineups with largely successful names. The great thing about Slush is that not only do they deliver just that, but you also get to dig into real world practices and hands-on advice from these great founders, leaders and startup pioneers. Slush is all about bringing true changemakers to the stage. Here are a few speakers from this year’s lineup:

• Dr. Werner Vogels, CTO at Amazon
• Bill Ready, COO at PayPal
• Julia Hartz, Co-Founder and CEO of Eventbrite
• Katarina Berg, Chief Human Resources Officer at Spotify

The Journey Behind Development

Slush not only knows how to create a great on-stage experience, but off stage there are ample opportunities for companies to share their ideas, products and even their stories. Startup District, Founders Mingle and even Speed Mentoring and Roundtables are just some of the amazing encounters that can happen from Dec 4-5.

And because of all the great contacts and leads from last year’s event, we’ve been in a great position to develop our shiny new MVP (minimum viable product), and we’re excited to show it off in Slush!

Slush 2015, photo by Jussi Hellsten

In Good Company

Of course, it’d be very remiss of us if we didn’t talk about the extensive list of innovative AI and machine learning startups that will be at Slush this year. Once again, we are in great company with a few notable mentions to include:

Malls of Globe – World market offering smart retail platform as a service.
• BEAD – An AI system that analyzes, optimizes and operates a building’s energy management and operations by measuring real-time occupancy.
Something Corporation – A personalized medicine company working on data-driven continuous care for chronic pain management.
Selko Technologies – An AI based software that automates complicated requirements analysis in engineering, that can save years worth of expensive, repetitive expert work

Meet Us There

Slush is known for helping the next wave of tech entrepreneurs take their business to the next level. And with so many like-minded tech fanatics in one phenomenal atmosphere, it’s hard not to.

We’re looking forward to all the great new opportunities to connect and even possibly show off our shiny new MVP. We said it before but we’ll say it again: we’re excited to be joining Slush 2018! Hope to see you there!

Data Overload Providing New Opportunities for the EPCM Industry

Every industry is feeling the changes dealt by advancing technology. Everyone wants projects completed quickly, accurately, sustainably, and cost-efficiently. Engineering, procurement, and construction management companies are expected to meet these desires for other companies, ensuring projects are successfully finished on time and under budget.

To keep up with the demands of other companies, EPCM companies will need to embrace the technology that is the driving force behind change in their industry. Automation and AI tools can help EPCM companies meet the expectations of their clients, completing projects without errors and without missing the deadline.

Advancements in Technology Affecting the Industry

Many industries are beginning to rely on EPCM companies to help them with new projects. With a focus on efficiency, there is a demand for ever-increasing speed of project delivery. Earlier deadlines and faster turnaround requirements are forcing EPCM companies to move faster with less wiggle room in the budget.

Advances in technology are the driving force behind changes in the EPCM industry. Tasks can now be completed faster, with less errors, and sometimes even automated entirely. Robotics, Artificial Intelligence, the internet of things, and additive manufacturing are all impacting how business is conducted and will likely continue to change the business landscape.

For example, drones in a warehouse can monitor and track inventory much more efficiently than people. 3D printing can reduce the time it takes to create products.

Increasing efficiency and decreasing project timelines are goals that a company in any industry could demand. EPCM companies need to expect these demands and work to improve their own efficiency to stay ahead.

Sustainability is a growing concern felt through all industries as well. Construction produces carbon emissions and uses a large amount of resources. Finding a way to reduce emissions and use sustainable materials is a goal of many companies. These goals mean that the previous way of doing business could be changing. For example, cheap material suppliers may no longer be the best option if they are not using sustainable material. This alone can cause upheaval in vendor selection, with contracts and offer documents needing to be read closely for these new demands.

Text Data at the Core

Engineering, procurement, and construction management rely heavily on contract text, especially in procurement. EPCM companies have to go through the process of evaluating a selection of potential suppliers before deciding which ones will be the best choices for the project. They may receive multiple offer documents from different companies, and a decision needs to be made quickly for the project to continue.

These tender documents are often hundreds of pages long. When multiple companies submit offers, they all need to be read quickly and correctly to make the best decisions. If a section is missed or misunderstood, the supplier who is chosen may not really be the best candidate for the job and can result in a regulation or requirement not being met.

Because of the advances in technology, with automation and digitalization driving down the timeline of projects, EPCM companies need to implement automation and digitalization where they can to reduce their own project lengths.

There is so much data that is being created but not being analyzed or implemented to the extent that it could be. Companies that learn how to use the value their data possesses will continue to thrive. For EPCM companies, data can reveal information all through the value chain and can help improve efficiency.

Learning to harness new technologies can also open up new markets. Sending out an offer to a customer that was previous unattainable due to a large barrier to enter, could be minimized by modern tools.

Working across borders or in multiple jurisdictions can also create issues and bring the burden of extra text data to process.

Difference areas may have different regulations or requirements, which create a large amount of extra work before an offer could be made or a project could commence.

All in all, whether it is tenders in procurement, offers to customers or regulatory text, processing this text quickly and accurately is important. However, organizing and analyzing all of the technical text is a huge task. When dealing with very large documents, finding the important information and sharing sections with relevant engineers can be arduous, and in the worst case, vital information can be missed. Luckily, advancements in technology can also help EPCM companies move through project workflows without errors and without breaking the budget.

New Tools, New Opportunities


4castplus allows an entire team – engineers, project managers, project controls, and procurement – to work together and collaborate within a single system. It includes a suite of tools to keep EPCM companies on top of projects, with a full lifecycle procurement system, project controls, and customizable notifications based on system events.

Selko Analytics

Selko Analytics has developed Artificial Intelligence software that can automate the processing and categorizing of text data documents. Machine learning allows the software to be customized to a specific company’s needs, and the software will learn from any changes or corrections a user makes.

This will reduce the amount of time and people needed to read through technical text documents; they simply need to verify the software’s results at the end. Any corrections made are “learned” by the program and will ensure more accurate results with each use.


Aconex improves workflow processes by using one single platform for everyone involved on a project to collaborate. Automated workflows and a project archive will reduce time spent duplicating work, allowing personnel to dedicate their time and resources elsewhere. Integrated search tools allow users to find information quickly, and cost and schedule information ensure projects remain under budget.

There are many tools that can help engineering, procurement, and construction management companies meet the expectations of their clients. Technology is advancing so quickly, EPCM companies will need to embrace it and use technologies such as AI and automation to their advantage and do business faster, more accurately, sustainably, and cost-efficiently.

Learn More