Your business is making decisions only on 20% of the information you have access to, this since 80% of your information is unstructured and up until now not able to be fully utilized. It is about time we start to make decisions for our company based on all information we have, not only 20%. All else would be quite stupid, wouldn’t it?

Companies have tried to make sense of unstructured data for ages, but 78% state that they have little or no insight into their unstructured data.

It is an understatement to say that most of the worlds information is completely un-utilized and hidden in the dark.

You might think “I know my data” or “We can search our documents”, but that is not the same as getting value from the information.

What is unstructured data?

It might be clear to many, but just so we are all on the same page, unstructured data is images, video, sound and documents like blogs, news articles and Word documents et al.

An image is most often stored only with some metadata attached to it. That data is only telling us what time, date the photo was taken, sometimes (if the camera has the feature) it stored where it was taken etc. What is it in the photo? The most important information is completely hidden from us and we need to manually look at the picture to decide what it contains….it is the same with video.

For text, it is hard to find entities, sentiment, emotions, categories and also how they actually relate to each other. Which information is the most significant in a text and how does that relate to a target entity etc.

But I have Google Search?

Yes, we all do, but let us try an example. If we let Googles algorithm read all the Harry Potter books and at the same time let a cognitive system like Watson read the books, what will the difference be?

A simple yet powerful result is that one of them will be able to answer this question:

“Which house in the Harry Potter books is evil”

Image courtesy of Warner Brothers

It is not stated in the books that the evil house is Slytherin, but we all know it since we have read, reasoned and decided that Slytherin is all bad and Griffindor is good. That is an advanced example but still, puts the finger on the difference.

Google can deliver this result as well, but only if someone actually has written that Slytherin is evil in a text.


In a company context then?

If we translate this to a company, we could have thousands of reviews of our products stored in documents, but actually not know which one that is most appreciated and why (most reviews rely on stars, numbers etc to create a forked way of rating, but that does not tell us anything about context).

If those reviews would be enriched with cognitive information, the answer is only a search away.

Other examples are accident reports for insurance companies, customer support, legal (laws, regulations etc), social media, integrate unstructured data in business analytics and predictive analysis, medical research, product information, marketing and communication etc.

Examples – Getting value from your unstructured data

Example with getting value from unstructured text: A customer survey or customer feedback or similar. You receive a 2 star review and that is not good. What you are missing when not working with your unstructured data in a way that you can pull value from it, is that in the comments it says “The product was a broken unit, but Julia really went the extra mile to fix it” or a three star with the comment “Your opening hours make it impossible for me to contact you in any way, even though I love your product, I actually just bought 2 new ones”

What a traditional system misses is the following:

  1. Julia did a great job
  2. Reason for the 2 star was that the product was broken
  3. Opening hours are bad which pulls down the stars
  4. The average review was not connected to the product, which seemed to be a 5-star experience.

That is without a doubt important information for any company working with customer experience.

Example with getting value from unstructured data in images: Your ad agency has taken a bunch of photos for your new products and it is time to add those to the product information data. Often the data from photos and product data are disconnected, but no more. Now the process can be streamlined. If we put this in an online perspective it will not only make your process more efficient it will also increase conversation and increase sales, why?

  1. If a potential customer is looking for a yellow chair, he / she will find it immediately, this instead of browsing through pages of chairs of different colors and sizes.
  2. Since the unstructured data has become structured the Google results will increase significantly and your customers will find the yellow chair much faster.
  3. Value add and up-sale. Since we know the image contains a yellow chair, we now automatically can add value by showing products that fit well with the yellow chair and not only additional chairs as often is the case today.

Example with internal company documents: You have thousands of customer reviews, but they are only available in document format and poorly tagged, you only get product, date and some other basic meta data. You cannot get an overview of which products are having problems and with what, which products are highly appreciated and why, is it a specific issue that is re-occurring? If you enrich all your reviews with cognitive capabilities you will get the following (please note that this is not a huge effort):

  1. Dashboards with clear overview of all products and how they are perceived with a score.
  2. If problems with products, the actual problem is defined on the affected products.
  3. Image information of attached images can be analyzed. What products, color, model, issues etc can now be identified.
  4. …well, you get it.

How do I start to take control over my unstructured data?

All this must be complex, expensive and take ages to get up and running. Not really, the actual enrichment is very straightforward. For text, use the Watson Natural Language Understanding service. Send text through the API and enrich the document with the response from the API. You can also bulk upload documents (in many different formats incl .doc .pdf and HTML) to Watson Discovery Service if you want the service to manage the processing for you (ingesting, converting, enrich, store and also the querying). Watson Discovery Service uses NLU for enrichment, but also adds an end-to-end solution. The actual enrichment is the same. Using WDS is a bit more complex, but on the other hand, you will have your own cognitive search-engine in-a-box incl a powerful query-language) and intuitive tooling

If you want to enrich documents with domain-specific information like your own products, domain language etc it is possible to add custom ML-models to both Watson NLU and WDS through Watson Knowledge Studio, which is an easy-to-use interface to build a custom ML-model (done by subject matter experts, not programmers).

For images, it is a similar approach, but with the Watson Visual Recognition API and enrich the image with the response from the API. It is also possible to build your domain-specific classifiers so that Watson can recognize your products etc.


There are vast amounts of value hidden in unstructured information and in this post, I tried to take a few simple examples. In each organization, there will be easy wins, but as with everything, also more complex.

The best value is naturally gained when all information is integrated and put in context.

Today, only 20% of the information in companies are accessible, that will not create the most reliable foundation for a business to rely on, so it is time to get started to gain value from ALL your information.