A Great Simian or just a Monkey

google nl vs watson nlu

Google Natural Language vs Watson Natural Language Understanding

The competition in understanding natural language from unstructured text is thickening. Google just launched two new features for their Google Natural Language API, categories and sentiment. Those have been in the Watson Natural Language Understanding API for a while now, but let us see how the two APIs compare to each other overall.

Let us start with a head to head comparison with a real example.

Google Natural Language API vs Watson Natural Language Understanding Head-to-Head

I thought an article about another player in the game could be in order, so I entered an article in Fast Company about the Microsoft CEO Satya Nadella “Satya Nadella Rewrites Microsoft’s Code”

I will use the demo-interfaces for both services, they can be found here for Google NL and here for Watson NLU. The only difference I could find in how you post information to the two services is that you in the case of Watson, just can post a URL to the API and Watson does the rest. It is a simple feature that makes analyzing of web pages much easier, but the result is the same and also, someone has probably already built something similar for Google NL and put it on Github. If you do try the services I suggest looking at the actual API results as well, not only the demo-interfaces since those only show parts of the results.

So, how did they compare?

Document Sentiment: 

Watson NLU returns a 0.19 positive sentiment on document level
Google NL returns a 0 neutral sentiment

…so very similar, which I would have considered very strange otherwise given the length and depth on the topic in the article.

Winner: Shared victory

Sentiment breakdown

Both services provide a breakdown on sentiment so it is possible to determine sentiment on entities etc, but Google NL also provides sentiment on sentences, which can come in handy since it puts the sentiment in context immediately in the result from the API.


I started by listing a few entities to compare, but it does not give a great perspective of the capabilities since those numbers need to be in context, do run it yourself and check the result for details. Overall they are very similar, naturally with some differences in the result, but overall similar. Watson NLU provides a slightly better granularity, but Google NL has the sentence result which is very good, so overall very similar.

Winner: Shared victory


Again, very similar. The major difference is that Google added the news and business categories, while Watson was a bit more rigid and stuck to tech and software. Even though the entities in the article mainly are tech-related I did like that Google NL classified the article as Business / Industrial at a 0.89 score, while Watson NLU did not include any business-related category, but classified the major category as /technology and computing/software at a 0.67 score.

Winner: Google


This one was a bit peculiar. Entity identification is naturally a difficult thing, but was a bit surprised by the results from Google NL, while Watson NLU was quite solid. Let us just look at the top 4 from each

Watson NLU (the score is relevance score)

  1. Microsoft, Company, 0.87
  2. Satya Nadella, Person, 0.81
  3. CEO, JobTitle, 0.55
  4. Steve Ballmer, Person, 0.38

Google NL (the score is salience score)

  1. Satya Nadella, Person, 0.47
  2. Microsoft, Organisation, 0.42
  3. learner, Person, 0.02
  4. CEO, Person, 0.01

The two things that surprised me was the drop in salience score already after the second entity, it stayed at 0 for all the rest of entities, as well as the type for “learner” and CEO….and also that “learner” was classified in that way at all. If I look through the entire list in Google NL, I cant get my head around it completely.

It also seems like Watson NLU has a bit better capability in business related types and Google NL is a bit more focused on consumer types. Watson NLU clearly more structured.

Winner: Watson

Conclusions of the test

The main differences between the two is that Watson NLU supports more features, like emotions, as well as the opportunity to apply custom ML models to the Watson NLU. This gives Watson NLU the capability of learning entities and relations in your specific domain.

Google NL has the benefit of being straightforward and support all their features in all languages as well as having a bit more granularity in their score (salience and magnitude).

Is it actually working? I would say that both services are good at what they do, but I would give the win at this stage to Watson due to the more extensive features as well as the capability of adding custom models. This is from an enterprise perspective, if you are in the consumer space it might be worth to do a POC on both. I like how IBM has started to be more modern in their approach with Watson and I think the APIs are working very similar. They are open, well documented and easy to work with (please note that I am not a developer).

Also, it is worth noting that much of Watson NLU have been around for a few years now (through the IBM acquisition of AlchemyAPI 2015 ). Google has been in the game for many years as well, but not in the enterprise space with a packaged service for natural language. If Google continuous to focus on this space I think they will be a real threat to IBM if they do not keep their pace up (which I see is a risk given it is IBM).

I would say as of this date, Watson NLU is the winner in the test, but I think Google is working at a high pace to package it’s extreme knowledge in the space quickly and I expect a lot of progress at a high pace. So, even if Watson is a leader today, they might not be tomorrow. The difference seems to be in the packaging, not the domain expertise.

For a bit of breakdown on pricing, terminology etc, keep reading.

What is Natural Langauge in this context?

Simply put it is the capability to do text analysis through natural language processing. It gives us the possibility to extract the following:

  • Entities
    Extract people, companies, places, landmarks, organisations etc etc
  • Categories
    Automatic categorization of the text. Both Google NL and Watson NLU has an impressive list of categories. Google state total 700 and I have not counted Watsons, but seems to be about the same.
    List of categories for Google NL.
    List of categories for Watson NLU.
  • Sentiment
    Is a text positive or negative, but nowadays it does not stop there, it is also possible to break it down further to target the sentiment at specific entities or words (differs between Google and Watson, more on that later in the post).
  • Syntax / Semantic Roles
    Linguistic analytics of the text by splitting the text into parts and identify nouns, verbs as well as subject, action and object etc. The Google Cloud Natural Language Syntax feature seems to be a bit more extensive than Watson Natural Languages Semantic Roles.
  • Keywords, emotions, and concepts (Watson only)
    Emotions are …. emotions like joy, anger, sadness etc. A great feature for customer service or similar products.
    Keywords are words that are important in the text.
    Concepts are words that might or might not appear in the text but reflect a concept.


The two services use similar terminology. Google uses Syntax where Watson uses Semantic Roles, otherwise very similar terminology.

In Watson NLU all results are returned with a confidence score. Google has added two additional things to consider, magnitude and salience. Personally I like the simplicity in only using the confidence score, but naturally, the two other values can provide additional value in some cases.

Confidence Score: Is a score between 0 to 1 and the closer to 1 it is, the more confident it is. Usually above 0.75 is considered confident, but that is naturally depending on the subject and domain, you do not want a car to only be 75% sure that it is ok to do something, but if a customer service representative is getting a ticket that is 75% confident to be a Lost Password ticket, that will do.

Sentiment Score: Is a score between -1 and +1. When close to 0 it is fairly neutral, the closer to 1 the more positive and when close to -1 it is pretty negative. Watson actually sent the positive/neutral/negative-label in the API, Google only the score. Google Natural Language also sends a Magnitude parameter. Magnitude is a score to complements the sentiment score by telling us how strong the sentiment is.

Salience: Shows how central an entity is in the entire provided text or document. It is a score between 0 to 1. This is a good feature to if you need to see how “heavy” an entity is in a text. Only available in Google Natural Language.

To see explanations of Google Natural Language terminology as well as examples of JSON results for each of above, do visit Google Natural Language Basics.

To see explanations of Watson Natural Language Understanding terminology as well as examples of JSON results for each of above, do visit the Watson Natural Langauge Understanding API reference documentation. There is also an API Explorer if you want to play with the API.

Custom ML-models?

If you are an enterprise this feature is usually very important, this so it is possible to extract domain-specific entities and relations. If you have build an ML-model it is very easy to deploy it to Watson Natural Language Understanding, but I could not find a way to do it with Google Natural Language. Since I am not entirely familiar with the Google APIs I might be mistaken here, so feel free to correct me and point me in the right direction.

It might also be as simple as that IBM comes from the enterprise angle and applying custom models in more of a pre-requisite for IBM than for Google that comes from the consumer space.

Supported Languages

In terms of AI / Cognitive / Machine Learning the language is always a tricky beast. I have written extensively about what languages Watson understands, and will in this context only compare Watson NLU vs Google NL. I would say they are on-par with each other on this topic. Watson supports Arabic and Russia, while Google NL is supporting Chinese (both traditional and simplified). As a Swede, I will give Watson the victory, since Watson NLU actually partially supports Swedish as well, but that is a very biassed Watson victory.

Additionally, the comparison here is a bit difficult. I interpret that Google NL supports the listed languages for all features in the API, which is very good. Watson NLU has more features but does not support all features in all languages, so dependent on your task one or the other might support it.

languages supported by watson natural language understanding

Supported Languages for Watson Natural Language Understanding

languages supported for google natural language

Supported Languages for Google Natural Language

What is the price for Google Natural Language

Monthly prices per 1000 text-records. One text-record can contain up to 1000 unicode characters. It might seem complicated, but if you have followed my posts of pricing prior, it is clear that they all are equally complicated. Full details available at the Google NL pricing site.

google nl pricing

What is the price for Watson Natural Language Understanding

Watson NLU is also charged on a per “block” per month price-model, they call it units and a unit is about 10.000 characters, so bigger units. IBM also charges for enrichment features. As an example: if you want a 18.000 character text analysed for entities and categories, it is 4 NLU Units (independent on how many categories or entities that are returned). Two units for the text and two units for the features. Looking for pricing for the rest of the Watson APIs, I have a post with a spreadsheet with the cost for all Watson APIs.

Watson NLU pricing

Given that the prices for Watson NLU are labeled in Swedish krona (since it is my live Bluemix account I have taken the screenshot from), I also attached a simplified model so it is easy to compare to USD as well.

Conclusion on pricing

This is a tough one since these models are hard to interpret before you have worked with them live and actually been invoiced, which I have not from Google, but from IBM Watson.

Nevertheless, I get the impression that you get more bang for the buck with Watson in this case. I sense that the free tire is more generous as well. But, this is a tough one for me to come to a clear conclusion, so it is more of a sense than a fact that I think Watson is more bang for the buck. The day I will receive an invoice from Google with NL on it I might update.

Disclaimer: I have been working with the Watson APIs for many years and know them pretty well, I am not as deep with Googles APIs. With that said I am open to others to complement my analysis and / or conclusions.

Top Image: The image is a wallpaper from the game Crysis 2.


Short-tail, long-tail and human-tail chatbot


Business Benefits of Natural Language Understanding


  1. Amit Meena

    Does the pricing differ if I impose a limit on extra features in a NLU item. for e.g : For Entities NLU item, if I remove emotion and sentiment from response, would it impact in pricing or it will remain same.

    • Hi Amit,
      Yes, every enrichment feature is considered one item, so if you limit the enrichment features, pricing will decrease, if you add additional ones, the pricing increases etc. Having said that, it seems that pricing does not increase or decrease if you add / remove sub-features. In my tests, it seems this way anyway and it seems hard to find any documentation supporting this.

  2. Sophia

    Hi, Please I have a quick question regarding the Watson’s NLU ‘Categories’. Is there a way to tweak it for a situation where it gives a lower score for a right category and a higher score for a wrong category.
    Thank you.

  3. Hello sir,
    This is a really comprehensive study of both the options we have for NLU.
    I just wanted to point out for other readers that:
    1. The “salience score” returned by Google Cloud NLU is NOT the “confidence score or something”. It is the salience score (as you correctly mentioned) which implies the importance/significance of the entity with respect to the document.
    2. Learner is categorized as a person as it helps us build chat-centered apps which need to understand that when a user says something like “The learner left the course…”, the learner is identified as a PERSON.

    It is really great how you provided this blog. Thanks a lot sir.

  4. Dave O

    Loved the article, any chance you will be updating it? Just curious how different the comparison would be today. Or are the two pretty much the same still?

    • Hi Dave,

      Thanks for your comment.

      I agree that an update would be in place since the space is moving in warp-speed and this post is about 1.5 years old. I have not been that good at writing posts in general lately, but hopefully I get up to speed soon again and then an update might be in place.

Leave a Reply

Your email address will not be published. Required fields are marked *

Powered by WordPress & Theme by Anders Norén