Insights and Learning from the Semasio Team

Man Versus Machine

Written by Kornelija Daugalaite | March 27, 2020

In light of the current global events, we at Semasio have been closely looking at how user behavior is changing online to not only gain insights but also to optimize the targets, if needed.  

While we were analyzing our target groups, Semasio’s platform once again proved to be self-learning and auto-adjusting - even in unprecedented circumstances such as the one we are in now. You may ask how and why it matters at all. Well I thought it’d be a great idea to walk you through just how our system, combined with a little human elbow-grease, is able to cope with situations just like this.

Let me give you some background theory first. 

 

The basis of it all: Semantic Approach

Semasio helps marketers to create highly flexible and unique targeting products using a variety of inputs. We believe the most precise and unbiased way to understand users is through the actual content they consume on a daily basis. This is why our Semantic Approach is grounded in dynamic Natural Language Processing and language-based probabilistic modeling, in combination with industry know-how. 

The Semasio Language Model, or rather its underlying algorithm, constantly scores all words and phrases in the given language by taking into account how frequently a specific keyword appears in editorial content in that particular language. The nature of any language, however, is that it changes – words, phrases, and buzzwords come into being and are propagated on a day-to-day basis. To accommodate this the Semasio Language Model is dynamically and automatically maintained to ensure its representation of a given language is always up to date.  

For every page analyzed, our system creates and stores a dynamic Semantic Page Profile that contains the most significant keywords weighted according to their probability of appearing on the given page versus the overall language model.  

 When a user consumes the page, its Semantic Page Profile is integrated into this users’ Semantic User Profile. With hundreds of pages consumed by each user every month this Semantic User Profile grows and changes, capturing the essence of what is important to the user right now.  

 

How our system has coped with the challenge  

Now, remember how I mentioned at the beginning that we had been following the progression of our target groups to understand how recent events (that is, the COVID-19 outbreak) were affecting content and user behavior online?

Well, at first we saw an absolute explosion of related keywords in users’ Semantic User Profiles, which was a result of everyone consuming pages containing those keywords. As such, most of the targets we observed had, initially, clear elements of coronavirus-related content.  

However, what we noticed soon enough was that Semasio Language Model was automatically adjusting itself and re-weighing all the words and phrases it was analyzing. Given that corona-related keywords were now everywhere, it got a very high score in the model. And if you compare a high number to another high number (in this case, the frequency of a word on a specific page versus the frequency in the overall language model), the relevance of that word on the page decreases. As a result, all targets we were reviewing now were back to their true relevance and served the initial purpose.  

As a result of our model’s adjustment, only pages that have more corona-related words, statistically, in their content will have them included as significant terms in their Semantic Page Profiles. And only users who have consumed exceptionally more of such content will have it dominant in their Semantic User Profiles. Most importantly, only targets that are intended to have such content will include it.  

 

But will machines always work? 

So we stay cool and proud. Our dynamic approach to page and user profiling is able to adapt to any new big topics and “normalize” user behavior. But what can we do when there is so much content out there treated as “bad,” now that there are corona-related terms all over the place? How do we “normalize” that?  

As much as I find machine learning and artificial intelligence absolutely astonishing, so is the human mind. Frankly speaking, they are both equally imperfect. While machine learning is great at constantly aggregating and dynamically processing enormous amounts of information, it does not always know how to make sense out of things; for example, it does not understand emotional or social factors. Humans, in turn, are prone to error and bias, and cannot comprehend as much information as quickly.  

To respond to the current situation, you might have a brand safety measure in place that excludes every page that has “corona” mentioned in its content. But what if your target group ‘commuters’ are reading an article by the name of “Should I still take the underground to work?”. It’s  not only OK to show an ad next to such an article, in fact, it might even hit the right nerve. Or, imagine you want to target beer enthusiasts. In this case, “corona” also coincides with a well-known beer brand and in a different context is exactly where beer enthusiasts can be found. Thus, by letting the system decide for you and exclude all corona-related content, you will be losing some very relevant pages.  

 

Machines plus humans is the best combo 

In my opinion, combining the two is where the magic happens. This is also what Semasio clients see as our greatest advantage. We let the system do the hard work of analyzing and building your custom target based on the terms pages contain and users consume. Then we fully equip you to become a secret data scientist who flexibly judges and fine-tunes this target to the highest semantic precision.  

Add complete control to the mix and you, not the system, get to decide where and how you want to speak to the target you just created – reaching the users directly, via the pages they are on, or combining the two to extend and unify your targeting strategy.