Semantix Killerapplix

Posted on: Mar 12, 2013 by Aljosa Pasic

I am a fan of the Asterix comics’ series, just like many Europeans are. I tend to think it is a part of the European (and not only the French) education, but only recently I started to re-read Asterix comics in different languages. In English, “Fulliautomatix” is the name of the Asterix’s village smith. His father was “Semiautomatix”. In French this is “Cétautomatix” (c'est automatique, in English meaning "it is automatic"), very similar to “Esautomátix” in Spanish. However, in Dutch, he is called “Hoefnix”, where we have the first of what I call “enhanced semantics”: “hoef” means "hoof" and the Dutch phrase “ik hoef niks” means "I don't need/have to do anything". Even better, and funnier, is the name of Obelix´s dog: “Idéfix” (idée fixe, a "fixed idea" or "obsession"). This is also his name in Spanish, but in English, they went for the “enhanced semantics”: Dogmatix comes from dogma, very similar to a “fixed idea”, while it also contains the word "dog". Personally, I find very funny these names in Asterix comics due to their clever “double meaning” charm, something also used in the other translations (e.g. Harry Potter).

Atos Semantix KillerapplixI decided to test Google Translate service: with English, French and Spanish it passes the test of translating the character names correctly, although it does fail with Dutch (e.g. does not recognize Hoefnix). Let me try it harder! Typing “I was in Newcastle last year” translated to Spanish, returns the correct translation. However, when I put “I was in New Castle last year” (this city appears to be in Delaware, USA), I got a wrong translation “Yo estaba en el castillo de nuevo el año pasado” (“I was in a new castle last year”).

Machine Translation (MT) research started with the invention of computers with first attempts of automatic translation dating back to the 1940's. Currently, the most of MT commercial systems use either traditional rule-based, statistic or a hybrid (RBMT-SMT) approach, but they still have to cope with semantic problems, seen as the capacity of a system to give/retrieve a meaning or multiple meanings to words or phrases. This is why back in 2000, when I first learned about the use of semantic web technologies in natural language processing and MT in the MKBEEM project (Multilingual Knowledge Based European Electronic Marketplace), I directly thought “MT will be killer app for semantic web technologies”.

Fast forward to 2006. I found a video lecture from the 3rd European Semantic Web Conference where I was asked about the killer app for the Semantic Web technologies. My favorite “killer app” candidate at that time was “semantic interoperability problem solver”. More killer app candidates appeared, such as web service discovery, composition or opinion mining/ sentiment analysis. However, semantic web technologies still remain rather “obscure” and unknown, something not easy to sell to your client.

And then in 2011 I got a new favorite Semantic Web “killer app”: future prediction!

Think about users like financial analysts, investment managers, market regulators, financial advisors, etc and their need and ability to quickly identify and interpret relevant information. Their job is to try to identify dynamically evolving and potentially risk bearing situations (e.g. shocks and crashes). If we could only predict how the stock exchange will behave tomorrow…

The problem comes essentially to processing vast amounts of (unstructured) information and reasoning about it, something that, at least in theory, can be solved with semantic web technologies and potent cloud computing resources. The “Big Data” project called FIRST tries to do exactly this. By gathering “open source” unstructured information such as twitter comments, and applying sentiment analysis technology, the solution is able to tell if the general opinion about e.g. the Euro is positive or negative. But this is not all! By putting this info in a sort of “sentiment timeline” we can get a diagram with the general “positiveness” or “negativeness” tendency that can predict e.g. stock exchange behavior in the near future. You can check the demo which is also accessible from the project website. So, does it really work, and if it does, why are all project members not already rich? I spoke to Tomas Pariente, the project coordinator and he gave me few explanations.

The “perfect” correlation between sentiments/opinions on the internet and the stock exchange (SE) behavior (tested e.g. with Dow Jones) works in one direction: if an important SE event happens, there is practically always a very strong positive or negative event in “FIRST sentiment timeline”. Unfortunately, it is not always the case in the opposite direction: these sentiment peaks are only one more indicator that influences SE. The second reason is reaction time: the SE changes are happening too often and/or too fast. Finally, there is also misleading information, which although not being categorized as a completely forbidden practice (such as market abuse and pump-and-dump), prevents sentiment analysis to be considered really a risk-free future prediction technology. As a security expert, I identified these practices as so called cognitive or semantic hacking attacks, something that becomes an increasing threat for the future of the Internet credibility and openness in general. When I think of it, the new killer app might be detecting and fighting this type of cognitive attacks. A semantic web technology-based solution, that supports reliability metrics and detects information provenance, could be useful in supporting/facilitating data sharing, information quality determination or simply in finding and chasing bad guys that publish false or fake information on internet.

So, going back to “semantic promises” and potential killer apps; there is no doubt that machines are getting smarter thanks to these technologies. Processing of unstructured data (e.g. text) and even knowledge itself, and using processes similar to human deductive reasoning and inference are changing the shape of the Internet, as we know it. Today, they not only understand synonyms or acronyms, but would also get the right meaning for homonyms or heteronyms. Tomorrow, they might be able to laugh along your side, while reading Asterix comics.

I started this post with my favorite comic and I will finish with it: “Ekonomikrisis” is one of the less known characters, a Phoenician merchant who helps Asterix and Obelix travel to and from Rome. When I typed his name in Google translator I got a message “Did you mean economic crisis?”. Was it now a machine joking or it is just testing me?

Share this blog article

About Aljosa Pasic

Business Development Director
ALJOSA PASIC current position is Technology Transfer  Director in Atos Research & Innovation (ARI), based in Madrid, Spain. He graduated Information Technology at Electro technical Faculty of Technical University Eindhoven, The Netherlands, and has been working for Cap Gemini (Utrecht, The Netherlands) until the end of 1998. In 1999 he moved to Sema Group (now part of Atos) where he occupied different managerial positions. During this period he was participating in more than 50 international research, innovation or consulting projects, mainly related to the areas of information security or e-government. He is member of EOS (European Organisation for Security) Board of Directors, and collaborates regularly with organisations such as ENISA, IFIP, IARIA, and others.

Follow or contact Aljosa