Hi, I am Danny Merkx, long time student and now finally employee at this wonderful university. Having studied Public Administration and then Artificial Intelligence I have been at different faculties finally ending up at yet another here at the Faculty of Arts. It is a very natural place to end up however, as the field of language and speech technology offers more than enough challenges for those interested in AI.
My work on computational modelling work started with my thesis on modelling the use of durational information in speech recognition. During my thesis I also got the opportunity to do an internship at the Jelinek Memorial Workshop on Speech and Language Technology at Carnegie Mellon University. Here we worked on multi-modal learning of word embeddings using speech and images. Multi-modal learning builds on the idea that people’s representations of semantics are not just based on text sources (as are most conventional word embeddings). In fact we can all see our environment and understand speech far before we learn to read. This approach got me interested in vector semantics and became a starting point for my PhD research.
As a member of the Language in Interaction consortium involved in the Big Question 1 project I will investigate computational models of the mental lexicon. Word vector representations such as the Skip-Gram model, GloVe and DeViSE all make the claim to result in vectors embedded in a rich semantic space. However there is a lack of universal evaluation metrics for this claim. Understanding what it even means for a representation to be ‘semantic’ and being able to evaluate different approaches on equal footing is essential to understanding how we can improve our methods for creating semantic embeddings.
My first starting point will be the semantics of numbers. We know quite well what it means for there to be ‘two’ or ‘three’ of something, what ‘more’ and ‘less’ mean and there’s even a relatively clear definition of more abstract concepts as ‘few’ and ‘many’. My goal is to build a system able to learn these concepts from combined visual and auditory information.