Jan Sedivy: Just some comments: Our projects

Recently I was asked to review the latest development in our group and I realised how much work we have done. I have also noticed I am forgetting about my blog. Let’s fix it!

First the best news, our group has grown during the last year to five PhD and around 10 MSc students working in machine learning.

Today I would like to start with part one and mention some of our progress in machine learning. In the second part I will describe our IoT effort. The main machine learning topics can be broken in the following categories:

Natural Language Processing

Question answering YodaQA
Intelligent assistants
Sentence pair similarity
Multinomial classification

Information retrieval

Learning to rank

Information extraction

Focused crawling
Convolutional Neural Networks

Combining text and images
Image labeling

I’ll start with the major achievement, the YodaQA answering machine. It is an open source question answering system. It implements state-of-art methods of information extraction and natural language understanding — to answer human-phrased questions! You can try the live demo.

Along with YodaQA we have worked also on simpler Intelligent Assistants acting on a smaller number of commands. They take advantage from simpler algorithms finding the most similar answer for a given query. The sentence pair similarity is another topic of interest. The algorithms can help solving not only Answer Sentence Selection, but also other interesting problems, such as Next Utterance Ranking, Semantic Textual Similarity, Paraphrase Identification, Recognizing Textual Entailment etc. We have tested and developed a series of algorithms based on word embeddings and different architecture of Neural Networks.

To the NLP category belongs also the multinomial classification algorithm. The use case we are testing is the products categorization to a hierarchical directory structure. Typically the e-shops are categorizing products, such as a “14 inch screen notebook” under notebooks, computers, electronics etc. This process is handled by human beings and they do mistakes, our algorithm can find problematically categorized entries or suggest correct category.

An exclusive position in our group has the adaptive ranking research. The web content, its information relevancy, authority and the users interests are changing constantly. The goal of any search engine is to provide exactly what the users look for. The newly developed algorithm relies on users to find currently the best ranking. It constantly observes on what links are users clicking and it adapts based on this feedback. This is very relevant for information search and recommendation services.

Information extraction is the next topic completing our portfolio. Initially we have looked at basics of focused crawling, a strategy how to crawl internet and extract for example all mentions about AT&T and Linux. This leads to a design of a crawler with programmable search policy. Currently we work on even more sophisticated algorithm for extracting content from e-shop pages, segmentation, price, product name etc. extraction. These are known problems typically solved semi-manually constructing scripts and then running the extraction. Our goal is a high accuracy, general algorithms without any customization or training working for all e-shops.

In the part two of this blog I will review our efforts in the Internet of Things.

5 comments:

UnknownJune 3, 2016 at 8:57 AM
Good to see this post with the impressive details about the projects. Thank you very much.

Best web site developer | professional seo company in Dindigul
SnehaAugust 2, 2016 at 9:04 PM
Trello is an easy way to organize your projects in a simple way. Best software for professionals and students
Trello project software
Tareq HasanDecember 15, 2016 at 10:18 AM
thansk
Jason MarshallFebruary 3, 2020 at 9:28 AM
Hey, this day is too much good for me, since this time I am reading this enormous informative article here at my home. Thanks a lot for massive hard work. Handyman
jimAugust 24, 2022 at 10:06 PM
Wow! This is the best information I have ever read and I hope to get new knowledge from this blog again. สมัครufabetขั้นต่ำ10บาทฝากไว1วิ

Jan Sedivy: Just some comments

Pages

Tuesday, May 3, 2016

Our projects

5 comments: