Thursday, September 22, 2016

Do not look for part time job work for the university

eClub is extending the Summer Camp for the whole academic year. This is an early information.

The eClub Summer Camp is in its third month. This year we have more than twenty members. It is a little less than last year but the projects are much more focused. The mentors are more effective, they have learned how to lead. Overall we have accomplished much more. The success and the high quality of projects has ignited an idea to extend the eClub throughout the whole academic year.

We will offer students scholarships starting from the October 1st for the winter and extend for the summer semester. We are preparing joined labs with our partners Seznam, Jablotron, and many others. We are in the process of signing the contracts. Our joined labs are located in the new CVUT building in the Dejvice campus, everybody will get a seat and table. You will save the commute time. eClub and the partners too, are looking for innovative ideas and interesting projects. We also want to make the projects semestral works or final thesis to let you earn credits. Working with leading companies is a great entry to CVs. eClub will provide scholarships and we also will offer a complete infrastructure including large computer clusters. Each student will have a mentor helping to get over the usual problems no to get stucked.

We will offer similar projects as during the Summer Camp. Mainly from the field of artificial Intelligence, IoT Industry 4.0 and BigData. We do not wnat to limit our activities to this fields only, come and ask. We are inviting students from all Czech Universities. We also will support startups. Work for the leading companies in an university joined lab.

Work on great projects. Follow our FB, eClub web pages and blog, stay tuned for further information.

Wednesday, September 7, 2016

Bots, Home control and many more projects, eClub

The students in the eClub Summer Camp are for more than two months working on new projects in IoT and Bots. Here is the first status report describing the projects under development.

The presentation has two major parts the IoT and conversational applications projects. The IoT team has built a proof of concept application allowing intelligent control of light including the cloud infrastructure. The sensors are wirelessly connected. A cloud application offers a simple dashboard. In conversational applications, we have built several modules, services running in Docker. Combining them we can implement simple conversational bots as well as a factoid answering machine.

In the presentation, you can find a basic information and links to websites, demos or gitHub with the code. Some of the projects are in progress some of them are already finished. Some students will extend the projects to theses in the following academic year. Let us know what you think.

Friday, September 2, 2016

Bots and question answering in ESC 2016

a team of talented students formed around conversational applications in the eClub Summer Camp 2016. They continue developing the YodaQA and a simple Alquist bot. Both systems are built on top of a set of services including dialog managers up to NLP services.

We have started with the YodaQA factoid question answering system inspired by the IBM Watson. It is a fairly sophisticated engine, which builds on many NLP algorithms, Lucene search, RDF databases etc. The architecture and technology description is available on GitHub along with a test website and Android application.

The latest work is concentrated on teaching YodaQA Czech. This requires replacing some of the components with Czech versions. The most important are the Stanford syntactic parser, the Named Entity Recognition and finally the answer classifier. For syntactic parser, we use the Google Tensor Flow, the Syntaxnet and the Czech dependencies dataset. We get similar accuracy as the classical top of the line algorithms. Currently, we are developing the basic version of the entity recognition algorithm based on Conditional Random Fields (CRF). We have plans to implement NER also using Neural Nets.

The biggest problem in machine learning are the training sets. For the initial answer scoring algorithm, we have put together a set of questions-answer pairs. To make the set as rich as possible we have been enriching the set using variables for entities and synonyms, which is allowing us to algorithmically generate a large number of questions. The real system is logging questions and answers helping us creating better training sets. The sets still require some manual processing, but it is worth doing it.

The emergence of the conversational bots caught our attention too. Initially, we have tested the Wit.ai, Microsoft Luise, Meya, and Amazon Echo for English. Soon we have found many different limitations. Because the YodaQA is put together from a set of independent services, NLU processors, we have decided to use the same services to build simpler conversational bots. The bots use two essential parts intent and the entities recognizers. The bot processes the input users query and the extracted intent and entities are saved to a context object. Dialog manager (DM) uses the context to control the dialog flow.

Since we do not use the DM in YodaQA, we had to develop it. During our experiments with commercially available bots, we liked very much the Meya DM because of the simple dialog declaration in YAML. We have decided to go in a similar direction and created our own version called Alquist. It allows us writing even complicated dialogs. The implementation was fast and today we are running our first version of Alquist DM.

All this work is done by about ten students and the team grows. During the last weeks, we have made a considerable progress. We have at work several applications, stay tuned to be the first to test them.

Wednesday, July 20, 2016

eClub Summer Camp IoT, Machine Learning

the eClub Summer Camp is in full swing and our lab is full of students. The projects can be divided into two groups IoT and in Machine Learning.

The IoT group is busy with projecting architecture for connecting HUBs with Cloud servers. We assume the cloud will have to serve millions of HUBs collecting the information from sensors and controlling the actuators. We are discussing and making predictions how many events we will collect from sensors, how active will be the smartphone users, how much of administration traffic (heartbeats, updates, etc.) we will have to support. The users' profiles, sensors, and HUB configuration need to be maintained in databases. We also plan to save all logs to provide access to historical data. There will be probably two different systems one for handling the incoming data and another one for storing the logs. Haboop with HDFS seems to be the choice for managing the logs. SPARK for filtering and managing the events. Of course, security is one of the most important features of the system and we are busily studying communication protocols. It is a large project and many students work on preparing the specification and testing parts of the design. Our goal is to create a proof of concept showing the HUB CLOUD communication still this year.

We have also a large group focusing on conversational systems. The work is centered around the open source YodaQA factoid answering engine. It has been inspired by the Watson Jeopardy system. It already answers English questions. Our major task is to convert it to Czech and improve the functionality. We are working on the integration of WikiData knowledge DB and we have to retrain a lot of the NLU blocks to Czech. One of the students works on creating a Czech parser model for the Google SyntaxNet parser,  

We are also looking at bots, which are good for creating of simple conversational apps for example for controlling a simple home IoT. The bots technology is based on information retrieval approach. We try to search for the best answer for a particular question. In this field, we have been working on Sentence Pair Similarity algorithms, which can be trained to recognize the question intent. Students are also looking at new development packages such as wit.ai, api.ai, Microsoft LUIS, land others. We try to develop examples of small simple applications. We believe the hands-on experience will help us to understand where are the limitations of the small, IR base systems and where we need to opt for the YodaQA technology.  The laest interesting but the essential part of our effort is creating training databases. Everybody is involved in the hard work of data set collection.

If you are interested, join us visit us we may help you to select an interesting project. There is still time to join.

Tuesday, May 31, 2016

X.GLU startup in eClub

Last week I have visited the Pioneers festival in Vienna. This was also the first public presentation of the new eClub startup X.GLU.

The X.GLU startup has developed a revolutionary glucometer called X.GLU. It is the smallest glucose meter, it is the size of a credit card and simply slips to your wallet. X.GLU requires no batteries and no wires to read the sugar level on your smartphone. As long as your smartphone is charged, the glucose meter works. No maintenance required. The X.GLU uses a standard connector for a biomedical sensor paper. It comes in a convenient bag along with disinfection tissues, lancets, and testing strips. The read out is transmitted by the NFC technology providing a secure wireless link between X.GLU  and the smartphone. Unlike Bluetooth, Wi-Fi, and similar wireless technology, the NFC cannot be sniffed from a distance of more than several inches. The measured values are displayed and stored in the smartphone. An encrypted connection sends the X.GLU data in a cloud and makes it available to physicians providing instant feedback in treatment.

The smartphone app comes with a how-to video. It shows detailed instructions on, how to treat the skin before taking the sample and the method for properly taking the blood sample. The app also conveniently reminds the user about  the scheduled measurement time.

The mastermind of the new company is the inventor and owner Marek Novak, who came with the idea of glucometer. Marek is one of the most active students in eClub. He has worked already on several IoT-related projects, but X.GLU is the first one we want to get to production. eClub helped in complementing Marek’s knowledge and found experts in sales and marketing to create a functional company. They start their operation from our scientific incubator.

It is great news for eClub. We all will try to do our best helping to start a productive and successful path to market. We are looking for other students teams with startup ideas. Join us during the eClub Summer Camp.

Tuesday, May 10, 2016

Our projects part 2

This is the second part of “What we do” this time about the IoT activities. 

Our IoT effort can be roughly divided into two parts SW infrastructure and sensors. We use the standard IoT architecture combining an HUB and a cloud server. It is a typical IoT system setup allowing to collect the sensors information and control actuators over the Internet. The architecture uses an HUB. It serves as a gateway to the Internet and concentrator for the sensor data. The HUB is a simple computer with a similar power as a router  equipped with Ethernet or WiFi or both to connect to the Internet. In addition, it may have several other radios for the sensor, actuator communication. The radios are continuously listening to sensors and this typically requires power, therefore, HUBs are usually not powered from a battery. 

Our HUB is based on the Intel Edison dual core 500 Mhz Linux-based embedded computer with WiFi, BLE, and 868MHz free band radio. It also includes three USB sockets for additional peripherals. The HUB is running a simple Node JS server Zetta. This server is handling the management and communication with servers. It allows a seamless connection to similar Zetta server residing in the Cloud. The linked Zetta servers communicate using a walkable, JSON based, hypermedia Siren. The cloud-based servers allow a simple connection to the smartphone. The hypermedia Siren allows the smartphone to set the UI based on the configuration of a particular space covered by an HUB. We have designed and implemented an IoT control app for Android and it configures based on location. In practice, it means as soon as you get to a smart room or to your car the Android home page sets for the particular environment with the most frequently used control on top. 

We do not use the WiFi for communicating with sensors. WiFi usually requires a lot of battery power and it is primarily designed for TCP/IP protocol, which may not be required for the very simple sensors such as thermometers. The thermometer is sampling the environment temperature for example only every 10 min and therefore we can let the sensor sleep most of the time. The radio is waking up only for the shortest possible communication required to exchange information with the HUB. This approach is allowing us to design sensors with very low energy requirements. 

The low sensors consumption allowed us to use one of the energy harvesting approaches, the Photovoltaic Cells. We have designed and put together a set of PV powered battery-less and wireless sensors. We can measure temperature, humidity, motion (accelerometers and PIR). The PIR equipped sensor is powered just from a fluorescent tube on the ceiling and it is sensing people coming to our lab for more than one year. We are monitoring the PV accumulated energy and we have so far never run out of power. We use the accelerometer-equipped sensors to check for open windows. The outside light is also good enough to provide enough juice. The sensors communicate with the HUB using 868MHz radios. We have found this band more resistant to objects than the WiFi or Bluetooth. Currently, we use a proprietary protocol, but we are looking for LoRa and MQTT, which we use in other projects with the same HUB.


Some of the described work is part of the Bachelor’s thesis written by my students. We are looking forward to pushing our work even more ahead during eClub Summer Camp 2016

Tuesday, May 3, 2016

Our projects

Recently I was asked to review the latest development in our group and I realised how much work we have done. I have also noticed I am forgetting about my blog. Let’s fix it!

First the best news, our group has grown during the last year to five PhD and around 10 MSc students working in machine learning. 

Today I would like to start with part one and mention some of our progress in machine learning. In the second part I will describe our IoT effort. The main machine learning topics can be broken in the following categories:
  • Natural Language Processing
    • Question answering YodaQA
    • Intelligent assistants
    • Sentence pair similarity
    • Multinomial classification
  • Information retrieval
    • Learning to rank 
  • Information extraction
    • Focused crawling
    • Convolutional Neural Networks
      • Combining text and images
      • Image labeling 
I’ll start with the major achievement, the YodaQA answering machine. It is an open source question answering system. It implements state-of-art methods of information extraction and natural language understanding — to answer human-phrased questions! You can try the live demo

Along with YodaQA we have worked also on simpler Intelligent Assistants acting on a smaller number of commands. They take advantage from simpler algorithms finding the most similar answer for a given query. The sentence pair similarity is another topic of interest. The algorithms can help solving not only Answer Sentence Selection, but also other interesting problems, such as Next Utterance Ranking, Semantic Textual Similarity, Paraphrase Identification, Recognizing Textual Entailment etc. We have tested and developed a series of algorithms based on word embeddings and different architecture of Neural Networks. 

To the NLP category belongs also the multinomial classification algorithm. The use case we are testing is the products categorization to a hierarchical directory structure. Typically the e-shops are categorizing products, such as  a “14 inch screen notebook” under notebooks, computers, electronics etc. This process is handled by human beings and they do mistakes, our algorithm can find problematically categorized entries or suggest correct category. 

An exclusive position in our group has the adaptive ranking research. The web content, its information relevancy, authority and the users interests are changing constantly.  The goal of any search engine is to provide exactly what the users look for. The newly developed algorithm relies on users to find currently the best ranking. It constantly observes on what links are users clicking and it adapts based on this feedback. This is very relevant for information search and recommendation services.

Information extraction is the next topic completing our portfolio. Initially we have looked at basics of focused crawling, a strategy how to crawl internet and extract for example all mentions about AT&T and Linux. This leads to a design of a crawler with programmable search policy. Currently we work on even more sophisticated algorithm for extracting content from e-shop pages, segmentation, price, product name etc. extraction. These are known problems typically solved semi-manually constructing scripts and then running the extraction. Our goal is a high accuracy, general algorithms without any customization or training working for all e-shops.

In the part two of this blog I will review our efforts in the Internet of Things.