Past Projects

  • Project Name: Deep Analysis for General Audio Signal Classification.
  • Supporter: TUM Graduate School, TUM-FGZ-EI, CMU.
  • Run Time: 01.2018 – 03.2018.
  • Role: Author of Proposal, Principal Investigator, Grant Holder.
  • Partners: Technical University of Munich, Carnegie Mellon University, Imperial College London, University of Augsburg.
  • Abstract: This project aims to investigate some state-of-the-art deep learning algorithms, e.g., deep belief networks, deep convolutional neural networks, deep recurrent neural networks, generative adversarial networks, on their performances on classification of general audio signals. Some novel deep learned acoustic features from the sounds’ spectrograms, scalograms, will be extracted by transfer learning architectures, or some expert-designed sophisticated networks. Furthermore, these features will improve the final predictions when combined with some conventional temporal and spectral features. The algorithms and models studied in this project can be used for healthcare (e.g., snore/heart sounds), ecology (e.g., bird sounds), and daily life surveillance (e.g., acoustic scenes).
  • Project Name: Fast Recognition of Bird Sounds Data by High Performance Computing System.
  • Supporter: TUM Graduate School, Tokyo Tech.
  • Run Time: 04.2016 – 11.2016.
  • Role: Author of Proposal, Principal Investigator, Grant Holder.
  • Partners: Technical University of Munich, Tokyo Institute of Technology, Imperial College London, University of Passau.
  • Abstract: The dramatically growing and expanding big audio data through Internet, videos, and music, among others, are now bringing huge opportunities, meanwhile, with great challenges to relevant research and industry community. In this study, we focus on how to implement the state-of-the-art theories and techniques for audio data classication within a large scale into a high performance computing (HPC) system. As a case study, we utilize the big bird sounds data (includes more than 272, 360 audio recordings, 9, 400 species of birds from the world, approximately 1TB in data size) provided by ‘xeno canto’ with our sophisticated toolkit CURRENNT (building LSTM (Long Short Term Memory) for segmentation of bird syllables) and openSMILE (extracting large scale acoustic features), and MXNET (a kind of deep neural networks (DNNs) platform), to implement an audtomatic big bird sounds detection, learning and classication paradigm in the advanced and energy-friendly TSUBAME 2.5, a HPC system designed and established by Tokyo Tech. This task is aiming to design and implement a feasible deep big audio data learning framework in a supercomputing system. The outputs of this study can also be used into other academic tasks and industrial applications related to big audio data learning and processing.
  • Project Name: Automatic Detection and Classification of Different Snore Related Sounds from Overall Night Audio Recordings.
  • Supporter: NJUST Graduate School, NTU Information Systems Research Laboratory.
  • Run Time: 11.2013 – 03.2014.
  • Role: Author of Proposal, Principal Investigator, Grant Holder.
  • Partners: Nanjing University of Science and Technology, Nanyang Technological University, Beijing Hospital.
  • Abstract: This project aims to establish a whole framework for automatically detecting and classifying different snore related sounds data from overall night microphone audio recordings. The framework includes steps of signal detection, feature extraction, feature selection, and machine learning.
  • Project Name: iHEARuIntelligent Systems’ Holistic Evolving Analysis of Real-life Universal Speaker Characteristics .
  • Supporter: European Research Council (ERC).
  • Run Time: 01.01.2014 – 31.12.2018.
  • Role: Main Participant.
  • Partners: University of Augsburg, University of Passau, Technical University of Munich.
  • Abstract: Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands of Europe’s citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security, and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines. In the iHEARu project, ground-breaking methodology including novel techniques for multi-task and semi-supervised learning will deliver for the first time intelligent holistic and evolving analysis in real-life condition of universal speaker characteristics which have been considered only in isolation so far. Today’s sparseness of annotated realistic speech data will be overcome by large-scale speech and meta-data mining from public sources such as social media, crowd-sourcing for labelling and quality control, and shared semi-automatic annotation. All stages from pre-processing and feature extraction, to the statistical modelling will evolve in “life-long learning” according to new data, by utilising feedback, deep, and evolutionary learning methods. Human-in-the-loop system validation and novel perception studies will analyse the self-organising systems and the relation of automatic signal processing to human interpretation in a previously unseen variety of speaker classification tasks. The project’s work plan gives the unique opportunity to transfer current world-leading expertise in this field into a new de-facto standard of speaker characterisation methods and open-source tools ready for tomorrow’s challenge of socially aware speech analysis.