Machine learning (ML) forms the basis of all modern AI applications. It collects statistics on given data, finds and memorizes patterns in it, and establishes strategies for unseen data. As the amount of data and tasks increases, ML calls for more innovations in both algorithmic and engineering aspects. The following topics are not specific to language processing and generally applicable to all AI problems, but speech or text is always an interesting test bench to begin with.
One model cannot solve all cases: There are infinitely many domains, topics, and people that a model should be optimized for. We need a machine learning system that can effectively adapt to various data sources and efficiently store the adapted components. The issues here are how to reflect the locality of a data stream and how to make the best use of the domain hierarchy.
Can we translate an unknown language of historical scripts or ciphertexts? Yes! Unsupervised algorithms correct the model prediction iteratively by considering the intrinsic structure of task modalities. The challenge here is to set up proper training objectives, find the right learning unit, and relieve the dissimilarity of the input-output spaces.
Self-supervised learning defines artificial tasks with both input and output from the same unlabeled data. Learning to solve such tasks is useful for obtaining informative representations without human supervision, which provides a decent starting point for real downstream tasks. To improve the quality and adaptability of the representations, new self-supervised tasks and pre-training schemes are to be devised.