About

What is Telize?

Overview

Telize maintains the core focus to develop analysis tools for audio and video streams, utilizing a broad area of techniques, like conventional statistical models, machine learning algorithms and the very well hyped deep learning.

At present, Telize embeds several programs and components able to execute various tasks, specialized in object detection in low quality recordings, face recognition, video clustering and some other related processes.

History

In 2016, I started to work on training speech to text recognition models for Romanian language, using CMU Sphinx, covering both, language and acoustic models. In the process, employing a lot of time with numerous audio files while getting familiar with the structure of speech and speech recognition methodologies, I ran into audio fingerprinting which represents a considerably more simplistic technological approach.

Audio fingerprinting is essentially useless in speech recognition, but quite powerful in conveniently finding repeatable sequences in audio streams. This is how Telize started in 2017, as an open project aimed at ensuring visibility to the public regarding the quality and specific duration of TV commercials running on the first 5 most popular channels in Romania.

While Telize initially used audio fingerprinting to recognize repeatable patterns in audio channels, I began working on a similar model, this time using video fingerprinting, capable of recognizing repeatable patterns in video streams. This strategy has proven to be very efficient, and combined with object detection, used for brand detection, exceeded the audio analysis model, in both accuracy and scalability.

The development of the object detection pipeline, used in the commercial classification, has established an alternative path and I have started to offer the detection models to a platform specialized in the analysis of the brand in sports broadcasts.


Note: Since deep learning based speech recognition models are remarkably more efficient than HMM based libraries, most of my work related to speech recognition using CMU Sphinx is currently on pause.