Meet Gladia, a French artificial intelligence (AI) startup that wants to revolutionize how companies deal with audio data. The company has developed a new audio transcription application programming interface (API) that is more reliable and efficient than existing options. Gladia’s technology foundation unlocks new use cases around audio, and it promises an hour of audio transcription for just $0.61, with the entire transcription process taking roughly 60 seconds.
According to Jean-Louis Quéguiner, the co-founder, and CEO of Gladia, there are three main pain points with existing transcription APIs. Firstly, price, transcribing an hour of audio usually costs between $1.50 to $2 an hour. Secondly, the output is not always very reliable, as some languages work well while others are barely supported. Lastly, existing transcription APIs are slow, taking more than 15 minutes to transcribe an hour of audio.
Gladia’s solution is based on OpenAI’s open-source transcription model, Whisper, which has been modified to work faster and more responsively. Gladia also has some pre-processing and post-processing algorithms that improve the end results. The Gladia API can detect when there are multiple speakers, add timestamps, detect languages and switch from one to another if needed. It can also automatically add punctuation and casing.
The Gladia transcription API is compatible with SRT and VTT files for companies that want to generate subtitles. Combined with word-level timestamps after an audio file has been transcribed, Gladia can translate text into another language, allowing companies to upload an audio file and get subtitles in dozens of languages in just a few minutes.
Gladia raised a $4 million seed round in a funding round led by New Wave, with support from Sequoia, Cocoa, and various business angels, including Solomon Hykes, Pierre Betouin, Miroslaw Klaba, and Alexandre Berriche. The company currently works with call center companies, virtual meeting services, and video publishers, including Claap, Livestorm, and Selectra.
Moving forward, the company aims to build features on top of its strong technical foundation. For instance, the company hopes to enable summarization of the content of an audio file, categorize content into multiple topic categories, create chapters automatically, conduct sentiment analysis, and much more.
Overall, Gladia is one of the best transcription APIs on the market, and its developers believe that transcription will become a commodity. The company’s long-term vision is to augment audio with intelligence, moving from 2D to 3D data.