OpenAI has recently added a new feature to its official ChatGPT app for iPhone called voice transcription, which has been dubbed as one of the most hidden but brilliant tools that users are not using. This new feature can completely change the way you transcribe audio, especially if you work with audio recordings like interviews, podcasts, and videos, and need to extract text from them. With this feature, users can record the audio they hear and ChatGPT will transcribe it accurately down to the punctuation.
Despite having its flaws such as not recognizing different speakers, the quality of the voice transcription is remarkable, as “you can almost hear the person speaking” in the ChatGPT version. OpenAI CEO Sam Altman explained that ChatGPT uses another OpenAI tech called Whisper, which means it can understand any audio by processing large amounts of audio data without human supervision.
Whisper tech has already been available as a free download from the Mac App Store for more than four years. With such a ground-breaking technology at our disposal, there is potential for transcription apps built with Whisper that can recognize independent speakers, offer timestamps, and allow users to navigate through an audio file via prompts. Furthermore, we might see AI services similar to Whisper down the road from OpenAI and other companies that can train the AI to understand any audio or speech input.
While there is no voice input feature on a computer yet, AI products like ChatGPT might eventually integrate such functionality, especially on devices like Apple’s new Vision Pro spatial computer. This can revolutionize how we interact with generative AI services.