Who doesn’t enjoy anything built to their liking? That which is custom-made for themselves and seemingly consistent with what their personal demands are instead of relying on an external factor or agency.

I think this analogy fits quite well with what we’re aiming at Dubverse across our entire dubbing tech stack. From speech-to-text to machine translation to text-to-speech, everything is being researched and engineered, to make the systems more customised for our users use-cases.

In our previous blog post, we hinted that we are working on implementing custom machine translation.

NLLB with context

NLLB is a Sequence-to-Sequence translation model designed to provide translation across 200 languages!

It works quite well and comes close to the quality of... (read 550 more words →)

-1

Self Supervised Learning (SSL)

Varshul Gupta

Self Supervised Learning (SSL)

"Unlocking Powerful Representations: The Frontier of Self-Supervised Learning"

JASKARAN SINGH

AUG 9, 2023

With all that’s been happening in the AI/ML industry for the past few weeks, it is important we address the elephant in the room.

The Idea Behind SSL

SSL comes under the umbrella of Unsupervised Learning. One thing that worked for NNs is that they are able to fit a curated dataset with ease given they have labels to optimize for (Supervised Learning), but this dataset may not be large enough, instead, it would be very expensive or impossible to create such a dataset. Once NNs have good representations of the task-related work they can learn a new task even more... (read 575 more words →)

ChatGPT for translation

Varshul Gupta

This post is picking up from some of the points mentioned in our Q2’23 work post (and continuing our experiments with ChatGPT). One big hurdle we are currently facing is translations not bing contextual / not so vernacular.

(If you just want to jump on to the results: here is the ChatGPT translation based video VS our pre-ChatGPT translation video )

Current translation setup

Under the hood, we use google translate, and for one of the instance it converted the word sleek to `चिकना` in a tech video, and without human supervision and/or context, maybe that’s not enough to convey the meaning. If only the API had the option to pass the previous relevant texts,... (read 667 more words →)

Whisper's Word-Level Timestamps are Out

Varshul Gupta

Hello, fellow tech and language enthusiasts! Today, we embark on a captivating journey into the domain of speech-to-text technology, where the remarkable creation known as Whisper, brought forth by OpenAI, has recently unveiled a remarkable advancement: word-level timestamps.

Now allow me to simplify this for you. Whisper, an impressive automated speech recognition (ASR) technology, has brought about a profound transformation in the way our video subtitling and transcription service functions. Initially, it offered timestamps at the segment level, enabling us to precisely identify when specific audio segments were spoken. However, the advent of word-level timestamps has ushered in a significant shift, propelling Whisper into a realm of unparalleled advancement.

We're discussing a mind-boggling performance... (read 499 more words →)

-18

Case for Foundation Models beyond English

Varshul Gupta

We live in a world enthralled by technology, so precisely and skilfully woven into the very fabric of our lives that it's impossible to untangle. Each thread of code, each strand of data, is meticulously encoded with language that is then translated into technology that builds the structures of our digital lives.

And at this very moment, the most dominant language in this realm (shoutout to the Marvel fans)—the lingua franca of our evolving digital society, if you will—is English.

Subscribe to get a blog every week on our learnings in Generative AI.

Andrej Karpathy wrote, "The hottest new programming language is English." If the language a common man across the globe needs to learn... (read 710 more words →)

LESSWRONG
LW

LESSWRONG
LW

Varshul Gupta