Python Notebook on Speaker Embeddings for Identification

This title was summarized by AI from the post below.

I've been wrestling with embeddings for speaker identification recently, and it's not an area I know well. To improve my own understanding, and help anyone else who needs to get up to speed on the practical details, I've put together a Python notebook on Colab. It walks you through how the embeddings work, with inline examples that build into a simple "is this audio from the same person or not?" system. I hope it's helpful to a few of you out there! https://lnkd.in/gE8qWATB

A former PhD student of mine and I recently completed some work on something similar. There is an unused (typically) part of the spectrum that has informstion about speaker identity, speaker gender and phonemic identity. I bet it would improve your embeddings. Or even create parallel embeddings (and then concatinate to create a global embedding). Let me inow if youd like to chst about it. (it turned out to also very reliably discriminate between natural speech and computer-generated speech e.g., AI cloned voices).

Like
Reply

Sorta related, recently I've been looking into S2T models specifically trained on children's voices. Children are some who could benefit the most from AI-enabled HCI and yet I'm finding how rare it is for these models to recognize children's voices.

Like
Reply

I enjoyed this simple yet useful workshop. The part where you used histogram to determine the threshold is my favorite.

Like
Reply

Nice write up.  I would make a small correction: pre processing is needed for non 16khz, and I would add non-wav file support (mp3 and flac).

Like
Reply

dude this kinda thing is so fun. Thanks for posting

Like
Reply

Hey Pete Warden, creator of pyannote here! We should talk!

Avi Tuschman - wondering if this might be useful for Crickit

Highly Appreciated !

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories