walter's dietblog | AI Voice Cloning for Trivial Reasons

I am a bad Discord user. Except for a brief moment during the pandemic I don't bother much with Discord voice chat. I tend to play solitary games, and when I don't I prefer to just text chat. Perhaps this is due to being in meetings all day. When I do want to voice chat with friends I usually just call someone on the phone, or touch grass and meet up in person.

I dropped into a Discord voice chat last week because I am trying to be more flexible, and I remembered some pandemic era hijinks I did that I think we can improve upon. My family tends to go to bed earlier than me, so when I do join a voice chat it tends to be late and I stay muted. During the pandemic when I was on Discord more frequently I got in the habit of responding to the voice channels by text, which would result in a stream of cluttery, contextless messages from myself to no one in particular. This is what inspired in the first iteration of a scheme that had my friends wishing I stuck with texting.

I discovered the /tts Discord command which reads out loud a text message, but this also resulted in a stream of cluttery, contextless messages from myself. At the time I was using Ubuntu, so as a phase two I cobbled together some arcane Pulseaudio tricks, the most intelligible (barely) free text-to-speech (TTS) voice I could find, and a shell script which would pipe text messages through TTS and out a virtual microphone right into a Discord voice channel. My very patient friends tolerated this for a long time.

Live Speech on MacOS

Today I have a nice Apple Silicon Macbook Pro, and I did some poking around and came across the Live Speech feature built into the OS.

With Live Speech you can use a professional TTS with a selection of built-in and downloadable voice options. Also right out of the box is all the piping necessary to use a virtual microphone device (which was an enormous headache on Ubuntu) to send these messages through a variety of voice applications, so, major bonus points for accessibility with MacOS (we'll revisit this shortly however).

After some fine tuning I ended up with a TTS setup that was fairly intelligible, to the horror of some friends who were not yet aware of my previous research on this topic.

But we can do better.

Personal Voice on MacOS

If you dig a little deeper in the Accessibility settings you may notice "Personal Voice." This is a really cool feature that has a much more humanist purpose than what I'm doing with it. It's a way for people with ALS or other diseases that rob people of their ability to speak to use an AI model to replicate their natural speaking voice for various accessibility use cases – or exasperate your friends at 11pm.

I quickly grabbed my Blue Yeti, hid in a closet, and read out loud 150 sample phrases which took about 20 minutes.

Afterwards I waited for about 4 hours for the model to process. What's nice about this feature is it's all on device and private, so there are minimal concerns about your voice being stolen.

Dont Steal My Voice Bro

0:00

/1.797414

I am not sure if the end result is more intelligible than the stock voice I was using before, but it is a lot funnier.

Gettysburg Address

0:00

/10.746848

The bugs tho

🤷

2025-03-06 – Update: IDK man. Last week I was going to have a friend run these scenarios, and got started on my Macbook and most of this was working again, and the plist was unchanged. I have no idea anymore. What follows is unsubstantiated conjecture.

2025-03-07 – Update: Still can't delete entries!

It wouldn't be me and niche features in an Apple product without encountering really frustrating bugs which raise the question of "did anyone actually test this?" before it's shipped. Clearly a lot of cool research and development went into an AI model that can clone your voice, but the UI around it is in my opinion excessively spartan, and the only part that isn't completely broken is the "Type to speak..." part.

The play button submits the text to speech message, but the placement on the other side of what is a dropdown menu was a little strange. My first instinct was to just use the enter key, which also works.

Next, the dropdown gives you the choices of "Saved" and "Recent." I assumed these save a history of messages somewhere (it does, or rather it should, we'll come back to this), but initially could not figure out where. Later I inferred the bookmark button at the end of the overlay is supposed to show you a history of previous messages so you can re-send them, but it is broken and does nothing. It's also a bit confusing to use the same icon as the "Saved" option for a button that presumably also shows you the history for the "Recent" messages.

None of these have tooltips by the way.

This was just the beginning.

Going back to the Live Speech settings, you may have noticed these two categories before:

This is where the message history resides. When I eventually discovered this it was very frustrating, because there were now dozens of messages under Saved, and the only way to delete them is to manually press "–" over and over; There is no bulk delete option.

After deleting dozens of saved messages I noticed they immediately came back. Trying several times in a row would crash the dialog with a low level error message I didn't manage to grab.

The Recent option does let you manually delete entries without errors. It also implies that you can auto-delete recent phrases – however this dropdown appears to also be broken and does nothing.

You may have also noticed the "Add Category..." button. I imagine this would be very useful to have a set of pre-written messages for a variety of day-to-day situations.

It is also broken.

Adding a category does not add it to the overlay dialog, and even if it did the bookbark button still doesn't work. If you attempt to move an entry from Saved to a new category, you're greeted with:

I'm not really sure why deleting the phrase you failed to move was their default action, but there you go I guess.

I did temporarily resolve most of these issues. I did some digging in the internals of my machine and figured out that these entries live in the com.apple.accessibility.livespeech plist. Taking a peak at it with defaults read com.apple.accessibility.livespeech.plist I noticed the favoritePhrases might have been a little mangled. Recent entries have different attributes from the Saved entries, which may be why the UI is struggling to delete them and frequently failing.

{
    favoritePhrases =     (
                {
            shortcut = "";
            text = test;
        },
                {
            categoryID = Recents;
            creationDate = "2025-02-20 03:48:57 +0000";
            inputID = en;
            text = "a test of my voice";
        },
    );
    maxRecents = 1;
}

I tried deleting the file several different ways, but the OS kept recreating it with the same content, so I settled on defaults delete com.apple.accessibility.livespeech favoritePhrases which stuck and seemed to fix most, perhaps almost all of the issues above, at least for a few days. I did not exhaustively re-test everything, but the only thing I noticed at a glance was still not working was the Recent auto-delete, so ultimately this workaround didn't really help my don't keep all the messages problem I had originally.

Curiously, I am fairly sure the format of the saved messages changed after I did this, but that is not what I saw when I queried them again tonight for this write-up, as everything is now broken again as it was before, so I guess it reversed itself. Resetting the file again doesn't seem to be fixing it. Regardless, I can just do defaults delete com.apple.accessibility.livespeech favoritePhrases I guess to clear the history since pressing the "–" button dozens of times sucks anyway.

Dear accessibility developers

This is a really neat, legitimately helpful, and humanizing feature for people who very much need it, but it is both unacceptably buggy and as-is very unintuitive.

My guess for why this may be falling apart, and this is a pattern I am not the first to notice, is that this was an iOS feature first based on the fact that all the reviews of it I could find on YouTube were iOS only, and the UI was hastily thrown together on MacOS and never looked at again. I did find one thread that referenced this error on the Apple Discussions forum, so it's not just my computer.

I am concerned that this is probably a widespread problem, but the population of users is so small that it's unlikely many would reach out to support about it. As I noted above, I could only find one example. This is a frequent problem I've seen with accessibility software in general, where developers spend a lot of time on the glamorous headline catching features and ignore the rest of it when they're done. Disabled people deserve much more effort expended on the maintenance and implementation of the software they rely on and pay for.

Sign up for walter's macroblog

random musings | more macro than a microblog

Email sent! Check your inbox to complete your signup.

No spam. Unsubscribe anytime.