sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

649
active users

#captions

1 post1 participant0 posts today

Struggling to install Whisper models for Kdenlive’s smart transcription plugin? Try this…

First, credit where credit is due: the following solution was cobbled together combining both advice from Kdenlive developers and a tutorial by Veronica Explains linked below.

So, if you are having a hard time installing Whisper models on Kdenlive’s latest Flatpak package, here’s how I did it.

Stop Kdenlive and open a terminal window.

From the command line, run:

flatpak run --command=/bin/bash org.kde.kdenlive

That 👆 was blatantly copied from the Veronica Explains‘ blogpost mentioned above that covers something related.

To quote her:

To break that down: flatpak run invokes the flatpak program to run an application. You can use that to run your Flatpak programs directly from the terminal (useful when running a window manager or building a startup script).

The program we’re running is org.kde.kdenlive, which is the application ID for the Kdenlive program.

In between flatpak run and org.kde.kdenlive, we have --command=/bin/bash, which will tell Flatpak that we want to run the bash prompt inside the Kdenlive Flatpak runtime, the sandboxed environment available to the Flatpak version of the Kdenlive application.

Hats off to Veronica for explaining things so well.

Anyway… You won’t see any difference when you jump into the sandboxed environment—no warning will be displayed and the prompt will remain unchanged.

Trust me, a no message is a good message: everything will be working as it should.

From inside the environment run Whisper on an audio or video file, any audio or video file:

$HOME/.var/app/org.kde.kdenlive/data/kdenlive/venv/bin/whisper [_some video or audio file_]

This will make Whisper automatically download the default model (which is turbo) and install it, before proceeding to transcribe the spoken bits of the file’s audio track.

Note that you can install other models with a slight variation:

$HOME/.var/app/org.kde.kdenlive/data/kdenlive/venv/bin/whisper --model large [_some video or audio file_]

Apart from turbo and large, you have a choice between tiny, tiny.en, base, base.en, small, small.en, medium and medium.en. That said, turbo should serve you just fine for most cases.

Once downloaded and transcribed, Whisper generates between one and several text/subtitle files in the current directory. If You don’t need them, you can safely remove them.

You can stop the virtual environment by typing

exit

And then start Kdenlive normally.

Check the models in Settings > Configure Kdenlive… > Plugins > Speech to text and clicking on the Model drop down. Tada!

Interestingly, if you now try downloading more models from inside Kdenlive, it will work flawlessly.

I guess it just needs that little nudge.

Una de las cosas que me frena mucho de compartir #multimedia es pensar los #captions y los #alttext

Pero he tomado la decisión que voy a probar dejar de ser tan exigente y hacer las cosas un poco más «como salgan». Quizás no le pongo caption o alt text ahora pero lo haré en el futuro así que le pediré a la gente que tenga paciencia porque si no, sigo pateando las cosas cada vez más y al final nunca hago algo, todo termina en nada.

For anyone working with Whisper or other ASR tools, the white paper from our grant project "Increasing Accessibility of Audiovisual Content Using Whisper" is now available on the Lyrasis repository: doi.org/10.48609/na33-1y19

Resources gathered include stats for processing times, editing times, accuracy (word error rate), and power consumption; style guide examples for caption editors; and a project workflow for caption creation and editing.

@sidereal I can't comment on the time taken to edit video.

As someone who occasionally transcribes: I assure readers that the time required to transcribe can be enormous — COLOSSAL — compared to the duration of the audio or AV content that must be listened to, repeatedly.

I can not speak for @howisyourdog but I typically don't bother with content that lacks captions (subtitles). Substandard accessibility is an immediate turn-off.

Cc @grvsmth @bedast

I've been so annoyed that there still isn't a decent WebVTT parser for PHP (there's a few halfway there ones, but they don't do much more than SRT parsers).

So this weekend I've started on making one, based on the original Mozilla vtt.js parser. I already have the basics working, but the region and content parsing as well as the test cases, still need quite some work before I'll publish it.

Goal is to make a spec compliant parser, validator and writer.

#video#vtt#captions
Continued thread

Update on my attempts to make #captions work for #Jitsi:

I've heard back from the following re: no plans to support Jitsi Meet: #OtterPilot, #MeetingBaas, #ReductVideo, Fireflies

#Tactiq said it's planned, but no timeline.

Tried reaching out to #8x8 for a quote, but they never followed up...

So far, my options are to run #CaptionNinja alongside Jitsi (free), or to embed a Jitsi-as-a-Service meeting iframe on a web page and enable captions (premium).

Not sure if this is a stretch, but has anyone managed to get automated #captions (or CART) and shareable live #transcriptions working with the public #Jitsi Meet instance, even via a third-party app?

meet.jit.si

I'm not interested in self-hosting, and I'm open to paid solutions. I know that there are some services that offer a bot that you can invite into your meeting to transcribe, but the ones I've tried don't work with Jitsi (Iist in 🧵)

meet.jit.siJitsi MeetJoin a WebRTC video conference powered by the Jitsi Videobridge
Replied in thread

@brucelawson whisper can't just transcribe, it also can put out SRT-Files which you can import in Programms like davinci resolve to add closed captions in the video file (which time stamps and everything)!

Saved my ass some days ago. But even though it's actually quite good, you still need to (and should anyway) check the output for errors.