Captions now added, if you like doing karaoke by yourself.
Captions now added, if you like doing karaoke by yourself.
Struggling to install Whisper models for Kdenlive’s smart transcription plugin? Try this…
First, credit where credit is due: the following solution was cobbled together combining both advice from Kdenlive developers and a tutorial by Veronica Explains linked below.
So, if you are having a hard time installing Whisper models on Kdenlive’s latest Flatpak package, here’s how I did it.
Stop Kdenlive and open a terminal window.
From the command line, run:
flatpak run --command=/bin/bash org.kde.kdenlive
That was blatantly copied from the Veronica Explains‘ blogpost mentioned above that covers something related.
To quote her:
To break that down:
flatpak run
invokes the flatpak program to run an application. You can use that to run your Flatpak programs directly from the terminal (useful when running a window manager or building a startup script).The program we’re running is
org.kde.kdenlive
, which is the application ID for the Kdenlive program.In between
flatpak run
andorg.kde.kdenlive
, we have--command=/bin/bash
, which will tell Flatpak that we want to run the bash prompt inside the Kdenlive Flatpak runtime, the sandboxed environment available to the Flatpak version of the Kdenlive application.
Hats off to Veronica for explaining things so well.
Anyway… You won’t see any difference when you jump into the sandboxed environment—no warning will be displayed and the prompt will remain unchanged.
Trust me, a no message is a good message: everything will be working as it should.
From inside the environment run Whisper on an audio or video file, any audio or video file:
$HOME/.var/app/org.kde.kdenlive/data/kdenlive/venv/bin/whisper [_some video or audio file_]
This will make Whisper automatically download the default model (which is turbo
) and install it, before proceeding to transcribe the spoken bits of the file’s audio track.
Note that you can install other models with a slight variation:
$HOME/.var/app/org.kde.kdenlive/data/kdenlive/venv/bin/whisper --model large [_some video or audio file_]
Apart from turbo
and large
, you have a choice between tiny
, tiny.en
, base
, base.en
, small
, small.en
, medium
and medium.en
. That said, turbo
should serve you just fine for most cases.
Once downloaded and transcribed, Whisper generates between one and several text/subtitle files in the current directory. If You don’t need them, you can safely remove them.
You can stop the virtual environment by typing
exit
And then start Kdenlive normally.
Check the models in Settings > Configure Kdenlive… > Plugins > Speech to text and clicking on the Model drop down. Tada!
Interestingly, if you now try downloading more models from inside Kdenlive, it will work flawlessly.
I guess it just needs that little nudge.
Una de las cosas que me frena mucho de compartir #multimedia es pensar los #captions y los #alttext
Pero he tomado la decisión que voy a probar dejar de ser tan exigente y hacer las cosas un poco más «como salgan». Quizás no le pongo caption o alt text ahora pero lo haré en el futuro así que le pediré a la gente que tenga paciencia porque si no, sigo pateando las cosas cada vez más y al final nunca hago algo, todo termina en nada.
For anyone working with Whisper or other ASR tools, the white paper from our grant project "Increasing Accessibility of Audiovisual Content Using Whisper" is now available on the Lyrasis repository: https://doi.org/10.48609/na33-1y19
Resources gathered include stats for processing times, editing times, accuracy (word error rate), and power consumption; style guide examples for caption editors; and a project workflow for caption creation and editing.
IFLA Journal article has been published! It's part of the special issue on AI and discusses Whisper automated speech recognition, our work to make AV collections more searchable and accessible, and insights from our student caption editors on how using Whisper transforms their work. Thankful to my co-authors and our project team.
Viel Forschung, wenig Umsetzung
Inklusionsarbeit ist weder leicht noch ruhmreich
Es wird im Bereich Schwerhörigkeit und Gehörlosigkeit viel geforscht und kreative Produkte entwickelt - aber es kommt selten etwas bei uns, den Menschen mit Schwerhörigkeit und Gehörlosigkeit, an. Warum eigentlich?
https://doofe-ohren.de/index.php/2025/01/15/viel-forschung-wenig-umsetzung/
@sidereal I can't comment on the time taken to edit video.
As someone who occasionally transcribes: I assure readers that the time required to transcribe can be enormous — COLOSSAL — compared to the duration of the audio or AV content that must be listened to, repeatedly.
I can not speak for @howisyourdog but I typically don't bother with content that lacks captions (subtitles). Substandard accessibility is an immediate turn-off.
I've been so annoyed that there still isn't a decent WebVTT parser for PHP (there's a few halfway there ones, but they don't do much more than SRT parsers).
So this weekend I've started on making one, based on the original Mozilla vtt.js parser. I already have the basics working, but the region and content parsing as well as the test cases, still need quite some work before I'll publish it.
Goal is to make a spec compliant parser, validator and writer.
Update on my attempts to make #captions work for #Jitsi:
I've heard back from the following re: no plans to support Jitsi Meet: #OtterPilot, #MeetingBaas, #ReductVideo, Fireflies
#Tactiq said it's planned, but no timeline.
Tried reaching out to #8x8 for a quote, but they never followed up...
So far, my options are to run #CaptionNinja alongside Jitsi (free), or to embed a Jitsi-as-a-Service meeting iframe on a web page and enable captions (premium).
Not sure if this is a stretch, but has anyone managed to get automated #captions (or CART) and shareable live #transcriptions working with the public #Jitsi Meet instance, even via a third-party app?
I'm not interested in self-hosting, and I'm open to paid solutions. I know that there are some services that offer a bot that you can invite into your meeting to transcribe, but the ones I've tried don't work with Jitsi (Iist in )
Now with added captions (an excellent suggestion from @pomCountyIrregs). Just finished "In Your Neighbourhood" and gradually adding lyrics for the older songs.
https://www.youtube.com/playlist?list=PLR5s-JDfPf5ivZvocPEHA5TNNZEuj2GqG
Psst. In case you missed it last week, we published the schedule for WordPress Accessibility Day. 24 hours of phenomenal talks - all free, on Zoom with live #captions and #ASL. Registration is open, won't you join us?
#WordPress #accessibility #a11y #conference #captioned #SignLanguage
https://2024.wpaccessibility.day/announcing-our-2024-schedule/
Does anyone have any insight into how #ClosedCaptions happen?
I've noticed issues in two shows recently, due to the strong accents and speech patterns (Northern Irish in Blue Lights on SBS, and Yorkshire in The Long Shadow on Stan).
Are the #captions generated by computer and checked by human? Are they done by the production/distribution company, or the broadcaster? #subtitles #question #accessibility
@brucelawson whisper can't just transcribe, it also can put out SRT-Files which you can import in Programms like davinci resolve to add closed captions in the video file (which time stamps and everything)!
Saved my ass some days ago. But even though it's actually quite good, you still need to (and should anyway) check the output for errors.
Y’all. I just created my first video captions, and it makes me a little giddy. Accessibility is fun.
busybee: fluffy rambles: Please stop using open captions https://beesbuzz.biz/blog/1723-Please-stop-using-open-captions #Accessibility #VideoEditing #Captions #YouTube #Video #Blog #Rant #A11Y
People ask me how I follow conferences that are not accessible at NewCrafts. I use Microsoft Translator, which automatically captions what the speakers say. It's not perfect but it does the job.