AssemblyAI has introduced its latest speech recognition model, Universal-1, designed to significantly enhance transcription accuracy and speed in Ruby applications. According to AssemblyAI, Universal-1 is trained on millions of hours of audio data, achieving near-human accuracy, even in challenging conditions such as accented speech and background noise.
Why Use Universal-1 for Speech-to-Text?
Universal-1 surpasses previous models, offering a 10% higher accuracy rate in English, Spanish, and German compared to leading commercial alternatives. It also reduces hallucination rates by 30% over Whisper and processes audio files five times faster than Whisper Large-v3. These improvements make Universal-1 a powerful tool for developers requiring reliable and efficient transcriptions.
Setting Up the AssemblyAI Ruby SDK
To integrate Universal-1 into Ruby applications, developers can use the AssemblyAI Ruby SDK. The setup process involves adding the AssemblyAI gem to the bundle and configuring an authenticated SDK client with an API key from the AssemblyAI dashboard. Here is a basic setup guide:
bundle add assemblyai
bundle install
require 'assemblyai'
client = AssemblyAI::Client.new(api_key: ENV['ASSEMBLYAI_API_KEY'])
Transcribing Audio Files with Universal-1
Using the Best class model in Universal-1 ensures the highest transcription accuracy. Developers can transcribe audio files from a URL or local files uploaded to AssemblyAI. The following code demonstrates transcribing an audio file from a URL:
transcript = client.transcripts.transcribe(audio_url: "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3")
raise transcript.error unless transcript.error.nil?
puts transcript.text
For local files, the file needs to be uploaded first:
uploaded_file = client.files.upload(file: './audio.mp3')
transcript = client.transcripts.transcribe(audio_url: uploaded_file.upload_url)
raise transcript.error unless transcript.error.nil()
puts transcript.text
Running the application requires setting the ASSEMBLYAI_API_KEY
as an environment variable and executing the Ruby script:
ruby main.rb
Nano: A Cost-Effective Alternative
For applications where cost is a concern, AssemblyAI offers the Nano model, supporting 99 different languages. Switching to Nano involves setting the speech_model
parameter accordingly:
transcript = client.transcripts.transcribe(audio_url: "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3", speech_model: AssemblyAI::Transcripts::SpeechModel::NANO)
Additional Features with Audio Intelligence
Beyond transcription, AssemblyAI provides additional features such as entity detection, content moderation, PII redaction, and the application of Large Language Models (LLMs) to audio data. These features enhance the utility and safety of transcriptions in various applications.
For more details on Universal-1 and its capabilities, visit the official AssemblyAI blog here.
Image source: Shutterstock