Integrating Speech-to-Text functionality into Django applications can significantly enhance user experience by allowing audio transcription directly within the app. According to AssemblyAI, developers can leverage their API to implement this feature seamlessly.
Setting Up the Project
To get started, create a new project folder and establish a virtual environment:
# Mac/Linux
python3 -m venv venv
. venv/bin/activate
# Windows
python -m venv venv
.\venv\Scripts\activate.bat
Next, install the necessary packages including Django, AssemblyAI Python SDK, and python-dotenv:
pip install Django assemblyai python-dotenv
Creating the Django Project
Create a new Django project named ‘stt_project’ and a new app within it called ‘transcriptions’:
django-admin startproject stt_project
cd stt_project
python manage.py startapp transcriptions
Building the View
In the ‘transcriptions’ app, create a view to handle file uploads and transcriptions. Open transcriptions/views.py
and add the following code:
from django.shortcuts import render
from django import forms
import assemblyai as aai
class UploadFileForm(forms.Form):
audio_file = forms.FileField()
def index(request):
context = None
if request.method == 'POST':
form = UploadFileForm(request.POST, request.FILES)
if form.is_valid():
file = request.FILES['audio_file']
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(file.file)
file.close()
context = {'transcript': transcript.text} if not transcript.error else {'error': transcript.error}
return render(request, 'transcriptions/index.html', context)
Defining URL Configuration
Map the view to a URL by creating transcriptions/urls.py
:
from django.urls import path
from . import views
urlpatterns = [
path('', views.index, name="index"),
]
Include this app URL pattern in the global project URL configuration in stt_project/urls.py
:
from django.contrib import admin
from django.urls import include, path
urlpatterns = [
path('', include('transcriptions.urls')),
path('admin/', admin.site.urls),
]
Creating the HTML Template
Inside the ‘transcriptions/templates’ directory, create an index.html
file with the following content:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AssemblyAI Django App</title>
</head>
<body>
<h1>Transcription App with AssemblyAI</h1>
<form method="post" enctype="multipart/form-data">
{% csrf_token %}
<input type="file" accept="audio/*" name="audio_file">
<button type="submit">Upload</button>
</form>
<h2>Transcript:</h2>
{% if error %}
<p style="color: red">{{ error }}</p>
{% endif %}
<p>{{ transcript }}</p>
</body>
</html>
Setting the API Key
Store the AssemblyAI API key in a .env
file in the root directory:
ASSEMBLYAI_API_KEY=your_api_key_here
Load this environment variable in stt_project/settings.py
:
from dotenv import load_dotenv
load_dotenv()
Running the Django App
Start the server using the following command:
python manage.py runserver
Visit the app in your browser, upload an audio file, and see the transcribed text appear.
Non-blocking Implementations
To avoid blocking operations, consider using webhooks or async functions. Webhooks notify you when the transcription is ready, while async calls allow the app to continue running during the transcription process.
Using Webhooks
Set a webhook URL in the transcription config and handle the webhook delivery in a separate view function:
webhook_url = f'{request.get_host()}/webhook'
config = aai.TranscriptionConfig().set_webhook(webhook_url)
transcriber.submit(file.file, config)
Define the webhook receiver:
def webhook(request):
if request.method == 'POST':
data = json.loads(request.body)
transcript_id = data['transcript_id']
transcript = aai.Transcript.get_by_id(transcript_id)
Map this view to a URL:
urlpatterns = [
path('', views.index, name="index"),
path('webhook/', views.webhook, name="webhook"),
]
Using Async Functions
Utilize async views in Django for non-blocking transcription:
transcript_future = transcriber.transcribe_async(file.file)
if transcript_future.done():
transcript = transcript_future.result()
Speech-to-Text Options for Django Apps
When implementing Speech-to-Text, consider cloud-based APIs like AssemblyAI or Google Cloud Speech-to-Text for high accuracy and scalability, or open-source libraries like SpeechRecognition and Whisper for greater control and privacy.
Conclusion
This guide shows how to integrate Speech-to-Text into Django apps using the AssemblyAI API. Developers can choose between blocking and non-blocking implementations and select the best Speech-to-Text solution based on their needs.
For more details, visit the AssemblyAI blog.
Image source: Shutterstock