How to call openai whisper model to get time informaiton in Python


The speech to text API provides two endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to:

Transcribe audio into whatever language the audio is in.
Translate and transcribe the audio into english.
File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Here is one example in Python

import requests
import openai

# Define the API endpoint and headers
url = "https://api.openai.com/v1/audio/transcriptions"
headers = {
"Authorization": "Bearer {}".format(open_ai_key) # replace with your API key

}

# location of your audito files, could be mp3 or mp4, etc.
FILE_PATH = "./upload-whisper.mp4"
# define the parameters

files = {
'file': ('test.mp4', open(FILE_PATH, 'rb')),
'model': (None, 'whisper-1'),
'response_format': (None, 'srt')

}

response = requests.post(url, headers=headers, files=files)
print(response.text)

the output is:

1
00:00:00,000 --> 00:00:02,600
First, I need you to go to the front desk to sign up for work.

Notice that, in the above code, we set the resonse_format to be “srt” which comes with timestamp.
The format of the transcript output can be also one of these options: json, text, srt, verbose_json, or vtt.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC