Skip to content

Commit 01888d9

Browse files
feat (provider/elevenlabs): add transcription provider (#5643)
Co-authored-by: Lars Grammel <[email protected]>
1 parent 96c7d1a commit 01888d9

27 files changed

+1156
-6
lines changed

.changeset/sweet-turtles-kiss.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'@ai-sdk/elevenlabs': patch
3+
---
4+
5+
feat (provider/elevenlabs): add transcription provider

content/docs/02-foundations/02-providers-and-models.mdx

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ The AI SDK comes with a wide range of providers that you can use to interact wit
4040
- [Cerebras Provider](/providers/ai-sdk-providers/cerebras) (`@ai-sdk/cerebras`)
4141
- [Groq Provider](/providers/ai-sdk-providers/groq) (`@ai-sdk/groq`)
4242
- [Perplexity Provider](/providers/ai-sdk-providers/perplexity) (`@ai-sdk/perplexity`)
43+
- [ElevenLabs Provider](/providers/ai-sdk-providers/elevenlabs) (`@ai-sdk/elevenlabs`)
4344

4445
You can also use the [OpenAI Compatible provider](/providers/openai-compatible-providers) with OpenAI-compatible APIs:
4546

content/docs/03-ai-sdk-core/36-transcription.mdx

+7-5
Original file line numberDiff line numberDiff line change
@@ -144,10 +144,12 @@ try {
144144

145145
## Transcription Models
146146

147-
| Provider | Model |
148-
| ----------------------------------------------------------------- | ------------------------ |
149-
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `whisper-1` |
150-
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `gpt-4o-transcribe` |
151-
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `gpt-4o-mini-transcribe` |
147+
| Provider | Model |
148+
| ------------------------------------------------------------------------- | ------------------------ |
149+
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `whisper-1` |
150+
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `gpt-4o-transcribe` |
151+
| [OpenAI](/providers/ai-sdk-providers/openai#transcription-models) | `gpt-4o-mini-transcribe` |
152+
| [ElevenLabs](/providers/ai-sdk-providers/elevenlabs#transcription-models) | `scribe_v1` |
153+
| [ElevenLabs](/providers/ai-sdk-providers/elevenlabs#transcription-models) | `scribe_v1_experimental` |
152154

153155
Above are a small subset of the transcription models supported by the AI SDK providers. For more, see the respective provider documentation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: ElevenLabs
3+
description: Learn how to use the ElevenLabs provider for the AI SDK.
4+
---
5+
6+
# ElevenLabs Provider
7+
8+
The [ElevenLabs](https://elevenlabs.io/) provider contains language model support for the ElevenLabs transcription API.
9+
10+
## Setup
11+
12+
The ElevenLabs provider is available in the `@ai-sdk/elevenlabs` module. You can install it with
13+
14+
<Tabs items={['pnpm', 'npm', 'yarn']}>
15+
<Tab>
16+
<Snippet text="pnpm add @ai-sdk/elevenlabs" dark />
17+
</Tab>
18+
<Tab>
19+
<Snippet text="npm install @ai-sdk/elevenlabs" dark />
20+
</Tab>
21+
<Tab>
22+
<Snippet text="yarn add @ai-sdk/elevenlabs" dark />
23+
</Tab>
24+
</Tabs>
25+
26+
## Provider Instance
27+
28+
You can import the default provider instance `elevenlabs` from `@ai-sdk/elevenlabs`:
29+
30+
```ts
31+
import { elevenlabs } from '@ai-sdk/elevenlabs';
32+
```
33+
34+
If you need a customized setup, you can import `createElevenLabs` from `@ai-sdk/elevenlabs` and create a provider instance with your settings:
35+
36+
```ts
37+
import { createElevenLabs } from '@ai-sdk/elevenlabs';
38+
39+
const elevenlabs = createElevenLabs({
40+
// custom settings, e.g.
41+
fetch: customFetch,
42+
});
43+
```
44+
45+
You can use the following optional settings to customize the ElevenLabs provider instance:
46+
47+
- **apiKey** _string_
48+
49+
API key that is being sent using the `Authorization` header.
50+
It defaults to the `ELEVENLABS_API_KEY` environment variable.
51+
52+
- **headers** _Record&lt;string,string&gt;_
53+
54+
Custom headers to include in the requests.
55+
56+
- **fetch** _(input: RequestInfo, init?: RequestInit) => Promise&lt;Response&gt;_
57+
58+
Custom [fetch](https://developer.mozilla.org/en-US/docs/Web/API/fetch) implementation.
59+
Defaults to the global `fetch` function.
60+
You can use it as a middleware to intercept requests,
61+
or to provide a custom fetch implementation for e.g. testing.
62+
63+
## Transcription Models
64+
65+
You can create models that call the [ElevenLabs transcription API](https://elevenlabs.io/speech-to-text)
66+
using the `.transcription()` factory method.
67+
68+
The first argument is the model id e.g. `scribe_v1`.
69+
70+
```ts
71+
const model = elevenlabs.transcription('scribe_v1');
72+
```
73+
74+
You can also pass additional provider-specific options using the `providerOptions` argument. For example, supplying the input language in ISO-639-1 (e.g. `en`) format can sometimes improve transcription performance if known beforehand.
75+
76+
```ts highlight="6"
77+
import { experimental_transcribe as transcribe } from 'ai';
78+
import { elevenlabs } from '@ai-sdk/elevenlabs';
79+
80+
const result = await transcribe({
81+
model: elevenlabs.transcription('scribe_v1'),
82+
audio: new Uint8Array([1, 2, 3, 4]),
83+
providerOptions: { elevenlabs: { languageCode: 'en' } },
84+
});
85+
```
86+
87+
The following provider options are available:
88+
89+
- **languageCode** _string_
90+
91+
An ISO-639-1 or ISO-639-3 language code corresponding to the language of the audio file.
92+
Can sometimes improve transcription performance if known beforehand.
93+
Defaults to `null`, in which case the language is predicted automatically.
94+
95+
- **tagAudioEvents** _boolean_
96+
97+
Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.
98+
Defaults to `true`.
99+
100+
- **numSpeakers** _integer_
101+
102+
The maximum amount of speakers talking in the uploaded file.
103+
Can help with predicting who speaks when.
104+
The maximum amount of speakers that can be predicted is 32.
105+
Defaults to `null`, in which case the amount of speakers is set to the maximum value the model supports.
106+
107+
- **timestampsGranularity** _enum_
108+
109+
The granularity of the timestamps in the transcription.
110+
Defaults to `'word'`.
111+
Allowed values: `'none'`, `'word'`, `'character'`.
112+
113+
- **diarize** _boolean_
114+
115+
Whether to annotate which speaker is currently talking in the uploaded file.
116+
Defaults to `true`.
117+
118+
- **fileFormat** _enum_
119+
120+
The format of input audio.
121+
Defaults to `'other'`.
122+
Allowed values: `'pcm_s16le_16'`, `'other'`.
123+
For `'pcm_s16le_16'`, the input audio must be 16-bit PCM at a 16kHz sample rate, single channel (mono), and little-endian byte order.
124+
Latency will be lower than with passing an encoded waveform.
125+
126+
### Model Capabilities
127+
128+
| Model | Transcription | Duration | Segments | Language |
129+
| ------------------------ | ------------------- | ------------------- | ------------------- | ------------------- |
130+
| `scribe_v1` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |
131+
| `scribe_v1_experimental` | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> | <Check size={18} /> |

examples/ai-core/package.json

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
"@ai-sdk/cohere": "1.2.7",
1111
"@ai-sdk/deepinfra": "0.2.10",
1212
"@ai-sdk/deepseek": "0.2.9",
13+
"@ai-sdk/elevenlabs": "0.0.0",
1314
"@ai-sdk/fal": "0.1.7",
1415
"@ai-sdk/fireworks": "0.2.9",
1516
"@ai-sdk/google": "1.2.10",
@@ -49,4 +50,4 @@
4950
"tsx": "4.19.2",
5051
"typescript": "5.6.3"
5152
}
52-
}
53+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import { elevenlabs } from '@ai-sdk/elevenlabs';
2+
import { experimental_transcribe as transcribe } from 'ai';
3+
import 'dotenv/config';
4+
import { readFile } from 'fs/promises';
5+
6+
async function main() {
7+
const result = await transcribe({
8+
model: elevenlabs.transcription('scribe_v1'),
9+
audio: Buffer.from(await readFile('./data/galileo.mp3')).toString('base64'),
10+
});
11+
12+
console.log('Text:', result.text);
13+
console.log('Duration:', result.durationInSeconds);
14+
console.log('Language:', result.language);
15+
console.log('Segments:', result.segments);
16+
console.log('Warnings:', result.warnings);
17+
console.log('Responses:', result.responses);
18+
}
19+
20+
main().catch(console.error);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import { elevenlabs } from '@ai-sdk/elevenlabs';
2+
import { experimental_transcribe as transcribe } from 'ai';
3+
import 'dotenv/config';
4+
5+
async function main() {
6+
const result = await transcribe({
7+
model: elevenlabs.transcription('scribe_v1'),
8+
audio: new URL(
9+
'https://github.com/vercel/ai/raw/refs/heads/main/examples/ai-core/data/galileo.mp3',
10+
),
11+
});
12+
13+
console.log('Text:', result.text);
14+
console.log('Duration:', result.durationInSeconds);
15+
console.log('Language:', result.language);
16+
console.log('Segments:', result.segments);
17+
console.log('Warnings:', result.warnings);
18+
console.log('Responses:', result.responses);
19+
}
20+
21+
main().catch(console.error);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import { elevenlabs } from '@ai-sdk/elevenlabs';
2+
import { experimental_transcribe as transcribe } from 'ai';
3+
import 'dotenv/config';
4+
import { readFile } from 'fs/promises';
5+
6+
async function main() {
7+
const result = await transcribe({
8+
model: elevenlabs.transcription('scribe_v1'),
9+
audio: await readFile('data/galileo.mp3'),
10+
});
11+
12+
console.log('Text:', result.text);
13+
console.log('Duration:', result.durationInSeconds);
14+
console.log('Language:', result.language);
15+
console.log('Segments:', result.segments);
16+
console.log('Warnings:', result.warnings);
17+
console.log('Responses:', result.responses);
18+
}
19+
20+
main().catch(console.error);

packages/elevenlabs/README.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# AI SDK - ElevenLabs Provider
2+
3+
The **[ElevenLabs provider](https://sdk.vercel.ai/providers/ai-sdk-providers/elevenlabs)** for the [AI SDK](https://sdk.vercel.ai/docs)
4+
contains language model support for the ElevenLabs chat and completion APIs and embedding model support for the ElevenLabs embeddings API.
5+
6+
## Setup
7+
8+
The ElevenLabs provider is available in the `@ai-sdk/elevenlabs` module. You can install it with
9+
10+
```bash
11+
npm i @ai-sdk/elevenlabs
12+
```
13+
14+
## Provider Instance
15+
16+
You can import the default provider instance `elevenlabs` from `@ai-sdk/elevenlabs`:
17+
18+
```ts
19+
import { elevenlabs } from '@ai-sdk/elevenlabs';
20+
```
21+
22+
## Example
23+
24+
```ts
25+
import { elevenlabs } from '@ai-sdk/elevenlabs';
26+
import { experimental_transcribe as transcribe } from 'ai';
27+
28+
const { text } = await transcribe({
29+
model: elevenlabs.transcription('scribe_v1'),
30+
audio: new URL(
31+
'https://github.com/vercel/ai/raw/refs/heads/main/examples/ai-core/data/galileo.mp3',
32+
),
33+
});
34+
```
35+
36+
## Documentation
37+
38+
Please check out the **[ElevenLabs provider documentation](https://sdk.vercel.ai/providers/ai-sdk-providers/elevenlabs)** for more information.

packages/elevenlabs/package.json

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
{
2+
"name": "@ai-sdk/elevenlabs",
3+
"version": "0.0.0",
4+
"license": "Apache-2.0",
5+
"sideEffects": false,
6+
"main": "./dist/index.js",
7+
"module": "./dist/index.mjs",
8+
"types": "./dist/index.d.ts",
9+
"files": [
10+
"dist/**/*",
11+
"internal/dist/**/*",
12+
"CHANGELOG.md"
13+
],
14+
"scripts": {
15+
"build": "tsup",
16+
"build:watch": "tsup --watch",
17+
"clean": "rm -rf dist && rm -rf internal/dist",
18+
"lint": "eslint \"./**/*.ts*\"",
19+
"type-check": "tsc --noEmit",
20+
"prettier-check": "prettier --check \"./**/*.ts*\"",
21+
"test": "pnpm test:node && pnpm test:edge",
22+
"test:edge": "vitest --config vitest.edge.config.js --run",
23+
"test:node": "vitest --config vitest.node.config.js --run",
24+
"test:node:watch": "vitest --config vitest.node.config.js --watch"
25+
},
26+
"exports": {
27+
"./package.json": "./package.json",
28+
".": {
29+
"types": "./dist/index.d.ts",
30+
"import": "./dist/index.mjs",
31+
"require": "./dist/index.js"
32+
}
33+
},
34+
"dependencies": {
35+
"@ai-sdk/provider": "1.1.2",
36+
"@ai-sdk/provider-utils": "2.2.6"
37+
},
38+
"devDependencies": {
39+
"@types/node": "20.17.24",
40+
"@vercel/ai-tsconfig": "workspace:*",
41+
"tsup": "^8",
42+
"typescript": "5.6.3",
43+
"zod": "3.23.8"
44+
},
45+
"peerDependencies": {
46+
"zod": "^3.0.0"
47+
},
48+
"engines": {
49+
"node": ">=18"
50+
},
51+
"publishConfig": {
52+
"access": "public"
53+
},
54+
"homepage": "https://sdk.vercel.ai/docs",
55+
"repository": {
56+
"type": "git",
57+
"url": "git+https://github.com/vercel/ai.git"
58+
},
59+
"bugs": {
60+
"url": "https://github.com/vercel/ai/issues"
61+
},
62+
"keywords": [
63+
"ai"
64+
]
65+
}

0 commit comments

Comments
 (0)