Superface

Speech to text conversion

speech/recognize@1.0.0
4 providers

Speech recognition

Real-time speech recognition.

Input
Audio content
Language code
Audio encoding
Maximum alternatives
Result
Results

1.Choose a provider

2.Use Recognize with mock in your code

Below instructions are for our Node.js SDK. Use OneService for other languages.
npm i @superfaceai/one-sdk
const { SuperfaceClient } = require('@superfaceai/one-sdk');

// You can manage tokens here: https://superface.ai/insights
const sdk = new SuperfaceClient({ sdkAuthToken: '<< Login to get your token >>' });

async function run() {
  // Load the profile
  const profile = await sdk.getProfile('speech/recognize@1.0.0');

  // Use the profile
  const result = await profile
    .getUseCase('Recognize')
    .perform({
      audioContent: '<base64 encoded wav audio>',
      languageCode: 'en-US'
    }, {
      provider: 'mock'
    });

  // Handle the result
  try {
    const data = result.unwrap();
    console.log(data);
  } catch (error) {
    console.error(error);
  }
}

run();

Structure details

Input (object)

audioContent
Audio data in the encoding specified by audioEncodig input parameter.
languageCode
The language (and potentially also the region) of the speech expressed as a BCP-47 language tag, e.g. 'en-US'.
audioEncoding
Encoding of audio data sent. This input is optional for WAV audio files and required for other audio formats.
maxAlternatives
Maximum number of recognition hypotheses to be returned. The server may return fewer than maxAlternatives. Valid values are 0-30. Default value is 1.

Example

{
  "audioContent": "<base64 encoded wav audio>",
  "languageCode": "en-US"
}

Result (object)

results
Sequential list of transcription results corresponding to sequential portions of audio.
alternatives
Alternative hypotheses.
transcript
Transcript text representing the words recognized in audio input.
confidence
The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. The default of 0.0 is a sentinel value indicating confidence was not set.

Example

{
  "results": [
    {
      "alternatives": [
        {
          "confidence": 0.8393012,
          "transcript": "hello world"
        }
      ]
    }
  ]
}

Implementation details

Provider
mock
Use case
Recognize
Author
@superface
Source
Verified