Skip to content

Commit

Permalink
com.rest.elevenlabs 3.4.0 (#100)
Browse files Browse the repository at this point in the history
- com.utilities.rest -> 3.3.0
- com.utilities.encoder.ogg -> 4.0.2
- Added additional request properties for TextToSpeechRequest
  - `previous_text`, `next_text`, `previous_request_ids`, `next_request_ids`, `languageCode`, `withTimestamps`
  - `cacheFormat` which can be `None`, `Wav`, or `Ogg`
- Added support for transcription timestamps by @tomkail
- Added support for language code in TextToSpeechRequest @Mylan719
- Refactored `VoiceClip`
  - clip samples and data are now prioritized over the `AudioClip`
    - audioClip will not be created until you access the `VoiceClip.AudioClip` property
    - if an audio clip is not loaded, you can load it with `LoadCachedAudioClipAsync`
 - Refactored demo scene to use `OnAudioFilterRead` to better quality stream playback

---------

Co-authored-by: Milan Mikuš <[email protected]>
Co-authored-by: Milan Mikuš <[email protected]>
Co-authored-by: Tom Kail <[email protected]>
Co-authored-by: Tom Kail <[email protected]>
  • Loading branch information
5 people authored Nov 25, 2024
1 parent e7f175b commit 3fca11c
Show file tree
Hide file tree
Showing 31 changed files with 926 additions and 455 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/unity.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,24 @@ on:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ ( github.event_name == 'pull_request' || github.event.action == 'synchronize' ) }}
permissions:
checks: write
pull-requests: write
cancel-in-progress: ${{ (github.event_name == 'pull_request' || github.event.action == 'synchronize') }}
jobs:
build:
env:
UNITY_PROJECT_PATH: ''
permissions:
checks: write
pull-requests: write
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-13]
os: [ubuntu-latest, windows-latest, macos-15]
unity-versions: [2021.x, 2022.x, 6000.x]
include:
- os: ubuntu-latest
build-target: StandaloneLinux64
- os: windows-latest
build-target: StandaloneWindows64
- os: macos-13
- os: macos-15
build-target: StandaloneOSX
steps:
- uses: actions/checkout@v4
Expand All @@ -46,11 +44,13 @@ jobs:
- uses: RageAgainstThePixel/unity-action@v1
name: '${{ matrix.build-target }}-Validate'
with:
build-target: ${{ matrix.build-target }}
log-name: '${{ matrix.build-target }}-Validate'
args: '-quit -nographics -batchmode -executeMethod Utilities.Editor.BuildPipeline.UnityPlayerBuildTools.ValidateProject -importTMProEssentialsAsset'
- uses: RageAgainstThePixel/unity-action@v1
name: '${{ matrix.build-target }}-Build'
with:
build-target: ${{ matrix.build-target }}
log-name: '${{ matrix.build-target }}-Build'
args: '-quit -nographics -batchmode -executeMethod Utilities.Editor.BuildPipeline.UnityPlayerBuildTools.StartCommandLineBuild'
- uses: actions/upload-artifact@v4
Expand Down
3 changes: 3 additions & 0 deletions ElevenLabs/.editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ dotnet_style_predefined_type_for_locals_parameters_members = true
# Code Style
csharp_style_var_when_type_is_apparent = true

dotnet_diagnostic.IDE0051.severity = none
dotnet_diagnostic.CS0649.severity = none

#### Resharper/Rider Rules ####
# https://www.jetbrains.com/help/resharper/EditorConfig_Properties.html

Expand Down
41 changes: 19 additions & 22 deletions ElevenLabs/Packages/com.rest.elevenlabs/Documentation~/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ The recommended installation method is though the unity package manager and [Ope
- [com.utilities.extensions](https://github.com/RageAgainstThePixel/com.utilities.extensions)
- [com.utilities.audio](https://github.com/RageAgainstThePixel/com.utilities.audio)
- [com.utilities.encoder.ogg](https://github.com/RageAgainstThePixel/com.utilities.encoder.ogg)
- [com.utilities.encoder.wav](https://github.com/RageAgainstThePixel/com.utilities.encoder.wav)
- [com.utilities.rest](https://github.com/RageAgainstThePixel/com.utilities.rest)

---
Expand All @@ -59,7 +60,7 @@ The recommended installation method is though the unity package manager and [Ope
- [Text to Speech](#text-to-speech)
- [Stream Text To Speech](#stream-text-to-speech)
- [Voices](#voices)
- [Get Shared Voices](#get-shared-voices) :new:
- [Get Shared Voices](#get-shared-voices)
- [Get All Voices](#get-all-voices)
- [Get Default Voice Settings](#get-default-voice-settings)
- [Get Voice](#get-voice)
Expand All @@ -70,13 +71,13 @@ The recommended installation method is though the unity package manager and [Ope
- [Samples](#samples)
- [Download Voice Sample](#download-voice-sample)
- [Delete Voice Sample](#delete-voice-sample)
- [Dubbing](#dubbing) :new:
- [Dub](#dub) :new:
- [Get Dubbing Metadata](#get-dubbing-metadata) :new:
- [Get Transcript for Dub](#get-transcript-for-dub) :new:
- [Get dubbed file](#get-dubbed-file) :new:
- [Delete Dubbing Project](#delete-dubbing-project) :new:
- [SFX Generation](#sfx-generation) :new:
- [Dubbing](#dubbing)
- [Dub](#dub)
- [Get Dubbing Metadata](#get-dubbing-metadata)
- [Get Transcript for Dub](#get-transcript-for-dub)
- [Get dubbed file](#get-dubbed-file)
- [Delete Dubbing Project](#delete-dubbing-project)
- [SFX Generation](#sfx-generation)
- [History](#history)
- [Get History](#get-history)
- [Get History Item](#get-history-item)
Expand Down Expand Up @@ -265,8 +266,8 @@ Convert text to speech.
var api = new ElevenLabsClient();
var text = "The quick brown fox jumps over the lazy dog.";
var voice = (await api.VoicesEndpoint.GetAllVoicesAsync()).FirstOrDefault();
var defaultVoiceSettings = await api.VoicesEndpoint.GetDefaultVoiceSettingsAsync();
var voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(text, voice, defaultVoiceSettings);
var request = new TextToSpeechRequest(voice, text);
var voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(request);
audioSource.PlayOneShot(voiceClip.AudioClip);
```

Expand All @@ -284,18 +285,14 @@ Stream text to speech.
var api = new ElevenLabsClient();
var text = "The quick brown fox jumps over the lazy dog.";
var voice = (await api.VoicesEndpoint.GetAllVoicesAsync()).FirstOrDefault();
var partialClips = new Queue<AudioClip>();
var voiceClip = await api.TextToSpeechEndpoint.StreamTextToSpeechAsync(
text,
voice,
partialClip =>
{
// Note: Best to queue them and play them in update loop!
// See TextToSpeech sample demo for details
partialClips.Enqueue(partialClip);
});
// The full completed clip:
audioSource.clip = voiceClip.AudioClip;
var partialClips = new Queue<VoiceClip>();
var request = new TextToSpeechRequest(voice, message, model: Model.EnglishTurboV2, outputFormat: OutputFormat.PCM_44100);
var voiceClip = await api.TextToSpeechEndpoint.StreamTextToSpeechAsync(request, partialClip =>
{
// Note: check demo scene for best practices
// on how to handle playback with OnAudioFilterRead
partialClips.Enqueue(partialClip);
});
```

### [Voices](https://docs.elevenlabs.io/api-reference/voices)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1106,7 +1106,7 @@ private async void GenerateSynthesizedText()
Directory.CreateDirectory(downloadDir);
}

voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(speechSynthesisTextInput, currentVoiceOption, currentVoiceSettings, currentModelOption);
voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(new(currentVoiceOption, speechSynthesisTextInput, voiceSettings: currentVoiceSettings, model: currentModelOption));
voiceClip.CopyIntoProject(editorDownloadDirectory);
}
catch (Exception e)
Expand Down Expand Up @@ -1225,7 +1225,7 @@ private void RenderVoiceLab()
EditorGUILayout.Space(EndWidth);
EditorGUILayout.EndHorizontal();
EditorGUI.indentLevel++;

EditorGUILayout.BeginHorizontal();
{
EditorGUILayout.LabelField(voice.Id, EditorStyles.boldLabel);
Expand All @@ -1242,7 +1242,7 @@ private void RenderVoiceLab()
EditorGUILayout.Space(EndWidth);
EditorGUILayout.EndHorizontal();
EditorGUI.indentLevel++;

if (!voiceLabels.TryGetValue(voice.Id, out var cachedLabels))
{
cachedLabels = new Dictionary<string, string>();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Licensed under the MIT License. See LICENSE in the project root for license information.

namespace ElevenLabs
{
public enum CacheFormat
{
None,
Ogg,
Wav
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

using ElevenLabs.Extensions;
using System;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Scripting;
using Utilities.Audio;
using Utilities.WebRequestRest;

namespace ElevenLabs
{
Expand All @@ -12,16 +16,30 @@ namespace ElevenLabs
public class GeneratedClip : ISerializationCallbackReceiver
{
[Preserve]
internal GeneratedClip(string id, string text, AudioClip audioClip, string cachedPath)
internal GeneratedClip(string id, string text, AudioClip audioClip, string cachedPath = null)
{
this.id = id;
this.text = text;
TextHash = $"{id}{text}".GenerateGuid();
textHash = TextHash.ToString();
this.audioClip = audioClip;
this.cachedPath = cachedPath;
SampleRate = audioClip.frequency;
}

[Preserve]
internal GeneratedClip(string id, string text, ReadOnlyMemory<byte> clipData, int sampleRate, string cachedPath = null)
{
this.id = id;
this.text = text;
TextHash = $"{id}{text}".GenerateGuid();
textHash = TextHash.ToString();
this.cachedPath = cachedPath;
ClipData = clipData;
SampleRate = sampleRate;
}

private readonly ReadOnlyMemory<byte> audioData;

[SerializeField]
private string id;

Expand All @@ -44,16 +62,73 @@ internal GeneratedClip(string id, string text, AudioClip audioClip, string cache
private AudioClip audioClip;

[Preserve]
public AudioClip AudioClip => audioClip;
public AudioClip AudioClip
{
get
{
if (audioClip == null && !audioData.IsEmpty)
{
var pcmData = PCMEncoder.Decode(audioData.ToArray());
audioClip = AudioClip.Create(Id, pcmData.Length, 1, SampleRate, false);
audioClip.SetData(pcmData, 0);
}

if (audioClip == null)
{
Debug.LogError($"{nameof(audioClip)} is null, try loading it with LoadCachedAudioClipAsync");
}

return audioClip;
}
}

[SerializeField]
private string cachedPath;

[Preserve]
public string CachedPath => cachedPath;

public ReadOnlyMemory<byte> ClipData { get; }

private float[] clipSamples;

public float[] ClipSamples
{
get
{
if (!ClipData.IsEmpty)
{
clipSamples ??= PCMEncoder.Decode(ClipData.ToArray());
}
else if (audioClip != null)
{
clipSamples = new float[audioClip.samples];
audioClip.GetData(clipSamples, 0);
}

return clipSamples;
}
}

public int SampleRate { get; }

public void OnBeforeSerialize() => textHash = TextHash.ToString();

public void OnAfterDeserialize() => TextHash = Guid.Parse(textHash);

public static implicit operator AudioClip(GeneratedClip clip) => clip?.AudioClip;

public async Task<AudioClip> LoadCachedAudioClipAsync(CancellationToken cancellationToken = default)
{
var audioType = cachedPath switch
{
var path when path.EndsWith(".ogg") => AudioType.OGGVORBIS,
var path when path.EndsWith(".wav") => AudioType.WAV,
var path when path.EndsWith(".mp3") => AudioType.MPEG,
_ => AudioType.UNKNOWN
};

return await Rest.DownloadAudioClipAsync($"file://{cachedPath}", audioType, cancellationToken: cancellationToken);
}
}
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Licensed under the MIT License. See LICENSE in the project root for license information.

using Newtonsoft.Json;
using UnityEngine.Scripting;

namespace ElevenLabs
{
/// <summary>
/// Represents timing information for a single character in the transcript
/// </summary>
[Preserve]
public class TimestampedTranscriptCharacter
{
[Preserve]
[JsonConstructor]
internal TimestampedTranscriptCharacter(string character, double startTime, double endTime)
{
Character = character;
StartTime = startTime;
EndTime = endTime;
}

/// <summary>
/// The character being spoken
/// </summary>
[Preserve]
[JsonProperty("character")]
public string Character { get; }

/// <summary>
/// The time in seconds when this character starts being spoken
/// </summary>
[Preserve]
[JsonProperty("character_start_times_seconds")]
public double StartTime { get; }

/// <summary>
/// The time in seconds when this character finishes being spoken
/// </summary>
[Preserve]
[JsonProperty("character_end_times_seconds")]
public double EndTime { get; }
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 3fca11c

Please sign in to comment.