esnya

To allow the Hugging Face version of Whisper to predict word-level timestamps, a new property alignment_heads must be added to the GenerationConfig object. This is a list of [layer, head] pairs that select the cross-attention heads that are highly correlated to word-level timing.

If your Whisper checkpoint does not have the alignment_heads property yet, it can be added in two possible ways.

Method 1. Change the model.generation_config property:

# load the model
model = WhisperForConditionalGeneration.from_pretrained("your_checkpoint")

	Shader "Amebient/Air"
	{
	Properties
	{
	_Color ("Color", Color) = (0.5,0.8,1,1)
	}
	SubShader
	{
	Tags { "RenderType"="Transparent" "Queue"="Transparent-525" "LightMode"="ForwardBase" "DisableBatching"="True" }
	LOD 100