Word level timestamps
#1
by
valbarriere
- opened
Hello!
I have a naive question, does this model allow getting the word-level timestamps?
I obtain the following error:
>>> result = pipe(sample, return_timestamps="word")
WhisperModel is using WhisperSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True` or `layer_head_mask` not None. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__
return super().__call__(inputs, **kwargs)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1294, in __call__
return next(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
processed = self.infer(next(self.iterator), **self.params)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1209, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 515, in _forward
tokens = self.model.generate(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 684, in generate
) = self.generate_with_fallback(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 862, in generate_with_fallback
seek_sequences, seek_outputs = self._postprocess_outputs(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 963, in _postprocess_outputs
seek_outputs["token_timestamps"] = self._extract_token_timestamps(
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in _extract_token_timestamps
weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in <listcomp>
weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
IndexError: list index out of range
Thanks in advance,
Valentin