Word level timestamps

#1
by valbarriere - opened

Hello!

I have a naive question, does this model allow getting the word-level timestamps?

I obtain the following error:

>>> result = pipe(sample, return_timestamps="word")
WhisperModel is using WhisperSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True` or `layer_head_mask` not None. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1294, in __call__
    return next(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/pt_utils.py", line 269, in __next__
    processed = self.infer(next(self.iterator), **self.params)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 515, in _forward
    tokens = self.model.generate(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 684, in generate
    ) = self.generate_with_fallback(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 862, in generate_with_fallback
    seek_sequences, seek_outputs = self._postprocess_outputs(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 963, in _postprocess_outputs
    seek_outputs["token_timestamps"] = self._extract_token_timestamps(
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in _extract_token_timestamps
    weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
  File "/home/vbarrier/.venv/lib/python3.8/site-packages/transformers/models/whisper/generation_whisper.py", line 195, in <listcomp>
    weights = torch.stack([cross_attentions[l][:, h] for l, h in alignment_heads])
IndexError: list index out of range

Thanks in advance,
Valentin

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment