Computing timestamps are not supported for canary_model?

by Nguyen667201 - opened 10 days ago

Discussion

Nguyen667201

10 days ago

This comment has been hidden (marked as Spam)

ankitapasad

NVIDIA org 9 days ago

Hi @Nguyen667201

Thanks for giving our models a try!

Please make sure that you use the latest NeMo main branch to run the above inference code. Timestamp support for Canary-Flash models was added in this PR.

Let us know if this doesn’t resolve the issue.

You may also refer to this discussion on the same issue resolved for another user.

Nguyen667201

9 days ago

Hi @ankitapasad

Thanks for reply!

I have tried the latest version of NeMo, but as you can see, the timestamp issue may still not be fixed. Every time I pass the 'timestamps' parameter into transcribe, I get an error: 'Computing timestamps is not supported for this model yet.' Please release a new version to resolve this issue."

ankitapasad

NVIDIA org 8 days ago

Hi @Nguyen667201

Please use the NeMo main branch, it supports the timestamp feature for Canary-Flash models.

The error you are facing is because v2.2.1, as in the screenshot above, does not include the timestamp feature. We will include the feature in the next stable release version, but until then, please use the main branch, it will resolve the issue.

Let us know if you are still facing issues.

Nguyen667201

3 days ago

•

edited 3 days ago

Hi @Nguyen667201

Please use the NeMo main branch, it supports the timestamp feature for Canary-Flash models.

The error you are facing is because v2.2.1, as in the screenshot above, does not include the timestamp feature. We will include the feature in the next stable release version, but until then, please use the main branch, it will resolve the issue.

Let us know if you are still facing issues.

Thank @ankitapasad , I got it.
Could you explain the effect of "source_lang" and "target_lang" for me?
I think only "target_lang" is needed for inference, because even if you set any "source_lang", it might not affect the transcript as long as "target_lang = en".

When i have trained from scratch, i got an error "[rank0]: assert lang is not None, "Expected 'lang' to be set for AggregateTokenizer."
[rank0]: AssertionError: Expected 'lang' to be set for AggregateTokenizer." How can i solve it ?

following this tutorial, the format of manifest for an example looks like:
{"audio_filepath": "datasets/LibriLight/librispeech_finetuning/1h/2/clean/5778/12761/5778-12761-0000.flac", "duration": 14.56, "text": "continuation of fremont's account of the passage through the mountains we had hard and doubtful labor yet before us as the snow appeared to be heavier where the timber began further down with few open spots", "target_lang": "en", "source_lang": "en", "pnc": "False"}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment