The recognition result is not ideal, and I am not sure where the problem lies.

#1
by wuzheng650 - opened

windows,
Brotli==1.1.0
MarkupSafe==3.0.2
PySocks==1.7.1
PyYAML==6.0.2
accelerate==1.3.0
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
cn2an==0.5.23
colorama==0.4.6
filelock==3.17.0
fsspec==2025.2.0
h2==4.2.0
hpack==4.1.0
huggingface-hub==0.28.1
hyperframe==6.1.0
idna==3.10
jinja2==3.1.5
kaldi-native-fbank==1.20.2
kaldiio==2.18.0
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.3
packaging==24.2
peft==0.14.0
pillow==11.1.0
pip==25.0.1
proces==0.1.7
psutil==7.0.0
pycparser==2.22
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
sentencepiece==0.2.0
setuptools==75.8.0
sympy==1.13.1
tokenizers==0.21.0
torch==2.3.1
torchaudio==2.3.1
torchvision==0.18.1
tqdm==4.67.1
transformers==4.48.3
typing-extensions==4.12.2
urllib3==2.3.0
wheel==0.45.1
win-inet-pton==1.1.0
zstandard==0.23.0
autocommand==2.2.2
backports.tarfile==1.2.0
importlib-metadata==8.0.0
inflect==7.3.1
jaraco.collections==5.1.0
jaraco.context==5.3.0
jaraco.functools==4.0.1
jaraco.text==3.12.1
more-itertools==10.3.0
platformdirs==4.2.2
tomli==2.0.1
typeguard==4.3.0
zipp==3.19.2

The recognition result is irrelevant to the speech content, as shown below:
[{'uttid': '0113680_0118640', 'text': '嗯', 'wav': 'G:\testAudio\0113680_0118640.wav', 'rtf': '18.8808'}]
[{'uttid': '0351510_0354420', 'text': '一二三四五六', 'wav': 'G:\testAudio\0351510_0354420.wav', 'rtf':
'1.8096'}]

Can you use sherpa-onnx to test it?

Please see
https://k2-fsa.github.io/sherpa/onnx/FireRedAsr/index.html

By the way, you can test it in the following huggingface space.

https://k2-fsa.github.io/sherpa/onnx/FireRedAsr/huggingface-space.html

Sign up or log in to comment