Edit model card

English G2P token classification model

This is a non-autoregressive model for English grapheme-to-phoneme (G2P) conversion based on BERT architecture. It predicts phonemes in CMU format. Initial data was built using CMUdict v0.07

Intended uses & limitations

The input is expected to contain english words consisting of latin letters and apostrophe, all letters separated by space.

How to use

Install NeMo.

Download en_g2p.nemo (this model)

git lfs install
git clone https://huggingface.co/bene-ges/en_g2p_cmu_bert_large

Run

python ${NEMO_ROOT}/examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py \
  pretrained_model=en_g2p_cmu_bert_large/en_g2p.nemo \
  inference.from_file=input.txt \
  inference.out_file=output.txt \
  model.max_sequence_len=64 \
  inference.batch_size=128 \
  lang=en

Example of input file:

g e f f e r t
p r o s c r i b e d
p r o m i n e n t l y
j o c e l y n
m a r c e c a ' s
s t a n k o w s k i
m u f f l e

Example of output file:

G EH1  F  ER0 T	               g e f f e r t           G EH1 <DELETE> F <DELETE> ER0 T   G EH1 <DELETE> F <DELETE> ER0 T   PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
P R OW0 S K R AY1 B  D         p r o s c r i b e d	   P R OW0 S K R AY1 B <DELETE> D    P R OW0 S K R AY1 B <DELETE> D    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
P R AA1 M AH0 N AH0 N T L IY0  p r o m i n e n t l y   P R AA1 M AH0 N AH0 N T L IY0     P R AA1 M AH0 N AH0 N T L IY0     PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
JH AO1 S  L IH0 N              j o c e l y n           JH AO1 S <DELETE> L IH0 N         JH AO1 S <DELETE> L IH0 N         PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
M AA0 R S EH1 K AH0  Z         m a r c e c a ' s       M AA0 R S EH1 K AH0 <DELETE> Z	 M AA0 R S EH1 K AH0 <DELETE> Z    PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
S T AH0 NG K AO1 F S K IY0     s t a n k o w s k i     S T AH0 NG K AO1 F S K IY0        S T AH0 NG K AO1 F S K IY0        PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN
M AH1  F AH0L                  m u f f l e	           M AH1 <DELETE> F AH0_L <DELETE>   M AH1 <DELETE> F AH0_L <DELETE>   PLAIN PLAIN PLAIN PLAIN PLAIN PLAIN

Note that the correct output tags are in the third column, input is in the second column. Tags correspond to input letters in a one-to-one fashion. If you remove <DELETE> tag, and replace _ with space, you should get CMU-like transcription.

How to use for TTS

See this script to run TTS directly from CMU phonemes.

Downloads last month
11
Inference Examples
Inference API (serverless) does not yet support nemo models for this pipeline type.