Punctuation and Timestamps?
Hi there!
Congrats on the work, I'm just testing it out and I have a question.
Seems that if I enabled timestamps
, then punctuation disappears.
Is this a normal behaviour?
Thanks!
As an output.
This is what I see with pnc='yes'
and without timestamps:
{
"pred_text": "Today we're talking about one of the most jaw dropping and epic presentations that Steve Jobs gave in his career. I'm talking about his two thousand eight keynote address where he unveiled the MacBook. Air. And the reason why I'm going way back in the time machine for this one is because I think there's so many underappreciated things about this keynote that Steve Jobs gave that if we go understood those things when I first saw this in two thousand eight. And by going through it now and just pointing out the little tips and tricks that Steve uses in his presentation, I feel like you could become a better presenter. Just give me ten to fifteen minutes to point these things out and I guarantee you it'll be worth your while. So let's dive right into it, but before we do, I just want to let you know my name is John Yushai. I've been a marketing manager at YouTube. Instagram for the past seven years and I upload a video to this channel every single week with a tip related to marketing, creativity and productivity. So if you haven't done so already, make sure to hit that subscribe. This is a presentation about the world's thinnest notebook and it's one simple idea and I feel like so many times we try to cram four or five different ideas so much so that people forget. Get all of them at the end of our presentation. So it's a great note that when you're making and preparing your next presentation, think about one simple idea and mention that right off the bat. Let's keep going. So what does that mean? Well, we went out and looked at all the thin notebooks out there. Most people think of these the Sony TZ series they're good notebooks and they're thin. This is what they look like. Side view there. By the way, I love what he's doing here because he calls out the competition by name and this is exactly how great stories form right he's setting up the villain."
...
}
And this is what I see with pnc='yes'
and timestamps='yes'
{
"pred_text": "Today we're talking about one of the most jaw dropping and epic presentations that Steve jobs gave in his career I'm talking about his two thousand eight keynote address where he unveiled the Macbook air and the reason why I'm going way back in the time machine for this one is because I think there's so many underappreciated things about this keynote that Steve Jobs gave that if we go <|endoftext|><|2|> those things when I first saw this in two thousand eight and by going through it now and just pointing out the little tips and tricks that Steve uses in his presentation I feel that you could become a better presenter Just give me ten to fifteen minutes to point these things out and I guarantee you it'll be worth your while so let's dive right into it but before we do I just want to let you know my name is John Youshai I've been a marketing manager at YouTube Instagram for the past seven years and I upload a video to this channel every single week with a tip related to marketing creativity and productivity so if you haven't done so already make sure to hit that subscribe <|0|> <|0|> Steve immediately does something that I feel like so many so few people do which is think about how can you sum up your entire presentation in one sentence and he did it right there this is a presentation about the world's thinnest notebook and it's one simple idea and I feel like so many times we try to cram four or five different ideas so much so that people get all of them at the end of our presentation so it's a great note that when you're making and preparing your next presentation think about one simple idea and mention that right off the bat let's keep going So what does that mean well we went out and looked at all the thin notebooks out there most people think of these the Sony tz series they're good notebooks and they're thin right this is what they look like side view there By way I love what he's doing here because he calls out the competition by name and this is exactly how great stories form right he's setting up the villain"
...
}
And in the one with timestamps enabled I also see some special tokens: <|endoftext|><|2|>
, <|0|> <|0|>
Thanks @eek for reporting. Do you have any audio sample and code snippet you could share for easy reproducing?
Thanks for letting us know, @eek ! This is likely due to lack of punctuated timestamp data in our English training data. How to properly model timestamps for punctuations is worth discussing and now we loosely model this by our training data. Please be ware of this limitation and we will come up with a solution.