HomeWHYWhy Wer

Why Wer

Here, the ASR system omits information about seeing the dog. There are two clear ways to align this, but they both lead to different WERs. Firstly, in alignment 1 you align “and I saw a cat” with the end of the reference. This leads to 5 deletions “and I saw a dog” which is a WER of 50%. Alignment 2 leads to an extra error due to the substitution of “dog” with “cat”. This leads to a WER of 60%. By minimizing the edit distance, you’ll ensure you get the alignment with the minimal WER (in this case Alignment 1).

Downfalls of WER

As mentioned above, the main pitfall is incorrectly giving importance to some errors over others.

Minor problems such as misspellings of names or the wrong number of repeated words – say when someone speaks with a stutter – are penalized just as heavily as errors that cause misinformation.

In addition, it is very difficult to get a test set that has no mistakes from human transcribers. This introduces an intrinsic floor which means it is impossible for the system to reach 0% WER. On top of mistakes, they often write the transcript using a different specification. After all, there are many ways to write the same thing. Here are two examples:

  • Numeric entities – can be in written or spoken form and there are differences in formatting such as commas in large numbers. Also, there are different ways of saying currency such as 10 pounds, 10 quid and a tenner. All are the same and will be £10 in written form.

  • Dates – there are countless ways of writing which include different ordering or shortened forms.

Refer to more articles:  Why Are Nendoroids So Expensive

If the reference is misaligned with the ASR output this can lead to increasing WER significantly. It can be trivial to normalize some of these issues, but the problem is harder when comparing across vendors because they all have different ways of formatting.

Furthermore, in order to get an accurate alignment, you must strip all punctuation and convert characters to lower case. Punctuation and capitals at the start of sentences or for proper nouns are essential for readability, but WER doesn’t take any of that extra information into account.

Let’s see a particularly bad example of how WER can be completely misaligned with reality.

RELATED ARTICLES

Why Is 13 Reasons Why Banned

Why Is Arr Stock Down

Why Is Blood Sausage Illegal

Most Popular

Recent Comments