# Welcome to the "Formosa Speech in the Wild (FSW)" Project

#### Yuan-Fu Liao, Taipei University of Technology, [yfliao@mail.ntut.edu.tw](mailto:[yfliao@mail.ntut.edu.tw)

## [Formosa Speech Recognition Challenge 2018]

為了促進台灣本土語音識別技術的發展,凸顯台灣在地語料的差異。我們特別另外舉辦了一場單純針對語音辨認技術的評比活動。活動內容和規則詳情請參閱:

* https://sites.google.com/speech.ntut.edu.tw/fsw/home/challenge

比賽特色包括:
* 提供126.6小時台灣本土真實廣播節目語料
* 提供kaldi腳本
* 特別設置最佳學生獎

活動時間即日起接受報名。

歡迎學術界、業界的朋友,尤其是學生共同參與!

## 【新增參考系統kaldi腳本】
* https://github.com/yfliao/kaldi/tree/master/egs/formosa


### 語料庫現況
這是整個TSW語料庫現況簡介的public project,若有關於整個TSW的問題,歡迎在此發問(請用[issues](https://speech.nchc.org.tw/yfliao/Taiwanese-Speech-in-the-Wild/issues))!

若是針對個別子語料庫的問題請移駕到各子語料庫project網頁!

* `歡迎回報任何語料錯誤、問題與建議!`
* `如果有意願幫助校正語料,請先知會廖元甫(yfliao@mail.ntut.edu.tw),協調工作分配,以免重複!`

### 問題

* CA error

    > git clone https://speech.nchc.org.tw/GrandChallenge/MATBN.git/   時出現CA error
    >
    > fatal: unable to access 'https://speech.nchc.org.tw/GrandChallenge/MATBN.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

    workaround
    
    > disable the ca-certificates verification by export GIT_SSL_NO_VERIFY=1

### 公告

*  The first wave of FSW corpora consists 5 subsets (beta version, except MATBN) and has been officially released on April 11, 2018!

    |Corpus|abbreviation|Source|Hours|Remark|
    |:---|:---|:---:|---:|:--|
    |Mandarin Chinese Broadcast News corpus |MATBN|PTS|198.0|story and speaker boundaries|
    |NER Phonetic Annotation corpus Vol. 1|NER-PhA-Vol1 |NER|6.5 | phone, syllable, speaker and code-switching|
    |NER Manual Transcription corpus Vol. 1|NER-Trs-Vol1 |NER| 126.6 | manual, word sequences|
    |NER Manual Transcription corpus Vol. 2|NER-Trs-Vol2 |NER| 111.0 | manual, word sequences|
    |NER Automatic Transcription corpus Vol. 1|NER-Auto-Vol1 |NER| 309.6 | auto, word sequences with recognition error rate prediction (QE) and confidence measure (CM)|
    |PTS Manual Subtitlig corpus Vol. 1 |PTS-MSub-Vol1 |PTS| 264.0 | manual subtitling with time-code|
    |Total|||980.0| exclude NER-PhA-Vol1|
  
        *   PTS: Taiwan Public Television Service
        *   NER: National Education Radio