# Welcome to the "Taiwanese Speech in the Wild (TSW)" Project ##### Yuan-Fu Liao, Taipei University of Technology, [yfliao@mail.ntut.edu.tw](mailto:[yfliao@mail.ntut.edu.tw) ### 語料庫現況 這是整個TSW語料庫現況簡介的public project,若有關於整個TSW的問題,歡迎在此發問(請用[issues](https://speech.nchc.org.tw/yfliao/Taiwanese-Speech-in-the-Wild/issues))! 若是針對個別子語料庫的問題請移駕到各子語料庫project網頁! * `如果有意願幫助校正語料,為語料庫盡一份心力,可以知會廖元甫(yfliao@mail.ntut.edu.tw),先協調工作分配,以免重複。` ### Bug Report * CA error > git clone https://speech.nchc.org.tw/GrandChallenge/MATBN.git/ 時出現CA error > > fatal: unable to access 'https://speech.nchc.org.tw/GrandChallenge/MATBN.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none workaround > disable the ca-certificates verification by export GIT_SSL_NO_VERIFY=1 ### 公告 * The first wave of TSW corpora consists 5 subsets (beta version, except MATBN) and has been officially released on April 11, 2018! |Corpus|abbreviation|Source|Hours|Remark| |:---|:---|:---:|---:|:--| |Mandarin Chinese Broadcast News corpus |MATBN|PTS|198.0|story and speaker boundaries| |NER Phonetic Annotation corpus Vol. 1|NER-PhA-Vol1 |NER|6.5 | phone, syllable, speaker and code-switching| |NER Manual Transcription corpus Vol. 1|NER-Trs-Vol1 |NER| 107.4 | manual, word sequences| |NER Automatic Transcription corpus Vol. 1|NER-Auto-Vol1 |NER| 309.6 | auto, word sequences| |PTS Manual Subtitlig corpus Vol. 1 |PTS-MSub-Vol1 |PTS| 264.0 | manual subtitling with time code| |Total|||879.0| exclude NER-PhA-Vol1| * PTS: Taiwan Public Television Service * NER: National Education Radio