Speech for Social Good Workshop, 2022
Proceedings of the workshop are now available on the ISCA Archive.
Interspeech 2022 Satellite Event
Despite the fact that speech is the most general form of human communication, the appeal of speech technologies has been inhibited by their limited availability and accessibility. Globally, speech technologies, including recognition and generation, are concentrated in regions of relative economic stability, which naturally excludes low resource languages from mainstream research. Even where available, these systems have biases in gender and accent, and are mostly unusable for individuals with speech impediments. Further, even though modelling and data curation are very important in academic circles, deployment and the challenges thereof in real world scenarios are often not spoken about. As speech technologies continue to be employed in an increasing number of application areas - despite the aforementioned shortcomings - we aim to highlight research that will make speech technologies more inclusive and useful for all, and believe that this workshop will provide a platform to address and highlight the social problems of incumbent speech systems.
Though we highlight a few topics of interest below, we invite submissions of any applications of speech technologies to social good:
- Bias in Speech Technologies - Uncovering and mitigating gender, racial, and other biases in recognition or generation systems,
- Impaired Speech - Developing better ASR, learning aids, and diagnostic tools for individuals with speech impediments or other medical conditions
- Support for seniors - synthesis and dialog systems that address the unique challenges brought on by age,
- Improving speech intelligibility - better synthesis systems for hearing-impaired individuals,
- Low Resource Languages - Developing data efficient systems for low resource languages, low resource data collection and curation, challenges in building such systems, challenges in collecting low resource data, case studies in low-resource languages,
- Affective computing - ASR systems for emotion recognition, and synthesis/dialog systems for positive human-computer interaction, identifying signs of depression and other mental health conditions,
- Challenges of Speech for Social Good - research and survey papers highlighting the challenges of designing such systems, privacy and ethical concerns, identifying and preventing potential misuse of speech systems,
- Social good, justice, equity : surveys on demographic usage of speech technologies including identification of most typical user types, and which groups are currently most disadvantaged by these systems, how speech systems can be used to challenge the existing power structures,
- Recommendations on the future of speech technologies : papers that address the current state of affairs in speech technologies and make recommendations to researchers for critical considerations that must be made in designing future speech technologies,
- Case studies : case studies of real world speech systems deployment, social and ethical issues in use and deployment of speech systems, etc.
Invited Speakers
We are excited to announce the following speakers at S4SG 2022:- Vikram Ramanarayanan, Modality AI and UCSF, "Multimodal Dialog Technologies for Neurological and Mental Health"
- Pratyush Kumar, IIT Madras, "What does it really take to build speech recognition systems for the next billion users?"
Submission Process
You can submit original research work or a summary of research activities done in the direction of Speech for Social Good. You can present original completed work, a case study, a negative result, an opinion piece, or an interesting application nugget.
The review process will be double-blinded. Submissions must not identify authors or their affiliations, or they will be desk-rejected.
Accepted papers will be presented during the workshop by either oral presentations or posters determined by the program committee.
We encourage authors to also include a discussion of what “positive impact” means to them or to the field of NLP.
Submission Link
We will be using the SoftConf conference management system.Venue
The (ISCA-endorsed) workshop will be a Satellite Event at Interspeech 2022, and will be organized virtually considering the current travel restrictions due to the ongoing COVID-19 pandemic.Important Dates
Abstract Submission Deadline |
|
Paper Submission Deadline |
|
Notification of Acceptance |
|
Camera-ready Papers Due |
|
Workshop | September 24-25, 2022 |
Program
All times in Korean Standard Time. Please make sure to convert to your own time zone. Here is a handy tool you can use for conversions.Saturday, September 24th
Time | Title |
---|---|
11AM-12PM | Keynote : "What does it really take to build speech recognition systems for the next billion users?", Pratyush Kumar (Slides, Video) |
12:10PM - 12:30PM | "Annotated Speech Corpus for Low Resource Indian Langauges: Awadhi, Bhojpuri, Braj and Magahi", Ritesh Kumar, Siddharth Singh, Shyam Ratan, Mohit Raj, Sonal Sinha, Sumitra Mishra, Bornini Lahiri, Vivek Seshadri, Kalika Bali, Atul Kr. Ojha (Paper, Slides, Video) |
12:30PM - 12:50PM | "Towards an Automatic Speech Recognizer for the Choctaw language", Jacqueline Brixey, David Traum (Paper, Slides, Video) |
12:50PM - 1:10PM | "Building TTS systems for low resource languages under resource constraints", Perez Ogayo, Graham Neubig, Alan W Black (Paper, Slides, Video) |
1:10PM - 1:30PM | Breakout Room |
Time | Title |
---|---|
10PM-11PM |
Panel : "Data Collection, Bias, and Ethical Concerns in Speech Processing" (Video) Odette Scharenborg, Emily Ahn, Gopala Anumanchipalli, and Sakriani Sakti |
11:10PM - 11:30PM | "Can Smartphones be a cost-effective alternative to LENA for Early Childhood Language Intervention?" Satwik Dutta, Jacob C. Reyna, Jay F. Buzhardt, Dwight Irvin, John H.L. Hansen (Paper, Slides, Video) |
11:30PM - 11:50PM | "Comparing data augmentation and training techniques to reduce bias against non-native accents in hybrid speech recognition systems", Yixuan Zhang, Yuanyuan Zhang, Tanvina Patel, Odette Scharenborg (Paper, Slides, Video) |
11:50PM - 12:10AM | "Text Normalization for Speech Systems for All Languages", Athiya Deviyani, Alan W Black (Paper, Slides, Video) |
12:10AM - 12:30AM | Breakout Room |
Sunday, September 25th
Time | Title |
---|---|
11AM-12PM | Keynote : "Multimodal Dialog Technologies for Neurological and Mental Health", Vikram Ramanarayanan (Video) |
12:10PM - 12:30PM | "Assessing ASR Model Quality on Disordered Speech using BERTScore", Jimmy Tobin, Qisheng Li, Subhashini Venugopalan, Katie Seaver, Richard Cave, Katrin Tomanek (Paper, Slides, Video) |
12:30PM - 12:50PM | "Cross-Teager Cepsral Coefficients For Dysarthric Severity Level Classification", Anand Therattil, Aastha Kachhi, Hemant Patil (Paper, Slides, Video) |
12:50PM - 1:10PM | "Highly Intelligible Speech Synthesis for Spinal Muscular Atrophy Patients Based on Model Adaptation", Takuma Yoshimoto, Ryoichi Takashima, Chiho Sasaki, Tetsuya Takiguchi (Paper, Slides, Video) |
1:10PM - 1:30PM | Breakout Room |
Organizers
Anurag Katakkar | NVIDIA |
Alan W Black | Carnegie Mellon University |
Sunayana Sitaram | Microsoft Research India |
Shrimai Prabhumoye | NVIDIA |
Sai Krishna Rallabandi | Carnegie Mellon University |
Sakriani Sakti | JAIST |
Vikram Ramanarayanan | Modality AI |
Anirudh Koul |