September 22nd ABAIR Meeting
· 4 min read
Highlights
- One post per week to be put up on website/social media to promote ABAIR - need for the Oireachtas
- API meeting on Thursday at 2pm to discuss all the recent requests for access.
- New name for school - "Dept. of Linguistics".
- Storm technologies want an online meeting.
- Thursday at 2pm for API meeting.
- Adam from XXX, put together another chatbot - not great.
- Prototype sent using ELevenlabs API was not great.
- Údarás launching big drive to get people to contribute to CommonVoice.
- Is word level timing available.
- Already Coiste Comharlachan (sp?).
- Talked about seting up data centres for Irish with Thomás previously.
- Oireachtas - 3pm Thursday, 12noon Fri and 12noon on Sat for Oireachtas.
- One piece of content per week should be posted. Need a pipeline put in place.
- A few students interested in doing speech projects - could use Liam's Seannós recordings.
- Need to link auth with Fotheidil and put in average wait time.
- Agreed with Amanda that we need to post content about ABAIR to social media pages for Oireachtas.
- Can multiple accounts access and edit same Fotheidil file? Answer - no.
- Doing more recordings. Working on website stuff with John and Oskar.
- Asked if Joey's girlfriend is available to do the promotion - might be better to spend time and energy on promotion instead of recordings at Oireachtas.
- For MGnanÓg - should we launch it with just our Caideanach stories?
- Phoneme timing can be added for Cormac's final year project.
- Working on transcriptions with Muireann. Meeting Fionait on Wed - trying to do as many as possible.
- Met on Fri - a lot of extra sheimhus in Ring dialect (atá vs. athá). UNsure how to transcribe
- Trying to compress model, so that it could be accessible over an API.
- Don't know how good it is, as benchmarks are different form real use.
- Thomas fro Údarás met with Gerry Sweeney from UCC. Put in proposal to create AI for Irish committee. Got his name on the proposal. UCC developing a toolkit. Want to be the go to data centre for AI tools for Irish (LLMs).
- Short form video content works best for attracting attention.
- For our API - provide keys to users. Need to restrict access. Check profit vs. not-for-profit.
- Compared our models with commercial models - Speechmatics & Azure. Two tests, in domain and out of domain. In domain dataset (Commonvoice) - 8.7% for Speechmatics, 7.1% for us. OOD - Azure & SPechmatics ~40%, us 13.5%.
- New model can transcribe Seannós. Found 850 videos, 70 hours of Seannós scraped with lyrics - high quality recordings, good for an undergraduate student who wants to do a project.
- Companies haven't trained new models in the past year.
- Need to make a tutorial for Fotheidil. Also maybe try to crowsource it to get people involved.
- ASked about phoneme timing is available.
- Have multiple transcriptions in the lexicon already. Ciadhla should leave spellings as standard.
- Liam asked, past tense "d'" or "dh'"? Said not to change it from standard.
- Asked Liam about doegen.ie (sp?) recordings, as they have a lot of noise.
- Issue with slender 'r' in Mayo dataset. When Bríd talks, sometimes slender and sometimes broad. It's a position distinction.
- Some outdated phrasing in text for speakers to read. Recommended 'the Dialect of Ring'
Tamás
- Applied for positions at SETU, going for interviews soon.
- Didn't speak
- Didn't speak
- Didn't speak
- Didn't speak
- Didn't speak










