August 8th ABAIR Meeting
· 2 min read
Highlights
- Meeting with TG4 pushed back from September to October.
- COGG have asked Amanda to come present to them.
- Julie heard from someone Gaeltacht sites of UG wanting to use Geabaire ond other ABAIR resources in their courses.
- Just 2 hours gathered from child speech so far. Much more difficult to get volume than for adults.
- Want to give kids some token of appreciation.
- Recording on iPad good.
- Afric up with Amanda.
- Need to polish up Mile Glor.
- Diversity needed in characters in MaO.
- COGG contacted Amanda to present to them.
- Julie heard from someone Gaeltacht sites of UG wanting to use Geabaire ond other ABAIR resources in their courses.
- Working with Amanda on Míle Glór na nÓg.
- Questions if Míle Glór is still going.
- Working with Transtool.
- Scraped some other data ?? RTÉ Archives.
- Collected Oireachtas Corpus, with 1.5B tokens EN and 27M GA. Helps with current GA LLM problems: short context text, cultural alignment. Working with Tung, a PhD researcher at UCC.
- Plans for data set:
- Instruction data set
- Human feedback dataset
- Bilingual QA Bench mark. Easy to make because written parliamentary questions are annotaed by topic. E.g.: 'Finance'.
- Sociolingusitic analysis (long term, future project): topics, codeswitching en/ga
- Questions on how much data TG4 will be providing us so we can plan storage.
- Fixed errors on S2s.
- Will fix fotheidil next week.
- Asked about new Website progress.






