Skip to main content

June 6th ABAIR Meeting

· One min read
Liam Lonergan
PhD Candidate
Andy Murphy
Research Fellow
John Sloan
Research Fellow
Dylan Martin
Undergraduate Student
Muireann Nic Corcrain
Postgraduate Student
Joey McInerney
Masters Student
Sai Kaustubh
Undergraduate Student

Highlights

  • DCU text corpus finally accessed, but data duplication an issue.
  • Attempting to improve recognition by using LLMs to generate diverse sentences with Irish placenames.

Liam

  • Problems with textcorpus from DCU - duplication. Only half 38.9 million words
  • Issues with proper nouns in recognition - not well-covered in corpora. Take placenames from logainm and use LLMs to generate text.

Joey

  • Chat with Master's supervisor. No issue with problematic data.
  • Liam suggests mixing some of the shorter and longer data so model doesn't overfit. Varied lengths may help.
  • Generated sentences with placenames with LLMs.

John