June 6th ABAIR Meeting

June 6, 2025 · One min read

PhD Candidate

Research Fellow

Research Fellow

Undergraduate Student

Postgraduate Student

Masters Student

Undergraduate Student

DCU text corpus finally accessed, but data duplication an issue.
Attempting to improve recognition by using LLMs to generate diverse sentences with Irish placenames.

Problems with textcorpus from DCU - duplication. Only half 38.9 million words
Issues with proper nouns in recognition - not well-covered in corpora. Take placenames from logainm and use LLMs to generate text.

Chat with Master's supervisor. No issue with problematic data.
Liam suggests mixing some of the shorter and longer data so model doesn't overfit. Varied lengths may help.
Generated sentences with placenames with LLMs.