September 29th ABAIR Meeting
· 4 min read
Highlights
- Application for supercomuting resources to be made on Wednesday.
- Need to talk about call for computational resources identified by Tamas. Deadline for Wednesday.
- Joey mentioned training an LLM like LLama - also talked about leaning and training pdfs. 8 GPUs needed.
- Is access time limited - yes, 6 or 12 months.
- Waiting for large amounts of data from TG4.
- Want to put in 2 applications, one for ASR and one for LLMs.
- Question about the providion for some EU contries but not others.
- Should be launching Míle Glór na nÓg - will be the central piece. Fotheidil, new website.
Tamás
- Call is monthly, so not imperitive to get done for this week.
- Process takes 2 weeks. Develop, becnhmark and then scale.
- EXtreme scale is very competitive.
- Go for smaller one before going for larger.
- Are we married to Nvidia GPU? - some use AMD GPUs. Liam says models based on Pytorch, so ok.
- Suggests LUMI - 4500 node hours being given away.
- 1 node is technically 8 GPUs, so multiply by 8 to get total power.
- Need Liam and Joey to put together abstract, code and make estimates.
- None of nodes have hard drives - so need details on files to send.
- Maybe use another toolkit, e.g. espnet.
- Recommends going for one application for 2 projects, as they are not strict about what the compute is used for after it is assigned, so can use it for both.
- Creating and serving a GPU recognition model might be difficult.
- Mentioned Gradio as a possibility but doesn't have experience. Difficulty is opening right ports for access - might have to fight it.
- Currently using Nvidea Nemo. All models based on PyTorch, so should be safe.
- Notes that lack of hard drives might be a limiting factor.
- Better train an audio encoder - more manageable?
- Can scale up model size with this approach -make the case that this is a dry run and we intend to apply for more resources with more data later.
- Difficulty is creating something that is usable my numerous users (Gradio - host a model and interact with it. Not sure how well it would scale).
- Asked Joey if you can use local GPU - can't see a way to do it on your own GPUs.
- Asked about hosting our production services on the cloud GPUs, but it's not possible. Will free up resources in the lab however.
- Asked about social media posts that were discussed last week.
- Asked Liam to put guide for Fotheidil on Mol.
- Asked where check to see status of language - low resource, underresourced etc.? Tamas will find it.
- Need abstracts for what we are launching in the Oireachtas.
- Pressure on for doing transcriptions.
- Didn't receive anything to post for social media. Aifric wants to work on that.
- Will have limited content for MGNN with different dialects. It's difficult to retrieve and get permission.
- Can do Gradio demo in an hour. Takes no time to deploy.
- Going to train streaming ASR using Gradio.
- Transcriptions going on, getting through them. Met with Fianit last Wednesday - figuring out some dialectal differences.
- Meet with Andy about creating documentation for recordings.
- Some stuff is on Audacity, but will be using Fotheidil.
- Almost halfway through transcriptions.
- Noted regular changes made by FIant, and asked her for clarifications - are they universal or context dependent. Have good idea of lexical variation for LTS rules. Should chat with Christoph.
- Wants to join the Gradio group.
- Didn't speak
- Wants to meet Ailbhe/OSkar about his final year project. Asks for time from Ailbhe - push it back to Wednesday afternoon.
- Didn't speak
- Will meet Ailbhe & Finn this week.
- Asked Liam for clarification on streaming model vs the best that's available which is not streaming. Liam said it is totally different. Would need to retrain current best system so that it can be streamable. Will take months.
- Wants to test the s2s system with native speakers.










