Apply for community grant: Academic project (gpu and storage)

#1
by arthrod - opened
Cicero org

Hugging Face Community Grant Application

Project Name

Cicero-IM - Research on Multilingual Legal AI for Access to Justice

Project Description

Cicero-IM is a research initiative exploring how AI can improve access to legal services in non-English languages, with an initial focus on Brazilian Portuguese. We study how language models can be adapted to help individuals with limited resources understand and generate basic legal documents.

Our research involves analyzing legal document structures and creating accessible models that can simplify legal language for the average citizen. We've released a dataset of Portuguese legal clauses on Hugging Face and are committed to sharing our research findings with the community.

Current Work and Resource Needs

Our current research focuses on processing SEC agreements from EDGAR to better understand contract structures and clausular elements. This work includes:

  1. Analyzing Qwen 2.5 7B performance on legal domain tasks
  2. Developing methods for legal information extraction
  3. Testing approaches for generating legally sound Portuguese text

These research tasks require computational resources beyond what our small team currently has access to.

Why Hugging Face?

Hugging Face provides an ideal environment for our research:

  1. Research Community: The platform connects us with other researchers working on legal NLP and low-resource languages.

  2. Technical Infrastructure: The specialized infrastructure is well-suited for the experiments we're conducting with transformer models.

  3. Open Science: Our datasets are already on Hugging Face, making it easier to connect our research outputs directly to the data.

  4. Accessibility: Hugging Face's user-friendly interfaces make our research more accessible to legal aid organizations interested in building on our work.

Requested Resources

For our research, we need:

  • GPU Resources: 2,500 GPU hours for model training and evaluation
  • Persistent Storage: 5TB for datasets and experimental results (that will fit for the agreements)
  • Duration: 12-month allocation for our current research phase

Community Benefits

This support will enable us to:

  1. Publish research on legal language adaptation for resource-constrained settings
  2. Release evaluation benchmarks for legal domain performance in Portuguese
  3. Share our methodology for processing and analyzing legal documents
  4. Document effective approaches for making legal language more accessible

Current Hugging Face Presence

Impact Measurement

With these resources, our research aims to:

  • Process, translate and analyze at least 500,000 SEC agreements
  • Evaluate 3 model architectures for legal understanding tasks
  • Develop methods to make legal text more accessible for non-experts
  • Support research that could help legal aid organizations improve their services

Commitment to Open Science

All research outputs will be published openly on Hugging Face with comprehensive documentation. We'll regularly share our findings through model cards and technical notes to benefit the broader research community working on access to justice.


Thank you for considering our application. Our research aims to understand how AI can help bridge the justice gap for those who cannot afford legal services, and Hugging Face's support would significantly advance this important work.

Sign up or log in to comment