Advanced search
Start date
Betweenand

Intelligent system for case law analysis using modern deep learning techniques applied to natural language processing

Grant number: 20/09753-5
Support Opportunities:Research Grants - Innovative Research in Small Business - PIPE
Duration: March 01, 2021 - March 31, 2022
Field of knowledge:Engineering - Electrical Engineering
Principal Investigator:Rodrigo Frassetto Nogueira
Grantee:Rodrigo Frassetto Nogueira
Host Company:Neuralmind Inteligência Artificial Ltda
CNAE: Desenvolvimento de programas de computador sob encomenda
Desenvolvimento e licenciamento de programas de computador customizáveis
City: Campinas
Associated researchers: Fábio Capuano de Souza ; Israel Campiotti ; Roberto de Alencar Lotufo
Associated grant(s):22/01640-2 - QUEST - a Zero-Shot Information retrieval and summarization system, AP.PIPE
Associated scholarship(s):21/05099-1 - Implementation of a search engine for the jurisprudence analysis and construction of an annotation interface for the evaluation dataset, BP.TT
21/02480-6 - Implementation of a search engine for the jurisprudence analysis and construction of an annotation interface for the evaluation dataset, BP.TT

Abstract

In this project, we will investigate the automation of case law analysis, which consists of finding information that supports a favorable decision (or unfavorable) to a specific case. The task of case law analysis is fundamental to obtain a good result in a legal process, assisting in the strategic decisions, and minimizing risks. However, the costs of this task are high due to the large volume of documents that need to be analyzed. An accurate system that partially or totally automates this task has the possibility to reduce the time and procedural costs, in addition to increasing the effectiveness of the case's argument, enabling the adoption of the best strategy for that specific case. To build this system, we envision the use of modern natural language processing systems as a promising avenue, which have made tremendous progress in recent years, mainly due to advances in deep learning methods. However, their application to specialized tasks, such as the case law analysis, is not trivial due to the lack of data in Portuguese to train these models, especially legal domain data. To solve these problems, we will investigate methods for transferring knowledge from models pre-trained in English texts of general domain (which are abundant) for the task of case law analysis in Portuguese, whose availability of training data is still very limited. More specifically, this project aims to answer the following question: given that there are a variety of models pre-trained in English and with excellent performance in general domain tasks, what is the most effective way to adapt these models to the task of case law analysis in Brazilian Portuguese? The answer to this question will enable us to develop and apply natural language processing systems to new legal tasks quickly and cheaply. The knowledge transfer techniques that we will investigate include: 1) automatic translation of training datasets from English to Portuguese; 2) automatic translation at inference time of inputs and outputs of models trained in English; 3) low-cost knowledge and vocabulary transfer; 4) pre-training in a Brazilian legal corpus. The evaluation of the effectiveness of these methods will be done in the task of case law analysis, both in English and in Portuguese. The Portuguese dataset will be built during this project and will be a by-product that can be used in future projects. The results of the experiments described above will guide us on the best methodology to develop the system. From an economic point of view, the best scenario is one in which existing systems trained in abundant corpora and tasks perform effectively in specific corpus and tasks without any changes. The worst-case scenario is when a new system needs to be developed practically from scratch for each new task. We believe that, in practice, the methods for developing these systems lie between the two scenarios, but identifying what is the total effort required is one of the questions that we will answer with this project. The knowledge acquired in this project will guide us in the development of future natural language processing products for the legal area. This research can also be considered as a first step towards the development of Jurimetry systems. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)

Scientific publications
(References retrieved automatically from Web of Science and SciELO through information on FAPESP grants and their corresponding numbers as mentioned in the publications by the authors)
BONIFACIO, LUIZ; ABONIZIO, HUGO; FADAEE, MARZIEH; NOGUEIRA, RODRIGO; ACM. InPars: Unsupervised Dataset Generation for Information Retrieval. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), v. N/A, p. 6-pg., . (22/01640-2, 20/09753-5)

Please report errors in scientific publications list using this form.
X

Report errors in this page


Error details: