Friday, October 24, 2008
LORIA / INRIA Nancy Grand Est (French Consortium)
A General Overview
INRIA (National Institute for Research in Computer Science and Control) is a French public-sector scientific and technological institute operating under the dual authority of the Ministry in charge of Research and the Ministry in charge of Industry. General Headquarters are situated at Rocquencourt not so far from Paris. INRIA has eight main research units at: Bordeaux, Grenoble, Lille, Nancy (INRIA Nancy Grand Est), Rennes, Rocquencourt, Saclay, and Sophia-Antipolis. INRIA's missions are "to undertake basic and applied research, to design experimental systems, to ensure technology and knowledge transfer, to organize international scientific exchanges, to carry out scientific assessments, and to contribute to standardization".
The research carried out at INRIA brings together experts from the fields of computer science and applied mathematics covering the following areas: networks and systems; software engineering and symbolic computing; man-machine interaction; image processing, data management, knowledge systems; simulation and optimization of complex systems.
INRIA gathers in its premises around 2 100 persons including 1 300 scientists, many of which belong to partner organizations (CNRS, industrial labs, universities) and are assigned to work in common research teams called "projects". On INRIA's budget, around 500 full-time equivalent R&D positions can be accounted for. A large number of INRIA senior researchers are involved in teaching and their PhD students (about 550) prepare their thesis within the different INRIA research projects (about 90).
Its budget is roughly 90 MEuro, 25% of which comes from research and development contracts, royalties and sales.
Technology transfer is addressed at INRIA via industrial contracts (400 running), European Projects (already 300 executed), Framework agreements with specific industrial companies, and finally the creation of Technology companies (40 created since 1984). In 1998, INRIA has launched a subsidiary to promote high-tech start-up companies: INRIA-TRANSFERT deals with early accompaniment of the future companies and is the instigator of the I-Source Gestion company, in charge of setting up start up funds in the field of Information Technology.
LORIA (Laboratoire Lorrain de Recherche en Informatique et ses Applications) is a joint laboratory of the CNRS, INRIA and the three Universities of Nancy, putting together more than 400 researchers and PhD students. It is dedicated to fundamental and applied research in computer science with topics ranging from theoretical computer science, software engineering, parallelism, artificial intelligence, image synthesis and man-machine communication.
The TALARIS project (Traitement Automatique des LAngues : Représentations, Inférences et Sémantique) is devoted to Computational Lingusitics with special emphasis on research topics related to semantics and inference.
Our current research activities within the TALARIS project
Dealing with multilingual information is crucial to describing the content or the behavior of elementary components to specific user needs or targets. It requires one to consider potential situations where the linguistic information can be adapted on the fly to the linguistic needs of a user, or by using any additional process where an elementary component should be adapted before presenting it to the user.
The Multi Lingual Information Framework (MLIF – ISO WD 24616) is being designed with the objective of providing a common abstract model being able to generate several formats used in the framework of translation and localisation such as TMX or XLIFF.
MLIF also aims to propose a platform of specification for representing multilingual contents in a whole range of applications, as the localization and translation memory process, interactive and HD TV, karaoke, subtitles, accessibility, … MLIF promotes the use of a common framework for the future development of several different formats, for example: TBX (TermBase eXchange Standard), TMX (Translation Memory eXchange Standard), XLIFF (XML Localization Interchange File Format), W3C’s SMILtext (Synchronized Multimedia Integration Language), etc. It does not create a complete new format from scratch, but suggests that the overlapping issues should be handled independently and separately. It will save time and energy for different groups and will provide synergy to work in collaboration.
Presently, all the groups (i.e. LISA, OASIS, W3C, ISO, …) are working independently and do not have any mechanism for taking advantage of each other’s tools. MLIF proposes to concentrate on only those specific issues that are different from others and specific to one format only, so it will create a smaller domain for the groups’ developers. It gives more time to concentrate on a subset of the problems they are currently dealing with and creates a niche that helps in providing a better solution for problems of multilingual data handling and translation issues.
In MLIF, we deal with the issue of overlap between the existing formats. MLIF involves the development of an API through which all these formats will be integrated into the core MLIF structure. This is done through the identification and a selection of data categories as stated in ISO 12620. MLIF can be considered as a parent for all the formats that we have mentioned before. Since all these formats deal with multilingual data expressed in the form of segments or text units they can all be stored, manipulated and translated in a similar manner.
Our line of attack is largely inspired by the methodology used to develop TMF (Terminological Mark-up Framework) ISO 16642. So, we identify and select a set of data categories as stated in ISO 12620 (data category register). The way these data categories are related and associated to each other is described by a metamodel. This metamodel and the selected data categories are a high-level representation of multilingual content. From this high-level representation we are able to generate any specific format: we can thus ensure the interoperability between several multilingual content formats and their applications.
A first draft of MLIF has been developed, used an tested within the ITEA Passepartout Project (ITEA 04017). Currently, MLIF has became an “Working Draft” (WD) of ISO’s TC37/SC4 “Linguistic Resources Management”. We have also been working on a new W3C's SMIL 3.0 module called SMILtext (Synchronized Multimedia Integration Language - http://www.w3.org/AudioVideo/).
LORIA / INRIA Nancy Grand Est (TALARIS Project) and METAVERSE1
So, in the framework of the METAVERSE1 project, LORIA / INRIA Nancy Grand Est (TALARIS project) will mainly contribute to multilingual related issues (i.e. standardisation, handling and representation of multilingual textual information, ...). Also, within the scenario having being specified by the METAVERSE1 French consortium, LORIA / INRIA Nancy Grand Est will work on the detection of emotions from the analysis of textual information (i.e. chat).
LORIA / INRIA Nancy Grand Est
Contact Person: Dr. Samuel Cruz-Lara
Campus Scientifique - LORIA Bâtiment « B » - BP 239
F-54506 Vandoeuvre-lès-Nancy Cedex
Phone (+33) 3 83 59 20 31
Fax (+33) 3 83 41 30 79