corpus

set of documents grouped in a precise objective
Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 11/09/13
  • Minor correction: 11/09/13

TreeCloud : building tree cloud visualizations from texts

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 1.3 - 13/12/2009
  • License(s): GPL
  • Status: under development
  • Support: maintained, ongoing development
  • Designer(s): Philippe Gambette ; Jean VĂ©ronis
  • Contact designer(s):

    P. Gambette

  • Laboratory, service:

 

General software features

TreeCloud generates a tree cloud from a text, that is a word cloud whose words are arranged around a tree which reflects their semantic proximity inside the text.

Context in which the software is used

The main application of the tree clouds built by TreeCloud is to provide a quick overview of the content of a text. It is also possible to use them for a deeper analysis of the texts, included in a textometric approach (text analysis using software tools and statistical methods). Then, the tree cloud will help the user to fomalize some hypotheses, or test them. It can therefore lead to use other textometric tools to confirm these hypotheses, or to visualize the results of the output of those tools.

Publications related to the software

Philippe Gambette and Jean VĂ©ronis: Visualising a Text with a Tree Cloud, In Locarek-Junge H. and Weihs C., editors, Classification as a Tool of Research, Proc. of IFCS'09 (11th Conference of the International Federation of Classification Societies) Studies in Classification, Data Analysis, and Knowledge Organization 40, p. 561-570, 2010.

Delphine Amstutz and Philippe Gambette (in French): Utilisation de la visualisation en nuage arboré pour l'analyse littéraire, Statistical Analysis of Textual Data (Proc. of JADT'10), p. 227-238, 2010.

Philippe Gambette, Nuria Gala and Alexis Nasr(in French): Longueur de branches et arbres de mots, Corpus 11, p. 129-146, 2012.

William Martinez and Philippe Gambette (in French): L'affaire du Médiator au prisme de la textométrie, Texto !, to appear, 2013.

Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 24/03/09
  • Minor correction: 10/07/13

Unitex : corpus processing using finite state technology

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 3.0 stable - september 2012
  • License(s): LGPL - - The language resources distributed with the software are licensed LGPLLR, a license developed by the UniversitĂ© Paris-Est Marne-la-VallĂ©e and validated by the FSF as the equivalent of LGPL for linguistic data. http://igm.univ-mlv.fr/~unitex/lgpllr.html
  • Status: validated (according to PLUME), stable release, under development
  • Support: maintained, ongoing development
  • Designer(s): SĂ©bastien Paumier
  • Contact designer(s): unitex@univ-mlv.fr
  • Laboratory, service:

 

General software features

The Unitex system provides tools to build language resources such as electronic dictionaries and grammars to use them in advanced searches in texts and in generating concordances.

The French validated software index card Fiche Plume describes the software in detail.

Context in which the software is used

Exploration tool used for research by the language processing team of the computer laboratory.
It is also used in several universities at international level as a tool for research and teaching in computer language studies.

Publications related to the software
  • SĂ©bastien Paumier. 2000. Nouvelles mĂ©thodes pour la recherche d'expressions dans de grands corpus. In A. Dister (ed.), Actes des 3èmes JournĂ©es INTEX. Revue Informatique et Statistique dans les Sciences Humaines, 36ème annĂ©e, n° 1 Ă  4.
  • SĂ©bastien Paumier. 2003. A Time-Efficient Token Representation for Parsers, Proceedings of the EACL Workshop on Finite-State Methods in Natural Language Processing, Budapest, pp. 83-90.
  • Other publications associated with the project can be found at its website.
Syndicate content