statistics

Statistics
Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 19/09/13
  • Minor correction: 19/09/13
  • Index card author: Eric Hivon (IAP)
  • Theme leader : Dirk Hoffmann (Centre de Physique des Particules de Marseille (CPPM-IN2P3))

HEALPix : data analysis, simulation and visualisation on the sphere

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 3.11 - April 2013
  • License(s): GPL - GPLv2
  • Status: stable release
  • Support: maintained, ongoing development
  • Designer(s): Eric Hivon; Martin Reinecke; Krzysztof M. Gorski; Anthony J. Banday; Benjamin D. Wandelt; Emmanuel Joliet; William O'Mullane; Cyrille Rosset; Andrea Zonca
  • Contact designer(s): hivon at iap.fr
  • Laboratory, service: MPA (Garching, Allemagne), Caltech (Pasadena, CA,Etats-Unis), TAC (Copenhague, Danemark), ESAC (Madrid, Espagne), JPL (Pasadena, CA, Etats-Unis), ESO (Garching, Allemagne)

 

General software features

The HEALPix software implements the HEALPix (Hierarchical Equal Area iso-Latitude Pixelation) pixelation of the sphere. Initially developed for the simulation and analysis of ESA Planck satellite observations (dedicated to the study of the Cosmic Microwave Background (CMB) anisotropies, whose first results were delivered in March 2013), this software and its pixelation algorithm have become standard tools in the simulation and analysis of data on the sphere, including the NASA WMAP satellite, also dedicated to CMB observation, and the Pierre Auger ground based observatory for high energy cosmic rays, and are used for other astrophysical and geological studies.

Main features of the pixelation

At a given resolution, all HEALPix pixels have the same surface area, even if their shape varies slightly. Thanks to the hierarchical feature of the pixelation, upgrading its resolution to the next level simply amounts to divide each pixel into four sub-pixel of the same area. This allows quick and efficient upgrading and downgrading operations of existing maps.

Since the pixels are regularly spaced on iso-latitude rings, Spherical Harmonics can be computed very efficiently. The synthesis or analysis up to multipole Lmax  of a spherical data set containing Npix pixels is reduced from    Npix Lmax2   to   Npix½ Lmax2  compared to non iso-latitude pixelation.

Features of the software package

The represents data on the sphere, and enables analysis or simulation of these maps in (scalar or spin-weighted) Spherical Harmonics, as well as various kinds of statistical analyses and processing. Portable FITS files are used for input and output. The list of available functions includes:

  • generation of random maps (gaussian or not) from an arbitrary angular power spectrum,
  • computation of the angular power spectrum (or angular correlation function) of a map,
  • convolution of a spherical map with an arbitrary circular window,
  • tessellation of the sphere and pixel processing supported down to a pixel size of 0.4 milliarcseconds (equivalent to 3.5 1018 pixels on the sphere),
  • median filtering of a map,
  • search of local extrema in a map,
  • query of pixels located in user defined disks, triangles, polygons, ...
  • processing of binary masks to identify 'holes' in order to fill them, or to apodize masks,
  • visualization of HEALPix sky maps either on the whole sky (using Mollweide or orthographic projections) or on a patch (gnomic or cartesian projections),
  • output in Google Map/Google Sky and DomeMaster format.

The most expensive operations, such a Spherical Harmonics Transform have been carefully optimised and benefit from a shared memory parallelisation based on OpenMP.

Contents of the software package

The software is available in C, C++, Fortran90, IDL/GDL, Java and python. The following modules are provided in each of these languages:

  • a library of tools (subroutines, functions, procedures, modules, classes, ...depending on languages) covering most of the functionnalities described above, as well as supporting ancillary tools (eg, parameter file parsing),
  • a set of stand-alone facilities based on the library above and each implementing one of HEALPix major features (map generation or analysis, filtering, resolution udgrade or downgrade, visualization). These applications are generally run via an interactive dialog or an ASCII parameter file. Their source code can be used as a starting point for user specific developments,
  • an extensive PDF and/or HTML documentation describing in details the API, inner working and limitations of each tool and application.

Finally, some tools (interactive script and Makefile) are provided to manage and facilitate the compilation and installation of one or several of the libraries and facilities, for most combinations of hardwares, operating systems, compilers, ...

Third Party Developements

One can distinguish two kinds of third party developements (defined as not (yet) being part of the official HEALPix package described above):

  • new functionalities, for instance many tools based on Minkowski functionals, wavelets (iSAP, MRS, S2LET, SphereLab), or structure identification (DisPerSE) developed by various research teams can be applied to data stored in HEALPix format,
  • translations, re-implementations or wrapping of (some of) existing functionalities, for instance in Matlab/Octave (Mealpix) and Yorick (YHeal) are available. (See (almost) exhaustive list.)

Context in which the software is used

Software used for the analysis of Planck satellite data.
Data format supported by Aladin visualisation software to represent diffuse astronomical data on the sky.

Publications related to the software

Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 18/04/13
  • Minor correction: 18/04/13

SVDetect : a tool to detect genomic structural variations from paired-end and mate-pair sequencing data

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 0.8 - 05/12/2011
  • License(s): GPL
  • Status: stable release
  • Support: maintained, no ongoing development
  • Designer(s): Bruno Zeitouni, Valentina Boeva
  • Contact designer(s): svdetect@curie.fr
  • Laboratory, service:

 

General software features

From NGS paired sequences and mapped onto a reference genome, SVDetect allows you to detect clusters of anomanously mapped pairs (with abnormal order, strand orientation or insert size of fragments), and to predict structural variants (SVs) such as large insertions, deletions, inversions, duplications or intra/inter-chromosomal translocations. SVDetect can also compare the results of SVs from different samples and to identify specific-sample SVs (Tumoral DNA vs Control DNA, for example).
SVDetect is compatible with any type of paired reads ("paired-end" or "mate-pair"), sequencing technology (Illumina, SOLiD, PGM, ...), or type of genome.
SVDetect can compute coverage profiles and to reveal loss or gains of genomic regions from the copy-number information.
It is available into a PERL Script and takes the BAM format as input.
SVDetect is also available at the Galaxy toolshed.

Context in which the software is used

SVDetect is an application for the isolation and the type prediction of intra- and inter-chromosomal rearrangements from paired-end/mate-pair sequencing data provided by the high-throughput sequencing technologies.
It was primarily tested in the context of whole genome resequencing projects from cancer cells, rich in chromosomal rearrangements.
SVDetect can also detect fusion genes from RNA-seq experiments.

Publications related to the software
  • SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data
    Bruno Zeitouni; Valentina Boeva; Isabelle Janoueix-Lerosey; Sophie Loeillet; Patricia Legoix-ne; Alain Nicolas; Olivier Delattre; Emmanuel Barillot, Bioinformatics 2010 26: 1895-1896, http://www.hal.inserm.fr/inserm-00508372
Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 20/01/12
  • Minor correction: 20/01/12

QTLMap : detection of QTL from experimental designs in outbred population

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 0.8.3 - 14 october 2010
  • License(s): CeCILL
  • Status: beta release
  • Support: maintained, ongoing development
  • Designer(s): Pascale Le Roy, Jean-Michel Elsen, Helene Gilbert, Carole Moreno, Andres Legarra, Olivier Filangi
  • Contact designer(s): olivier.filangi@rennes.inra.fr
  • Laboratory, service:

 

General software features

Description

QTLMap is a software dedicated to the detection of QTL from experimental designs in outbred population. QTLMap software is developed by the Animal Genetics Division at INRA (French National Institute for Agronomical Research). The statistical techniques used are linkage analysis (LA) and linkage disequilibrium linkage analysis (LDLA) using interval mapping. Different versions of the LA are proposed from a quasi Maximum Likelihood approach to a fully linear (regression) model. The LDLA is a regression approach (Legarra and Fernando, 2009). The population may be sets of half-sib families or mixture of full- and half- sib families. The computations of Phase and Transmission probabilities are optimized to be rapid and as exact as possible. QTLMap is able to deal with large numbers of markers (SNP) and traits (eQTL).

Functionnalities

  • QTL detection in half-sib families or mixture of full- and half-sib families
  • One or several linked QTL segregating in the population
  • Single trait or multiple trait
  • Nuisance parameters (e.g. sex, batch, weight...) and their interactions with QTL can be included in the analysis
  • Gaussian, discrete or survival (Cox model) data
  • Familial heterogeneity of variances (heteroscedasticity)
  • Can handle eQTL analyses
  • Computation of transmission and phase probabilities adapted to high throughput genotyping (SNP)
  • Empirical thresholds are estimated using simulations under the null hypothesis or permutations of trait values
  • Computation of power and accuracy of your design or any simulated design
Context in which the software is used

QTLMap source code is available under the CeCILL version 2.0 license, a GPL like license.

Utilisateurs

This software is used by genetic researchers to detect a region of the genome that controls an agronomic trait

Cluster Infrastructures

Software dependencies

Installation

  • Suite gcc (>=4.4)
  • CMake 2.6.4

Support

Users mailing list : inscription

Publications related to the software

Legarra A, Fernando RL, 2009. Linear models for joint association and linkage QTL mapping. Genet Sel Evol., 41:43.

Elsen JM, Filangi O, Gilbert H, Le Roy P, Moreno C, 2009. A fast algorithm for estimating transmission probabilities in QTL detection designs with dense maps. Genet Sel Evol., 41:50.

Gilbert H., Le Roy P., Moreno C., Robelin D., Elsen J. M., 2008. QTLMAP, a software for QTL detection in outbred population. Annals of Human Genetics, 72(5): 694.

Gilbert H, Le Roy P., 2007. Methods for the detection of multiple linked QTL applied to a mixture of full and half sib families. Genet Sel Evol., 39(2):139-58.

Moreno C.R., Elsen J.M., Le Roy P., Ducrocq V., 2005. Interval mapping methods for detecting QTL affecting survival and time–to–event phenotypes. Genet. Res. Camb., 85 : 139-149.

Goffinet B, Le Roy P, Boichard D, Elsen JM, Mangin B, 1999. Alternative models for QTL detection in livestock. III. Heteroskedastic model and models corresponding to several distributions of the QTL effect.. Genet. Sel. Evol., 31, 341-350.

Mangin B, Goffinet B, Le Roy P, Boichard D, Elsen JM, 1999. Alternative models for QTL detection in livestock. II. Likelihood approximations and sire marker genotype estimations. Genet. Sel. Evol., 31, 225-237.

Elsen JM, Mangin B, Goffinet B, Boichard D, Le Roy P, 1999. Alternative models for QTL detection in livestock. I. General introduction. Genet. Sel. Evol., 31, 213-224

Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 28/03/11
  • Minor correction: 21/05/19

Monolix : analysis of non linear mixed effects models

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 3.1 - 10/2009
  • License(s): CeCILL
  • Status: validated (according to PLUME), under development
  • Support: maintained, ongoing development
  • Designer(s): Marc Lavielle, Hector Mesa, Kaelig Chatel, Benoît Charles, Eric Blaudez
  • Contact designer(s): Marc.Lavielle@math.u-psud.fr
  • Laboratory, service:

 

General software features

MONOLIX is a free software dedicated to the analysis of non linear mixed effects models. The objective of this software is to perform: parameter estimation, model selection, goodness of fit plots and, data simulation.

Context in which the software is used
  • Research in statistic: University of Paris 5, 11 and 13
  • Research in pharmacology: INSERM - P7
  • Research in microbiology : INRA
Publications related to the software

SAEM algorithm

  • Delyon B., Lavielle M., and Moulines E. "Convergence of a stochastic approximation version of the EM algorithm" The Annals of Stat., vol 27, no. 1, pp 94-128, 1999.
  • Kuhn E., Lavielle M. "Coupling a stochastic approximation version of EM with a MCMC procedure" ESAIM P&S, vol.8, pp 115-131, 2004.
  • Kuhn E., Lavielle M. "Maximum likelihood estimation in nonlinear mixed effects models" Computational Statistics and Data Analysis, vol. 49, No. 4, pp 1020-1038, 2005.
  • Lavielle M., Meza C. "A Parameter Expansion version of the SAEM algorithm" Statistics and Computing, vol. 17, pp 121-130, 2007.
  • Donnet S., Samson A. "Estimation of parameters in incomplete data models defined by dynamical systems" Jour. of Stat. Planning and Inference, vol. 137, no. 9, pp 2815-2831, 2007.
  • Meza C., Jaffrezic F., Foulley J.L. "REML estimation of variance parameters in non linear mixed effects models using the SAEM algorithm" The Biometrical Journal 49, 1-13, 2007.
  • Donnet S., Samson A. "Parametric inference for mixed models defined by stochastic differential equations" ESAIM P&S, 12:196-218, (2008).

Applications

  • Makowski D., Lavielle M. "Using SAEM to estimate parameters of models of response to applied fertilizer" Journal of agricultural, Biological and Enviromental Statistics, vol. 11, n. 1, pp. 45-60, 2006.
  • Samson A., Lavielle M., Mentré F. "Extension of the SAEM algorithm to left-censored data in non-linear mixed-effects model: application to HIV dynamics models" Computational Statistics and Data Analysis, vol. 51, pp. 1562--1574, 2006.
  • Jaffrezic F., Meza C., Lavielle M., Foulley J.L. "Genetics analysis of growth curves using the SAEM algorithm" Genetics Selection Evolution, vol. 38, pp. 583--600, 2006.
  • Lavielle M., Mentré F. "Estimation of population pharmacokinetic parameters of saquinavir in HIV patients and covariate analysis with the SAEM algorithm" Journal of Pharmacokinetics and Pharmacodynamics, vol. 34, pp. 229--49, 2007.
  • Comets E, Verstuyft C, Lavielle M, Jaillon P, Becquemont L, Mentré F. Modelling the influence of MDR1 polymorphism on digoxin pharmacokinetic parameters. European Journal of Clinical Pharmacology, 63, pp. 437-49, 2007.
  • Samson A., Lavielle M., Mentré F. "The SAEM algorithm for group comparison tests in longitudinal data analysis based on nonlinear mixed-effects model" Statistics in Medicine, vol. 26, pp 4860-4875, 2007.
Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 26/10/10
  • Minor correction: 17/01/13

Mixmod : a software package for data supervised and unsupervised classification

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: mixmodLib 2.3.0 - mixmodGUI 0.9.6 - mixmodForMatlab 2.2.1 - RMixmod 1.1.3 - 2012
  • License(s): GPL, Proprietary licence
  • Status: validated (according to PLUME), stable release
  • Support: maintained, ongoing development
  • Designer(s): Florent Langrognet
  • Contact designer(s): contact@mixmod.org
  • Laboratory, service:

 

General software features
  • Supervised Classification
  • Unsupervised Classification

for quantitive and qualitative data.

To address issues of data classification, Mixmod uses mixture models, powerful and flexible tool.

Among the many features and tools available in Mixmod:

  • Processing quantitative data (with Gaussian mixture models) or qualititative (with Multinomial mixture models)
  • Parsimonious mixture models
  • Specific models for the treatment of high-dimensional data (individuals characterized by a large number of features)
  • EM, CEM, SEM Algorithms
  • Many strategies initialization
  • Criteria for selection of models and the number of classes suited to different purposes
Context in which the software is used

The Mixmod software package consists :

  • a library (C++): mixmodLib
  • a Graphical User Interface: mixmodGUI developed with QT : mixmodGUI
  • functions for R: RMixmod
  • functions for Matlab: mixmodForMatlab
Publications related to the software
  •  "MIXMOD : un logiciel de classification supervisée et non supervisée pour données quantitatives et qualitatives",
    F. Langrognet,
    La Revue de Modulad, numéro 40 (2009)
  •  "Le logiciel MIXMOD d'analyse de mélange pour la classification et l'analyse discriminante"
    C. Biernacki, G. Celeux, A. Echenim, G. Govaert, F. Langrognet,
    La Revue de Modulad 35, pp. 25-44. (2007)
  • "Model-Based Cluster and Discriminant Analysis with the MIXMOD Software",
    C. Biernacki, G. Celeux, G. Govaert, F. Langrognet,
    Computational Statistics and Data Analysis, vol. 51/2, pp. 587-600. (2006)
Higher Edu - Research dev card
Development from the higher education and research community
  • Creation or important update: 23/04/10
  • Minor correction: 23/04/10
  • Index card author: Florian Salipante (IGF - Contrôle de l'apoptose et de la prolifération dans les systèmes neuronaux et endocriniens)
  • Theme leader : Christelle Dantec (CRBM)

GAGG : algorithm (R code) allowing gene clustering

This software was developed (or is under development) within the higher education and research community. Its stability can vary (see fields below) and its working state is not guaranteed.
  • Web site
  • System:
  • Current version: 1.1 - 12/01/2010
  • License(s): not yet chosen
  • Status: internal use
  • Support: maintained, ongoing development
  • Designer(s): Florian Salipante, Christelle Reynès, Robert Sabatier
  • Contact designer(s): florian.salipante@univ-montp1.fr
  • Laboratory, service: research team 'Laboratoire de Physique Industrielle et Traitement de l'Information'

 

General software features

GAGG (Genetic Algorithm for Gene Gathering) is a new statistical method which allows to detect differentially expressed genes and to cluster them according to their expression profiles. This is a factorial method based on integer encoding of the projection variables. It allows to take into account the multivariate aspect of data. It requires the use of a genetic algorithm, and combines several statistical methods, such as PCA or k-means. The code is implemented in R language and consists in 5 functions. A main function GAGG, three internal functions GAGG1, GAGG2 and GAGG3 and a function which allows to visualize genes profiles PlotProfiles.

 

Profils
Context in which the software is used

GAGG algorithm is used to realize genes clusters according to their expression profiles.

 

It is essentially intended to biologists, statisticians and bioinformaticians, who have a minimal prerequisite in the use of R software.

Statistical knowledge and in particular in principal component analysis can facilitate the understanding of the graphics, but are not indispensable because the groups are generated in a self organizing manner.

In the same way, default parameters are set for the genetic algorithm:
Tpop and Ngene parameters corresponding respectively to the population size and the number of generations, can be modified by the user. The more these values are high, the more the chance to converge to the optimal solution will be increasing, but the computational time will be increased too.

The algorithm allows to indifferently treat monocolor or bicolor microarrays, the pre-treatment of data is let to the user who can choose his normalization (Quantile normalization, loess, lowess etc..) and standardization technics.

The data matrix will be presented with genes in rows and experimental conditions in columns. If necessary, a pre-treatment step will be added to the algorithm later.

GAGG method gives good results for gene clustering, it uses a genetic algorithm which is greedy in computation, that implies a long execution time (several hours), in function of Tpop and Ngene parameters. At the beginning, a message asks to the user how many components he wants to compute (some information are given to help with this choice), most of the time two components are sufficient. .

The code source may be downloaded.

Publications related to the software

An article will soon be published in the review CSDA

Syndicate content