Unitex/GramLab is an open source, cross-platform, multilingual, lexicon- and grammar-based corpus processing suite

Unitex/GramLab project decision-making is based on a community meritocratic process. Anyone with an interest in Unitex/GramLab can join the community, contribute to the project design and participate in decisions.

Unitex

Unitex is the C++ Natural Language Processing (NLP) engine of Unitex/GramLab. It is distributed under the terms of the GNU Lesser General Public License version 2.1 (LGPLv2) and contains only few third-party code dependencies (LibYAML, Pstdin, TRE, WinGetOpt) licensed under more-permissive licenses.

GramLab

GramLab is the Project-oriented integrated development environment (IDE) of Unitex/GramLab. There is also a Classic IDE (Unitex.jar) that we are currently integrating with GramLab. They are distributed under the terms of the GNU Lesser General Public License version 2.1 (LGPLv2) and contains only few third-party dependencies (XAlign, Xerces2-j) licensed under equal or more-permissive licenses.

Language resources

Language resources released with Unitex/GramLab are distributed under the terms of the Lesser General Public License For Linguistic Resources (LGPLLR). For authors and more information on these language resources, see here.

Documentation

User’s Manual (in PDF format) is available in English and French (more translations are welcome). You can view and print them with Evince, downloadable here. The latest on-line version of the User’s Manual is accessible here.

Support

Support questions can be posted in the community support forum. You are welcome to ask to join at any time by following this link. Please feel free to submit any suggestions or requests for new features too. Some general advice about asking technical support questions can be found here.

Reporting Bugs

See the Bug Reporting Guide for information on how to report bugs.

Governance Model

Unitex/GramLab project decision-making is based on a community meritocratic process. Anyone with an interest in it can join the community, contribute to the project design and participate in decisions. The Unitex/GramLab Governance Model describes how this participation takes place and how to set about earning merit within the project community.

Spelling

Unitex/GramLab is spelled with capitals "U" "G" and "L", and with everything else in lower case. Excepting the forward slash, do not put a space or any character between words. When the forward slash is not allowed, you can simply write “UnitexGramLab”

It's common to refer to the Unitex/GramLab Core as "Unitex", and to the Unitex Project-oriented IDE as "GramLab". If you are mentioning the distribution suite (Core, IDE, Linguistic Resources and others bundled tools) always use "Unitex/GramLab".

Useful links

How to start ?

Thank you for your interest in contributing with the Unitex/GramLab development! You could start downloading a binary release and getting familiar with Unitex/GramLab. The User's Manual is available here.

Unitex/GramLab source code is hosted on https://github.com/UnitexGramLab. An overview of the C++ Core code (v3.0) is reachable here. For an overview of the Java IDE (v3.0) you could check this presentation. There are also some contribution rules here.

To start hacking the code, checkout the sources with git:

C++ Core

git clone https://github.com/UnitexGramLab/unitex-core.git

To compile under Linux use :

cd build
make DEBUG=yes UNITEXTOOLLOGGERONLY=yes

On Windows:

cd build
make ADDITIONAL_CFLAG+=-DUNITEX_PREVENT_USING_WINRT_API DEBUG=yes UNITEXTOOLLOGGERONLY=yes

Java GramLab IDE

git clone https://github.com/UnitexGramLab/gramlab-ide

To compile use:

ant

Language Resources

git clone https://github.com/UnitexGramLab/lingua

Where to start ?

All contributions are welcome. If you are a new comer and want to help with the Unitex/GramLab codebase, look the GitHub issues under the label good first issue.