Spanish stemmer / Lematizador para español

Stemmer-es A spanish stemmer / Un lematizador de español

English This is a PHP implementation of a stemmer. It's based on the algorithm by Martin Porter. More specifically, it's based on the Spanish stemming algorithm of Snowball. *How to use* Download the package from Github: https://github.com/pragone/stemmer-es/releases. Include the file stemm_es.php in your php script and call it's stemm method. The method is static so the call would be: `$result = stemm_es::stemm('canciones');` Note that the method only stemms one word at a time, you must split the text before feeding it to the stemmer. *What's included* The tgz file includes three files: stemm_es.php: This is the file you really need stemm_test.php: A script file for testing the stemmer against the corpus found on the Spanish stemming algorithm page stemm_test_corpus.txt: The test corpus. It's a list of words and stemms (one per line) *Does it work?* Using the corpus provided all the words were stemmed as expected. Actually the test script shows that the whole test corpus (28390 words) was stemmed in less than a second so it has a nice performance ;) *Licence and other stuff* This is a free library provided under the LGPL license and comes with no guarantees. If you feel it was usefull to you please send me an email at pragone at gmail dot com. Me My name is Paolo Ragone, and you can see the Open Source projects I'm working on here: https://github.com/pragone or on my blog: http://pragone.com/.	Español Esta es una implementación en PHP de un lematizador de español. Está basado en el algoritmo de Martin Porter. Específicamente, está basado en el Spanish stemming algorithm de Snowball. *Cómo utilizarlo* Descarga la librería de Github: https://github.com/pragone/stemmer-es/releases. Incluye el archivo stemm_es.php en tu php e invoca al método stemm. Este método es estático, por lo que la llamada sería algo así: `$result = stemm_es::stemm('canciones');` Nota: el método sólo obtiene la raíz de una palabra a la vez, por lo que debes picar tu texto en palabras antes de pasárselo al lematizador. *¿Qué incluye?* El archivo tgz incluye: stemm_es.php: Este es el único archivo que realmente necesitas stemm_test.php: Un programa para probar el lematizador contra un corpus de prueba encontrado en la página de Spanish stemming algorithm stemm_test_corpus.txt: El corpus de prueba. Es una lista de palabras y su raiz (una por línea) *Funciona?* Utilizando el corpus provisto, todas las raíces de las palabras fueron obtenidas perfectamente. De hecho, el script de prueba muestra que todo el corpus (28390 palabras) fue procesado en menos de un segundo, por lo que tiene un buen rendimiento ;) *Licencia y otras cosas* Esta es una librería gratuita liberada bajo la licencia LGPL y viene sin ninguna garantía. Si te ha sido de utilidad, por favor, envíame un correo a: pragone at gmail dot com. Yo Me llamo Paolo Ragone, y puedes ver los proyectos de código libre en los que estoy trabajando aquí: https://github.com/pragone o en mi blog: http://pragone.com/.

Stemmer-es