Class PorterStemmer

Description

PHP5 Implementation of the Porter Stemmer algorithm. Certain elements were borrowed from the (broken) implementation by Jon Abernathy.

Usage:

$stem = PorterStemmer::Stem($word);

How easy is that?

Located in /lib/utilities/class.porterstemmer.php (line 21)


	
			
Variable Summary
static string $regex_consonant
static string $regex_vowel
Method Summary
static bool cvc (string $str)
static bool doubleConsonant (string $str)
static int m (string $str)
static bool replace ( &$str, string $check, string $repl, [int $m = null], string $str)
static string Stem (string $word)
static void step1ab ( $word)
static void step1c (string $word)
static void step2 (string $word)
static void step3 (string $word)
static void step4 (string $word)
static void step5 (string $word)
Variables
static string $regex_consonant = '(?:[bcdfghjklmnpqrstvwxz]|(?<=[aeiou])y|^y)' (line 27)

Regex for matching a consonant

  • access: private
static string $regex_vowel = '(?:[aeiou]|(?<![aeiou])y)' (line 34)

Regex for matching a vowel

  • access: private
Methods
static cvc (line 398)

Checks for ending CVC sequence where second C is not W, X or Y

  • return: Result
  • access: private
static bool cvc (string $str)
  • string $str: String to check
static doubleConsonant (line 384)

Returns true/false as to whether the given string contains two of the same consonant next to each other at the end of the string.

  • return: Result
  • access: private
static bool doubleConsonant (string $str)
  • string $str: String to check
static m (line 363)

What, you mean it's not obvious from the name?

m() measures the number of consonant sequences in $str. if c is a consonant sequence and v a vowel sequence, and <..> indicates arbitrary presence,

<c><v> gives 0 <c>vc<v> gives 1 <c>vcvc<v> gives 2 <c>vcvcvc<v> gives 3

  • return: The m count
  • access: private
static int m (string $str)
  • string $str: The string to return the m count for
static replace (line 331)

Replaces the first string with the second, at the end of the string. If third arg is given, then the preceding string must match that m count at least.

  • return: Whether the $check string was at the end of the $str string. True does not necessarily mean that it was replaced.
  • access: private
static bool replace ( &$str, string $check, string $repl, [int $m = null], string $str)
  • string $str: String to check
  • string $check: Ending to check for
  • string $repl: Replacement string
  • int $m: Optional minimum number of m() to meet
  • &$str
static Stem (line 43)

Stems a word. Simple huh?

  • return: Stemmed word
  • access: public
static string Stem (string $word)
  • string $word: Word to stem
static step1ab (line 63)

Step 1

  • access: private
static void step1ab ( $word)
  • $word
static step1c (line 111)

Step 1c

  • access: private
static void step1c (string $word)
  • string $word: Word to stem
static step2 (line 128)

Step 2

  • access: private
static void step2 (string $word)
  • string $word: Word to stem
static step3 (line 186)

Step 3

  • access: private
static void step3 (string $word)
  • string $word: String to stem
static step4 (line 224)

Step 4

  • access: private
static void step4 (string $word)
  • string $word: Word to stem
static step5 (line 295)

Step 5

  • access: private
static void step5 (string $word)
  • string $word: Word to stem

Documentation generated on Sun, 13 Dec 2009 19:39:33 +0000 by phpDocumentor 1.4.3