TFIDF

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def tfIdfVector(countMap: Map[String, Long], words: Set[String], tfMode: String, smthTerm: Double, idfMode: String, termBoosts: Map[String, Double], numDocs: Int, maxTF: Long, dfMap: Map[String, Long]): Seq[Double]
def tfIdfVector(reader: IReader, field: String, docId: Int, words: Set[String] = Set.empty, tfMode: String = "n", a: Double = 0.4, idfMode: String = "t", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Seq[Double])

Generate tf-idf based feature vector from a document.
Generate tf-idf based feature vector from a document.
tf and idf calculations can be varied according to "tfMode" and "idfMode" parameters. See http://nlp.stanford.edu/IR-book/html/htmledition/variant-tf-idf-functions-1.html for theoretical backgrounds.
In default, when tfMode and idfMode are not given, weight for each term is given by this basic tf-idf formula
(tf) * log(N / df)
where tf is term frequency of the term in given document, N is the total number of documents and df is document frequency for the term.
reader
the IReader instance
field
the field name for counting words
docId
the Lucene document id
words
the set of words(terms) considered as feature. All words(terms) will be taken as features if empty set is given.
tfMode
tf calculation mode. Expected values are "n" (normal), "l" (logarithm), "m" (maximum normalization), "b" (boolean), "L" (Log ave), "w" (sublinear weighted). The default value is "n"
a
the smoothing term for tfMode "m". The default value is 0.4.
idfMode
idf calculation mode. Expected values are "n" (no), "t" (idf), "p" (prob idf). The default value is "t"
returns
the Vector of words and the feature vector
def tfIdfVectors(reader: IReader, field: String, docIds: List[Int], words: Set[String] = Set.empty, tfMode: String = "n", a: Double = 0.4, idfMode: String = "t", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Stream[Seq[Double]])

Generate tf-idf based feature vector from a document.
Generate tf-idf based feature vector from a document.
See also documentation for tfIdfVector().
reader
the IReader instance
field
the field name for counting words
docIds
the list of Lucene document id
words
the set of words(terms) considered as feature. All words(terms) will be taken as features if empty set is given.
tfMode
tf calculation mode. The default value is "n"
a
the smoothing term for tfMode "m". The default value is 0.4.
idfMode
idf calculation mode. The default value is "t"
returns
the pair of words and the feature vectors
def tfVector(reader: IReader, field: String, docId: Int, words: Set[String] = Set.empty, tfMode: String = "n", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Seq[Long])

Generate simple tf based feature vector from a document.
Generate simple tf based feature vector from a document.
reader
the IReader instance
field
the field name for counting words
docId
the Lucene document id
words
the set of words(terms) considered as feature. All words(terms) will be taken as features if empty set is given.
returns
the Vector of words and the feature vector
def tfVectors(reader: IReader, field: String, docIds: List[Int], words: Set[String] = Set.empty, tfMode: String = "n", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Stream[Seq[Long]])

Generate simple tf based feature vector from specified documents.
Generate simple tf based feature vector from specified documents.
reader
the IReader instance
field
the field name for counting words
docIds
the list of Lucene document id
words
the set of words(terms) considered as feature. All words(terms) will be taken as features if empty set is given.
returns
the pair of words and the feature vectors
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package stats

object TFIDF

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def tfIdfVector(countMap: Map[String, Long], words: Set[String], tfMode: String, smthTerm: Double, idfMode: String, termBoosts: Map[String, Double], numDocs: Int, maxTF: Long, dfMap: Map[String, Long]): Seq[Double]

def tfIdfVector(reader: IReader, field: String, docId: Int, words: Set[String] = Set.empty, tfMode: String = "n", a: Double = 0.4, idfMode: String = "t", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Seq[Double])

def tfIdfVectors(reader: IReader, field: String, docIds: List[Int], words: Set[String] = Set.empty, tfMode: String = "n", a: Double = 0.4, idfMode: String = "t", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Stream[Seq[Double]])

def tfVector(reader: IReader, field: String, docId: Int, words: Set[String] = Set.empty, tfMode: String = "n", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Seq[Long])

def tfVectors(reader: IReader, field: String, docIds: List[Int], words: Set[String] = Set.empty, tfMode: String = "n", termBoosts: Map[String, Double] = Map.empty): (Seq[String], Stream[Seq[Long]])

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped