SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Protected Attributes | Friends
CCommWordStringKernel Class Reference

Detailed Description

The CommWordString kernel may be used to compute the spectrum kernel from strings that have been mapped into unsigned 16bit integers.

These 16bit integers correspond to k-mers. To applicable in this kernel they need to be sorted (e.g. via the SortWordString pre-processor).

It basically uses the algorithm in the unix "comm" command (hence the name) to compute:

\[ k({\bf x},({\bf x'})= \Phi_k({\bf x})\cdot \Phi_k({\bf x'}) \]

where $\Phi_k$ maps a sequence ${\bf x}$ that consists of letters in $\Sigma$ to a feature vector of size $|\Sigma|^k$. In this feature vector each entry denotes how often the k-mer appears in that ${\bf x}$.

Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order up to 8.

For this kernel the linadd speedups are quite efficiently implemented using direct maps.

Definition at line 46 of file CommWordStringKernel.h.

Inheritance diagram for CCommWordStringKernel:
Inheritance graph
[legend]

Public Member Functions

 CCommWordStringKernel ()
 CCommWordStringKernel (int32_t size, bool use_sign)
 CCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CCommWordStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual void cleanup ()
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual bool init_dictionary (int32_t size)
virtual bool init_optimization (int32_t count, int32_t *IDX, float64_t *weights)
virtual bool delete_optimization ()
virtual float64_t compute_optimized (int32_t idx)
virtual void add_to_normal (int32_t idx, float64_t weight)
virtual void clear_normal ()
virtual EFeatureType get_feature_type ()
void get_dictionary (int32_t &dsize, float64_t *&dweights)
virtual float64_tcompute_scoring (int32_t max_degree, int32_t &num_feat, int32_t &num_sym, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, bool do_init=true)
char * compute_consensus (int32_t &num_feat, int32_t num_suppvec, int32_t *IDX, float64_t *alphas)
void set_use_dict_diagonal_optimization (bool flag)
bool get_use_dict_diagonal_optimization ()
- Public Member Functions inherited from CStringKernel< uint16_t >
 CStringKernel (int32_t cachesize=0)
 CStringKernel (CFeatures *l, CFeatures *r)
virtual EFeatureClass get_feature_class ()
- Public Member Functions inherited from CKernel
 CKernel ()
 CKernel (int32_t size)
 CKernel (CFeatures *l, CFeatures *r, int32_t size)
virtual ~CKernel ()
float64_t kernel (int32_t idx_a, int32_t idx_b)
SGMatrix< float64_tget_kernel_matrix ()
virtual SGVector< float64_tget_kernel_col (int32_t j)
virtual SGVector< float64_tget_kernel_row (int32_t i)
template<class T >
SGMatrix< T > get_kernel_matrix ()
virtual bool set_normalizer (CKernelNormalizer *normalizer)
virtual CKernelNormalizerget_normalizer ()
virtual bool init_normalizer ()
void load (CFile *loader)
void save (CFile *writer)
CFeaturesget_lhs ()
CFeaturesget_rhs ()
virtual int32_t get_num_vec_lhs ()
virtual int32_t get_num_vec_rhs ()
virtual bool has_features ()
bool get_lhs_equals_rhs ()
virtual void remove_lhs_and_rhs ()
virtual void remove_lhs ()
virtual void remove_rhs ()
 takes all necessary steps if the rhs is removed from kernel
void set_cache_size (int32_t size)
int32_t get_cache_size ()
void list_kernel ()
bool has_property (EKernelProperty p)
EOptimizationType get_optimization_type ()
virtual void set_optimization_type (EOptimizationType t)
bool get_is_initialized ()
bool init_optimization_svm (CSVM *svm)
virtual void compute_batch (int32_t num_vec, int32_t *vec_idx, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, float64_t factor=1.0)
float64_t get_combined_kernel_weight ()
void set_combined_kernel_weight (float64_t nw)
virtual int32_t get_num_subkernels ()
virtual void compute_by_subkernel (int32_t vector_idx, float64_t *subkernel_contrib)
virtual const float64_tget_subkernel_weights (int32_t &num_weights)
virtual void set_subkernel_weights (SGVector< float64_t > weights)
- Public Member Functions inherited from CSGObject
 CSGObject ()
 CSGObject (const CSGObject &orig)
virtual ~CSGObject ()
virtual bool is_generic (EPrimitiveType *generic) const
template<class T >
void set_generic ()
void unset_generic ()
virtual void print_serializable (const char *prefix="")
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
void set_global_io (SGIO *io)
SGIOget_global_io ()
void set_global_parallel (Parallel *parallel)
Parallelget_global_parallel ()
void set_global_version (Version *version)
Versionget_global_version ()
SGVector< char * > get_modelsel_names ()
char * get_modsel_param_descr (const char *param_name)
index_t get_modsel_param_index (const char *param_name)

Protected Member Functions

virtual float64_t compute (int32_t idx_a, int32_t idx_b)
virtual float64_t compute_helper (int32_t idx_a, int32_t idx_b, bool do_sort)
virtual float64_t compute_diag (int32_t idx_a)

Protected Attributes

int32_t dictionary_size
float64_tdictionary_weights
bool use_sign
bool use_dict_diagonal_optimization
int32_t * dict_diagonal_optimization

Friends

class CVarianceKernelNormalizer
class CSqrtDiagKernelNormalizer
class CAvgDiagKernelNormalizer
class CRidgeKernelNormalizer
class CFirstElementKernelNormalizer
class CTanimotoKernelNormalizer
class CDiceKernelNormalizer

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
Parallelparallel
Versionversion
Parameterm_parameters
Parameterm_model_selection_parameters
- Static Protected Member Functions inherited from CKernel
template<class T >
static void * get_kernel_matrix_helper (void *p)

Constructor & Destructor Documentation

default constructor

Definition at line 22 of file CommWordStringKernel.cpp.

CCommWordStringKernel ( int32_t  size,
bool  use_sign 
)

constructor

Parameters
sizecache size
use_signif sign shall be used

Definition at line 28 of file CommWordStringKernel.cpp.

CCommWordStringKernel ( CStringFeatures< uint16_t > *  l,
CStringFeatures< uint16_t > *  r,
bool  use_sign = false,
int32_t  size = 10 
)

constructor

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
use_signif sign shall be used
sizecache size

Definition at line 35 of file CommWordStringKernel.cpp.

~CCommWordStringKernel ( )
virtual

Definition at line 57 of file CommWordStringKernel.cpp.

Member Function Documentation

void add_to_normal ( int32_t  idx,
float64_t  weight 
)
virtual

add to normal

Parameters
idxwhere to add
weightwhat to add

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 241 of file CommWordStringKernel.cpp.

void cleanup ( )
virtual

clean up kernel

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 79 of file CommWordStringKernel.cpp.

void clear_normal ( )
virtual

clear normal

Reimplemented from CKernel.

Definition at line 286 of file CommWordStringKernel.cpp.

virtual float64_t compute ( int32_t  idx_a,
int32_t  idx_b 
)
protectedvirtual

compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object

Parameters
idx_aindex a
idx_bindex b
Returns
computed kernel function at indices a,b

Implements CKernel.

Definition at line 215 of file CommWordStringKernel.h.

char * compute_consensus ( int32_t &  num_feat,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas 
)

compute consensus

Parameters
num_featnumber of features
num_suppvecnumber of support vectors
IDXIDX
alphasalphas
Returns
computed consensus

Definition at line 498 of file CommWordStringKernel.cpp.

float64_t compute_diag ( int32_t  idx_a)
protectedvirtual

helper to compute only diagonal normalization for training

Parameters
idx_aindex a
Returns
unnormalized diagonal value

Definition at line 85 of file CommWordStringKernel.cpp.

float64_t compute_helper ( int32_t  idx_a,
int32_t  idx_b,
bool  do_sort 
)
protectedvirtual

helper for compute

Parameters
idx_aindex a
idx_bindex b
do_sortif sorting shall be performed
Returns
computed value

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 129 of file CommWordStringKernel.cpp.

float64_t compute_optimized ( int32_t  idx)
virtual

compute optimized

Parameters
idxindex to compute
Returns
optimized value at given index

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 326 of file CommWordStringKernel.cpp.

float64_t * compute_scoring ( int32_t  max_degree,
int32_t &  num_feat,
int32_t &  num_sym,
float64_t target,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas,
bool  do_init = true 
)
virtual

compute scoring

Parameters
max_degreemaximum degree
num_featnumber of features
num_symnumber of symbols
targettarget
num_suppvecnumber of support vectors
IDXIDX
alphasalphas
do_initif initialization shall be performed
Returns
computed scores

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 375 of file CommWordStringKernel.cpp.

bool delete_optimization ( )
virtual

delete optimization

Returns
if deleting was successful

Reimplemented from CKernel.

Definition at line 318 of file CommWordStringKernel.cpp.

void get_dictionary ( int32_t &  dsize,
float64_t *&  dweights 
)

get dictionary

Parameters
dsizedictionary size will be stored in here
dweightsdictionary weights will be stored in here

Definition at line 153 of file CommWordStringKernel.h.

virtual EFeatureType get_feature_type ( )
virtual

return feature type the kernel can deal with

Returns
feature type WORD

Reimplemented from CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 146 of file CommWordStringKernel.h.

virtual EKernelType get_kernel_type ( )
virtual

return what type of kernel we are

Returns
kernel type COMMWORDSTRING

Implements CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 95 of file CommWordStringKernel.h.

virtual const char* get_name ( ) const
virtual

return the kernel's name

Returns
name CommWordString

Reimplemented from CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 101 of file CommWordStringKernel.h.

bool get_use_dict_diagonal_optimization ( )

get.use.dict.diagonal.optimization

Returns
true if diagonal optimization is on

Definition at line 201 of file CommWordStringKernel.h.

bool init ( CFeatures l,
CFeatures r 
)
virtual

initialize kernel

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
Returns
if initializing was successful

Reimplemented from CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 65 of file CommWordStringKernel.cpp.

bool init_dictionary ( int32_t  size)
virtual

initialize dictionary

Parameters
sizesize

Definition at line 46 of file CommWordStringKernel.cpp.

bool init_optimization ( int32_t  count,
int32_t *  IDX,
float64_t weights 
)
virtual

initialize optimization

Parameters
countcount
IDXindex
weightsweights
Returns
if initializing was successful

Reimplemented from CKernel.

Definition at line 292 of file CommWordStringKernel.cpp.

void set_use_dict_diagonal_optimization ( bool  flag)

set_use_dict_diagonal_optimization

Parameters
flagenable diagonal optimization

Definition at line 192 of file CommWordStringKernel.h.

Friends And Related Function Documentation

friend class CAvgDiagKernelNormalizer
friend

Definition at line 50 of file CommWordStringKernel.h.

friend class CDiceKernelNormalizer
friend

Definition at line 54 of file CommWordStringKernel.h.

friend class CFirstElementKernelNormalizer
friend

Definition at line 52 of file CommWordStringKernel.h.

friend class CRidgeKernelNormalizer
friend

Definition at line 51 of file CommWordStringKernel.h.

friend class CSqrtDiagKernelNormalizer
friend

Definition at line 49 of file CommWordStringKernel.h.

friend class CTanimotoKernelNormalizer
friend

Definition at line 53 of file CommWordStringKernel.h.

friend class CVarianceKernelNormalizer
friend

Definition at line 48 of file CommWordStringKernel.h.

Member Data Documentation

int32_t* dict_diagonal_optimization
protected

array to hold counters for all strings

Definition at line 253 of file CommWordStringKernel.h.

int32_t dictionary_size
protected

size of dictionary (number of possible strings)

Definition at line 242 of file CommWordStringKernel.h.

float64_t* dictionary_weights
protected

dictionary weights - array to hold counters for all possible strings

Definition at line 245 of file CommWordStringKernel.h.

bool use_dict_diagonal_optimization
protected

whether diagonal optimization shall be used

Definition at line 251 of file CommWordStringKernel.h.

bool use_sign
protected

if sign shall be used

Definition at line 248 of file CommWordStringKernel.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation