SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Protected Attributes
CWeightedCommWordStringKernel Class Reference

Detailed Description

The WeightedCommWordString kernel may be used to compute the weighted spectrum kernel (i.e. a spectrum kernel for 1 to K-mers, where each k-mer length is weighted by some coefficient $\beta_k$) from strings that have been mapped into unsigned 16bit integers.

These 16bit integers correspond to k-mers. To applicable in this kernel they need to be sorted (e.g. via the SortWordString pre-processor).

It basically uses the algorithm in the unix "comm" command (hence the name) to compute:

\[ k({\bf x},({\bf x'})= \sum_{k=1}^K\beta_k\Phi_k({\bf x})\cdot \Phi_k({\bf x'}) \]

where $\Phi_k$ maps a sequence ${\bf x}$ that consists of letters in $\Sigma$ to a feature vector of size $|\Sigma|^k$. In this feature vector each entry denotes how often the k-mer appears in that ${\bf x}$.

Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order 8.

For this kernel the linadd speedups are quite efficiently implemented using direct maps.

Definition at line 50 of file WeightedCommWordStringKernel.h.

Inheritance diagram for CWeightedCommWordStringKernel:
Inheritance graph
[legend]

Public Member Functions

 CWeightedCommWordStringKernel ()
 CWeightedCommWordStringKernel (int32_t size, bool use_sign)
 CWeightedCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CWeightedCommWordStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual void cleanup ()
virtual float64_t compute_optimized (int32_t idx)
virtual void add_to_normal (int32_t idx, float64_t weight)
void merge_normal ()
bool set_wd_weights ()
bool set_weights (float64_t *w, int32_t d)
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual EFeatureType get_feature_type ()
virtual float64_tcompute_scoring (int32_t max_degree, int32_t &num_feat, int32_t &num_sym, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, bool do_init=true)
- Public Member Functions inherited from CCommWordStringKernel
 CCommWordStringKernel ()
 CCommWordStringKernel (int32_t size, bool use_sign)
 CCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CCommWordStringKernel ()
virtual bool init_dictionary (int32_t size)
virtual bool init_optimization (int32_t count, int32_t *IDX, float64_t *weights)
virtual bool delete_optimization ()
virtual void clear_normal ()
void get_dictionary (int32_t &dsize, float64_t *&dweights)
char * compute_consensus (int32_t &num_feat, int32_t num_suppvec, int32_t *IDX, float64_t *alphas)
void set_use_dict_diagonal_optimization (bool flag)
bool get_use_dict_diagonal_optimization ()
- Public Member Functions inherited from CStringKernel< uint16_t >
 CStringKernel (int32_t cachesize=0)
 CStringKernel (CFeatures *l, CFeatures *r)
virtual EFeatureClass get_feature_class ()
- Public Member Functions inherited from CKernel
 CKernel ()
 CKernel (int32_t size)
 CKernel (CFeatures *l, CFeatures *r, int32_t size)
virtual ~CKernel ()
float64_t kernel (int32_t idx_a, int32_t idx_b)
SGMatrix< float64_tget_kernel_matrix ()
virtual SGVector< float64_tget_kernel_col (int32_t j)
virtual SGVector< float64_tget_kernel_row (int32_t i)
template<class T >
SGMatrix< T > get_kernel_matrix ()
virtual bool set_normalizer (CKernelNormalizer *normalizer)
virtual CKernelNormalizerget_normalizer ()
virtual bool init_normalizer ()
void load (CFile *loader)
void save (CFile *writer)
CFeaturesget_lhs ()
CFeaturesget_rhs ()
virtual int32_t get_num_vec_lhs ()
virtual int32_t get_num_vec_rhs ()
virtual bool has_features ()
bool get_lhs_equals_rhs ()
virtual void remove_lhs_and_rhs ()
virtual void remove_lhs ()
virtual void remove_rhs ()
 takes all necessary steps if the rhs is removed from kernel
void set_cache_size (int32_t size)
int32_t get_cache_size ()
void list_kernel ()
bool has_property (EKernelProperty p)
EOptimizationType get_optimization_type ()
virtual void set_optimization_type (EOptimizationType t)
bool get_is_initialized ()
bool init_optimization_svm (CSVM *svm)
virtual void compute_batch (int32_t num_vec, int32_t *vec_idx, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, float64_t factor=1.0)
float64_t get_combined_kernel_weight ()
void set_combined_kernel_weight (float64_t nw)
virtual int32_t get_num_subkernels ()
virtual void compute_by_subkernel (int32_t vector_idx, float64_t *subkernel_contrib)
virtual const float64_tget_subkernel_weights (int32_t &num_weights)
virtual void set_subkernel_weights (SGVector< float64_t > weights)
- Public Member Functions inherited from CSGObject
 CSGObject ()
 CSGObject (const CSGObject &orig)
virtual ~CSGObject ()
virtual bool is_generic (EPrimitiveType *generic) const
template<class T >
void set_generic ()
void unset_generic ()
virtual void print_serializable (const char *prefix="")
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
void set_global_io (SGIO *io)
SGIOget_global_io ()
void set_global_parallel (Parallel *parallel)
Parallelget_global_parallel ()
void set_global_version (Version *version)
Versionget_global_version ()
SGVector< char * > get_modelsel_names ()
char * get_modsel_param_descr (const char *param_name)
index_t get_modsel_param_index (const char *param_name)

Protected Member Functions

virtual float64_t compute_helper (int32_t idx_a, int32_t idx_b, bool do_sort)
- Protected Member Functions inherited from CCommWordStringKernel
virtual float64_t compute (int32_t idx_a, int32_t idx_b)
virtual float64_t compute_diag (int32_t idx_a)

Protected Attributes

int32_t degree
float64_tweights
- Protected Attributes inherited from CCommWordStringKernel
int32_t dictionary_size
float64_tdictionary_weights
bool use_sign
bool use_dict_diagonal_optimization
int32_t * dict_diagonal_optimization

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
Parallelparallel
Versionversion
Parameterm_parameters
Parameterm_model_selection_parameters
- Static Protected Member Functions inherited from CKernel
template<class T >
static void * get_kernel_matrix_helper (void *p)

Constructor & Destructor Documentation

default constructor

Definition at line 18 of file WeightedCommWordStringKernel.cpp.

CWeightedCommWordStringKernel ( int32_t  size,
bool  use_sign 
)

constructor

Parameters
sizecache size
use_signif sign shall be used

Definition at line 24 of file WeightedCommWordStringKernel.cpp.

CWeightedCommWordStringKernel ( CStringFeatures< uint16_t > *  l,
CStringFeatures< uint16_t > *  r,
bool  use_sign = false,
int32_t  size = 10 
)

constructor

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
use_signif sign shall be used
sizecache size

Definition at line 32 of file WeightedCommWordStringKernel.cpp.

Definition at line 43 of file WeightedCommWordStringKernel.cpp.

Member Function Documentation

void add_to_normal ( int32_t  idx,
float64_t  weight 
)
virtual

add to normal

Parameters
idxwhere to add
weightwhat to add

Reimplemented from CCommWordStringKernel.

Definition at line 191 of file WeightedCommWordStringKernel.cpp.

void cleanup ( )
virtual

clean up kernel

Reimplemented from CCommWordStringKernel.

Definition at line 59 of file WeightedCommWordStringKernel.cpp.

float64_t compute_helper ( int32_t  idx_a,
int32_t  idx_b,
bool  do_sort 
)
protectedvirtual

helper for compute

Parameters
idx_aindex a
idx_bindex b
do_sortif sorting shall be performed

Reimplemented from CCommWordStringKernel.

Definition at line 96 of file WeightedCommWordStringKernel.cpp.

float64_t compute_optimized ( int32_t  idx)
virtual

compute optimized

Parameters
idxindex to compute
Returns
optimized value at given index

Reimplemented from CCommWordStringKernel.

Definition at line 253 of file WeightedCommWordStringKernel.cpp.

float64_t * compute_scoring ( int32_t  max_degree,
int32_t &  num_feat,
int32_t &  num_sym,
float64_t target,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas,
bool  do_init = true 
)
virtual

compute scoring

Parameters
max_degreemaximum degree
num_featnumber of features
num_symnumber of symbols
targettarget
num_suppvecnumber of support vectors
IDXIDX
alphasalphas
do_initif initialization shall be performed
Returns
computed score

Reimplemented from CCommWordStringKernel.

Definition at line 288 of file WeightedCommWordStringKernel.cpp.

virtual EFeatureType get_feature_type ( )
virtual

return feature type the kernel can deal with

Returns
feature type WORD

Reimplemented from CCommWordStringKernel.

Definition at line 134 of file WeightedCommWordStringKernel.h.

virtual EKernelType get_kernel_type ( )
virtual

return what type of kernel we are

Returns
kernel type WEIGHTEDCOMMWORDSTRING

Reimplemented from CCommWordStringKernel.

Definition at line 122 of file WeightedCommWordStringKernel.h.

virtual const char* get_name ( ) const
virtual

return the kernel's name

Returns
name WeightedCommWordString

Reimplemented from CCommWordStringKernel.

Definition at line 128 of file WeightedCommWordStringKernel.h.

bool init ( CFeatures l,
CFeatures r 
)
virtual

initialize kernel

Parameters
lfeatures of left-hand side
rfeatures of right-hand side
Returns
if initializing was successful

Reimplemented from CCommWordStringKernel.

Definition at line 48 of file WeightedCommWordStringKernel.cpp.

void merge_normal ( )

merge normal

Definition at line 221 of file WeightedCommWordStringKernel.cpp.

bool set_wd_weights ( )

set weighted degree weights

Returns
if setting was successful

Definition at line 67 of file WeightedCommWordStringKernel.cpp.

bool set_weights ( float64_t w,
int32_t  d 
)

set custom weights (swig compatible)

Parameters
wweights
ddegree (must match number of weights)
Returns
if setting was successful

Definition at line 85 of file WeightedCommWordStringKernel.cpp.

Member Data Documentation

int32_t degree
protected

degree

Definition at line 168 of file WeightedCommWordStringKernel.h.

float64_t* weights
protected

weights for each of the subkernels of degree 1...d

Definition at line 171 of file WeightedCommWordStringKernel.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation