SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Static Protected Member Functions | Protected Attributes
CMultidimensionalScaling Class Reference

Detailed Description

the class Multidimensionalscaling is used to perform multidimensional scaling (capable of landmark approximation if requested).

Description of classical embedding is given on p.261 (Section 12.1) of Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling: Theory and applications. Springer.

Description of landmark MDS approximation is given in

Sparse multidimensional scaling using landmark points V De Silva, J B Tenenbaum (2004) Technology, p. 1-4

In this preprocessor the LAPACK routine DSYEVR is used for solving an eigenproblem. If ARPACK library is available, its routines DSAUPD/DSEUPD are used instead.

Note that target dimension should be set with reasonable value (using set_target_dim). In case it is higher than intrinsic dimensionality of the dataset 'extra' features of the output might be inconsistent (essentially, according to zero or negative eigenvalues). In this case a warning is showed.

It is possible to apply multidimensional scaling to any given distance using apply_to_distance_matrix method. By default euclidean distance is used (with parallel instance replaced by preprocessor's one).

Faster landmark approximation is parallel using pthreads. As for choice of landmark number it should be at least 3 for proper triangulation. For reasonable embedding accuracy greater values (30%-50% of total examples number) is pretty good for the most tasks.

Definition at line 59 of file MultidimensionalScaling.h.

Inheritance diagram for CMultidimensionalScaling:
Inheritance graph
[legend]

Public Member Functions

 CMultidimensionalScaling ()
virtual ~CMultidimensionalScaling ()
virtual CSimpleFeatures
< float64_t > * 
embed_distance (CDistance *distance)
virtual CFeaturesapply (CFeatures *features)
const char * get_name () const
SGVector< float64_tget_eigenvalues () const
void set_landmark_number (int32_t num)
int32_t get_landmark_number () const
void set_landmark (bool landmark)
bool get_landmark () const
- Public Member Functions inherited from CEmbeddingConverter
 CEmbeddingConverter ()
virtual ~CEmbeddingConverter ()
virtual CSimpleFeatures
< float64_t > * 
embed (CFeatures *features)
void set_target_dim (int32_t dim)
int32_t get_target_dim () const
void set_distance (CDistance *distance)
CDistanceget_distance () const
void set_kernel (CKernel *kernel)
CKernelget_kernel () const
- Public Member Functions inherited from CConverter
 CConverter ()
virtual ~CConverter ()
- Public Member Functions inherited from CSGObject
 CSGObject ()
 CSGObject (const CSGObject &orig)
virtual ~CSGObject ()
virtual bool is_generic (EPrimitiveType *generic) const
template<class T >
void set_generic ()
void unset_generic ()
virtual void print_serializable (const char *prefix="")
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
void set_global_io (SGIO *io)
SGIOget_global_io ()
void set_global_parallel (Parallel *parallel)
Parallelget_global_parallel ()
void set_global_version (Version *version)
Versionget_global_version ()
SGVector< char * > get_modelsel_names ()
char * get_modsel_param_descr (const char *param_name)
index_t get_modsel_param_index (const char *param_name)

Protected Member Functions

virtual void init ()
 HELPERS.
SGMatrix< float64_tclassic_embedding (SGMatrix< float64_t > distance_matrix)
SGMatrix< float64_tlandmark_embedding (SGMatrix< float64_t > distance_matrix)
virtual SGMatrix< float64_tprocess_distance_matrix (SGMatrix< float64_t > distance_matrix)

Static Protected Member Functions

static void * run_triangulation_thread (void *p)
 STATIC.
static SGVector< int32_t > shuffle (int32_t count, int32_t total_count)

Protected Attributes

SGVector< float64_tm_eigenvalues
 FIELDS.
bool m_landmark
int32_t m_landmark_number
- Protected Attributes inherited from CEmbeddingConverter
int32_t m_target_dim
CDistancem_distance
CKernelm_kernel

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
Parallelparallel
Versionversion
Parameterm_parameters
Parameterm_model_selection_parameters

Constructor & Destructor Documentation

Definition at line 60 of file MultidimensionalScaling.cpp.

Definition at line 76 of file MultidimensionalScaling.cpp.

Member Function Documentation

CFeatures * apply ( CFeatures features)
virtual

apply preprocessor to feature matrix, changes feature matrix to the one having target dimensionality

Parameters
featuresfeatures which feature matrix should be processed
Returns
new feature matrix

Implements CEmbeddingConverter.

Definition at line 136 of file MultidimensionalScaling.cpp.

SGMatrix< float64_t > classic_embedding ( SGMatrix< float64_t distance_matrix)
protected

classical embedding

Parameters
distance_matrixdistance matrix to be used for embedding
Returns
new feature matrix representing given distance

Definition at line 149 of file MultidimensionalScaling.cpp.

CSimpleFeatures< float64_t > * embed_distance ( CDistance distance)
virtual

apply preprocessor to CDistance

Parameters
distance(should be approximate euclidean for consistent result)
Returns
new features with distance similar to given as much as possible

Definition at line 114 of file MultidimensionalScaling.cpp.

SGVector< float64_t > get_eigenvalues ( ) const

get last embedding eigenvectors

Returns
vector with last eigenvalues

Definition at line 81 of file MultidimensionalScaling.cpp.

bool get_landmark ( ) const

getter for landmark parameter

Returns
true if landmark embedding is used

Definition at line 104 of file MultidimensionalScaling.cpp.

int32_t get_landmark_number ( ) const

get number of landmarks

Returns
current number of landmarks

Definition at line 94 of file MultidimensionalScaling.cpp.

const char * get_name ( ) const
virtual

get name

Reimplemented from CEmbeddingConverter.

Reimplemented in CIsomap.

Definition at line 109 of file MultidimensionalScaling.cpp.

void init ( )
protectedvirtual

HELPERS.

default initialization

Reimplemented from CEmbeddingConverter.

Reimplemented in CIsomap.

Definition at line 69 of file MultidimensionalScaling.cpp.

SGMatrix< float64_t > landmark_embedding ( SGMatrix< float64_t distance_matrix)
protected

landmark embedding (approximate, accuracy varies with m_landmark_num parameter)

Parameters
distance_matrixdistance matrix to be used for embedding
Returns
new feature matrix representing given distance matrix

Definition at line 270 of file MultidimensionalScaling.cpp.

SGMatrix< float64_t > process_distance_matrix ( SGMatrix< float64_t distance_matrix)
protectedvirtual

process distance matrix (redefined in isomap, for mds does nothing)

Parameters
distance_matrixdistance matrix
Returns
processed distance matrix

Reimplemented in CIsomap.

Definition at line 131 of file MultidimensionalScaling.cpp.

void * run_triangulation_thread ( void *  p)
staticprotected

STATIC.

run triangulation thread for landmark embedding

Parameters
pthread parameters

Definition at line 406 of file MultidimensionalScaling.cpp.

void set_landmark ( bool  landmark)

setter for landmark parameter

Parameters
landmarktrue if landmark embedding should be used

Definition at line 99 of file MultidimensionalScaling.cpp.

void set_landmark_number ( int32_t  num)

set number of landmarks should be lesser than number of examples and greater than 3 for consistent embedding as triangulation is used

Parameters
numnumber of landmark to be set

Definition at line 86 of file MultidimensionalScaling.cpp.

SGVector< int32_t > shuffle ( int32_t  count,
int32_t  total_count 
)
staticprotected

subroutine used to shuffle count indexes among of total_count ones with Fisher-Yates (known as Knuth too) shuffle algorithm

Parameters
countnumber of indexes to be shuffled and returned
total_counttotal number of indexes
Returns
sorted shuffled indexes for landmarks

Definition at line 448 of file MultidimensionalScaling.cpp.

Member Data Documentation

SGVector<float64_t> m_eigenvalues
protected

FIELDS.

last embedding eigenvalues

Definition at line 140 of file MultidimensionalScaling.h.

bool m_landmark
protected

use landmark approximation?

Definition at line 143 of file MultidimensionalScaling.h.

int32_t m_landmark_number
protected

number of landmarks

Definition at line 146 of file MultidimensionalScaling.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation