当前位置:文档之家› Semi-Supervised Learning by

Semi-Supervised Learning by

Semi-Supervised Learning by
Semi-Supervised Learning by

Under consideration for publication in Knowledge and Information

Systems

Semi-Supervised Learning by Disagreement

Zhi-Hua Zhou and Ming Li

National Key Laboratory for Novel Software Technology

Nanjing University,Nanjing210093,China

Abstract.In many real-world tasks there are abundant unlabeled examples but the number of labeled training examples is limited,because labeling the examples requires human e?orts and expertise.So,semi-supervised learning which tries to exploit unla-beled examples to improve learning performance has become a hot topic.Disagreement-based semi-supervised learning is an interesting paradigm,where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process.This survey article provides an introduction to research advances in this paradigm.

Keywords:Machine Learning;Data Mining;Semi-Supervised Learning;Disagreement-Based Semi-Supervised Learning

1.Introduction

In traditional supervised learning,hypotheses are learned from a large number of training examples.Each training example has a label which indicates the desired output of the event described by the example.In classi?cation,the label indicates the category into which the corresponding example falls into;in regression,the label is a real-valued output such as temperature,height,price,etc.

Advances in data collection and storage technology enable the easy accu-mulation of a large amount of training instances without labels in many real-world applications.Assigning labels to those unlabeled examples is expensive because the labeling process requires human e?orts and expertise.For example, in computer-aided medical diagnosis,a large number of X-ray images can be obtained from routine examination,yet it is di?cult to ask physicians to mark

Received October16,2008

Revised March16,2009

Accepted April03,2009

2Z.-H.Zhou and M.Li all focuses in all images.If we use traditional supervised learning techniques to build a diagnosis system,then only a small portion of training data,on which the focuses have been marked,are useful.Due to the limited amount of labeled training examples,it may be di?cult to get a strong diagnosis system.Then,a question arises:Can we leverage the abundant unlabeled training examples with a few labeled training examples to generate a strong hypothesis?Roughly speak-ing,there are three major techniques for this purpose[82],i.e.,semi-supervised learning,transductive learning and active learning.

Semi-supervised learning[21,92]deals with methods for automatically ex-ploiting unlabeled data in addition to labeled data to improve learning per-formance,where no human intervention is assumed.Transductive learning is a cousin of semi-supervised learning,which also tries to exploit unlabeled data au-tomatically.The main di?erence between them lies in the di?erent assumptions on the test data.Transductive learning takes a“close-world”assumption,i.e., the test data set is known in advance and the goal of learning is to optimize the generalization ability on this test data set,while the unlabeled examples are exactly the test examples.Semi-supervised learning takes an“open-world”assumption,i.e.,the test data set is not known and the unlabeled examples are not necessary test examples.In fact,the idea of transductive learning originated from statistical learning theory[69].Vapnik[69]believed that one often wants to make predictions on test examples at hand instead of on all potential examples, while inductive learning that seeks the best hypothesis over the whole distri-bution is a problem more di?cult than what is actually needed;we should not try to solve a problem by solving a more di?cult intermediate problem,and so, transductive learning is more appropriate than inductive learning.Up to now there is still a debate in the machine learning community on this learning phi-losophy.Nevertheless,it is well recognized that transductive learning provides an important insight into the exploitation of unlabeled data.

Active learning deals with methods that assume that the learner has some control over the input space.In exploiting unlabeled data,it requires an oracle, such as a human expert,from which the ground-truth labels of instances can be queried.The goal of active learning is to minimize the number of queries for building a strong learner.Here,the key is to select those unlabeled examples where the labeling will convey the most helpful information to the learner.There are two major schemes,i.e.,uncertainty sampling and committee-based sampling. Approaches of the former train a single learner and then query the unlabeled example on which the learner is least con?dent[45];approaches of the latter generate multiple learners and then query the unlabeled example on which the learners disagree to the most[1,63].

In this survey article,we will introduce an interesting and important semi-supervised learning paradigm,i.e.,disagreement-based semi-supervised learning. This line of research started from Blum&Mitchell’s seminal paper on co-training [13]1.Di?erent relevant approaches have been developed with di?erent names, and recently the name disagreement-based semi-supervised learning was coined [83]to re?ect the fact that they are actually in the same family,and the key for the learning process to proceed is to maintain a large disagreement between base learners.Although transductive learning or active learning may be involved in some place,we will not talk more on them.In the following we will start by a 1This seminal paper won the“ten years best paper award”at ICML’08.

Semi-Supervised Learning by Disagreement3 brief introduction to semi-supervised learning,and then we will go to the main theme to introduce representative disagreement-based semi-supervised learning approaches,theoretical foundations,and some applications to real-world tasks.

2.Semi-Supervised Learning

In semi-supervised learning,a labeled training data set L={(x1,y1),(x2,y2),...,

(x|L|,y|L|)}and an unlabeled training data set U={x 1,x 2,...,x

|U|}are pre-

sented to the learning algorithm to construct a function f:X→Y for predicting the labels of unseen instances,where X and Y are respectively the input space and output space,x i,x j∈X(i=1,2,...,|L|,j=1,2,...,|U|)are d-dimensional feature vectors drawn from X,and y i∈Y is the label of x i;usually|L| |U|.

It is well-known that semi-supervised learning originated from[64].In fact, some straightforward use of unlabeled examples appeared even earlier[40,50, 52,53,57].Due to the di?culties in incorporating unlabeled data directly into conventional supervised learning methods(e.g.,BP neural networks)and the lack of a clear understanding of the value of unlabeled data in the learning process, the study of semi-supervised learning attracted attention only after the middle of1990s.As the demand for automatic exploitation of unlabeled data increases and the value of unlabeled data was disclosed by some early analyses[54,78], semi-supervised learning has become a hot topic.

Most early studies did not provide insight or explanation to the reason why unlabeled data can be bene?https://www.doczj.com/doc/5f816874.html,ler and Uyar[54]provided possibly the ?rst explanation to the usefulness of unlabeled data from the perspective of data distribution estimation.They assumed that the data came from a Gaussian mixture model with L mixture components,i.e.,

f(x|θ)=

L

l=1

αl f(x|θl),(1)

whereαl is the mixture coe?cient satisfying L

l=1

αl=1,whileθ={θl}are

the model parameters.In this case,label c i can be considered a random variable C whose distribution P(c i|x i,m i)is determined by the mixture component m i and the feature vector x i.The optimal classi?cation rule for this model is the MAP(maximum a posterior)criterion,that is,

h(x)=arg max

k

j

P(c i=k|m i=j,x i)P(m i=j|x i),(2)

where

P(m i=j|x i)=

αj f(x i|θj)

L

l=1

αl f(x i|θl)

.(3)

Thus,the objective of learning is accomplished by estimating the terms P(c i=k|m i=j,x i)and P(m i=j|x i)from the training data.It can be found that only the estimate of the?rst probability involves the class label.So,unla-beled examples can be used to improve the estimate of the second probability,and hence improve the performance of the learned https://www.doczj.com/doc/5f816874.html,ter,Zhang and Oles

4Z.-H.Zhou and M.Li [78]analyzed the value of unlabeled data for parametric models.They suggested that if a parametric model can be decomposed as P(x,y|θ)=P(y|x,θ)P(x|θ), the use of unlabeled examples can help to reach a better estimate of the model parameters.

There are two basic assumptions in semi-supervised learning,that is,the cluster assumption and the manifold assumption.The former assumes that data with similar inputs should have similar class labels;the latter assumes that data with similar inputs should have similar outputs.The cluster assumption concerns classi?cation,while the manifold assumption can also be applied to tasks other than classi?cation.In some sense,the manifold assumption is a generalization of the cluster assumption.These assumptions are closely related to the idea of low density separation,which has been taken by many semi-supervised learning algorithms.No matter which assumption is taken,the common underlying belief is that the unlabeled data provide some helpful information on the ground-truth data distribution.So,a key of semi-supervised learning is to exploit the distributional information disclosed by unlabeled examples.

Many semi-supervised learning algorithms have been developed.Roughly speaking,they can be categorized into four categories,i.e.,generative meth-ods[54,56,64],S3VMs(Semi-Supervised Support Vector Machines)[22,37,42,44], graph-based methods[7–9,80,93],and disagreement-based methods[13,16,36, 48,55,85,88,89,91].

In generative approaches,both labeled and unlabeled examples are assumed to be generated by the same parametric model.Thus,the model parameters directly link unlabeled examples and the learning objective.Methods in this category usually treat the labels of the unlabeled data as missing values of model parameters,and employ the EM(expectation-maximization)algorithm[29]to conduct maximum likelihood estimation of the model parameters.The methods di?er from each other by the generative models used to?t the data,for example, mixture of Gaussian[64],mixture of experts[54],Na¨?ve Bayes[56],etc.The generative methods are simple and easy to implement,and may achieve better performance than discriminative models when learning with a very small number of labeled examples.However,methods in this category su?er from a serious de?ciency.That is,when the model assumption is incorrect,?tting the model using a large number of unlabeled data will result in performance degradation [23,26].Thus,in order to make it e?ective in real-world applications,one needs to determine the correct generative model to use based on domain knowledge. There are also attempts of combining advantages of generative and discriminative approaches[4,33].

S3VMs try to use unlabeled data to adjust the decision boundary learned from the small number of labeled examples,such that it goes through the less dense region while keeping the labeled data being correctly classi?ed.Joachims [42]proposed TSVM(Transductive Support Vector Machine).This algorithm ?rstly initiates an SVM using labeled examples and assigns potential labels to unlabeled data.Then,it iteratively maximizes the margin over both labeled and unlabeled data with their potential labels by?ipping the labels of the unlabeled examples on di?erent sides of the decision boundary.An optimal solution is reached when the decision boundary not only classi?es the labeled data as accu-rate as possible but also avoids going through the high density region.Chapelle and Zien[22]derived a special graph kernel using the low density separation cri-terion,and employed gradient descent to solve the SVM optimization problem. The non-convexity of the loss function of TSVM leads to the fact that there are

Semi-Supervised Learning by Disagreement5 many local optima.Many studies tried to reduce the negative in?uence caused by the non-convexity.Typical methods include:employing a continuation approach, which begins by minimizing an easy convex objective function and sequentially deforms it to the non-convex loss function of TSVM[20];employing a determin-istic annealing approach,which decomposes the original optimization problem into a series of convex optimization problems,from easy to hard,and solves them sequentially[65,66];employing the convex-concave procedure(CCCP)[77] to directly optimize the non-convex loss function[25],etc.

The?rst graph-based semi-supervised learning method is possibly[11].Blum and Chawla[11]constructed a graph whose nodes are the training examples (both labeled and unlabeled)and the edges between nodes re?ect certain re-lation,such as similarity,between the corresponding examples.Based on the graph,the semi-supervised learning problem can be addressed by seeking the minimum cut of the graph such that nodes in each connected component have the same https://www.doczj.com/doc/5f816874.html,ter,Blum et al.[12]disturbed the graph with some random-ness and produced a“soft”minimum cut using majority voting.Note that the predictive function in[11]and[12]is discrete,i.e.,the prediction on unlabeled ex-amples should be one of the possible labels.Zhu et al.[93]extended the discrete prediction function to continuous case.They modelled the distribution of the prediction function over the graph with Gaussian random?elds and analytically showed that the prediction function with the lowest energy should have the har-monic property.They designed a label propagation strategy over the graph using such a harmonic property.Zhou et al.[80]de?ned a quadratic loss of the predic-tion function over both the labeled and unlabeled data,and used a normalized graph Laplacian as the regularizer.They provided an iterative label propagation method yielding the same solution of the regularized loss function.Belkin and Niyogi[7]assumed that the data are distributed on a Riemannian manifold,and used the discrete spectrum and its eigenfunction of a nearest neighbor graph to reform the learning problem to interpolate over the data points in Hillbert space. Then,Belkin et al.[8,9]further extended the idea of manifold learning in semi-supervised learning scenario,and proposed manifold regularization framework in Reproducing Kernel Hillbert Space(RKHS).This framework directly exploits the local smoothness assumption to regularize the loss function de?ned over the labeled training examples such that the learned prediction function is biased to give similar output to the examples in a local region.Sindhwani et al.[67] embedded the manifold regularization into a semi-supervised kernel de?ned over the overall input space.They modi?ed the original RKHS by changing the norm while keeping the same function space.This leads to a new RKHS,in which learning supervised kernel machines with only the labeled data is equivalent to a certain manifold regularization over both labeled and unlabeled data in original input space.

Most of the previous studies on graph-based semi-supervised learning usually focus on how to conduct semi-supervised learning over a given graph.It is note-worthy that how to construct a graph which re?ects the essential relationship between examples is a key that will seriously a?ect the learning performance. Although the graph construction might favor certain domain knowledge,some re-searchers have attempted to construct graphs of high quality using some domain-knowledge-independent properties.Carreira-Perpinan and Zemel[19]generated multiple minimum spanning trees based on perturbation to construct a robust graph.Wang and Zhang[70]used the idea of LLE[60]that instances can be reconstructed by their neighbors to obtain weights over the edges in the graph.

6Z.-H.Zhou and M.Li Zhang and Lee[79]selected a better RBF bandwidth to minimize the predic-tive error on labeled data using cross validation.Hein and Maier[39]attempted to remove noisy data and hence obtained a better graph.Note that,although graph-based semi-supervised learning approaches have been used in many ap-plications,they su?er seriously from poor scalability.This de?ciency has been noticed and some e?orts have been devoted to this topic[34,76,94].Recently, Goldberg et al.[35]proposed an online manifold regularization framework as well as e?cient solutions,which improves the applicability of manifold regularization to large-scale and real-time problems.

The name disagreement-based semi-supervised learning was coined recently by Zhou[83],but this line of research started from Blum&Mitchell’s seminal work[13].In those approaches,multiple learners are trained for the same task and the disagreements among the learners are exploited during the learning process. Here,unlabeled data serve as a kind of“platform”for information exchange.If one learner is much more con?dent on a disagreed unlabeled example than other learner(s),then this learner will teach other(s)with this example;if all learners are comparably con?dent on a disagreed unlabeled example,then this example may be selected for query.Since methods in this category do not su?er from the model assumption violation,nor the non-convexity of the loss function,nor the poor scalability of the learning algorithms,disagreement-based semi-supervised learning has become an important learning paradigm.In the following sections, we will review studies of this paradigm in more detail.

3.Disagreement-Based Semi-Supervised Learning

A key of disagreement-based semi-supervised learning is to generate multiple learners,let them collaborate to exploit unlabeled examples,and maintain a large disagreement between the base learners.In this section,we roughly classify existing disagreement-based semi-supervised learning techniques into three cat-egories,that is,learning with multiple views,learning with single view multiple classi?ers,and learning with single view multiple regressors.

3.1.Learning with Multiple Views

In some applications,the data set has several disjoint subsets of attributes(each subset is called as a view).For example,the web page classi?cation task has two views,i.e.,the texts appearing on the web page itself and the anchor text attached to hyper-links pointing to this page[13].Naturally,we can generate multiple learners with these multiple views and then use the multiple learn-ers to start disagreement-based semi-supervised learning.Note that there were abundant research on multi-view learning,yet a lot of work was irrelevant to semi-supervised learning,and so they are not mentioned in this section.

The?rst algorithm of this paradigm is the co-training algorithm proposed by Blum and Mitchell[13].They assumed that the data has two su?cient and redundant views(i.e.,attribute sets),where each view is su?cient for training a strong learner and the views are conditionally independent to each other given the class label.

The co-training procedure,which is illustrated in Fig.1,is rather simple. In co-training,each learner is generated using the original labeled data.Then,

Semi-Supervised Learning by Disagreement7

Fig.1.An illustration of the co-training procedure

each learner will select and label some high-con?dent unlabeled examples for its peer https://www.doczj.com/doc/5f816874.html,ter,the learners will be re?ned using the newly labeled examples provided by its peer.With such a process,when two learners disagree on an unlabeled example,the learner which misclassi?es this example will be taught by its peer.The whole process will repeat until no learner changes or a pre-set number of learning rounds has been executed.Blum and Mitchell[13]analyzed the e?ectiveness of the co-training algorithm,and showed that co-training can e?ectively exploit unlabeled data to improve the generalization ability,given that the training data are described by su?cient and redundant views which are conditionally independent to each other given the class label.

Another famous multi-view semi-supervised learning algorithm,co-EM[55], combines multi-view learning with the probabilistic EM approach.This algo-rithm requires the base learners be capable of estimating class probabilities, and so na¨?ve Bayes classi?ers are generally used.Through casting linear classi-?ers into a probabilistic framework,Brefeld and Sche?er[15]replaced the na¨?ve Bayes classi?ers by support vector machines.The co-EM algorithm has also been applied to unsupervised clustering[10].

Brefeld et al.[14]tried to construct a hidden markov perceptron[3]in each of the two views,where the two hidden markov perceptrons were updated according to the heuristic that,if the two perceptrons disagree on an unlabled example then each perceptron is moved towards that of its peer view.Brefeld et al.[14]did not mention how to extend this method to more than two views,but it could be able to move the perceptrons towards their median peer view when they disagree, according to the essential of their heuristics.However,the convergence of the process has not been proved even for two-view case.Brefeld and Sche?er[16] extended SVM-2K[32],a supervised co-SVM that minimizes the training error as well as the disagreement between the two views,to semi-supervised learning and applied it to several tasks involving structured output variables such as multi-class classi?cation,label sequence learning and natural language parsing.

In real-world applications,when the data has two views,it is rarely that the two views are conditionally independent given the class label.Even a weak

8Z.-H.Zhou and M.Li conditional independence[2]is di?cult to be met in practice.In fact,the as-sumption of su?cient and redundant views,which are conditionally independent to each other given the class label,is so strong that when it holds,a single labeled training example is able to launch a successful semi-supervised learning[91].

Zhou et al.[91]e?ectively exploited the“compatibility”of the two views to turn some unlabeled examples into labeled ones.Speci?cally,given two su?-cient and redundant views v1and v2(in this case,an instance is represented by

x=(x(v1),x(v2))),a prediction function f v

i is learned from each view respec-

tively.Since the two views are su?cient,the learned prediction functions satisfy

f v

1(x(v1))=f v

2

(x(v2))=y,where y is the ground-truth label of x.Intuitively,

some projections in these two views should have strong correlation with the ground-truth.For either view,there should exist at least one projection which is correlated strongly with the ground-truth,since otherwise this view could not be su?cient.Since the two su?cient views are conditionally independent given the class label,the most strongly correlated pair of projections should be in accor-dance with the ground-truth.Thus,if such highly correlated projections of these two views can be identi?ed,they can help induce the labels of some unlabeled ex-amples.With those additional labeled examples,two learners can be generated, and then they can be improved using the standard co-training routine,i.e.,if the learners disagree on an unlabeled example,the learner which misclassi?es this example will be taught by its peer.To identify the correlated projections,Zhou et al.[91]employed kernel canonical component analysis(KCCA)[38]to?nd two sets of basis vectors in the feature space,one for each view,such that after projecting the two views onto the corresponding sets of basis vectors,the corre-lation between the projected views is maximized.Here the correlation strength (λ)of the projections are also given by KCCA.Instead of considering only the highest correlated projections,they used top m projections with the m-highest correlation strength.Finally,by linearly combining similarity in each projection, they computed con?denceρi of each unlabeled example x i as being with the same label as that of the single labeled positive example x0,as shown in Eq.4,

ρi=

m

j=1

λj sim i,j,(4)

where

sim i,j=exp

?d2

P j(x(v1)

i

),P j(x(v1)

)

+

exp

?d2

P j(x(v2)

i

),P j(x(v2)

)

(5)

and d(a,b)measures the Euclidian distance between a and b.

Thus,several unlabeled examples with the highest and lowest con?dence values can be picked out,respectively,and used as extra positive and negative examples.Based on this augmented labeled training set,standard co-training can be employed for semi-supervised learning.Again,intuitively,when the two learners disagree on an unlabeled example,the learner which misclassi?es this example will be taught by its peer.Such kind of method has been applied to content-based image retrieval[91]where there is only one example image in the ?rst round of query.

Semi-Supervised Learning by Disagreement9 3.2.Learning with Single View Multiple Classi?ers

In most real-world applications the data sets have only one attribute set rather than two.So the e?ectiveness and usefulness of the standard co-training is lim-ited.To take advantage of the interaction between learners when exploiting un-labeled data,methods that do not rely on the existence of two views have been developed.

A straightforward way to tackle this problem is to partition the attribute sets into two disjoint sets,and conduct standard co-training based on the manually generated views.Nigam and Ghani[55]empirically studied the performance of standard co-training algorithm in this case.The experimental results suggested that when the attribute set is su?ciently large,randomly splitting the attributes and then conducting standard co-training may lead to a good performance.How-ever,many applications are not described by a large number of attribute,and co-training on randomly partitioned views is not always e?ective.Thus,a better way is to design single-view methods that can exploit the interaction between multiple learners rather than tailoring the data sets for standard two-view co-training.

Goldman and Zhou[36]proposed a method that does not rely on two views. They employed di?erent learning algorithms to train the two classi?ers,respec-tively.It is required that each classi?er is able to partition the instance space into a number of equivalence classes.In order to identify which unlabeled ex-ample to label,and to decide how to make the prediction when two classi?ers disagree,ten-fold cross validations are executed such that the con?dences of the two classi?ers as well as the con?dences of the equivalence classes that contain the concerned instance can be https://www.doczj.com/doc/5f816874.html,ter,this idea was extended to in-volving more learning algorithms[81].Note that although[36]does not rely on the existence of two views,it requires special learning algorithms to construct classi?ers.This prevents its application to other kinds of learning algorithms.

Zhou and Li[88]proposed the tri-training method,which requires neither the existence of two views nor special learning algorithms,thus it can be applied to more real-world problems.In contrast to the previous studies[13,36,55],tri-training attempts to exploit unlabeled data using three classi?ers.Such a setting tackles the problem of determining how to e?ciently select most con?dently predicted unlabeled examples to label and produces?nal hypothesis.Note that the essential of tri-training is extensible for more than three classi?ers,which will be introduced later.The use of more classi?ers also provides a chance to employ ensemble learning techniques[84]to improve the performance of semi-supervise learning.

Generally,tri-training works in the following way.First,three classi?ers are initially trained from the original labeled data.Unlike[36],tri-training uses the same learning algorithm(e.g.,C4.5decision tree)to generate the three classi?ers. In order to make the three classi?ers diverse,the original labeled example set is bootstrap sampled[31]to produce three perturbed training sets,on each of which a classi?er is then generated.The generation of the initial classi?ers is similar to training an ensemble from the labeled example set using Bagging[17].Then, intuitively,in each tri-training round,if two classi?ers agree on the labeling of an unlabeled example while the third one disagrees,then these two classi?ers will teach the third classi?er on this example.Finally,three classi?ers are combined by majority voting.Note that the“majority teaches minority”strategy serves as an implicit con?dence measurement,which avoids the use of complicated time-

10Z.-H.Zhou and M.Li consuming approaches to explicitly measure the predictive con?dence,and hence the training process is e?cient.Such an implicit measurement,however,might not be as accurate as an explicit estimation,since sometimes“minority holds the truth”.Thus,some additional control is needed to reduce the negative in?uence of incorrectly labeled examples.Zhou and Li[88]analytically showed that the negative in?uence can be compensated if the amount of newly labeled examples is su?cient under certain conditions.

Inspired by[36],Zhou and Li[88]derived the criterion based on theoretical results of learning from noisy examples[5].In detail,if a sequenceσof m samples is drawn,where the sample size m satis?es Eq.6,

m≥

2

2(1?2η)2

ln(

2N

δ

),(6)

where is the hypothesis worst-case classi?cation error rate,η(<0.5)is an upper bound on the classi?cation noise rate,N is the number of hypotheses,and δis the con?dence,then a hypothesis H i that minimizes disagreement withσwill have the PAC property,i.e.,

Pr[d(H i,H?)≥ ]≤δ,(7) where d(,)is the sum over the probability of elements from the symmetric dif-ference between the two hypothesis sets H i and H?(the ground-truth).Let

c=2μln(2N

δ)whereμmakes Eq.6hold equality.After some reforming,Eq.6

becomes Eq.8.

u=c

=m(1?2η)2(8)

For each classi?er,in order to keep improving the performance in the training process,the u value of the current round should be greater than that in its previous round.Let L t and L t?1denote the newly labeled data set of a classi?er in the t-th round and(t?1)-th round,respectively.Then the training sets for this classi?er in the t-th round and(t?1)-th round are L∪L t of the size of |L∪L t|and L∪L t?1of the size of|L∪L t?1|,respectively.Letˇe t andˇe t?1 denote the upper bound of the classi?cation error rate of the hypothesis derived from the combination of the other two classi?ers in the t-th round and(t?1)-th round,respectively.By comparing Eq.8in the subsequent rounds,the condition that a classi?er’s performance can be improved through the re?nement in the t-th round is shown as

0<

ˇe t

ˇe t?1

<

|L t?1|

|L t|

<1.(9)

Such a condition is used as the stopping criterion for tri-training algorithm. If none of the three classi?ers satis?es the condition shown in Eq.9in the t-th round,tri-training stops and outputs the learned classi?ers.Note that Eq.9 sometimes could not be satis?ed,due to the fact that|L t|may be far bigger than |L t?1|instead ofˇe t being higher thanˇe t?1.When this happens,in order not to stop training before the error rate of the classi?er becomes low,L t are randomly subsampled to size s according to Eq.10to make Eq.9hold again,

s=

ˇe t?1|L t?1|

ˇe t

?1

,(10)

Semi-Supervised Learning by Disagreement11 Table1.Pseudo-code describing the tri-training algorithm[88]

tri-training(L,U,Learn)

Input:L:Original labeled example set

U:Unlabeled example set

Learn:Learning algorithm

for i∈{1..3}do

S i←BootstrapSample(L)

h i←Learn(S i)

e i ←.5;l

i

←0

end of for

repeat until none of h i(i∈{1..3})changes for i∈{1..3}do

L i←?;update i←F ALSE

e i←MeasureError(h j&h k)(j,k=i)

if(e i

i )%otherwise Eq.9is violated

then for every x∈U do

if h j(x)=h k(x)(j,k=i)

then L i←L i∪{(x,h j(x))}

end of for

if(l

i

=0)%h i has not been updated before

then l

i ←

e i

e

i

?e i

+1

%refer Eq.11

if(l

i

<|L i|)%otherwise Eq.9is violated

then if(e i|L i|

i l

i

)%otherwise Eq.9is violated

then update i←T RUE

else if l

i >e i

e

i

?e i

%refer Eq.11

then L i←Subsample(L i,

e

i

l

i

e i

?1

)

%refer Eq.10

update i←T RUE

end of for

for i∈{1..3}do

if update i=T RUE

then h i←Learn(L∪L i);e

i ←e i;l

i

←|L i|

end of for end of repeat

Output:h(x)←arg max

y∈label

i:h i(x)=y

1

where L t?1should satisfy Eq.11such that the size of L t after subsampling,i.e. s,is still bigger than|L t?1|.

|L t?1|>

ˇe t

ˇe?ˇe

(11)

The pseudo-code of tri-training algorithm is shown in Table1.The function MeasureError(h j&h k)estimates the classi?cation error rate of the hypothe-sis derived from the combination of h j and h k.The function Subsample(L t,s) randomly removes|L t|?s examples from L t where s is computed according to Eq.10.

As mentioned before,the essential of tri-training is extensible for much more

12Z.-H.Zhou and M.Li classi?ers.Li and Zhou[48]proposed the Co-Forest algorithm,which extended tri-training to the collaboration of many classi?ers in training process.By using an ensemble of classi?ers,several immediate bene?ts can be achieved.First,for each classi?er h i,its concomitant ensemble H i,i.e.,ensemble of all the other classi?es,is used to label several unlabeled examples for this classi?er.As an ensemble of classi?ers usually achieves a better generalization than a single clas-si?er[73,84],the labeling of unlabeled data becomes more reliable,and thus, classi?er re?nement using these high quality newly labeled data can lead to bet-ter performances.Second,without employing sophisticated and time-consuming methods to estimate the predictive con?dence,the use of multiple classi?ers enables an e?cient estimation of the predictive con?dence.Such an explicit es-timate can be further exploited to guide the use of the corresponding unlabeled example in the training stage.

Although the advantages of using multiple classi?ers seems promising,a straightforward extension su?ers from a problem,that is,the“majority teaches minority”process hurts the generalization ability of an ensemble of classi?ers.It is known that the diversity of the learners is a key of a good ensemble[84].Dur-ing the“majority teaches minority”process,the behaviors of the learners will become more and more similar,and thus the diversity of the learners decreases rapidly.

To address this problem,Li and Zhou[48]proposed to inject certain ran-domness into the semi-supervised learning process.They employed two strate-gies.First,randomness is injected into the classi?er learning process,such that any two classi?ers in the ensemble can be diverse even when their training data are similar.For implementation convenience,they used Random Forest[18]to construct the ensemble.Second,randomness is injected into the unlabeled data selection process.Instead of directly selecting the highly con?dent unlabeled ex-amples,some candidate examples for labeling are randomly subsampled from the original unlabeled training set to meet a condition similar as Eq.9,and highly con?dent examples in the candidate pool are then selected and labeled.Thus, the learners will encounter di?erent training sets in each round.Such a strategy is helpful not only for the diversity,but also for reducing the chance of being trapped into local minima,just like a similar strategy adopted in[13].

3.3.Learning With Single View Multiple Regressors

Previous studies on semi-supervised learning mainly focus on classi?cation tasks. Although regression is almost as important as classi?cation,semi-supervised re-gression has rarely been studied.One reason is that for real valued labels the cluster assumption is not applicable.Although methods based on manifold as-sumption can be extended to regression,as pointed out by[92],these methods are essentially transductive instead of really semi-supervised since they assume that the unlabeled examples are exactly test examples.

Zhou and Li[87]?rst proposed a disagreement-based semi-supervised regres-sion approach Coreg,which employs two k NN regressors[27]to conduct the data labeling as well as the predictive con?dence estimation.The use of k NN regressors as base learners enables e?cient re?nement of a regressor based on the newly labeled data from its peer regressor since this lazy learning approach does not hold a separate training phase when updating the current regressor.

Semi-Supervised Learning by Disagreement13 Moreover,k NN regressor can be easily coupled with the predictive con?dence estimation method.

In order to choose appropriate unlabeled examples for labeling in semi-supervised regression,the labeling con?dence should be estimated such that the most con?dently labeled example can be identi?ed.In classi?cation this is rela-tively straightforward because when making classi?cations,many classi?ers(e.g.

a Na¨?ve Bayes classi?er)can also provide an estimated probability(or an ap-proximation)for the classi?cation.Therefore,the predictive con?dence can be estimated through consulting the probabilities of the unlabeled examples being labeled to di?erent classes.Unfortunately,in regression there is no such estimated probability that can be used directly.This is because in contrast to classi?cation where the number of labels to be predicted is?nite,the possible predictions in re-gression are in?nite.Therefore,Zhou and Li[87]proposed a predictive con?dence estimation criterion for disagreement-based semi-supervised learning method.

Intuitively,the most con?dently labeled example of a regressor should de-crease most the error of the regressor on the labeled example set,if the the most con?dently labeled example is utilized.In other words,the most con?dently la-beled example should be the one which makes the regressor most consistent with the labeled example set.Thus,the mean squared error(MSE)of the regressor on the labeled example set can be evaluated?rst.Then,the MSE of the regres-sor utilizing the information provided by a newly labeled example(x u,?y u)can be evaluated on the labeled example set,where the real-valued label?y u of the unlabeled instance x u is generated by the regressor.Let?u denote the result of subtracting the latter MSE from the former MSE.Note that the number of?u to be estimated equals to the number of unlabeled examples.Finally,(x u,?y u) associated with the biggest positive?u can be regarded as the most con?dently labeled example.

To avoid repeatedly measuring the MSE of the k NN regressor on the whole labeled training set in each iteration,approximation is employed to compute the MSE based on only the k-nearest labeled examples of an unlabeled instance. Let?u denote the set of its k-nearest labeled examples of x u,then the most con?dently labeled example?x is identi?ed through maximizing the value of?x

u in Eq.12,

?x

u =

x i∈?u

(y i?h(x i))2?(y i?h (x i))2

,(12)

where h denotes the original regressor while h denotes the re?ned regressor which has utilized the information provided by(x u,?y u),?y u=h(x u).

Another important aspect in Coreg is the diversity between two regressors. Note that the labeling of an unlabeled example is obtained by averaging the real-valued labels of its k-nearest neighbors in the labeled training set.As only a few examples are labeled at early stage,the labeling of unlabeled data can be noisy. Zhou and Li[87,89]showed that using diverse regressors can help to reduce the negative in?uence of the noisy newly labeled data.Since k NN regressor is used as the base learner,a natural way to make the k NN regressors di?erent is to en-able them to identify di?erent vicinities,which can be achieved by manipulating the parameter settings of the k NN regressors.In[87],Minkowsky distances of di?erent orders were used to generate two diverse k NN regressors.This strategy was extended to a more general case,i.e.,achieving diversity by using di?er-ent distance metrics and/or di?erent number of neighbors identi?ed for a given

14Z.-H.Zhou and M.Li Table2.Pseudo-code describing the Coreg algorithm[89]

Algorithm:Coreg

Input:labeled example set L,unlabeled example set U,

maximum number of learning iterations T,

number of nearest neighbors k1,k2

distance metrics D1,D2

Process:

L1←L;L2←L

Create pool U of size s by randomly picking examples from U

h1←kNN(L1,k1,D1);h2←kNN(L2,k2,D2)

Repeat for T rounds:

for j∈{1,2}do

for each x u∈U do

?u←Neighbors(x u,L j,k j,D j)

?y u←h j(x u)

h

j

←kNN(L j∪{(x u,?y u)},k j,D j)

?x

u ←

x i∈?u

(y i?h j(x i))2?

y i?h

j

(x i)

2

end of for

if there exists an?x

u >0

then?x j←arg max

x u∈U ?x

u

;?y j←h j(?x j)

πj←{(?x j,?y j)};U ←U ??x j

elseπj←?

end of for

L1←L1∪π2;L2←L2∪π1

if neither of L1and L2changes then exit

else

h1←kNN(L1,k1,D1);h2←kNN(L2,k2,D2)

Replenish U to size s by randomly picking examples from U

end of Repeat

Output:regressor h?(x)←1(h1(x)+h2(x))

example[89].Additionally,such a setting also brings another bene?t,that is, since it is usually di?cult to decide the appropriate parameter settings for k NN regressor for a speci?c task,combining the regressors with di?erent parameter settings can obtain somewhat complementary e?ect.

The pseudo code of Coreg is shown in Table2,where kNN(L j,k j,D j)is a function that returns a k NN regressor on the labeled training set L j,whose k value is k j and distance metric is D j.The learning process stops when the max-imum number of learning iterations,i.e.T,is reached,or there is no unlabeled example which is capable of reducing the MSE of any of the regressors on the labeled example set.A pool of unlabeled examples smaller than U is used,as what was used in[13].Note that in each iteration the unlabeled example chosen by h1won’t be chosen by h2,which is an extra mechanism for encouraging the diversity of the regressors.Thus,even when h1and h2are similar,the examples they labeled for each other will still be di?erent.

It is evident that the method introduced in this section is closely related to those introduced in Section3.2.The key is to generate multiple diverse learners and then try to exploit their disagreements on unlabeled data to implement the

Semi-Supervised Learning by Disagreement15 performance boost.Actually,“learning with multiple views”(Section3.1)is a special case which uses multiple views to help generate multiple diverse learners.

3.4.The Combination with Active Learning

In disagreement-based semi-supervised learning approaches,the unlabeled ex-amples that are labeled for a learner are examples on which most other learners agree but the concerned learner disagrees.If all learners disagree on the labeling of an unlabeled example,this example is simply neglected.However,it is highly probable that such an example is not able to be learned well by the learning sys-tem itself.As mentioned in Section1,active learning is another major technique of learning with labeled and unlabeled data.It actively selects some informative unlabeled examples and queries their labels from an oracle independent to the learning system.It is evident that the unlabeled example on which all learners disagree is a good candidate to query.

Zhou et al.[85,86]proposed a disagreement-based active semi-supervised learning method Ssair for content-based image retrieval.After obtaining a small number of labeled images from relevance feedback,they constructed two learners using the labeled images.Each learner attempts to assign a rank to all images in the imagebase.The smaller the rank,the higher the chance that the concerned image is relevant to the user query.The two most con?dent irrelevant images of each learner are passed to the other learner as negative examples.Such a pro-cess is repeated and the two learners are re?ned.In previous relevance feedback methods,the user randomly picks some images from the retrieval result to give feedback.Zhou et al.[85,86]thought that letting the user to give feedback on images that have been learned well is not helpful to improve the performance. So,instead of passively waiting user feedback,they actively prepared a pool of images for user to give feedback.The pool contains images on which the two learners are with contradict predictions but similar con?dences,and images on which the two learners are both with low con?dences.Thus,in each round of relevance feedback,both semi-supervised learning and active learning are ex-ecuted to exploit the images existing in imagebase to the most.It is evident that although the combination of the disagreement-based semi-supervised learn-ing and active learning is simple,it provides a good support to the interesting active semi-supervised relevance feedback scheme which is useful in information retrieval tasks.

4.Theoretical Foundations for Disagreement-Based

Semi-Supervised Learning

Early theoretical analyses of the disagreement-based semi-supervised learning approaches mainly focus on the case where there exists two views.

Blum and Mitchell[13]analyzed the e?ectiveness of co-training.Let X1and X2denote the two su?cient and redundant views of the input space X,and hence an instance can be represented by x=(x1,x2)∈X1×X2.Assume that f=(f1,f2)∈C1×C2is a target function de?ned over X and C1,C2are concept classes de?ned over X1,X2respectively,and then f(x)=f1(x1)= f2(x2)=y should be held due to the su?ciency of the two views,where y is the ground-truth label of x.Therefore,they de?ned compatibility between target

16Z.-H.Zhou and M.Li

Fig.2.The bipartite graph for instance distribution.Plot based on a similar?gure in[13]. function f=(f1,f2)and the unknown data distribution D,based on which co-training is analyzed.Here,f=(f1,f2)is compatible with D means that D assigns probability zero to any instance(x1,x2)such that f1(x1)=f2(x2).

If the“compatibility”is satis?ed,as[13]pointed out,even if the concept classes C1and C2are large and complex(i.e.,in high VC-dimension),the set of target concepts that is compatible with D could be smaller and simpler.There-fore,unlabeled data can be used to verify which are the compatible target con-cepts,and hence lead to the reduction of the number of labeled data needed in learning.They illustrated this idea by an bipartite graph shown in Fig.2.In this graph,vertices in left hand side and right hand side denote the instances in X1 and X2,respectively.For any pair of vertices on each side of the graph,there exists an edge on between if and only if the corresponding instance(x1,x2)can be drawn with non-zero probability under the distribution D.The solid edges denote the instances observed in the?nite training set.Obviously,under this rep-resentation,the concepts that are compatible with D correspond to the graph partitions without any cross-edges.The instances in the same connected compo-nent share the same label,and only a labeled example is required to determine the labeling of this component.The value of unlabeled data is their assistance for identifying the connected components of the graph,and in fact identifying the distribution D.

Based on this bi-partite representation,Blum and Mitchell[13]analytically showed that“If C2is learnable in the PAC model with classi?cation noise,and if the conditional independence assumption is satis?ed,then(C1,C2)is learn-able in Co-training model from unlabeled data only,given an initial weak-useful predictor h(x1)”.This is a very strong conclusion which implies that if the two views are conditionally independent,the predictive accuracy of an initial weak learner can be boosted to arbitrarily high with unlabeled data using co-training.

Later,Dasgupta et al.[28]analyzed the generalization bound for standard co-training.Let S be a set of i.i.d samples.For any statementΦ[s],let S(Φ) be a subset of S that satis?esΦ.For two statementsΦandΨ,the empirical estimate?P(Φ|Ψ)=|S(Φ∧Ψ)|/|S(Ψ)|.They assumed that the data is drawn from some distribution over triples x1,y,x2 with x1∈X1and x2∈X2,and

Semi-Supervised Learning by Disagreement17 P(x1|y,x2)=P(x1|y)and P(x2|y,x1)=P(x2|y);in other words,the data has two views that are independent given the class label.Assume that there are k di?erent classes,and if a learner h fails to classify x into the k classes,then h(x)=⊥.Let|h|denote a complex measurement of h,and h1and h2denote the learner constructed in X1and X2,respectively.Then,they showed that with probability at least1?δover the choice of S,for all pairs of learners h1and h2such thatγi(h1,h2,δ/2)>0and b i(h1,h2,δ/2)≤(k?1)/k,the following inequity holds,

error(h1)≤

?P(h

1

=⊥)? (|h1|,δ/2)

max

j

b j(h1,h2,δ/2)

+

k?1

k

?P(h

1

=⊥)+ (|h1|,δ/2)

(13)

where

(k,δ)=

k ln2+ln2/δ

2|S|

(14)

b i(h1,h2,δ)=

1

γi(h1,h2,δ)

?P(h

1

=i|h2=i,h1=⊥)+ i(h1,h2,δ)

(15)

i(h1,h2,δ)=

(ln2)(|h1|+|h2|)+ln(2k/δ)

2|S(h2=i,h1=⊥)|

(16)

γi(h1,h2,δ)=?P(h1=i|h2=i,h1=⊥)??P(h1=i|h2=i,h1=⊥)

?2 i(h1,h2,δ)(17) The result of Dasgupta et al.[28]shows that when there are two su?cient and redundant views which are conditionally independent given the class label, the generalization error of co-training is upper-bounded by the disagreement between the two classi?ers.This suggests that a better learning performance can be obtained if the disagreement can be exploited in a better way.

Note that in the analyses in[13]and[28],it was assumed that there exist two su?cient and redundant views that are conditionally independent given the class label.Since such a strong requirement is not often satis?ed,analyses under more realistic assumptions are desired.Balcan et al.[6]pointed out that if a PAC learner can be obtained on each view,the conditional independence assumption or even weak independent assumption[2]is unnecessary;a weaker assumption of“expansion”of the underlying data distribution is su?cient for iterative co-training to succeed.They consider that the learning algorithm used in each view is con?dent about being positive and is able to learn from positive examples only, and the“expansion”is de?ned as follows:Let X+denote the positive region and D+denote the distribution over X+.For S1?X1and S2?X2,let S i be the event that an instance(x1,x2)has x i∈S i(i=1,2).Let P(S1∧S2)denote the probability on examples for being con?dent on both views,and P(S1⊕S2) denote the probability on examples for being con?dent on only one view.Let

18Z.-H.Zhou and M.Li

H i ∩X +i ={h ∩X +i :h ∈H i },where H i (i =1,2)is hypothesis class.If Eq.18holds for any S 1?X 1and S 2?X 2,then D +is -expanding;if Eq.18holds for any S 1?H 1∩X 1and S 2?H 2∩X 2,then D +is -expanding with respect to hypothesis class H 1×H 2.

P (S 1⊕S 2)≥ min P (S 1∧S 2),P (ˉS 1∧ˉS 2) (18)

If some data distribution satis?es the expansion assumption,with a small con?dence set S j of the hypothesis of view j (j =1,2),the iterative co-training can succeed to achieve a classi?er whose error rate is smaller than with a large probability.

All the previous theoretical studies investigated the standard two-view co-training.Theoretical foundation of other disagreement-based semi-supervised learning approaches,in particular,those work on a single view,has not been established,although the e?ectiveness of those approaches have been empirically veri?ed.Wang and Zhou [71]presented a theoretical study for those approaches.Let H denote the hypothesis space and D is the data distribution generated by the ground-truth hypothesis h ?∈H .Let d (h i ,h ?)denote the di?erence between the two classi?er h i and h ?,which can be measured by Pr x ∈D [h i (x )=h ?(x )].Let h i 1and h i 2denote the classi?ers in i -th round of the iterative co-training process.Then,their main result is summarized in Theorem 1shown as follows.Theorem 1Given the initial labeled data set L which is clean,and assuming that the size of L is su?cient to learn two classi?ers h 01and h 02whose upper bound of the generalization error is a 0<0.5and b 0<0.5with high probability

(more than 1?δ)in the PAC model,respectively,i.e.l ≥max[1a 0ln |H|

δ,1b 0ln |H|δ].Then h 01selects u number of unlabeled instances from U to label and puts them

into σ2which contains all the examples in L ,and then h 12is trained from σ2by minimizing the empirical risk.If lb 0≤e M √M !?M ,then

Pr[d (h 12,h ?)≥b 1]≤δ.(19)

where M =ua 0and b 1=max[

lb 0+ua 0?ud (h 01,h 12)l ,0].Such a theorem suggests that the key for the disagreement-based approaches to succeed is the large di?erence between the learners,which explains the reason why the disagreement-based approaches still work well even when there are no two views.Note that in contrast to all previous studies which assumed that data is drawn from some distribution over triples x 1,y,x 2 (that is,the data has two views),the above theorem does not assume that data is drawn from distribution over two views.Actually,from Theorem 1we can know that the existence of two views is a su?cient condition instead of necessary condition for disagreement-based approaches.This is because when there are two su?cient and redundant views,the learners trained from the two views respectively are of course diverse,and so the disagreement-based learning process can succeed.When there are no two views,it is also possible to get two diverse learners,and thus disagreement-based approaches are also able to succeed.It is worth mentioning that all previous studies,either theoretical or algorithmic,tried to maximize the consensus among the learners;in other words,they always tried to minimize the error for labeled examples and maximize the agreement for

Semi-Supervised Learning by Disagreement19 unlabeled examples,but never revealed that keeping a large disagreement among the learners is a necessary condition for co-training to proceed.

Moreover,Wang and Zhou[71]analyzed the reason why the performance of the disagreement-based approaches could not be improved further after a number of training rounds.Such a problem is frequently encountered in many practical applications of the disagreement-based approaches but could not be explained by previous theoretical results.Based on Theorem1,Wang and Zhou[71]showed that as the learning process continues,the learners will become more and more similar,and therefore,the required diversity could not be met and the learners could not improve each other further.Based on this recognition,a preliminary method for roughly estimating the appropriate iteration to terminate the learning process was proposed.

Section3.4introduced that the combination of disagreement-based semi-supervised learning with active learning can lead to good performance.Recently, Wang and Zhou[72]analyzed this situation and got the result in Theorem2. Theorem2For data distribution Dα-expanding with respect to hypothesis class H1×H2,let andδdenote the?nal desired accuracy and con?dence parameters.

If s= logα8

log1

C ,m0=1

L

(4V log(1

L

)+2log(8(s+1)

δ

))and m i=16

α

(4V log(16

α

)+

2log(8(s+1)

δ))(i=1,2,···,),a classi?er will be generated with error rate no

more than with probability1?δ,according to a similar approach in[85].

Here,V=max[V C(H1),V C(H2)]where V C(H)denotes the VC-dimension

of the hypothesis class H,constant C=α/4+1/α

1+1/αand constant L=min[α

16

,1

16L1L2

].

This theorem suggests that under assumption ofα-expansion on the hypothe-sis class H1×H2,the sample complexity can be exponentially reduced by combin-ing disagreement-based semi-supervised learning with active learning in contrast to pure disagreement-based semi-supervised learning.This is the?rst theoreti-cal analysis on the combination of semi-supervised learning with active learning, which also contains the?rst analysis on multi-view active learning.

Both Theorem1and Theorem2provide theoretical explanations to the ef-fectiveness of general disagreement-based semi-supervised learning approaches whose e?ectiveness has been empirically veri?ed in practice.Although some strong assumptions are still required in the analyses,the results serve as an important step towards the establishment of the whole theoretical foundation for the disagreement-based learning framework.However,note that all the cur-rent theoretical analyses are on the use of two learners,while theoretical analysis on disagreement-based semi-supervised learning with more than two learners re-mains an open problem.

5.Applications to Real-World Tasks

Disagreement-based semi-supervised learning paradigm has been successfully ap-plied to many real-world tasks,particularly in natural language processing.In fact,in the middle of1990s,it was accepted that constructing prediction models based on the di?erent attribute sets of the problem may help to achieve a better result.Yarowsky[75]constructed a word sense classi?er using the local context of the word and a classi?er based on the senses of other occurrences of that word

20Z.-H.Zhou and M.Li in the same document for word sense disambiguation.Rilo?and Jones[59]con-sidered both the noun phrase itself and the linguistic context in which the noun phrase appears for classifying noun phrase for geographic locations.Collins and Singer[24]utilized both the spelling of the entity and the context in which the entity appears for named entity classi?cation.

Pierce and Cardie[58]applied standard co-training to conduct named entity identi?cation.They treated the current word and k immediate words before it as the?rst view,and similarly,the current word and k immediate words after it as the second view.Based on these two views,standard co-training were directly applied with some necessary adaptations to multi-class classi?cation.By utilizing unlabeled data with co-training,the identi?cation error rate reduced by36%compared to the identi?cation using only the labeled data.Sarkar[62] decomposed statistical parser into two sequentially related probabilistic models. The?rst model,which is called tagging probability model,is responsible to select the most likely trees for each word by examining the local context,while the second model,which is called parsing probability model,is responsible for attaching the selected trees together to provide a consistent bracketing of the sentences.In the learning process,these two models employ a disagreement-based approach to exploit the unlabeled examples,where each model uses its most con?dent information about the prediction to help the other model to reduce the uncertainty in statistical parsing,and hence achieves a better performance in terms of both precision and https://www.doczj.com/doc/5f816874.html,ter,Steedman et al.[68]solved this problem from a di?erent perspective.Unlike[62],they used the two di?erent statistical parsers for co-training.In the training process,each parser assigns scores to unlabeled sentences that have been parsed by itself,using a scoring function to indicate the con?dence of the parse results.Then,the parser passes the parsed sentences with the top scores to the other parser.They empirically showed that such a method could also improve the performance of statistical parsing.Hwa et al.[41]combined disagreement-based semi-supervised learning with active learning in statistical parsing,where each learner teaches the other learner with its most con?dently parsed sentences,while its peer learner queries the parse result for its least con?dently parsed sentences from the user and feeds them to this learner.By applying such a method,the number of manually labeling can be greatly reduced.

In addition to natural language processing,disagreement-based semi-supervised learning paradigm has been applied to content-based image retrieval.Given a query image,a CBIR system is required to return the images in the imagebase that are relevant to the query image.Due to the semantic gap between the high-level image semantics and the low-level image features,relevance feedback[61] is usually employed to bridge the gap.Since it is usually infeasible for the user to provide many rounds of feedback,the number of images with the relevance judgement is insu?cient to achieve a good performance.Thus,unlabeled images in the imagebase can be further exploited to improve the performance of CBIR based on semi-supervised learning,while CBIR itself becomes a good applica-tion of learning with labeled and unlabeled data[82].In fact,semi-supervised learning in CBIR scenario has been studied in[30,74].

Zhou et al.[85,86]?rst applied a disagreement-based semi-supervised learn-ing method to exploit unlabeled images in the imagebase of a CBIR system.The method actually combines semi-supervised learning with active learning(see Sec-tion3.4for details).This research also leads to a new user interface design in CBIR.As shown in Fig.3,the region above the dark line displays the retrieved

知识管理和信息管理之间的联系和区别_[全文]

引言:明确知识管理和信息管理之间的关系对两者的理论研究和实际应用都有重要意义。从本质上看,知识管理既是在信息管理基础上的继承和进一步发展,也是对信息管理的扬弃,两者虽有一定区别,但更重要的是两者之间的联系。 知识管理和信息管理之间的联系和区别 By AMT 宋亮 相对知识管理而言,信息管理已有了较长的发展历史,尽管人们对信息管理的理解各异,定义多样,至今尚未取得共识,但由于信息对科学研究、管理和决策的重要作用,信息管理一词一度成为研究的热点。知识管理这一概念最近几年才提出,而今知识管理与创新已成为企业、国家和社会发展的核心,重要原因就在于知识已取代资本成为企业成长的根本动力和竞争力的主要源泉。 信息管理与知识管理的关系如何?有人将知识管理与信息管理对立起来,强调将两者区分开;也有不少人认为知识管理就是信息管理;多数人则认为,信息管理与知识管理虽然存在较大的差异,但两者关系密切,正是因为知识和信息本身的密切关系。明确这两者的关系对两者的理论研究和实际应用都有重要意义。从本质上看,知识管理既是在信息管理基础上的继承和进一步发展,也是对信息管理的扬弃,两者虽有一定区别,但更重要的是两者之间的联系。 一、信息管理与知识管理之概念 1、信息管理 “信息管理”这个术语自本世纪70年代在国外提出以来,使用频率越来越高。关于“信息管理”的概念,国外也存在多种不同的解释。1>.人们公认的信息管理概念可以总结如下:信息管理是实观组织目录、满足组织的要求,解决组织的环境问题而对信息资源进行开发、规划、控制、集成、利用的一种战略管理。同时认为信息管理的发展分为三个时期:以图书馆工作为基础的传统管理时期,以信息系统为特征的技术管理时期和以信息资源管理为特征的资源管理时期。 2、知识管理 由于知识管理是管理领域的新生事物,所以目前还没有一个被大家广泛认可的定义。Karl E Sverby从认识论的角度对知识管理进行了定义,认为知识管理是“利用组织的无形资产创造价值的艺术”。日本东京一桥大学著名知识学教授野中郁次郎认为知识分为显性知识和隐性知识,显性知识是已经总结好的被基本接受的正式知识,以数字化形式存在或者可直接数字化,易于传播;隐性知识是尚未从员工头脑中总结出来或者未被基本接受的非正式知识,是基于直觉、主观认识、和信仰的经验性知识。显性知识比较容易共享,但是创新的根本来源是隐性知识。野中郁次郎在此基础上提出了知识创新的SECI模型:其中社会化(Socialization),即通过共享经验产生新的意会性知识的过程;外化(Externalization),即把隐性知识表达出来成为显性知识的过程;组合(Combination),即显性知识组合形成更复杂更系统的显性知识体系的过程;内化(Internalization),即把显性知识转变为隐性知识,成为企业的个人与团体的实际能力的过程。

e-Learning 分享:2015年elearning发展趋势

2015年elearning发展趋势 1.大数据 elearning涉及的数据量越来越大,而传统的数据处理方式也变得越来越难以支撑。 仅在美国就有46%的大学生参加在线课程,而这也仅仅是一部分elearning用户而已。虽然大数据的处理是一个难题,但同时也是改善elearning的一个良好的契机。 下面是大数据改善elearning的一些案例。 大数据能够使人们更深入的了解学习过程。例如:对于(课程)完成时间及完成率的数据记录。 大数据能够有效跟踪学习者以及学习小组。例如:记录学习者点击及阅读材料的过程。 大数据有利于课程的个性化。例如:了解不同的学习者学习行为的不同之处。 通过大数据可以分析相关的学习反馈情况。例如:学习者在哪儿花费了多少时间,学习者觉得那部分内容更加有难度。 2.游戏化 游戏化是将游戏机制及游戏设计添加到elearning中,以吸引学习者,并帮助他们实现自己

的学习目标。 不过,游戏化的elearning课程并不是游戏。 elearning的游戏化只是将游戏中的元素应用于学习环境中。游戏化只是利用了学习者获得成功的欲望及需求。 那么,游戏化何以这么重要? 学习者能够回忆起阅读内容的10%,听讲内容的20%。如果在口头讲述过程中能够添加画面元素,该回忆比例上升到30%,而如果通过动作行为对该学习内容进行演绎,那么回忆比例则高达50%。但是,如果学习者能够自己解决问题,即使是在虚拟的情况中,那么他们的回忆比例则能够达到90%!(美国科学家联合会2006年教育游戏峰会上的报告)。近80%的学习者表示,如果他们的课业或者工作能够更游戏化一些,他们的效率会更高。(Talent LMS 调查) 3.个性化 每位学习者都有不同的学习需求及期望,这就是个性化的elearning如此重要的原因。个性化不仅仅是个体化,或差异化,而是使学习者能够自由选择学习内容,学习时间以及学习方式。 相关案例: 调整教学进度使教学更加个性化; 调整教学方式使教学更加与众不同; 让学习者自己选择适合的学习方式; 调整教学内容的呈现模式(文本、音频、视频)。

信息管理与知识管理典型例题讲解

信息管理与知识管理典型例题讲解 1.()就是从事信息分析,运用专门的专业或技术去解决问题、提出想法,以及创造新产品和服务的工作。 2.共享行为的产生与行为的效果取决于多种因素乃至人们的心理活动过程,()可以解决分享行为中的许多心理因素及社会影响。 3.从总体来看,目前的信息管理正处()阶段。 4.知识员工最主要的特点是()。 5.从数量上来讲,是知识员工的主体,从职业发展角度来看,这是知识员工职业发展的起点的知识员工类型是()。 6.人类在经历了原始经济、农业经济和工业经济三个阶段后,目前正进入第四个阶段——(),一种全新的经济形态时代。 1.对信息管理而言,( )是人类信息活动产生和发展的源动力,信息需求的不断 变化和增长与社会信息稀缺的矛盾是信息管理产生的内在机制,这是信息管理的实质所在。 2.()就是从事信息分析,运用专门的专业或技术去解决问题、提出想法,以及创造新产品和服务的工作。 3.下列不属于信息收集的特点() 4.信息管理的核心技术是() 5.下列不属于信息的空间状态() 6.从网络信息服务的深度来看,网络信息服务可以分 1.知识管理与信息管理的关系的()观点是指,知识管理在历史上曾被视为信息管理的一个阶段,近年来才从信息管理中孵化出来,成为一个新的管理领域。 2.著名的Delicious就是一项叫做社会化书签的网络服务。截至2008年8月,美味书签已经拥有()万注册用户和160万的URL书签。 3.激励的最终目的是()

4.经济的全球一体化是知识管理出现的() 5.存储介质看,磁介质和光介质出现之后,信息的记录和传播越来越() 6.下列不属于信息的作用的选项() 1.社会的信息化和社会化信息网络的发展从几方面影响着各类用户的信息需求。() 2.个性化信息服务的根本就是“()”,根据不同用户的行为、兴趣、爱好和习惯,为用户搜索、组织、选择、推荐更具针对性的信息服务。 3.丰田的案例表明:加入( )的供应商确实能够更快地获取知识和信息,从而促动了生产水平和能力的提高。 4.不属于信息资源管理的形成与发展需具备三个条件() 5.下列不属于文献性的信息是() 6.下列不属于()职业特点划分的

连锁咖啡厅顾客满意度涉入程度对忠诚度影响之研究以高雄市星巴克为例

连锁咖啡厅顾客满意度涉入程度对忠诚度影响之研究以高雄市星巴克 为例 Document number【SA80SAB-SAA9SYT-SAATC-SA6UT-SA18】

连锁咖啡厅顾客满意度对顾客忠诚度之影响-以高雄 市星巴克为例 The Effects of Customer Satisfaction on Customer Loyalty—An Empirical study of Starbucks Coffee Stores in Kaohsiung City 吴明峻 Ming-Chun Wu 高雄应用科技大学观光管理系四观二甲 学号:06 中文摘要 本研究主要在探讨连锁咖啡厅顾客满意度对顾客忠诚度的影响。 本研究以高雄市为研究地区,并选择8间星巴克连锁咖啡厅的顾客作为研究对象,问卷至2006年1月底回收完毕。 本研究将顾客满意度分为五类,分别是咖啡、餐点、服务、咖啡厅内的设施与气氛和企业形象与整体价值感;将顾客忠诚度分为三类,分别是顾客再购意愿、向他人推荐意愿和价格容忍度,并使用李克特量表来进行测量。 根据过往研究预期得知以下研究结果: 1.人口统计变项与消费型态变项有关。 2.人口统计变项与消费型态变项跟顾客满意度有关。 3.人口统计变项与消费型态变项跟顾客忠诚度有关。 4.顾客满意度对顾客忠诚度相互影响。 关键词:连锁、咖啡厅、顾客满意度、顾客忠诚度 E-mail

一、绪论 研究动机 近年来,国内咖啡消费人口迅速增加,国外知名咖啡连锁品牌相继进入台湾,全都是因为看好国内咖啡消费市场。 在国内较知名的连锁咖啡厅像是星巴克、西雅图极品等。 本研究针对连锁咖啡厅之顾客满意度与顾客忠诚度间关系加以探讨。 研究目的 本研究所要探讨的是顾客满意度对顾客忠诚度的影响,以国内知名的连锁咖啡厅星巴克之顾客为研究对象。 本研究有下列五项研究目的: 1.以星巴克为例,探讨连锁咖啡厅的顾客满意度对顾客忠诚度之影响。 2.以星巴克为例,探讨顾客满意度与顾客忠诚度之间的关系。 3.探讨人口统计变项与消费型态变项是否有相关。 4.探讨人口统计变项与消费型态变项跟顾客满意度是否有相关。 5.探讨人口统计变项与消费型态变项跟顾客忠诚度是否有相关。 二、文献回顾 连锁咖啡厅经营风格分类 根据行政院(1996)所颁布的「中华民国行业标准分类」,咖啡厅是属於九大行业中的商业类之饮食业。而国内咖啡厅由於创业背景、风格以及产品组合等方面有其独特的特质,使得经营型态与风格呈现多元化的风貌。 依照中华民国连锁店协会(1999)对咖啡产业调查指出,台湾目前的咖啡厅可分成以下几类:

elearning时代:企业培训大公司先行

e-learning时代:企业培训大公司先行 正如印刷书籍造就16世纪后的现代大学教育,互联网的普及,将使学习再次发生革命性的变化。通过e-learning,员工可以随时随地利用网络进行学习或接受培训,并将之转化为个人能力的核心竞争力,继而提高企业的竞争力。 从个人到企业,从企业到市场,e-learning有着无限的生长空间,跃上时代的浪尖是必然,也是必须。 企业培训:e-learning时代已经来临 约翰-钱伯斯曾经预言“互联网应用的第三次浪潮是e-learning”,现在,这个预言正在变为现实。 美国培训与发展协会(ASTD)对2005年美国企业培训情况的调查结果显示,美国企业对员工的培训投入增长了16.4%,e-learning培训比例则从24%增长到28%,通过网络进行学习的人数正以每年300%的速度增长,60%的企业已使用网络形式培训员工;在西欧,e-learning市场已达到39亿美元规模;在亚太地区,越来越多的企业已经开始使用e-learning…… 据ASTD预测,到2010年,雇员人数超过500人的公司,90%都将采用e-learning。 e-learning:学习的革命 正如印刷书籍造就16世纪后的现代大学教育,互联网的普及,将使学习再次发生革命性的变化。 按照ASTD给出的定义,e-learning是指由网络电子技术支撑或主导实施的教学内容或学习。它以网络为基础、具有极大化的交互作用,以开放式的学习空间带来前所未有的学习体验。 在企业培训中,e-learning以硬件平台为依托,以多媒体技术和网上社区技术为支撑,将专业知识、技术经验等通过网络传送到员工面前。通过e-learning,员工可以随时随地利用网络进行学习或接受培训,并将之转化为个人能力的核心竞争力,继而提高企业的竞争力。 据IDC统计,自1998年e-learning概念提出以来,美国e-learning市场的年增长率几乎保持在80%以上。增长速度如此之快,除了跟企业培训本身被重视度日益提高有关,更重要的是e-learning本身的特性: 它大幅度降低了培训费用,并且使个性化学习、终生学习、交互式学习成为可能。而互联网及相关产品的开发、普及和商业化是e-learning升温最强悍的推动力。 数据表明,采用e-learning模式较之传统模式至少可节约15%-50%的费用,多则可达

国内外知识共享理论和实践述评

●储节旺,方千平(安徽大学 管理学院,安徽 合肥 230039) 国内外知识共享理论和实践述评 3 摘 要:知识共享是知识管理的重要环节之一。本文评析了知识共享的内涵、类型、过程、模式、实践。对知识共享内涵的认识有信息技术、沟通、组织学习、市场、系统等角度;对知识共享的类型的认识有共享的主体、共享的知识类型等角度;对知识共享的过程的认识有交流、知识主体等角度;知识共享的模式包括SEC I 模式、行动—结果的模式、基于信息传递的模式、正式和非正式的模式。对知识共享的实践主要总结了一些著名公司的做法和成绩。 关键词:知识共享;知识管理;知识共享模式;理论研究 Abstract:Knowledge sharing is the i m portant link in the knowledge manage ment chain .This paper revie ws the meaning,type,p r ocess,model and p ractice of knowledge sharing .The meaning of knowledge sharing is su mma 2rized fr om the pers pectives of I T,communicati on,organizati on learning,market and syste m.The type of knowl 2edge sharing is su mmarized fr om the pers pectives of the main body and the type of knowledge sharing .The p r ocess of knowledge sharing is su mmarized fr om the pers pectives of communicati on and the main body of knowledge .The 2model of knowledge sharing is su mmarized fr om the pers pectives of SEC I,acti on -result,inf or mati on -based deliv 2ery and for mal and inf or mal type .The p ractice of knowledge sharing is su mmarized in s ome famous cor porati ons ’method of work and their achieve ments . Keywords:knowledge sharing;knowledge manage ment;model of knowledge sharing;theoretical study 3本文为国家社会科学基金项目“国内外知识管理理论发展与学科体系构建研究”(项目编号:06CT Q009)及安徽大学人才建设计划项目“企业应对危机的知识管理问题研究”研究成果之一。 知识管理的目标是为了有效地开发利用知识,实现知识的价值,其前提与基础是知识的获取与积累,其核心是知识的交流与学习,其追求的目标是知识的吸收与创造,从而最终达到充分利用知识获得效益、获得先机、获得竞争优势的目的。在这个过程中最根本的前提是知识的存在,包括知识的质与量。而如何拥有足够质与量的知识在于知识的积累与共享,因为个体的经验、知识和智慧都是相对有限的,而一个组织团体的知识是相对无限的,如何使一个组织的所有成员交流共享知识是知识管理达到最佳状态的关键。知识共享是发挥知识价值最大化的有效途径,知识共享是知识管理的基点,是知识管理的优势所在。因此,知识共享的理论研究和实施运营都显得十分必要。 保罗?S ?迈耶斯于1997年详细介绍了适合于知识共享的组织系统的设计方案[1]。自此以后,专家学者们开始从不同角度对知识共享展开研究,研究领域从企业不断扩 大到政府、教育等社会组织。有关知识共享的研究逐渐成为知识管理的重点。从笔者掌握的知识共享研究的相关文献来看,其研究内容主要体现在知识共享的定义类型、过程、模式等方面。 1 知识共享的定义 由于知识内涵的开放性和知识分类的复杂性,专家学者们对知识共享的内涵理解也很难形成一致的看法,但他们还是从不同的视角对知识共享的内涵提出了许多颇有见地的观点。 111 观点简述 1)信息技术角度。Ne well 及Musen 从信息技术的角 度来理解知识共享,认为必须厘清知识库中符号和知识之间存在的差异。知识库本身并不足以用来捕捉知识,而是赋予知识库意义的解释过程。因此,对知识共享而言必须包括理性判断与推理机制,才能赋予知识库生命[2]。 2)信息沟通角度。Hendriks 和Botkin 等人认为知识 共享是一种人与人之间的联系和沟通的过程。 3)组织学习角度。Senge 认为,知识共享不仅仅是一 方将信息传给另一方,还包含愿意帮助另一方了解信息的内涵并从中学习,进而转化为另一方的信息内容,并发展

知识管理系统:目标与策略

知识管理:目标与策略 摘要:知识管理是社会经济发展的主要驱动力和提高组织竞争力的重要手段。其基本内容是运用集体的智慧提高应变和创新能力。本文旨在界定知识经济的概念,探讨知识管理的目标,比较分析知识管理的两种策略之异同,以促进我国管理的创新,有利于引导我国企业步入知识经济时代。 关键词:管理;组织;创新 在人类社会的发展进程中,管理创新和技术进步可以说是推动经济增长的两个基本动力源。随着知识社会的到来,知识将成为核心和具有柔性特点的生产要素,而对知识的管理更是社会经济发展的主要驱动力和提高组织竞争力的重要手段。对组织而言,知识和信息正在取代资本和能源成为最主要的资源,知识经济迫切要求管理创新。适应此要求,近几年来,一种新的企业管理理念——知识管理(Knowledge management)正在国外一些大公司中形成并不断完善。其中心内容便是通过知识共享、运用集体的智慧提高应变和创新能力。知识管理的实施在于建立激励雇员参与知识共享的机制,设立知识总监,培养组织创新和集体创造力。总结和研究知识管理的做法和成功经验将有利于我国企业管理的创新,有利于引导我国企业步入知识经济时代。 一、概念的界定 什么是知识管理?一个定义说:“知识管理是当企业面对日益增

长着的非连续性的环境变化时,针对组织的适应性、组织的生存及组织的能力等重要方面的一种迎合性措施。本质上,它嵌涵了组织的发展过程,并寻求将信息技术所提供的对数据和信息的处理能力以及人的发明和创新能力这两者进行有机的结合。”笔者认为,知识管理虽然广泛运用于企业管理的实践,但作为具有一般管理的共同性质的公共管理同样也面临着知识管理的问题。对于公共部门而言,知识管理的目标与核心就是通过提高人的发明和创新能力来实现组织创新。 知识管理为组织实现显性和隐性知识共享提供了新的途径。显性知识易于整理和进行计算机存储,而隐性知识是则难以掌握,它集中存储在雇员的脑海里,是雇员所取得经验的体现。知识型组织能够对外部需求作出快速反应、明智地运用内部资源并预测外部环境的发展方向及其变化。虽然要做到这一点需要从根本上改变组织的发展方向和领导方式,但是其潜在回报是巨大的。要了解知识管理,首先要把它同信息管理区分开来。制定一个有效的信息管理战略并不意味着实现了知识管理,这正如不能单纯从一个组织的设备硬件层面来衡量其办公自动化水平一样。要想在知识经济中求得生存,就必须把信息与信息、信息与人、信息与过程联系起来,以进行大量创新。库珀认为:“正是由于信息与人类认知能力的结合才导致了知识的产生。它是一个运用信息创造某种行为对象的过程。这正是知识管理的目标。”实行有效知识管理所要求的远不止仅仅拥有合适的软件系统和充分的培训。它要求组织的领导层把集体知识共享和创新视为赢得竞争优势的支柱。如果组织中的雇员为了保住自己的工作而隐瞒信息,如果组

(完整版)信息管理与知识管理

信息管理与知识管理 ()是第一生产力,科学技术 ()是物质系统运动的本质特征,是物质系统运动的方式、运动的状态和运动的有序性信息按信息的加工和集约程度分()N次信息源 不是阻碍企业技术创新的主要因素主要包括()市场需求薄弱 不属于 E.M.Trauth博士在20世纪70年代首次提出信息管理的概念,他将信息管理分内容为()网络信息管理 不属于W.Jones将个人信息分为三类()个人思考的 不属于信息的经济价值的特点()永久性 不属于信息的社会价值特征的实()广泛性 不属于信息资源管理的形成与发展需具备三个条件()信息内容 存储介质看,磁介质和光介质出现之后,信息的记录和传播越来越()方便快捷 国外信息管理诞生的源头主要有()个1个 即信息是否帮助个人获得更好的发展,指的是()个人发展 社会的信息化和社会化信息网络的发展从几方面影响着各类用户的信息需求。()2个 社会价值的来源不包括()社会观念 图书馆层面的障碍因素不去包括:()书籍复杂无法分类 下列不属于信息收集的过程的()不提供信息收集的成果 下列不属于信息收集的特点()不定性

下列不属于用户在信息获取过程中的模糊性主要表现()用户无法准确表达需求 下列哪种方式不能做出信息商品()口口相传 下列现象不是五官感知信息为()电波 下列属于个人价值的来源()个人需求 信息information”一词来源于哪里()拉丁 信息管理的核心技术是()信息的存储与处理 技术信息管理观念是专业技术人员的(基本要素 信息认知能力是专业技术人员的(基本素质) 信息是通信的内容。控制论的创始人之一维纳认为,信息是人们在适应(外部) 世界并且使之反作用于世界的过程中信息收集的对象是(信息源)由于()是一种主体形式的经济成分,它们发展必将会导致社会运行的信息化。(信息经济) 在网络环境下,用户信息需求的存在形式有()4种 判断题 “效益”一词在经济领域内出现,效益递增或效益递减也是经济学领域的常用词。(对) “知识经济”是一个“直接建立在知识和信息的生产、分配和使用之上的经济”,是相对于“以物质为基础的经济”而言的一种新型的富有生命力的经济形态。(对) 百度知道模块是统计百度知道中标题包含该关键词的问题,然后由问

星巴克swot分析

6月21日 85度C VS. 星巴克SWOT分析 星巴克SWOT分析 优势 1.人才流失率低 2.品牌知名度高 3.熟客券的发行 4.产品多样化 5.直营贩售 6.结合周边产品 7.策略联盟 劣势 1.店內座位不足 2.分店分布不均 机会 1.生活水准提高 2.隐藏极大商机 3.第三空间的概念 4.建立电子商务 威胁 1.WTO开放后,陆续有国际品牌进驻 2.传统面包复合式、连锁咖啡馆的经营 星巴克五力分析 1、供应商:休闲风气盛,厂商可将咖啡豆直接批给在家煮咖啡的消费者 2、购买者:消费者意识高涨、资讯透明化(比价方便) 3、同业內部竞争:产品严重抄袭、分店附近必有其他竞争者 4、潜在竞争者:设立咖啡店连锁店无进入障碍、品质渐佳的铝箔包装咖啡 5、替代品:中国茶点、台湾小吃、窜红甚快的日本东洋风...等 85度c市場swot分析: 【Strength优势】 具合作同业优势 产品精致 以价格进行市场区分,平价超值 服务、科技、产品、行销创新,机动性强 加盟管理人性化 【Weakness弱势】 通路品质控制不易 品牌偏好度不足 通路不广 财务能力不健全 85度c的历史资料,在他们的网页上的活动信息左邊的新聞訊息內皆有詳細資料,您可以直接上網站去查閱,皆詳述的非常清楚。

顧客滿意度形成品牌重於產品的行銷模式。你可以在上他們家網站找找看!【Opportunity机会】 勇于變革变革与创新的经营理念 同业策略联盟的发展、弹性空间大 【Threat威胁】 同业竞争对手(怡客、维多伦)门市面对面竞争 加盟店水准不一,品牌形象建立不易 直,间接竞争者不断崛起(壹咖啡、City Café...) 85度跟星巴克是不太相同的零售,星巴客应该比较接近丹堤的咖啡厅,策略形成的部份,可以从 1.产品线的宽度跟特色 2.市场区域与选择 3.垂直整合 4.规模经济 5.地区 6.竞争优势 這6點來做星巴客跟85的区分 可以化一个表,來比较他們有什么不同 內外部的話,內部就从相同产业來分析(有什麼优势跟劣势) 外部的話,一樣是相同产业但卖的东西跟服务不太同,来与其中一个产业做比较(例如星巴客) S(优势):點心精致平价,咖啡便宜,店面设计观感很好...等等 W(劣势):对于消费能力较低的地区点心价格仍然较高,服务人员素质不齐,點心种类变化較少 O(机会):对于点心&咖啡市场仍然只有少数的品牌独占(如:星XX,壹XX...等),它可以透过连锁店的开幕达成高市占率 T(威协):1.台湾人的模仿功力一流,所以必须保持自己的特色做好市场定位 2.消費者的口味变化快速,所以可以借助学者"麥XX"的做法保有主要的點心款式外加上几样周期变化的點心 五力分析 客戶讲价能力(the bargaining power of customers)、 供应商讲价能力(the bargaining power of suppliers)、 新进入者的竞争(the threat of new entrants)、 替代品的威协(the threat of substitute products)、 现有厂商的竞争(The intensity of competitive rivalry)

知识管理

知识管理?技术平台构建_建立知识管理系统时应该 注意的几个方面 著者:于涛部门:泛微大连项目管理部 实施知识管理,不但需要推行知识管理的理念,建立知识管理组织及创建知识共享的文化,还要通过知识管理系统来使之变为现实。但在利用知识管理系统的时候一定要注意以下几个方面: 1、服从知识管理目标 构建知识管理系统,首先要服从知识管理的整体目标,即能够“将最恰当的知识在最恰当的时间传给最恰当的人”,使他们能够做出最好的决策,只有达到了这个目标,才能成为真正的知识管理系统。其次要清楚知识管理的具体目标,究竟是为了更快地解答客户的咨询,是避免公司一旦因员工流失而出现青黄不解,还是为了降低运作成本?在知识管理系统实施的每一步开始前,都应该十分清楚该阶段预期达到的成效和项目的成果是什么。 2、尊重“以人为本” 知识管理系统的规划必须“以人为本”,而不要以文字知识为核心。实施时应当清楚的定义企业使用知识管理系统的用户是谁,他们的具体要求是什么,用户应当能够通过知识管理系统定义知识、发现知识、整合企业内外众多的知识源,并对其分类和组织。 3、鼓励知识共享行为 一般,鼓励知识共享更多地依赖于组织的内部管理,但知识管理系统能够为组织实施各种知识共享措施提供支持。例如:提供意见反馈、评分与建议等用户之间交互的工具,增加用户参与感。从系统提供的机制中,管理层可以清楚地得知用户的参与及贡献度,并能由用户反馈、使用频率明确地判别其知识价值。 4、符合知识创新和转化的需求 知识管理系统能够将企业每个成员纳入管理范围之内,为其提供创新所必须的知识库,通过规定的知识创新步骤、设计好的知识表达的规范格式来引导成员将其隐性知识显性化。 5、注意系统安全问题 由于知识管理系统可能涉及到组织的核心知识资产,所以系统要有一定的技术手段保护核心知识的流失,例如对相关的分析数据或文档信息有一套权限管理机制,并满足用户个性化需求。 案例: 埃森哲的知识管理系统被誉为“全球知识共享网络”(lotus notes),系统利用KX(knowledge xchange)黄页提供多种查询途径。全球47个国家110个办事处共有75000个用户。员工通过黄页可以迅速找到所需资料库加入到个人的工作站中。资料库种类繁多、内容丰富,全球同步随时更新。资料库包括图书库/实务帮助、论坛、行业分析等,可以按照不同市场/行业/或服务领域分门别类。

知识管理案例之一Marconi的知识共享

知识管理案例之一——Marconi的知识共享 https://www.doczj.com/doc/5f816874.html,./ewkArticles/Category111/Article14375.htm 来源:.cio. 翻译特约撰稿人:王晨 2003-6-6 投稿 Marconi进入了一场并购狂潮,在3年的时间内先后收购了10家通信公司,随之而来的便是严峻的挑战:这个30亿美元通信设备的生产巨头如何保证它的技术支持人员了解最新收购的技术并在中向客户快速而准确的提供答案呢?Marconi如何培养熟悉公司所有产品的新人员? Marconi的技术支持人员(遍布全球14个呼叫中心的500个工程师)每个月要回复大约10,000个有关公司产品的问题。在并购之前,技术支持人员依赖于公司的外部网(Extranet)TacticsOnline,他们和客户在那里可以查询最近的问题和文本文档。当新的技术支持人员和产品加入公司以后,Marconi希望用一个更加全面的知识管理系统来补充。而新加盟公司的工程师却不愿将他们一直支持的产品的知识共享。“他们觉得他们的知识就象一X安全的毯子,确保他们不会失去工作”,负责管理服务技术与研发的主管DaveBreit说,“进行并购之后,最为重要的就是避免知识囤积,而要使它们共享”。 与此同时,Marconi还想通过把更多的产品和系统信息呈现给客户以及缩短客户的长度来提高客户服务部门的效率。“我们希望在提供客户自助服务的Web页和增加技术支持人员之间进行平衡”,Breit说,“我们还希望能够为我们的一线工程师(直接与客户打交道的)更加迅速的提供更多的信息,使得他们能够更快的解决客户的问题”。 建立知识管理的基础 当Marconi1998年春开始评估知识管理技术时,在技术支持人员中共享知识已经不是什么新鲜的概念了。技术支持人员已经习惯了在3到4人的团队中工作,以作战室的方式集合起来解决客户的技术问题。而且一年前,Marconi已经开始将技术支持人员季度奖的一部分与他们提交给TacticsOnline的知识多少、他们参与指导培训其他技术支持人员的情况进行挂钩。“每位技术服务人员需要教授两次培训课,写10条FAQ,才能获得全奖”,Breit说,“当新公司加入以后,新的技术支持人员将实行同样的奖金计划,这个办法使我们建立了一个相当开放的知识共享的环境”。 为了增强TacticsOnline,Marconi从ServiceWareTechnologies那里选择了软件,部分是因为他们的技术很容易与公司的排障客户关系管理系统(RemedyCRMSystem)集成,技术支持人员用这个系统来记录客户请求,跟踪客户交流。另外,Breit指出,Marconi希望它的技术支持人员能够利用已有的产品信息的Oracle数据库。 Breit的部门花了六个月的时间实施新系统并培训技术支持人员。这个被称为知识库(KnowledgeBase)的系统可以与公司的客户关系管理系统相连接,由强大的Oracle数据库所支持。Marconi客户与产品集成的观点为技术服务人员提供了全面的交流历史纪录。例如,技术服务人员可以将制造者加入数据库,立即接过上一位技术服务人员与客户进行交流。 在第一线 TacticsOnline补足了这个新系统。“知识库中的数据是关于我们不同产品线的特定的排障技巧和线索”,知识管理系统的管理员ZehraDemiral说,“而TacticsOnline更多的则是客户进入我

2019年温州一般公需科目《信息管理与知识管理》模拟题

2019年温州一般公需科目《信息管理与知识管理》模拟题 一、单项选择题(共30小题,每小题2分) 2.知识产权的时间起点就是从科研成果正式发表和公布的时间,但有期限,就是当事人去世()周年以内权利是保全的。 A、三十 B、四十 C、五十 D、六十 3.()国务院发布《中华人民共和国计算机信息系统安全保护条例》。 A、2005年 B、2000年 C、1997年 D、1994年 4.专利申请时,如果审察员认为人类现有的技术手段不可能实现申请者所描述的结果而不给予授权,这体现了()。 A、技术性 B、实用性 C、创造性 D、新颖性 5.根据本讲,参加课题的研究策略不包括()。 A、量力而行 B、逐步击破 C、整体推进 D、有限目标 6.本讲认为,传阅保密文件一律采取()。 A、横向传递 B、纵向传递 C、登记方式 D、直传方式 7.本讲提到,高达()的终端安全事件是由于配置不当造成。 A、15% B、35% C、65% D、95% 8.本讲认为知识产权制度本质特征是()。 A、反对垄断 B、带动创业 C、激励竞争 D、鼓励创新 9.本讲指出,以下不是促进基本 公共服务均等化的是()。 A、互联网+教育 B、互联网+医 疗 C、互联网+文化 D、互联网+工 业 10.根据()的不同,实验可以分 为定性实验、定量实验、结构分析实 验。 A、实验方式 B、实验在科研中 所起作用C、实验结果性质D、实验 场所 11.本讲认为,大数据的门槛包含 两个方面,首先是数量,其次是()。 A、实时性 B、复杂性 C、结构化 D、非结构化 12.2007年9月首届()在葡萄牙里 斯本召开。大会由作为欧盟轮值主席 国葡萄牙的科学技术及高等教育部 主办,由欧洲科学基金会和美国研究 诚信办公室共同组织。 A、世界科研诚信大会 B、世界学术诚信大会 C、世界科研技术大会 D、诺贝尔颁奖大会 13.()是2001年7月15日发现 的网络蠕虫病毒,感染非常厉害,能 够将网络蠕虫、计算机病毒、木马程 序合为一体,控制你的计算机权限, 为所欲为。 A、求职信病毒 B、熊猫烧香病 毒 C、红色代码病毒 D、逻辑炸弹 14.本讲提到,()是创新的基础。 A、技术 B、资本 C、人才 D、知 识 15.知识产权保护中需要多方协 作,()除外。 A、普通老百姓 B、国家 C、单位 D、科研人员 17.世界知识产权日是每年的()。 A、4月23日 B、4月24日 C、4月25日 D、4月26日 18.2009年教育部在《关于严肃 处理高等学校学术不端行为的通知》 中采用列举的方式将学术不端行为 细化为针对所有研究领域的()大类行 为。 A、3 B、5 C、7 D、9 19.美国公民没有以下哪个证件 ()。 A、护照 B、驾驶证 C、身份证 D、社会保障号 20.关于学术期刊下列说法正确 的是()。 A、学术期刊要求刊发的都是第 一手资料B、学术期刊不要求原发 C、在选择期刊时没有固定的套 式 D、对论文的专业性没有限制 22.知识产权的经济价值(),其 遭遇的侵权程度()。 A、越大,越低 B、越大,越高 C、越小,越高 D、越大,不变 23.()是一项用来表述课题研究 进展及结果的报告形式。 A、开题报告 B、文献综述 C、课题报告 D、序论 25.1998年,()发布《电子出版 物管理暂行规定》。

Elearning平台未来发展新趋势

Elearning 平台未来发展新趋势

随着e-Learning概念被大众逐渐所接受,如今许多e-Learning公司纷纷涌现,现基本形成三大氛围趋势。一类提供技术(提供学习管理平台,如上海久隆信息jite公司、Saba公司、Lotus公司)、一类侧重内容提供(如北大在线、SkillSoft公司、Smartforce公司、Netg公司)、一类专做服务提供(如Allen Interactions公司)。不过随着e-Learning发展,提供端到端解决方案的公司将越来越多,或者并不局限于一个领域,如目前北大在线除提供职业规划与商务网上培训课件外,还提供相应的网络培训服务,e-Learning作为企业发展的添加剂,必将成为知识经济时代的正确抉择。 e-Learning的兴起本身与互联网的发展应用有着密切关联,它更偏重于与传统教育培训行业相结合,有着扎实的基础及发展前景,因而在大部分互联网公司不景气之时,e-Learning公司仍然被许多公司所看好。 权威的Taylor Nelson Sofresd对北美的市场调查表明,有94%的机构认识到了e-Learning的重要性,雇员在10000人以上的公司62.7%都实施了e-Learning,有85%的公司准备在今后继续增加对e-Learning的投入。而54%的公司已经或预备应用e-Learning来学习职业规划与商务应用技能。 面对如此庞大的e-Learning市场,全世界e-Learning公司将面临巨大机遇,但竞争也更加激烈。优胜劣汰成为必然。将来只有更适应市场需求,确保质量,树立信誉,建设自有品牌的公司才更具备有竞争力。那么未来e-Learning到底将走向何方? 如今很多平台开发厂商以及用户都很关心这个问题。其实从e-Learning诞生到现在,如今大家有目共睹,移动终端上的3-D让我们的体验飘飘欲仙,该技术放大了我们的舒适度;云计算使得软件的访问轻而易举;全息图像为学习环境提供了逼真的效果;个性化的LMS使得用户可以学会更多专业人员的从业经验;更多的终端、更低廉的带宽连接为现实向世界提供解决方案的能力。 个性化学习是e-Learning平台共同追求的目标,力求解决单个群体的日常需求,将成为elearning平台下一阶段发展的新追求。那么个性化的学习将为何物呢?它又将通过何途径来实现呢? 移动学习 移动学习(M-learning)是中国远程教育本阶段发展的目标方向,特点是实现“Anyone、Anytime、Anywhere、Any style”(4A)下进行自由地学习。移动学习依托目前比较成熟的无线移动网络、国际互联网以及多媒体技术,学生和教师使用移动设备(如无线上网的便携式计算机、PDA、手机等),通 模板版次:2.0 I COPYRIGHT? 2000- JITE SHANGHAI

知识管理制度通用

[ 蓝凌管理咨询报告之三] ****集团基于LKS4.0知识管理 实施推进制度 (建议稿) 目录 一、知识管理的目标和策略 二、知识管理组织架构和岗位主要职责 三、知识贡献管理 四、知识内容的推进管理 五、系统内知识的安全管理 六、知识管理的绩效考评 七、附则 附件1:知识需求贡献表(样表) 附件2:内部专家名单表(含专家背景调查表)附件3:常用资料需求表

附件4:定期检查的知识表 附件5:模块的权限分配 深圳市蓝凌管理咨询支持系统有限公司 地址:深圳高新区中区麻雀岭工业区7栋五楼A区 电话:(0755)26710066-1808、1708 传真:(0755)26710099 邮编:518057 ****集团知识管理制度(建议稿) 一、知识管理的目标和策略 1.1目标 1.1.1****集团的知识管理目标,就是要把支持企业各方面工作的信息、知识管理起来,提高工作效率、保证工作质量、降低工作成本。通过“将最恰当的知识在最恰当的时间传递给最恰

当的人”,使集团全体员工掌握好企业知识,建立和强化企业的核心利润源,谋取企业长期的、稳定的、增长的利润。 1.1.2 较具体地, 要在这次知识管理的推动中,建立起集团的知识体系架构;整理出全集团各岗位工作所产生的知识、员工贡献的知识、员工所需求的知识、常用的知识、基础知识、专家知识、员工的工作经验和总结等知识内容;确定这些知识内容的管理方式;通过工具平台、组织和制度逐步使知识管理工作走向正规化。 1.2 策略 1.2.1 根据****集团业务和工作特点,进行知识管理信息化办公平台和知识管理组织和机制的建设,在专业咨询机构的指导下,分阶段稳步推进实施。 1.2.2 知识管理信息化办公平台选用蓝凌公司的LKS4.0系统。该系统以IBM/LOTUS 的DOMINO协作平台为基础,它是基于知识管理的工作平台,可以满足****集团的知识管理需求。

完善知识管理理论构建知识共享系统_王玉晶

完善知识管理理论构建知识共享系统 王玉晶 【摘 要】知识共享是知识管理的一个重要组成部分。本文从知识管理的概念入手,指出了知识共享在知识管理中的重要性和特殊地位,分析了企业中进行知识共享存在的障碍与不利因素,从建立知识共享技术支撑体系、构建知识共享组织结构、营造知识共享文化、完善知识共享激励机制等方面深入探讨了如何提高企业知识共享程度,加强企业竞争力。 【关键词】知识管理 知识共享 体系 Abstract:Kno wledge sharing is an important component of the kno wledge management.This paper obtained from the kno wledge management concept,has pointed out the importance and the special status of knowledge sharing in the knowledge manage ment,and analyzed the barrier and th e disadvantage factor of knowledge sharing in the enterprise.This article pointed out that the knowledge sharing should be established technical support system,built knowledge sharing structure,created a kno wledge sharing culture,and improved knowledge sharing incentive mechanism. Key words:kno wledge sharing knowledge management system 当今社会,企业竞争力已不局限于仅仅依赖其规模、信息和技术,而更注重其创新和应变能力。知识管理就是在知识经济大背景下,以当代信息技术为依托,着力于知识的开发和利用、积累和创新,帮助人们共享信息,进而对扩大了的知识资本进行有效运营,最终通过集体智慧提高企业创新和应变能力的一种全新的信息管理理论与方法。而创新本身,无论是技术创新还是管理创新,其实质就是一种新知识的创造,它要求企业内部各个部门之间以及各员工之间,企业内部与外部之间进行广泛的知识交流与共享。所以知识共享一直是知识管理的一个重要组成部分。它是发挥知识的效用,实现知识价值的必由之路,是实现知识共享的前提,是走向知识共享世界的先导。理解知识共享的内涵,分析企业知识共享存在的障碍与不利因素并提出解决的方案,对完善组织中的知识共享机制具有重要意义。 1 知识共享的含义 关于知识共享的概念,由于标准、角度的不同,学者对知识共享的界定也有所不同。知识共享是指个体知识、组织知识通过各种交流手段为组织中其他成员所共享。同时,通过知识创新,实现组织的知识增值。对知识共享的认识应从知识共享的对象———知识 内容,知识共享的手段———知识网络、会议及团队学习,知识共享的主体———个人、团队及组织这三个层面考虑。其中,人是知识共享的主体,文化是知识共享的背景,信息技术是知识共享的重要工具。 2 知识共享障碍因素分析 2.1 技术应用障碍 知识管理作为一种全新的管理模式,如果缺乏应有的技术支撑,将难以发挥其应有的效力,最终失之于形式,流于空泛。许多企业物质技术基础薄弱,缺少有效的计算机网络和通信系统,很多员工不知道到哪里去寻找所需要的知识。存在于员工头脑中的隐性知识难以明确的被他人观察、了解,也无法用言语表达或者只能用言语进行部分表达。因此,薄弱的技术基础不能够帮助人们跨越时间、空间和知识的数量及质量的限制,从而不能为有效地实现知识共享提供强有力的技术支持。 2.2 组织结构缺陷 传统的组织结构管理层次多,信息传递速度缓慢,信息衰退或信息失真现象非常严重,员工之间没有超越命令以外的直接接触和交流,无法实现面对面的互动式交流,无法突破岗位对个人的约束。多层管理结构森严,越级的知识传递几乎不可能实现;员工之间 27 RESEARCH ES I N LIBRARY SCIENCE  DOI:10.15941/https://www.doczj.com/doc/5f816874.html, ki.issn1001-0424.2008.05.027

相关主题
文本预览
相关文档 最新文档