SYSTRAN’s Chinese Word Segmentation
- 格式:pdf
- 大小:197.77 KB
- 文档页数:4
Systrace使⽤简介简介Systrace是基于Linux Ftrace功能实现的⼀套Function Trace⼯具。
通过它可以了解到⼀段时间内系统任务运⾏状态,⽅便分析问题。
Systrace抓取Systrace可以通过Android monitor和脚本⼯具两种⽅法抓取。
1.脚本:其中-t表⽰抓取的时间,-o表⽰输出的⽂件名字,后⾯⼀串是要抓取的trace的TAG。
2.monitor⼯具:点击上图的按钮就会弹出trace选择框,可以设置时间/⽂件⼤⼩/trace TAG等。
添加trace点1.Java层:a)import Trace类:import android.os.Trace;b)在需要Trace的地⽅调⽤trace函数:第⼀个参数是这个Trace所属的TAG,第⼆个是需要在Trace⽂件中显⽰的内容。
c)Java层的trace可以是具体的代码块。
2.Native层:a)inclue trace头⽂件:#includeb)在需要Trace的函数前调⽤trace:Native层的trace只能以function为单位Systrace分析从Systrace可以看出很多信息,下⾯介绍⼀些常⽤的:1.从下图可以看出每个CPU的运⾏状态。
a)C-State表⽰CPU处于idle状态,可以看到idle状态没有进程运⾏。
b)点击CPU上运⾏的⼀个进程,可以看到这个进程的⼀些信息,如下图⿊框中的信息,stateWhenDescheduled表⽰了这个进程被调度运⾏时的状态。
c)点击Clock Frequency,可以看到CPU的频率值。
2.查看屏幕刷新状态。
每帧的间隔和刷新率有关,如果⼀帧的执⾏时间超过了刷新率规定的间隔就会导致卡顿。
例如下图trace是从⼀个帧率为57的机器上抓取的,因此它的间隔基本上是17ms左右。
上图放⼤后可以看到SurfaceFlinger刷新屏幕数据时的调⽤关系。
调⽤顺序是从上到下。
前言,目录 位逻辑指令 1 比较指令 2 转换指令 3 计数器指令 4 数据块指令 5 逻辑控制指令 6 整数运算指令 7 浮点数运算指令 8 装载和传送指令 9 程序控制指令 10 移位和循环移位指令 11 定时器指令 12 字逻辑指令 13 累加器指令 14附录所有语句表指令一览 A 编程举例 B 参数传递 CSIMATICS7-300和S7-400 语句表(STL )编程参考手册2006年3月版A5E00706960-01索引安全指南本手册包括应该遵守的注意事项,以保证人身安全及财产损失。
在本手册中,与人身安全有关的注意事项通过安全警告符号突出显示,而只与财产损失有关的注意事项则没有安全警告符号。
这些注意事项根据危险等级显示如下:危险表示若不采取适当的预防措施,将导致死亡或严重的人身伤害。
警告表示若不采取适当的预防措施,将可能导致死亡或严重的人身伤害。
小心带安全警告符号时,表示若不采取适当的预防措施,将导致轻微的人身伤害。
小心不带安全警告符号时,表示若不采取适当的预防措施,将造成财产损失。
注意如果不引起相应的重视,将会导致意外的结果或状态。
当出现多个安全等级时,应该采用最高危险等级的安全提示。
带安全警告符号的人员伤害警告也可能会导致财产损失。
合格人员只有合格人员才允许对设备/系统进行调试和操作。
合格人员规定为根据既定的安全惯例和标准,被授权对设备、系统和电路进行调试、接地和加装标签的人员。
正确使用请注意如下事项:警告该装置及其组件只能用于产品目录或技术说明书中所阐述的应用,并且只能与由西门子公司认可或推荐的第三方厂商提供的设备或组件一起使用。
本产品只有在正确的运输、贮存、设置和安装以及仔细地运行和维护的情况下,才能正确而安全地运行。
商标标有®的所有名称均为西门子公司的注册商标。
本文档中的其它一些标志也是注册商标,如果任何第三方出于个人目的而使用,都会侵犯商标所有者的权利。
郑重声明我们已核对过本手册的内容与所述硬件和软件相符。
长文本重叠切分生成向量-概述说明以及解释1.引言1.1 概述概述部分旨在向读者介绍本文的主题和内容。
本文主要研究的是如何通过长文本的重叠切分来生成向量表示。
在这个过程中,我们需要经历三个主要步骤:长文本切分、文本重叠以及切分生成向量。
首先,我们对长文本进行切分,将其分成若干个子文本块。
这样做的目的是为了将长文本转化为多个短文本,以便我们可以更好地处理和分析其中的内容。
对于切分算法的选择,我们可以考虑使用基于句子、段落或者其他特定规则的方法。
其次,在进行文本切分的基础上,我们引入文本重叠的概念。
文本重叠是指将相邻的文本块进行部分重叠,以增加切分后文本的信息量。
通过重叠部分的引入,我们可以更好地捕捉到长文本中的语义和上下文信息。
最后,我们将切分后的文本块转化为向量表示。
向量表示是目前文本处理和分析的常用方法之一。
通过将文本块映射到向量空间中,我们可以利用向量之间的距离来衡量文本之间的相似性或者进行其他的文本分析任务。
在本文中,我们将介绍相关的方法和算法,探讨其在长文本领域中的应用。
通过实验和案例分析,我们将评估不同方法的性能和效果,以及它们在实际场景中的适用性。
最后,我们将总结本文的主要内容,并对未来进一步的研究方向进行展望。
1.2文章结构1.2 文章结构本文主要介绍了长文本重叠切分生成向量的方法和应用。
具体而言,文章分为引言、正文和结论三个部分,每个部分分为若干小节。
引言部分首先概述了长文本重叠切分生成向量的背景和意义。
随着人们对大数据处理和机器学习的需求增加,长文本的处理变得尤为重要。
然而,由于长文本的复杂性和庞大的数据量,传统的文本处理方法已经无法满足需求。
因此,本文提出了一种新的方法,即长文本重叠切分生成向量。
正文部分主要介绍了长文本切分、文本重叠和切分生成向量的具体步骤和原理。
首先,我们介绍了长文本切分的方法,包括按照句子、段落或关键词进行切分。
然后,我们详细说明了文本重叠的概念和作用,以及如何在切分过程中实现文本的重叠。
Gradient-Based Learning Appliedto Document RecognitionYANN LECUN,MEMBER,IEEE,L´EON BOTTOU,YOSHUA BENGIO,AND PATRICK HAFFNER Invited PaperMultilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient-based learning technique.Given an appropriate network architecture,gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns,such as handwritten characters,with minimal preprocessing.This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task.Convolutional neural networks,which are specifically designed to deal with the variability of two dimensional(2-D)shapes,are shown to outperform all other techniques.Real-life document recognition systems are composed of multiple modules includingfield extraction,segmentation,recognition, and language modeling.A new learning paradigm,called graph transformer networks(GTN’s),allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure.Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training,and theflexibility of graph transformer networks.A graph transformer network for reading a bank check is also described.It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks.It is deployed commercially and reads several million checks per day. Keywords—Convolutional neural networks,document recog-nition,finite state transducers,gradient-based learning,graphtransformer networks,machine learning,neural networks,optical character recognition(OCR).N OMENCLATUREGT Graph transformer.GTN Graph transformer network.HMM Hidden Markov model.HOS Heuristic oversegmentation.K-NN K-nearest neighbor.Manuscript received November1,1997;revised April17,1998.Y.LeCun,L.Bottou,and P.Haffner are with the Speech and Image Processing Services Research Laboratory,AT&T Labs-Research,Red Bank,NJ07701USA.Y.Bengio is with the D´e partement d’Informatique et de Recherche Op´e rationelle,Universit´e de Montr´e al,Montr´e al,Qu´e bec H3C3J7Canada. Publisher Item Identifier S0018-9219(98)07863-3.NN Neural network.OCR Optical character recognition.PCA Principal component analysis.RBF Radial basis function.RS-SVM Reduced-set support vector method. SDNN Space displacement neural network.SVM Support vector method.TDNN Time delay neural network.V-SVM Virtual support vector method.I.I NTRODUCTIONOver the last several years,machine learning techniques, particularly when applied to NN’s,have played an increas-ingly important role in the design of pattern recognition systems.In fact,it could be argued that the availability of learning techniques has been a crucial factor in the recent success of pattern recognition applications such as continuous speech recognition and handwriting recognition. The main message of this paper is that better pattern recognition systems can be built by relying more on auto-matic learning and less on hand-designed heuristics.This is made possible by recent progress in machine learning and computer ing character recognition as a case study,we show that hand-crafted feature extraction can be advantageously replaced by carefully designed learning machines that operate directly on pixel ing document understanding as a case study,we show that the traditional way of building recognition systems by manually integrating individually designed modules can be replaced by a unified and well-principled design paradigm,called GTN’s,which allows training all the modules to optimize a global performance criterion.Since the early days of pattern recognition it has been known that the variability and richness of natural data, be it speech,glyphs,or other types of patterns,make it almost impossible to build an accurate recognition system entirely by hand.Consequently,most pattern recognition systems are built using a combination of automatic learning techniques and hand-crafted algorithms.The usual method0018–9219/98$10.00©1998IEEE2278PROCEEDINGS OF THE IEEE,VOL.86,NO.11,NOVEMBER1998Fig.1.Traditional pattern recognition is performed with two modules:afixed feature extractor and a trainable classifier.of recognizing individual patterns consists in dividing the system into two main modules shown in Fig.1.Thefirst module,called the feature extractor,transforms the input patterns so that they can be represented by low-dimensional vectors or short strings of symbols that:1)can be easily matched or compared and2)are relatively invariant with respect to transformations and distortions of the input pat-terns that do not change their nature.The feature extractor contains most of the prior knowledge and is rather specific to the task.It is also the focus of most of the design effort, because it is often entirely hand crafted.The classifier, on the other hand,is often general purpose and trainable. One of the main problems with this approach is that the recognition accuracy is largely determined by the ability of the designer to come up with an appropriate set of features. This turns out to be a daunting task which,unfortunately, must be redone for each new problem.A large amount of the pattern recognition literature is devoted to describing and comparing the relative merits of different feature sets for particular tasks.Historically,the need for appropriate feature extractors was due to the fact that the learning techniques used by the classifiers were limited to low-dimensional spaces with easily separable classes[1].A combination of three factors has changed this vision over the last decade.First, the availability of low-cost machines with fast arithmetic units allows for reliance on more brute-force“numerical”methods than on algorithmic refinements.Second,the avail-ability of large databases for problems with a large market and wide interest,such as handwriting recognition,has enabled designers to rely more on real data and less on hand-crafted feature extraction to build recognition systems. The third and very important factor is the availability of powerful machine learning techniques that can handle high-dimensional inputs and can generate intricate decision functions when fed with these large data sets.It can be argued that the recent progress in the accuracy of speech and handwriting recognition systems can be attributed in large part to an increased reliance on learning techniques and large training data sets.As evidence of this fact,a large proportion of modern commercial OCR systems use some form of multilayer NN trained with back propagation.In this study,we consider the tasks of handwritten character recognition(Sections I and II)and compare the performance of several learning techniques on a benchmark data set for handwritten digit recognition(Section III). While more automatic learning is beneficial,no learning technique can succeed without a minimal amount of prior knowledge about the task.In the case of multilayer NN’s, a good way to incorporate knowledge is to tailor its archi-tecture to the task.Convolutional NN’s[2],introduced in Section II,are an example of specialized NN architectures which incorporate knowledge about the invariances of two-dimensional(2-D)shapes by using local connection patterns and by imposing constraints on the weights.A comparison of several methods for isolated handwritten digit recogni-tion is presented in Section III.To go from the recognition of individual characters to the recognition of words and sentences in documents,the idea of combining multiple modules trained to reduce the overall error is introduced in Section IV.Recognizing variable-length objects such as handwritten words using multimodule systems is best done if the modules manipulate directed graphs.This leads to the concept of trainable GTN,also introduced in Section IV. Section V describes the now classical method of HOS for recognizing words or other character strings.Discriminative and nondiscriminative gradient-based techniques for train-ing a recognizer at the word level without requiring manual segmentation and labeling are presented in Section VI. Section VII presents the promising space-displacement NN approach that eliminates the need for segmentation heuris-tics by scanning a recognizer at all possible locations on the input.In Section VIII,it is shown that trainable GTN’s can be formulated as multiple generalized transductions based on a general graph composition algorithm.The connections between GTN’s and HMM’s,commonly used in speech recognition,is also treated.Section IX describes a globally trained GTN system for recognizing handwriting entered in a pen computer.This problem is known as “online”handwriting recognition since the machine must produce immediate feedback as the user writes.The core of the system is a convolutional NN.The results clearly demonstrate the advantages of training a recognizer at the word level,rather than training it on presegmented, hand-labeled,isolated characters.Section X describes a complete GTN-based system for reading handwritten and machine-printed bank checks.The core of the system is the convolutional NN called LeNet-5,which is described in Section II.This system is in commercial use in the NCR Corporation line of check recognition systems for the banking industry.It is reading millions of checks per month in several banks across the United States.A.Learning from DataThere are several approaches to automatic machine learn-ing,but one of the most successful approaches,popularized in recent years by the NN community,can be called“nu-merical”or gradient-based learning.The learning machine computes afunction th input pattern,andtheoutputthatminimizesand the error rate on the trainingset decreases with the number of training samplesapproximatelyasis the number of trainingsamples,is a number between0.5and1.0,andincreases,decreases.Therefore,when increasing thecapacitythat achieves the lowest generalizationerror Mostlearning algorithms attempt tominimize as well assome estimate of the gap.A formal version of this is calledstructural risk minimization[6],[7],and it is based on defin-ing a sequence of learning machines of increasing capacity,corresponding to a sequence of subsets of the parameterspace such that each subset is a superset of the previoussubset.In practical terms,structural risk minimization isimplemented byminimizingisaconstant.that belong to high-capacity subsets ofthe parameter space.Minimizingis a real-valuedvector,with respect towhichis iteratively adjusted asfollows:is updated on the basis of a singlesampleof several layers of processing,i.e.,the back-propagation algorithm.The third event was the demonstration that the back-propagation procedure applied to multilayer NN’s with sigmoidal units can solve complicated learning tasks. The basic idea of back propagation is that gradients can be computed efficiently by propagation from the output to the input.This idea was described in the control theory literature of the early1960’s[16],but its application to ma-chine learning was not generally realized then.Interestingly, the early derivations of back propagation in the context of NN learning did not use gradients but“virtual targets”for units in intermediate layers[17],[18],or minimal disturbance arguments[19].The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation[20]and for deriving generalizations of back propagation to recurrent networks[21]and networks of heterogeneous modules[22].A simple derivation for generic multilayer systems is given in Section I-E.The fact that local minima do not seem to be a problem for multilayer NN’s is somewhat of a theoretical mystery. It is conjectured that if the network is oversized for the task(as is usually the case in practice),the presence of “extra dimensions”in parameter space reduces the risk of unattainable regions.Back propagation is by far the most widely used neural-network learning algorithm,and probably the most widely used learning algorithm of any form.D.Learning in Real Handwriting Recognition Systems Isolated handwritten character recognition has been ex-tensively studied in the literature(see[23]and[24]for reviews),and it was one of the early successful applications of NN’s[25].Comparative experiments on recognition of individual handwritten digits are reported in Section III. They show that NN’s trained with gradient-based learning perform better than all other methods tested here on the same data.The best NN’s,called convolutional networks, are designed to learn to extract relevant features directly from pixel images(see Section II).One of the most difficult problems in handwriting recog-nition,however,is not only to recognize individual charac-ters,but also to separate out characters from their neighbors within the word or sentence,a process known as seg-mentation.The technique for doing this that has become the“standard”is called HOS.It consists of generating a large number of potential cuts between characters using heuristic image processing techniques,and subsequently selecting the best combination of cuts based on scores given for each candidate character by the recognizer.In such a model,the accuracy of the system depends upon the quality of the cuts generated by the heuristics,and on the ability of the recognizer to distinguish correctly segmented characters from pieces of characters,multiple characters, or otherwise incorrectly segmented characters.Training a recognizer to perform this task poses a major challenge because of the difficulty in creating a labeled database of incorrectly segmented characters.The simplest solution consists of running the images of character strings through the segmenter and then manually labeling all the character hypotheses.Unfortunately,not only is this an extremely tedious and costly task,it is also difficult to do the labeling consistently.For example,should the right half of a cut-up four be labeled as a one or as a noncharacter?Should the right half of a cut-up eight be labeled as a three?Thefirst solution,described in Section V,consists of training the system at the level of whole strings of char-acters rather than at the character level.The notion of gradient-based learning can be used for this purpose.The system is trained to minimize an overall loss function which measures the probability of an erroneous answer.Section V explores various ways to ensure that the loss function is differentiable and therefore lends itself to the use of gradient-based learning methods.Section V introduces the use of directed acyclic graphs whose arcs carry numerical information as a way to represent the alternative hypotheses and introduces the idea of GTN.The second solution,described in Section VII,is to eliminate segmentation altogether.The idea is to sweep the recognizer over every possible location on the input image,and to rely on the“character spotting”property of the recognizer,i.e.,its ability to correctly recognize a well-centered character in its inputfield,even in the presence of other characters besides it,while rejecting images containing no centered characters[26],[27].The sequence of recognizer outputs obtained by sweeping the recognizer over the input is then fed to a GTN that takes linguistic constraints into account andfinally extracts the most likely interpretation.This GTN is somewhat similar to HMM’s,which makes the approach reminiscent of the classical speech recognition[28],[29].While this technique would be quite expensive in the general case,the use of convolutional NN’s makes it particularly attractive because it allows significant savings in computational cost.E.Globally Trainable SystemsAs stated earlier,most practical pattern recognition sys-tems are composed of multiple modules.For example,a document recognition system is composed of afield loca-tor(which extracts regions of interest),afield segmenter (which cuts the input image into images of candidate characters),a recognizer(which classifies and scores each candidate character),and a contextual postprocessor,gen-erally based on a stochastic grammar(which selects the best grammatically correct answer from the hypotheses generated by the recognizer).In most cases,the information carried from module to module is best represented as graphs with numerical information attached to the arcs. For example,the output of the recognizer module can be represented as an acyclic graph where each arc contains the label and the score of a candidate character,and where each path represents an alternative interpretation of the input string.Typically,each module is manually optimized,or sometimes trained,outside of its context.For example,the character recognizer would be trained on labeled images of presegmented characters.Then the complete system isLECUN et al.:GRADIENT-BASED LEARNING APPLIED TO DOCUMENT RECOGNITION2281assembled,and a subset of the parameters of the modules is manually adjusted to maximize the overall performance. This last step is extremely tedious,time consuming,and almost certainly suboptimal.A better alternative would be to somehow train the entire system so as to minimize a global error measure such as the probability of character misclassifications at the document level.Ideally,we would want tofind a good minimum of this global loss function with respect to all theparameters in the system.If the loss functionusing gradient-based learning.However,at first glance,it appears that the sheer size and complexity of the system would make this intractable.To ensure that the global loss functionwithrespect towith respect toFig.2.Architecture of LeNet-5,a convolutional NN,here used for digits recognition.Each plane is a feature map,i.e.,a set of units whose weights are constrained to be identical.or other2-D or one-dimensional(1-D)signals,must be approximately size normalized and centered in the input field.Unfortunately,no such preprocessing can be perfect: handwriting is often normalized at the word level,which can cause size,slant,and position variations for individual characters.This,combined with variability in writing style, will cause variations in the position of distinctive features in input objects.In principle,a fully connected network of sufficient size could learn to produce outputs that are invari-ant with respect to such variations.However,learning such a task would probably result in multiple units with similar weight patterns positioned at various locations in the input so as to detect distinctive features wherever they appear on the input.Learning these weight configurations requires a very large number of training instances to cover the space of possible variations.In convolutional networks,as described below,shift invariance is automatically obtained by forcing the replication of weight configurations across space. Secondly,a deficiency of fully connected architectures is that the topology of the input is entirely ignored.The input variables can be presented in any(fixed)order without af-fecting the outcome of the training.On the contrary,images (or time-frequency representations of speech)have a strong 2-D local structure:variables(or pixels)that are spatially or temporally nearby are highly correlated.Local correlations are the reasons for the well-known advantages of extracting and combining local features before recognizing spatial or temporal objects,because configurations of neighboring variables can be classified into a small number of categories (e.g.,edges,corners,etc.).Convolutional networks force the extraction of local features by restricting the receptive fields of hidden units to be local.A.Convolutional NetworksConvolutional networks combine three architectural ideas to ensure some degree of shift,scale,and distortion in-variance:1)local receptivefields;2)shared weights(or weight replication);and3)spatial or temporal subsampling.A typical convolutional network for recognizing characters, dubbed LeNet-5,is shown in Fig.2.The input plane receives images of characters that are approximately size normalized and centered.Each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer.The idea of connecting units to local receptivefields on the input goes back to the perceptron in the early1960’s,and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive,orientation-selective neurons in the cat’s visual system[30].Local connections have been used many times in neural models of visual learning[2],[18],[31]–[34].With local receptive fields neurons can extract elementary visual features such as oriented edges,endpoints,corners(or similar features in other signals such as speech spectrograms).These features are then combined by the subsequent layers in order to detect higher order features.As stated earlier,distortions or shifts of the input can cause the position of salient features to vary.In addition,elementary feature detectors that are useful on one part of the image are likely to be useful across the entire image.This knowledge can be applied by forcing a set of units,whose receptivefields are located at different places on the image,to have identical weight vectors[15], [32],[34].Units in a layer are organized in planes within which all the units share the same set of weights.The set of outputs of the units in such a plane is called a feature map. Units in a feature map are all constrained to perform the same operation on different parts of the image.A complete convolutional layer is composed of several feature maps (with different weight vectors),so that multiple features can be extracted at each location.A concrete example of this is thefirst layer of LeNet-5shown in Fig.2.Units in thefirst hidden layer of LeNet-5are organized in six planes,each of which is a feature map.A unit in a feature map has25inputs connected to a5case of LeNet-5,at each input location six different types of features are extracted by six units in identical locations in the six feature maps.A sequential implementation of a feature map would scan the input image with a single unit that has a local receptive field and store the states of this unit at corresponding locations in the feature map.This operation is equivalent to a convolution,followed by an additive bias and squashing function,hence the name convolutional network.The kernel of the convolution is theOnce a feature has been detected,its exact location becomes less important.Only its approximate position relative to other features is relevant.For example,once we know that the input image contains the endpoint of a roughly horizontal segment in the upper left area,a corner in the upper right area,and the endpoint of a roughly vertical segment in the lower portion of the image,we can tell the input image is a seven.Not only is the precise position of each of those features irrelevant for identifying the pattern,it is potentially harmful because the positions are likely to vary for different instances of the character.A simple way to reduce the precision with which the position of distinctive features are encoded in a feature map is to reduce the spatial resolution of the feature map.This can be achieved with a so-called subsampling layer,which performs a local averaging and a subsampling,thereby reducing the resolution of the feature map and reducing the sensitivity of the output to shifts and distortions.The second hidden layer of LeNet-5is a subsampling layer.This layer comprises six feature maps,one for each feature map in the previous layer.The receptive field of each unit is a 232p i x e l i m a g e .T h i s i s s i g n i fic a n tt h e l a r g e s t c h a r a c t e r i n t h e d a t a b a s e (a t28fie l d ).T h e r e a s o n i s t h a t i t it h a t p o t e n t i a l d i s t i n c t i v e f e a t u r e s s u c h o r c o r n e r c a n a p p e a r i n t h e c e n t e r o f t h o f t h e h i g h e s t l e v e l f e a t u r e d e t e c t o r s .o f c e n t e r s o f t h e r e c e p t i v e fie l d s o f t h e l a y e r (C 3,s e e b e l o w )f o r m a 2032i n p u t .T h e v a l u e s o f t h e i n p u t p i x e l s o t h a t t h e b a c k g r o u n d l e v e l (w h i t e )c o ro fa n d t h e f o r e g r o u n d (b l ac k )c o r r e s p T h i s m a k e s t h e m e a n i n p u t r o u g h l y z e r o r o u g h l y o n e ,w h i c h a c c e l e r a t e s l e a r n i n g I n t h e f o l l o w i n g ,c o n v o l u t i o n a l l a y e r s u b s a m p l i n g l a y e r s a r e l a b e l ed S x ,a n d l a ye r s a r e l a b e l e d F x ,w h e r e x i s t h e l a y L a y e r C 1i s a c o n v o l u t i o n a l l a y e r w i t h E a c h u n i t i n e a c hf e a t u r e m a p i s c o n n e c t28w h i c h p r e v e n t s c o n n e c t i o n f r o m t h e i n p t h e b o u n d a r y .C 1c o n t a i n s 156t r a i n a b l 122304c o n n e c t i o n s .L a y e r S 2i s a s u b s a m p l i n g l a y e r w i t h s i s i z e 142n e i g h b o r h o o d i n t h e c o r r e s p o n d i n g f T h e f o u r i n p u t s t o a u n i t i n S 2a r e a d d e d ,2284P R O C E E D I N G S O F T H E I E E E ,V O L .86,N O .11,N O VTable 1Each Column Indicates Which Feature Map in S2Are Combined by the Units in a Particular Feature Map ofC3a trainable coefficient,and then added to a trainable bias.The result is passed through a sigmoidal function.The25neighborhoods at identical locations in a subset of S2’s feature maps.Table 1shows the set of S2feature maps combined by each C3feature map.Why not connect every S2feature map to every C3feature map?The reason is twofold.First,a noncomplete connection scheme keeps the number of connections within reasonable bounds.More importantly,it forces a break of symmetry in the network.Different feature maps are forced to extract dif-ferent (hopefully complementary)features because they get different sets of inputs.The rationale behind the connection scheme in Table 1is the following.The first six C3feature maps take inputs from every contiguous subsets of three feature maps in S2.The next six take input from every contiguous subset of four.The next three take input from some discontinuous subsets of four.Finally,the last one takes input from all S2feature yer C3has 1516trainable parameters and 156000connections.Layer S4is a subsampling layer with 16feature maps of size52neighborhood in the corresponding feature map in C3,in a similar way as C1and yer S4has 32trainable parameters and 2000connections.Layer C5is a convolutional layer with 120feature maps.Each unit is connected to a55,the size of C5’s feature maps is11.This process of dynamically increasing thesize of a convolutional network is described in Section yer C5has 48120trainable connections.Layer F6contains 84units (the reason for this number comes from the design of the output layer,explained below)and is fully connected to C5.It has 10164trainable parameters.As in classical NN’s,units in layers up to F6compute a dot product between their input vector and their weight vector,to which a bias is added.This weighted sum,denotedforunit (6)wheredeterminesits slope at the origin.Thefunctionis chosen to be1.7159.The rationale for this choice of a squashing function is given in Appendix A.Finally,the output layer is composed of Euclidean RBF units,one for each class,with 84inputs each.The outputs of each RBFunit(7)In other words,each output RBF unit computes the Eu-clidean distance between its input vector and its parameter vector.The further away the input is from the parameter vector,the larger the RBF output.The output of a particular RBF can be interpreted as a penalty term measuring the fit between the input pattern and a model of the class associated with the RBF.In probabilistic terms,the RBF output can be interpreted as the unnormalized negative log-likelihood of a Gaussian distribution in the space of configurations of layer F6.Given an input pattern,the loss function should be designed so as to get the configuration of F6as close as possible to the parameter vector of the RBF that corresponds to the pattern’s desired class.The parameter vectors of these units were chosen by hand and kept fixed (at least initially).The components of thoseparameters vectors were set to1.While they could have been chosen at random with equal probabilities for1,or even chosen to form an error correctingcode as suggested by [47],they were instead designed to represent a stylized image of the corresponding character class drawn on a7。
基于翻译方法的计算机翻译工具比较--以Systran、谷歌翻译、Trados为例王晶;谢聪【期刊名称】《英语广场(下旬刊 )》【年(卷),期】2016(000)007【摘要】Most researches on computer translation tools used to focus on the analysis or comparison on the quality of translation result. However,the study on the different machine translation approaches that various computer translation tools use has been neglected. The article compares different computer translation tools using varied machine translation approaches, based on machine translation approaches. The advantagesand disadvantages are compared, with the example of Systran, Google Translate and SDL Trados. It aims to give useful suggestions to the users of computer translation tools.%以往的计算机翻译工具研究大多从译文的质量或技术层面进行分析或比较,然而从其所使用的翻译方法的角度,对计算机翻译工具进行研究却并未受到研究人员的关注。
文章基于计算机翻译方法,比较使用不同翻译方法的计算机翻译工具,以Systran、谷歌翻译(Google Translate)及SDL Trados为例,比较其优势和劣势,为用户提供使用计算机翻译工具的建议。
SYSTRAN Chinese-English and English-Chinese HybridMachine Translation SystemsJin Yang, Satoshi Enoue Jean Senellart, Tristan CroisetSYSTRAN Software, Inc. SYSTRAN SA9333 Genesee Ave. Suite PL1 La Grande Arche, 1, Parvis de la DéfenseSan Diego, CA 92121, USA 92044 Paris La Défense Cedex,France{jyang, enoue}@ {senellart,croiset}@systran.frAbstract: This report describes both of SYSTRAN’s Chinese-English and English-Chinese machine translation systems that participated in the CWMT2009 machine translation evaluation tasks. The base systems are SYSTRAN rule-based machine translation systems, augmented with various statistical techniques. Based on the translations of the rule-based systems, we perform statistical post-editing with the provided bilingual and monolingual training corpora. In this report, we describe the technology behind the systems, the training data, and finally the evaluation results in the CWMT2009 evaluation. Our primary systems were top-ranked in the evaluation tasks.Keywords: Chinese-English Machine Translation, English-Chinese Machine Translation, Rule-Based Machine Translation System, Hybrid Approach, Statistical Post-EditingSYSTRAN混合策略汉英和英汉机器翻译系统Jin Yang, Satoshi Enoue Jean Senellart, Tristan CroisetSYSTRAN Software, Inc. SYSTRAN SA9333 Genesee Ave. Suite PL1 La Grande Arche, 1, Parvis de la DéfenseSan Diego, CA 92121, USA 92044 Paris La Défense Cedex,France{jyang, enoue}@ {senellart,croiset}@systran.fr摘要: 本文介绍了SYSTRAN参加CWMT2009机器翻译评测的汉英和英汉机器翻译系统。
SYSTRAN’s Chinese Word SegmentationJin Yang Jean Senellart Remi ZajacSYSTRAN Software, Inc.9333 Genesee Ave. San Diego, CA 92121, USA jyang@SYSTRAN S.A.1, rue du Cimetière95230 Soisy-sous-Montmorency, Francesenellart@systran.frSYSTRAN Software, Inc.9333 Genesee Ave.San Diego, CA 92121, USAzajac@AbstractSYSTRAN’s Chinese word segmentationis one important component of itsChinese-English machine translationsystem. The Chinese word segmentationmodule uses a rule-based approach, basedon a large dictionary and fine-grainedlinguistic rules. It works on general-purpose texts from different Chinese-speaking regions, with comparableperformance. SYSTRAN participated inthe four open tracks in the FirstInternational Chinese Word SegmentationBakeoff. This paper gives a generaldescription of the segmentation module,as well as the results and analysis of itsperformance in the Bakeoff.1IntroductionChinese word segmentation is one of the pre-processing steps of the SYSTRAN Chinese-English Machine Translation (MT) system. The development of the Chinese-English MT system began in August 1994, and this is where the Chinese word segmentation issue was first addressed. The algorithm of the early version of the segmentation module was borrowed from SYSTRAN’s Japanese segmentation module. The program ran on a large word list, which contained 600,000 entries at the time1. The basic strategy was to list all possible matches for an entire linguistic unit, then solve the overlapping matches via linguistic rules. The development was focused on technical domains, and high accuracy was achieved after only three months of development. Since then, development has shifted to other areas of Chinese-English MT, including the enrichment of the bi-lingual word lists with part-of-speech, syntactic and semantic features. In 2001, the development of a prototype Chinese-Japanese MT system began. Although the project only lasted for three months, some important changes were madein the segmentation convention, regarding the distinction between words and phrases2. Along with new developments of the SYSTRAN MT engine, the segmentation engine has recently beenre-implemented. The dictionary and the general approach remain unchanged, but dictionary lookup and rule matching were re-implemented using finite-state technology, and linguistic rules for the segmentation module are now expressed using a context-free-based formalism, improving maintainability. The re-implementation generates multiple segmentation results with associated probabilities. This will allow for disambiguation ata later stage of the MT process, and will widen the possibility of word segmentation for other applications.2System Description2.1Segmentation StandardOur definition of words and our segmentation conventions are based on available standards, modified for MT purposes. The PRC standard (Liuet al., 1993) was initially used. Sample differencesare listed as follows:Type PRC SYSTRANNP 中华民族中华人民共和国中华民族中华人民共和国CD 31日 31日CD + M 一个一排排一个一排排DI4 + CD 第一第一Name 李白李清照李白李清照Table 1. Segmentation Divergences with the PRC Guidelines2.2MethodologyThe SYSTRAN Chinese word segmentation module uses a rule-based approach and a large dictionary. The dictionary is derived from theLanguage Processing, July 2003, pp. 180-183. Proceedings of the Second SIGHAN Workshop on ChineseChinese-English MT dictionary. It currently includes about 400,000 words. The basic segmentation strategy is to list all possible matches for a translation unit (typically, a sentence), then to solve overlapping matches via linguistic rules. The same segmentation module and the same dictionary are used to segment different types of text with comparable performance.All dictionary lookup and rule matching are performed using a low level Finite State Automaton library. The segmentation speed is 3,500 characters per second using a Pentium 4 2.4GHZ processor.DictionaryThe Chinese-English MT dictionary currently contains 400,000 words (e.g., 中华), and 200,000 multi-word expressions (e.g., 中华人民共和国). Only words are used for the segmentation. Specialized linguistic rules are associated with the dictionary. The dictionary is general purpose, with good coverage on several domains. Domain-specific dictionaries are also available, but were not used in the Bakeoff.The dictionary contains words from different Chinese-speaking regions, but the representation is mostly in simplified Chinese. The traditional characters are considered as “variants”, and they are not physically stored in the dictionary. For example, 意大利 and 义大利 are stored in the dictionary, and 義大利 can also be found via the character matching 義→义.The dictionary is encoded in Unicode (UTF8), and all internal operations manipulate UTF8 strings. Major encoding conversions are supported, including GB2312-80, GB13000, BIG-5, BIG5-HKSCS, etc.TrainingThe segmentation module has been tested and fine-tuned on general texts, and on texts in the technical and military domains (because of specific customer requirements for the MT system). Due to the wide availability of news texts, the news domain has also recently been used for training and testing.The training process is merely reduced to the customization of a SYSTRAN MT system. In the current version of the MT system, customization is achieved by building a User Dictionary (UD). A UD supplements the main dictionary: any word that is not found in the main MT system dictionary is added in a User Dictionary.Name-Entity Recognition and Unknown Words Name entity recognition is still under development. Recognition of Chinese persons’ names is done via linguistic rules. Foreign name recognition is not yet implemented due to the difficulty of obtaining translations.Due to the unavailability of translations, even when an unknown word has been successfully recognized, we consider the unknown word recognition as part of the terminology extraction process. This feature was not integrated for the Bakeoff.2.3EvaluationOur internal evaluation has been focused on the accuracy of segmentation using our own segmentation standard. Our evaluation process includes large-scale bilingual regression testing for the Chinese-English system, as well as regression testing of the segmenter itself using a test database of over 5MB of test items. Two criteria are used:1.Overlapping Ambiguity Strings (OAS): thereference segmentation and the segmentersegmentation overlap for some string, e.g.,AB-C and A-BC. As shown below, thistypically indicates an error from oursegmenter.2.Covering Ambiguity Strings (CAS): the teststrings that cover the reference strings(CAS-T: ABC and AB-C), and the referencestrings that cover the test strings (CAS-R:AB-C and ABC). These cases arise mostlyfrom a difference between equally validsegmentation standards.No evaluation with other standards had been done before the Bakeoff.Test ReferenceType 崇文区 政府 崇文 区政府 OAS冰清玉洁 冰 清 玉 洁 CAS-T除夕之夜 除夕 之 夜 CAS-T擦泪 擦 泪 CAS-T精神 文明 精神文明 CAS-R1994 年 1994年 CAS-R不 怕 不怕 CAS-R3 Discussion of the Bakeoff3.1 Results SYSTRAN participated in the four open tracks in the First International Chinese Word Segmentation Bakeoff /bakeoff2003/. Each track corresponds to one corpus with its own word segementation standard. Each corpus had its own segmentation standard that was significantly different from the others. The training process included building a User Dictionary that contains words found in the training corpora, but not in the SYSTRAN dictionary. Although each of thesecorpora was segmented according to its ownstandard, we made a single UD containing all thewords gathered in all corpora.Although the ranking of the SYSTRANsegmenter is different in the four open tracks,SYSTRAN’s segmentation performance is quitecomparable across the four corpora. This is to be compared to the scores obtained by other participants, where good performance was typically obtained on one corpus only. SYSTRAN scores for the 4 tracks are shown in Table 3 (Sproat and Emerson, 2003).Track R P F R oov R iv AS o 0.915 0.894 0.904 0.426 0.926 CTB o 0.891 0.877 0.884 0.733 0.925 HK o 0.898 0.860 0.879 0.616 0.920 PK o 0.905 0.869 0.886 0.503 0.934Table 3. SYSTRAN’s Scores in the Bakeoff3.2 Discussions The segmentation differences between thereference corpora and SYSTRAN’s results are further analyzed. Table 4 shows the partition ofdivergences between OAS, CAS-T, and CAS-R strings:3 Total Same OAS CAS-T CAS-R AS o 11,985 10,970 76 448 491 CTB o 39,922 35,561 231 2,419 1,711 HK o 34,959 31,397 217 1,436 1,909 PK o 17,194 15,554 82 615 943 Table 4. Count of OAS and CAS DivergenceThe majority of OAS divergences show incorrect segmentation from SYSTRAN. However, differences in CAS do not necessarily indicate incorrect segmentation results. The reasons can be categorized as follows: a) different segmentation standards, b) unknown word problem, c) nameentity recognition problem, and d) miscellaneous 4. The distributions of the differences are furtheranalyzed in Table 5 and 6 for the AS o and PK o corpora, respectively.CAS-R: Unique Strings=334 (total=491)Type Count Percent Examples Different Standards 184 55% 感觉到 不能 第十三区 廿十五日 Unknown Words 116 35% 秋颱中菜哭骂 院庆 Name Entity 30 9% 川崎 津巴貝 台塑 Misc. 4 1% 一百余萬 CAS-T: Unique Strings=137 (total=448)Type Count Percent ExamplesDifferent Standard 134 98% 喝酒 出了名 喝不喝酒 True Covering 3 2% 都會 有為 Table 5. Distribution of Divergences in the AS o TrackCAS-R: Unique Strings=508 (total=943)Type Count Percent Examples Different Standards 294 58% 中共中央 这次本届 不要第一2001年Unknown Words 90 18% 攀岩雪浴 拥堵Name Entity 61 12% 奥佩蒂 福彩村 Misc. 63 12% 20% 3.9亿 CAS-T: Unique Strings=197 (total=615) Type Count Percent Examples Different Standards 194 98% 中国人 大吼不夜天赤着膊 True Covering 3 2% 高过 雪洗 Table 6. Distribution of Divergences in the PK o TrackThis analysis shows that the segmentation results are greatly impacted by the difference in the segmentation standards. Other problems include for example the encoding of numbers using single bytes instead of the standard double-byte encoding in the PKo corpus, which account for about 12% of differences in the PKo track scores.4ConclusionFor an open track segmentation competition like the Bakeoff, we need to achieve a balance between the following aspects:•Segmentation standards: differences between one’s own standard and the reference standard. •Adaptation to the other standards: whether one should adapt to other standards.•Dictionary coverage: the coverage of one’s own dictionary and the dictionary obtained bytraining.•Algorithm: combination of segmentation, unknown word identification, and name entityrecognition.•Speed: the time needed to segment the corpora. •Training: time and manpower used for training each corpus and trackFew systems participated in all open tracks: only SYSTRAN and one university participated in all four. We devoted about 2 person/week for this evaluation. We rank in the top three of three open tracks, and only the PK o track scores are lower, probably because of encoding problems for numbers for this corpus (we did not adjust our segmenter to cope with this corpus-specific problem). Our results are very consistent for all open tracks, indicating a very robust approach to Chinese segmentation.Analysis of results shows that SYSTRAN’s Chinese word segmentation excels in the area of dictionary coverage, robustness, and speed. The vast majority of divergences with the test corpora originate from differences in segmentation standards (over 55% for CAS-R and about 98% for CAS-T). True errors range between 0% and 2% only, the rest being assigned to either the lack of unknown word processing or the lack of a name entity recognizer. Although not integrated, the unknown word identification and name entity recognition are under development as part of a terminology extraction tool.For future Chinese word segmentation evaluations, some of the issues that arose in this Bakeoff would need to be addressed to obtain even more significant results, including word segmentation standards and encoding problems for example. We would also welcome the introduction of a surprise track, similar to the surprise track of the DARPA MT evaluations that would require participants to submit results within 24 hours on an unknown corpus.ReferencesLiu, Y, Tan Q. & Shen, X. 1993. Segmentation Standard for Modern Chinese Information Processingand Automatic Segmentation Methodology.Sproat, R., & Emerson T. 2003. The First International Chinese Word Segmentation Bakeoff. In the Proceedings of the Second SIGHAN Workshop on Chinese Language Processing. ACL03.1The word list only contained Chinese-English bilingual dictionary without any syntactic or semantic features. It also contained many compound nouns, e.g.北京大学.2Compound nouns are no longer considered as words. They were moved to the expression dictionary. For example, 北京大学has become 北京大学.3 The number of words in the reference strings is used when counting OAS and CAS divergences. For example, 除夕之夜’s CAS count is three because the number of words in the reference string 除夕之夜 is three.4 Word segmentation in SYSTRAN MT systems occurs after sentence identification and normalization. During word segmentation, Chinese numbers are converted into Arabic numbers.。