当前位置:文档之家› Long-term Recurrent Convolutional Networks for

Long-term Recurrent Convolutional Networks for

Long-term Recurrent Convolutional Networks for
Long-term Recurrent Convolutional Networks for

guage text)and can model complex temporal dynamics;yet

they can be optimized with backpropagation.Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual rep-resentations.Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately de?ned and/or optimized.

1.Introduction

Recognition and description of images and videos is a fundamental challenge of computer vision.Dramatic

progress has been achieved by supervised convolutional models on image recognition tasks,and a number of exten-sions to process video have been recently proposed.Ideally,

a video model should allow processing of variable length

input sequences,and also provide for variable length out-puts,including generation of full-length sentence descrip-tions that go beyond conventional one-versus-all prediction tasks.In this paper we propose long-term recurrent convo-lutional networks(LRCNs),a novel architecture for visual recognition and description which combines convolutional layers and long-range temporal recursion and is end-to-end trainable(see Figure1).We instantiate our architecture for speci?c video activity recognition,image caption genera-1

a r

tion,and video description tasks as described below.

To date,CNN models for video processing have success-fully considered learning of3-D spatio-temporal?lters over raw sequence data[13,2],and learning of frame-to-frame representations which incorporate instantaneous optic?ow or trajectory-based models aggregated over?xed windows or video shot segments[16,33].Such models explore two extrema of perceptual time-series representation learning: either learn a fully general time-varying weighting,or apply simple temporal pooling.Following the same inspiration that motivates current deep convolutional models,we advo-cate for video recognition and description models which are also deep over temporal dimensions;i.e.,have temporal re-currence of latent variables.RNN models are well known to be“deep in time”;e.g.,explicitly so when unrolled,and form implicit compositional representations in the time do-main.Such“deep”models predated deep spatial convolu-tion models in the literature[31,44].

Recurrent Neural Networks have long been explored in perceptual applications for many decades,with varying re-sults.A signi?cant limitation of simple RNN models which strictly integrate state information over time is known as the “vanishing gradient”effect:the ability to backpropogate an error signal through a long-range temporal interval becomes increasingly impossible in practice.A class of models which enable long-range learning was?rst proposed in[12], and augments hidden state with nonlinear mechanisms to cause state to propagate without modi?cation,be updated, or be reset,using simple memory-cell like neural gates. While this model proved useful for several tasks,its util-ity became apparent in recent results reporting large-scale learning of speech recognition[10]and language transla-tion models[38,5].

We show here that long-term recurrent convolutional models are generally applicable to visual time-series mod-eling;we argue that in visual tasks where static or?at tem-poral models have previously been employed,long-term RNNs can provide signi?cant improvement when ample training data are available to learn or re?ne the representa-tion.Speci?cally,we show LSTM-type models provide for improved recognition on conventional video activity chal-lenges and enable a novel end-to-end optimizable mapping from image pixels to sentence-level natural language de-scriptions.We also show that these models improve gen-eration of descriptions from intermediate visual representa-tions derived from conventional visual models.

We instantiate our proposed architecture in three experi-mental settings(see Figure3).First,we show that directly connecting a visual convolutional model to deep LSTM net-works,we are able to train video recognition models that capture complex temporal state dependencies(Figure3left; Section4).While existing labeled video activity datasets may not have actions or activities with extremely complex time dynamics,we nonetheless see improvements on the or-der of4%on conventional benchmarks.

Second,we explore direct end-to-end trainable image to sentence mappings.Strong results for machine translation tasks have recently been reported[38,5];such models are encoder/decoder pairs based on LSTM networks.We pro-pose a multimodal analog of this model,and describe an architecture which uses a visual convnet to encode a deep state vector,and an LSTM to decode the vector into an natu-ral language string(Figure3middle;Section5).The result-ing model can be trained end-to-end on large-scale image and text datasets,and even with modest training provides competitive generation results compared to existing meth-ods.

Finally,we show that LSTM decoders can be driven di-rectly from conventional computer vision methods which predict higher-level discriminative labels,such as the se-mantic video role tuple predictors in[30](Figure3right; Section6).While not end-to-end trainable,such models offer architectural and performance advantages over previ-ous statistical machine translation-based approaches,as re-ported below.

We have realized a generalized“LSTM”-style RNN model in the widely-adopted open source deep learning framework Caffe[14],incorporating the speci?c LSTM units of[46,38,5].

2.Background:Recurrent Neural Networks

(RNNs)

Traditional RNNs(Figure2,left)can learn complex tem-poral dynamics by mapping input sequences to a sequence of hidden states,and hidden states to outputs via the follow-ing recurrence equations(Figure2,left):

h t=g(W xh x t+W hh h t?1+b h)

z t=g(W hz h t+b z)

where g is an element-wise non-linearity,such as a sigmoid or hyperbolic tangent,x t is the input,h t∈R N is the hidden state with N hidden units,and y t is the output at time t. For a length T input sequence x1,x2,...,x T ,the updates above are computed sequentially as h1(letting h0=0),y1, h2,y2,...,h T,y T.

Though RNNs have proven successful on tasks such as speech recognition[42]and text generation[37],it can be dif?cult to train them to learn long-term dynamics,likely due in part to the vanishing and exploding gradients prob-lem[12]that can result from propagating the gradients down through the many layers of the recurrent network, each corresponding to a particular timestep.LSTMs pro-vide a solution by incorporating memory units that allow the network to learn when to forget previous hidden states and when to update hidden states given new information.

of the architecture described in[9],which was derived from the LSTM initially proposed in[12]).

In addition to a hidden unit h t∈R N,the LSTM in-cludes an input gate i t∈R N,forget gate f t∈R N,output gate o t∈R N,input modulation gate g t∈R N,and mem-ory cell c t∈R N.The memory cell unit c t is a summation of two things:the previous memory cell unit c t?1which is modulated by f t,and g t,a function of the current input and previous hidden state,modulated by the input gate i t. Because i t and f t are sigmoidal,their values lie within the range[0,1],and i t and f t can be thought of as knobs that the LSTM learns to selectively forget its previous memory or consider its current input.Likewise,the output gate o t learns how much of the memory cell to transfer to the hid-den state.These additional cells enable the LSTM to learn extremely complex and long-term temporal dynamics the RNN is not capable of learning.Additional depth can be added to LSTMs by stacking them on top of each other,us-ing the hidden state of the LSTM in layer l?1as the input to the LSTM in layer l.

Recently,LSTMs have achieved impressive results on language tasks such as speech recognition[10]and ma-chine translation[38,5].Analogous to CNNs,LSTMs are model that can learn to recognize and synthesize temporal dynamics for tasks involving sequential data(inputs or out-puts),visual,linsguistical or otherwise.Figure1depicts the core of our approach.Our LRCN model works by pass-ing each visual input v t(an image in isolation,or a frame from a video)through a feature transformationφV(v t) parametrized by V to produce a?xed-length vector rep-resentationφt∈R d.Having computed the feature-space representation of the visual input sequence φ1,φ2,...,φT , the sequence model then takes over.

In its most general form,a sequence model parametrized by W maps an input x t and a previous timestep hidden state h t?1to an output z t and updated hidden state h t.There-fore,inference must be run sequentially(i.e.,from top to bottom,in the Sequence Learning box of Figure1),by computing in order:h1=f W(x1,h0)=f W(x1,0),then h2=f W(x2,h1),etc.,up to h T.Some of our models stack multiple LSTMs atop one another as described in Section2.

The?nal step in predicting a distribution P(y t)at timestep t is to take a softmax over the outputs z t of the sequential model,producing a distribution over the(in our case,?nite and discrete)space C of possible per-timestep

outputs:

P(y t=c)=

exp(W zc z t,c+b c)

c ∈C

exp(W zc z t,c +b c)

The success of recent very deep models for object recog-nition[22,34,39]suggests that strategically composing many“layers”of non-linear functions can result in very powerful models for perceptual problems.For large T, the above recurrence indicates that the last few predictions from a recurrent network with T timesteps are computed by a very“deep”(T-layered)non-linear function,suggesting that the resulting recurrent model may have similar repre-sentational power to a T-layer neural network.Critically, however,the sequential model’s weights W are reused at every timestep,forcing the model to learn generic timestep-to-timestep dynamics(as opposed to dynamics directly con-ditioned on t,the sequence index)and preventing the pa-rameter size from growing in proportion to the maximum number of timesteps.

In most of our experiments,the visual feature transfor-mationφcorresponds to the activations in some layer of a large https://www.doczj.com/doc/6512170732.html,ing a visual transformationφV(.)which is time-invariant and independent at each timestep has the important advantage of making the expensive convolutional inference and training parallelizable over all timesteps of the input,facilitating the use of fast contemporary CNN im-plementations whose ef?ciency relies on independent batch processing,and end-to-end optimization of the visual and sequential model parameters V and W.

We consider three vision problems(activity recognition, image description and video description)which instantiate one of the following broad classes of sequential learning tasks:

1.Sequential inputs,?xed outputs(Figure3,left):

x1,x2,...,x T →y.The visual activity recognition problem can fall under this umbrella,with videos of arbitrary length T as input,but with the goal of pre-dicting a single label like running or jumping drawn from a?xed vocabulary.

2.Fixed inputs,sequential outputs(Figure3,middle):

x→ y1,y2,...,y T .The image description problem ?ts in this category,with a non-time-varying image as input,but a much larger and richer label space consist-ing of sentences of any length.

3.Sequential inputs and outputs(Figure3,right):

x1,x2,...,x T → y1,y2,...,y T .Finally,it’s easy to imagine tasks for which both the visual input and output are time-varying,and in general the number of input and output timesteps may differ(i.e.,we may have T=T ).In the video description task,for exam-ple,the input and output are both sequential,and the

number of frames in the video should not constrain the length of(number of words in)the natural-language description.

In the previously described formulation,each instance has T inputs x1,x2,...,x T and T outputs y1,y2,...,y T . We describe how we adapt this formulation in our hybrid model to tackle each of the above three problem settings. With sequential inputs and scalar outputs,we take a late fusion approach to merging the per-timestep predictions y1,y2,...,y T into a single prediction y for the full se-quence.With?xed-size inputs and sequential outputs,we simply duplicate the input x at all T timesteps x t:=x(not-ing this can be done cheaply due to the time-invariant vi-sual feature extractor).Finally,for a sequence-to-sequence problem with(in general)different input and output lengths, we take an“encoder-decoder”approach inspired by[46].In this approach,one sequence model,the encoder,is used to map the input sequence to a?xed-length vector,then an-other sequence model,the decoder,is used to unroll this vector to sequential outputs of arbitrary length.Under this model,the system as a whole may be thought of as having T+T timesteps of input and output,wherein the input is processed and the decoder outputs are ignored for the?rst T timesteps,and the predictions are made and“dummy”inputs are ignored for the latter T timesteps.

Under the proposed system,the weights(V,W)of the model’s visual and sequential components can be learned jointly by maximizing the likelihood of the ground truth outputs y t conditioned on the input data and labels up to that point(x1:t,y1:t?1)In particular,we minimize the negative log likelihood L(V,W)=?log P V,W(y t|x1:t,y1:t?1)of the training data(x,y).

One of the most appealing aspects of the described sys-tem is the ability to learn the parameters“end-to-end,”such that the parameters V of the visual feature extractor learn to pick out the aspects of the visual input that are rele-vant to the sequential classi?cation problem.We train our LRCN models using stochastic gradient descent with mo-mentum,with backpropagation used to compute the gradi-ent?L(V,W)of the objective L with respect to all param-eters(V,W).

We next demonstrate the power of models which are both deep in space and deep in time by exploring three appli-cations:activity recognition,image description,and video description.

4.Activity recognition

Activity recognition is an example of the?rst sequen-tial learning task described above;T individual frames are inputs into T convolutional networks which are then con-nected to a single-layer LSTM with256hidden units.A large body of recent work has proposed deep architectures

LSTM

C N N building the

front

Figure 3:Task-speci?c instantiations of our LRCN model for activity recognition,image description,and video description.

for activity recognition ([16,33,13,2,1]).[33,16]both propose convolutional networks which learn ?lters based on a stack of N input frames.Though we analyze clips of 16frames in this work,we note that the LRCN system is more ?exible than [33,16]since it is not constrained to analyz-ing ?xed length inputs and could potentially learn to rec-ognize complex video sequences (e.g .,cooking sequences as presented in 6).[1,2]use recurrent neural networks to learn temporal dynamics of either traditional vision features ([1])or deep features ([2]),but do not train their models end-to-end and do not pre-train on larger object recognition databases for important performance gains.

We explore two variants of the LRCN architecture:one in which the LSTM is placed after the ?rst fully connected layer of the CNN (LRCN-fc 6)and another in which the LSTM is placed after the second fully connected layer of the CNN (LRCN-fc 7).We train the LRCN networks with video clips of 16frames.The LRCN predicts the video class at each time step and we average these predictions for ?nal classi?cation.At test time,we extract 16frame clips with a stride of 8frames from each video and average across clips.We also consider both RGB and ?ow inputs.Flow is computed with [4]and transformed into a “?ow image”by centering x and y ?ow values around 128and mul-tiplying by a scalar such that ?ow values fall between 0and 255.A third channel for the ?ow image is created by calculating the ?ow magnitude.The CNN base of the LRCN is a hybrid of the Caffe [14]reference model,a mi-nor variant of AlexNet [22],and the network used by Zeiler &Fergus [47].The net is pre-trained on the 1.2M image ILSVRC-2012[32]classi?cation training subset of the Im-ageNet [7]dataset,giving the network a strong initialization to facilitate faster training and prevent over-?tting to the rel-atively small video datasets.When classifying center crops,

the top-1classi?cation accuracy is 60.2%and 57.4%for the hybrid and Caffe reference models,respectively.In our baseline model,T video frames are individually classi?ed by a CNN.As in the LSTM model,whole video classi?ca-tion is done by averaging scores across all video frames.

4.1.Evaluation

We evaluate our architecture on the UCF-101dataset [36]which consists of over 12,000videos categorized into 101human action classes.The dataset is split into three splits,with a little under 8,000videos in the training set for each split.We report accuracy for split-1.

Figure 1,columns 2-3,compare video classi?cation of our proposed models (LRCN-fc 6,LRCN-fc 7)against the baseline architecture for both RGB and ?ow inputs.Each LRCN network is trained end-to-end.To determine if end-to-end training is necessary,we also train a LRCN-fc 6network in which only the LSTM parameters are learned.The fully ?ne-tuned network increases performance from 70.47%to 71.12%,demonstrating that end-to-end ?ne-tuning is indeed bene?cial.The LRCN-fc 6network yields the best results for both RGB and ?ow and improves upon the baseline network by 2.12%and 4.75%respectively.RGB and ?ow networks can be combined by comput-ing a weighted average of network scores as proposed in [33].Like [33],we report two weighted averages of the predictions from the RGB and ?ow networks in Table 1(right).Since the ?ow network outperforms the RGB net-work,weighting the ?ow network higher unsurprisingly leads to better accuracy.In this case,LRCN outperforms the baseline single-frame model by 3.88%.

The LRCN shows clear improvement over the baseline single-frame system and approaches the accuracy achieved by other deep models.[33]report the results on UCF-101

Input Type Weighted Average Model RGB Flow1/2,1/21/3,2/3 Single frame65.4053.20––Single frame(ave.)69.0072.2075.7179.04 LRCN-fc671.1276.9581.9782.92 LRCN-fc770.6869.36––

Table1:Activity recognition:Comparing single frame models to LRCN networks for activity recognition in the UCF-101[36] dataset,with both RGB and?ow inputs.Our LRCN model con-sistently and strongly outperforms a model based on predictions from the underlying convolutional network architecture alone.

by computing a weighted average between?ow and RGB networks(86.4%for split1and87.6%averaging over all splits).Though[16]does not report numbers on the sepa-rate splits of UCF-101,the average split accuracy is65.4% which is substantially lower than our LRCN model.

5.Image description

In contrast to activity recognition,the static image de-scription task only requires a single convolutional network since the input consists of a single image.A variety of deep and multi-modal models[8,35,19,20,15,25,20,18]have been proposed for image description;in particular,[20,18] combine deep temporal models with convolutional repre-sentations.[20],utilizes a“vanilla”RNN as described in Section2,potentially making learning long-term tempo-ral dependencies dif?cult.Contemporaneous with and most similar to our work is[18],which proposes a different ar-chitecture that uses the hidden state of an LSTM encoder at time T as the encoded representation of the length T in-put sequence.It then maps this sequence representation, combined with the visual representation from a convnet, into a joint space from which a separate decoder predicts words.This is distinct from our arguably simpler architec-ture,which takes as per-timestep input a copy of the static input image,along with the previous word.We present empirical results showing that our integrated LRCN archi-tecture outperforms these prior approaches,none of which comprise an end-to-end optimizable system over a hierar-chy of visual and temporal parameters.

We now describe our instantiation of the LRCN architec-ture for the image description task.At each timestep,both the image features and the previous word are provided as in-puts to the sequential model,in this case a stack of LSTMs (each with1000hidden units),which is used to learn the dynamics of the time-varying output sequence,natural lan-guage.At timestep t,the input to the bottom-most LSTM is the embedded ground truth word from the previous timestep w t?1.For sentence generation,the input becomes a sample ?w t?1from the model’s predicted distribution at the previous timestep.The second LSTM in the stack fuses the outputs of the bottom-most LSTM with the image representation φV(x)to produce a joint representation of the visual and language inputs up to time t.(The visual modelφV(x)used in this experiment is the base Caffe[14]reference model, very similar to the well-known AlexNet[22],pre-trained on ILSVRC-2012[32]as in Section4.)Any further LSTMs in the stack transform the outputs of the LSTM below,and the fourth LSTM’s outputs are inputs to the softmax which produces a distribution over words p(w t|w1:t?1).

Following[19],we refer to the use of the bottom-most LSTM to exclusively process the language input(with no visual input)as the factored version of the model,and study the importance of this by comparing it to an unfactored vari-ant.See Figure4for details on the variants we study.

Without any explicit language modeling or de?ned syn-tax structure,the described LRCN system learns mappings from pixel intensity values to natural language descriptions that are often semantically descriptive and grammatically correct.

5.1.Evaluation

We evaluate our image description model on both image retrieval and image annotation generation.We?rst show the effectiveness of our model by quantitatively evaluating it on the image retrieval task proposed by[26]and seen in [25,15,35,8,18].Our model is trained on the combined training sets of the Flickr30k[28](28,000training images) and COCO2014[24]dataset(80,000training images).We report results on Flickr30k[28],with30,000images and ?ve sentence annotations per image.We use1000images each for test and validation and the remaining28,000for training.

Image retrieval results are recorded in Table2and re-port median rank,Med r,of the?rst retrieved ground truth image and Recall@K,the number of sentences for which the correct image is retrieved in the top-K.Our model consistently outperforms the strong baselines from recent work[18,25,15,35,8]as can be seen in Table2.Here, we make a note that the new OxfordNet model in[18]out-performs our model on the retrieval task.However,Ox-fordNet[18]utilizes a better-performing convolutional net-work to get the additional edge over the base ConvNet[18]. The strength of our temporal model(and integration of the temporal and visual models)can be more directly measured against the ConvNet[18]result,which uses the same base CNN architecture[22]pretrained on the same data.

In Table3,we report image-to-caption retrieval results for each of the architectural variants in Figure4,as well as a four-layer version(LRCN4f)of the factored model.Based on the facts that LRCN2f outperforms the LRCN4f model, and LRCN1u outperforms LRCN2u,there seems to be little to be gained from naively stacking additional LSTM layers atop an existing network.On the other hand,a compari-

of the LRCN architectures.See Figure4for diagrams of these ar-chitectures.The results indicate that the“factorization”is impor-tant to the LRCN’s retrieval performance,while simply stacking additional LSTM layers does not seem to improve performance. son of the LRCN2f and LRCN2u results indicatees that the “factorization”in the architecture is quite important to the model’s retrieval performance.

To evaluate sentence generation,we use the BLEU[27] metric which was designed for automated evaluation of sta-tistical machine translation.BLEU is a modi?ed form of precision that compares N-gram fragments of the hypothe-sis translation with multiple reference translations.We use BLEU as a measure of similarity of the descriptions.The unigram scores(B-1)account for the adequacy of(or the information retained)by the translation,while longer N-gram scores(B-2,B-3)account for the?uency.We com-pare our results with[25](on Flickr30k),and two strong reported in Table4.Additionally,we report results on the new COCO2014[24]dataset which has80,000training im-ages,and40,000validation images.Similar to Flickr30k, each image is annotated with5or more image annotations. We isolate5,000images from the validation set for testing purposes and the results are reported in Table4.

Based on the B-1scores in Table4,generation using LRCN performs comparably with m-RNN[25]in terms of the information conveyed in the description.Furthermore, LRCN signi?cantly outperforms the baselines and the m-RNN with regard to the?uency(B-2,B-3)of the genera-tion,indicating the LRCN retains more of the bigrams and trigrams from the human-annotated descriptions.

In addition to standard quantitative evaluations,we also employ Amazon Mechnical Turkers(AMT)to evaluate the generated sentences.Given an image and a set of descrip-tions from different models,we ask Turkers to rank the sentences based on correctness,grammar and relevance.

Correctness Grammar Relevance TreeTalk[23] 4.08 4.35 3.98 OxfordNet[18] 3.71 3.46 3.70 NN[18] 3.44 3.20 3.49 LRCN fc8(ours) 3.74 3.19 3.72 LRCN ft(ours) 3.47 3.01 3.50 Captions 2.55 3.72 2.59

Table5:Image description:Human evaluator rankings from1-6 (low is good)averaged for each method and criterion.We eval-uated on785Flickr images selected by the authors of[18]for the purposes of comparison against this similar contemporary ap-proach.

We compared sentences from our model to the ones made publicly available by[18].As seen in Table5,our?ne-tuned(ft)LRCN model performs on par with the Nearest Neighbour(NN)on correctness and relevance,and better on grammar.We show example sentence generations in Fig-ure6.

6.Video description

In video description we must generate a variable length stream of words,similar to Section5.[11,30,17,3,6,17, 40,41]propose methods for generating sentence descrip-tions for video,but to our knowledge we present the?rst application of deep models to the vision description task.

The LSTM framework allows us to model the video as a variable length input stream as discussed in Section3. However,due to limitations of available video description datasets we take a different path.We rely on more“tra-ditional”activity and video recognition processing for the input and use LSTMs for generating a sentence.

We?rst distinguish the following architectures for video description(see Figure5).For each architecture,we assume we have predictions of objects,subjects,and verbs present in the video from a CRF based on the full video input.In this way,we observe the video as whole at each time step, not incrementally frame by frame.

(a)LSTM encoder&decoder with CRF max.(Fig-ure5(a))The?rst architecture is motivated by the video de-scription approach presented in[30].They?rst recognize a semantic representation of the video using the maximum a posterior estimate(MAP)of a CRF taking in video features as unaries.This representation, e.g. person,cut,cutting board ,is then concatenated to a input sentence(person cut cutting board)which is translated to a natural sentence(a person cuts on the board)using phrase-based statistical ma-chine translation(SMT)[21].We replace the SMT with an LSTM,which has shown state-of-the-art performance for machine translation between languages[38,5].The archi-tecture(shown in Figure5(a))has an encoder LSTM(or-

Architecture Input BLEU SMT[30]CRF max24.9 SMT[29]CRF prob26.9

(a)LSTM Encoder-Decoder(ours)CRF max25.3

(b)LSTM Decoder(ours)CRF max27.4

(c)LSTM Decoder(ours)CRF prob28.8 Table6:Video description:Results on detailed description of TACoS multilevel[29],in%,see Section6for details.

ange)which encodes the one-hot vector(binary index vec-tor in a vocabulary)of the input sentence as done in[38]. This allows for variable-length inputs.(Note that the input sentence might have a different number of words than el-ements of the semantic representation.)At the end of the encoder stage,the?nal hidden unit must remember all nec-essary information before being input into the decoder stage (pink)in which the hidden representation is decoded into a sentence,one word at each time step.We use the same two-layer LSTM for encoding and decoding.

(b)LSTM decoder with CRF max.(Figure5(b))In this variant we exploit that the semantic representation can be encoded as a single?xed length vector.We provide the entire visual input representation at each time step to the LSTM,analogous to how an entire image is provided as an input to the LSTM in image description.

(c)LSTM decoder with CRF prob.(Figure5(c))A bene?t of using LSTMs for machine translation compared to phrase-based SMT[21]is that it can naturally incorpo-rate probability vectors during training and test time which allows the LSTM to learn uncertainties in visual generation rather than relying on MAP estimates.The architecture is the the same as in(b),but we replace max predictions with probability distributions.

6.1.Evaluation

We evaluate our approach on the TACoS multilevel [29]dataset,which has44,762video/sentence pairs(about 40,000for training/validation).We compare to[30]who use max prediction as well as a variant presented in[29] which takes CRF probabilities at test time and uses a word lattice to?nd an optimal sentence prediction.Since we use the max prediction as well as the probability scores pro-vided by[29],we have an identical visual representation.

[29]uses dense trajectories[43]and SIFT features as well as temporal context reasoning modeled in a CRF.

Table6shows the BLEU-4score.The results show that (1)the LSTM outperforms an SMT-based approach to video description;(2)the simpler decoder architecture(b)and(c) achieve better performance than(a),likely because the in-put does not need to be memorized;and(3)our approach achieves28.8%,clearly outperforming the best reported number of26.9%on TACoS multilevel by[29].

More broadly,these results show that our architecture

problems with time-varying visual input or sequential out-puts,which these methods are able to produce with little input preprocessing and no hand-designed features.

A female tennis player in action on the court.A group of young men playing a game of soccer A man riding a wave on top of a surf-board.

A baseball game in progress with the batter up to plate.A brown bear standing on top of a lush green ?eld.A person holding a cell phone in their hand.

A close up of a person brushing his teeth.

A woman laying on a bed in a bedroom.

A black and white cat is sitting on a chair.

A large clock mounted to the side of a building.A bunch of fruit that are sitting on a ta-ble.A toothbrush holder sitting on top of a white sink.

Figure 6:Image description:images with corresponding captions generated by our ?netuned LRCN model.These are images 1-12of our randomly chosen validation set from COCO 2014[24](see Figure 7for images 13-24).We used beam search with a beam size of 5to generate the sentences,and display the top (highest likelihood)result above.

A close up of a hot dog on a bun.

A boat on a river with a bridge in the background.

A bath room with a toilet and a bath tub.

A man that is standing in the dirt with a bat.A white toilet sitting in a bathroom next to a trash can.Black and white photograph of a woman sitting on a bench.

A group of people walking down a street next to a traf?c light.An elephant standing in a grassy area with tree in the background.A close up of a plate of food with broc-coli.

A bus parked on the side of a street next to a building.A group of people standing around a ta-ble.A vase ?lled with ?ower sitting on a ta-ble.

Figure 7:Image description:images 13-24(and LRCN-generated captions)from the set described in Figure 6.

Acknowledgements

The authors thank Oriol Vinyals for valuable advice and helpful discussion throughout this work.This work was supported in part by DARPA’s MSEE and SMISC pro-grams,NSF awards IIS-1427425,and IIS-1212798,IIS-1116411,Toyota,and the Berkeley Vision and Learning Center.Marcus Rohrbach was supported by a fellowship within the FITweltweit-Program of the German Academic Exchange Service(DAAD).

References

[1]M.Baccouche, F.Mamalet, C.Wolf, C.Garcia,and

A.Baskurt.Action classi?cation in soccer videos with long

short-term memory recurrent neural networks.In ICANN.

2010.5

[2]M.Baccouche, F.Mamalet, C.Wolf, C.Garcia,and

A.Baskurt.Sequential deep learning for human action

recognition.In Human Behavior Understanding.2011.2, 5

[3] A.Barbu,A.Bridge,Z.Burchill,D.Coroian,S.Dickin-

son,S.Fidler,A.Michaux,S.Mussman,S.Narayanaswamy,

D.Salvi,L.Schmidt,J.Shangguan,J.M.Siskind,J.Wag-

goner,S.Wang,J.Wei,Y.Yin,and Z.Zhang.Video in sen-tences out.In UAI,2012.8

[4]T.Brox,A.Bruhn,N.Papenberg,and J.Weickert.High ac-

curacy optical?ow estimation based on a theory for warping.

In ECCV.2004.5

[5]K.Cho,B.van Merri¨e nboer,D.Bahdanau,and Y.Bengio.

On the properties of neural machine translation:Encoder-decoder approaches.arXiv preprint arXiv:1409.1259,2014.

2,3,8

[6]P.Das,C.Xu,R.Doell,and J.Corso.Thousand frames

in just a few words:Lingual description of videos through latent topics and sparse object stitching.In CVPR,2013.8 [7]J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,and L.Fei-

Fei.ImageNet:A large-scale hierarchical image database.

In CVPR,2009.5

[8] A.Frome,G.Corrado,J.Shlens,S.Bengio,J.Dean,M.Ran-

zato,and T.Mikolov.DeViSE:A deep visual-semantic em-bedding model.In NIPS,2013.6,7

[9] A.Graves.Generating sequences with recurrent neural net-

works.arXiv preprint arXiv:1308.0850,2013.3

[10] A.Graves and N.Jaitly.Towards end-to-end speech recog-

nition with recurrent neural networks.In ICML,2014.2, 3

[11]S.Guadarrama,N.Krishnamoorthy,G.Malkarnenkar,

S.Venugopalan,R.Mooney,T.Darrell,and K.Saenko.

YouTube2Text:Recognizing and describing arbitrary activ-ities using semantic hierarchies and zero-shoot recognition.

In ICCV,2013.8

[12]S.Hochreiter and J.Schmidhuber.Long short-term memory.

Neural Computation,1997.2,3

[13]S.Ji,W.Xu,M.Yang,and K.Yu.3D convolutional neu-

ral networks for human action recognition.Pattern Analysis and Machine Intelligence,IEEE Transactions on,35(1):221–231,2013.2,5[14]Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,R.Gir-

shick,S.Guadarrama,and T.Darrell.Caffe:Convolutional architecture for fast feature embedding.In ACM MM,2014.

2,5,6

[15] A.Karpathy,A.Joulin,and L.Fei-Fei.Deep fragment em-

beddings for bidirectional image sentence mapping.NIPS, 2014.6,7

[16] A.Karpathy,G.Toderici,S.Shetty,T.Leung,R.Sukthankar,

and https://www.doczj.com/doc/6512170732.html,rge-scale video classi?cation with convo-lutional neural networks.In CVPR,2014.2,5,6

[17]M.U.G.Khan,L.Zhang,and Y.Gotoh.Human focused

video description.In Proceedings of the IEEE International Conference on Computer Vision Workshops(ICCV Work-shops),2011.8

[18]R.Kiros,R.Salakhuditnov,and R.S.Zemel.Unifying

visual-semantic embeddings with multimodal neural lan-guage models.arXiv preprint arXiv:1411.2539,2014.6, 7,8

[19]R.Kiros,R.Salakhutdinov,and R.Zemel.Multimodal neu-

ral language models.In ICML,2014.6,7

[20]R.Kiros,R.Zemel,and R.Salakhutdinov.Multimodal neu-

ral language models.In Proc.NIPS Deep Learning Work-shop,2013.6

[21]P.Koehn,H.Hoang,A.Birch,C.Callison-Burch,M.Fed-

erico,N.Bertoldi,B.Cowan,W.Shen,C.Moran,R.Zens,

C.Dyer,O.Bojar,A.Constantin,and E.Herbst.Moses:

Open source toolkit for statistical machine translation.In ACL,2007.8

[22] A.Krizhevsky,I.Sutskever,and G.E.Hinton.ImageNet

classi?cation with deep convolutional neural networks.In NIPS,2012.4,5,6

[23]P.Kuznetsova,V.Ordonez,T.L.Berg,U.C.Hill,and

Y.Choi.Treetalk:Composition and compression of trees for image descriptions.Transactions of the Association for Computational Linguistics,2(10):351–362,2014.8 [24]T.-Y.Lin,M.Maire,S.Belongie,J.Hays,P.Perona,D.Ra-

manan,P.Doll′a r,and C.L.Zitnick.Microsoft coco:Com-mon objects in context.arXiv preprint arXiv:1405.0312, 2014.6,7,10

[25]J.Mao,W.Xu,Y.Yang,J.Wang,and A.L.Yuille.Explain

images with multimodal recurrent neural networks.arXiv preprint arXiv:1410.1090,2014.6,7

[26]P.Y.Micah Hodosh and J.Hockenmaier.Framing image

description as a ranking task:Data,models and evaluation metrics.JAIR,47:853–899,2013.6

[27]K.Papineni,S.Roukos,T.Ward,and W.-J.Zhu.BLEU:a

method for automatic evaluation of machine translation.In ACL,2002.7

[28]M.H.Peter Young,Alice Lai and J.Hockenmaier.From im-

age descriptions to visual denotations:New similarity met-rics for semantic inference over event descriptions.TACL, 2:67–78,2014.6,7

[29] A.Rohrbach,M.Rohrbach,W.Qiu,A.Friedrich,M.Pinkal,

and B.Schiele.Coherent multi-sentence video description with variable level of detail.In GCPR,2014.8

[30]M.Rohrbach,W.Qiu,I.Titov,S.Thater,M.Pinkal,and

B.Schiele.Translating video content to natural language

descriptions.In ICCV,2013.2,8

[31] D.E.Rumelhart,G.E.Hinton,and R.J.Williams.Learn-

ing internal representations by error propagation.Technical report,DTIC Document,1985.2

[32]O.Russakovsky,J.Deng,H.Su,J.Krause,S.Satheesh,

S.Ma,Z.Huang,A.Karpathy,A.Khosla,M.Bernstein,

A.C.Berg,and L.Fei-Fei.ImageNet Large Scale Visual

Recognition Challenge,2014.5,6

[33]K.Simonyan and A.Zisserman.Two-stream convolutional

networks for action recognition in videos.arXiv preprint arXiv:1406.2199,2014.2,5,6

[34]K.Simonyan and A.Zisserman.Very deep convolutional

networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,2014.4

[35]R.Socher,Q.Le,C.Manning,and A.Ng.Grounded com-

positional semantics for?nding and describing images with sentences.In NIPS Deep Learning Workshop,2013.6,7 [36]K.Soomro,A.R.Zamir,and M.Shah.UCF101:A dataset

of101human actions classes from videos in the wild.arXiv preprint arXiv:1212.0402,2012.5,6

[37]I.Sutskever,J.Martens,and G.E.Hinton.Generating text

with recurrent neural networks.In ICML,2011.2

[38]I.Sutskever,O.Vinyals,and Q.V.Le.Sequence to sequence

learning with neural networks.In NIPS,2014.2,3,8

[39] C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,

D.Anguelov, D.Erhan,V.Vanhoucke,and A.Rabi-

novich.Going deeper with convolutions.arXiv preprint arXiv:1409.4842,2014.4

[40] C.C.Tan,Y.-G.Jiang,and C.-W.Ngo.Towards textually

describing complex video contents with audio-visual concept classi?ers.In MM,2011.8

[41]J.Thomason,S.Venugopalan,S.Guadarrama,K.Saenko,

and R.J.Mooney.Integrating language and vision to gen-erate natural language descriptions of videos in the wild.In COLING,2014.8

[42]O.Vinyals,S.V.Ravuri,and D.Povey.Revisiting recurrent

neural networks for robust ASR.In ICASSP,2012.2 [43]H.Wang,A.Kl¨a ser,C.Schmid,and C.Liu.Dense trajecto-

ries and motion boundary descriptors for action recognition.

IJCV,2013.8

[44]R.J.Williams and D.Zipser.A learning algorithm for con-

tinually running fully recurrent neural networks.Neural Computation,1989.2

[45]W.Zaremba and I.Sutskever.Learning to execute.arXiv

preprint arXiv:1410.4615,2014.3

[46]W.Zaremba,I.Sutskever,and O.Vinyals.Recurrent neu-

ral network regularization.arXiv preprint arXiv:1409.2329, 2014.2,4

[47]M.D.Zeiler and R.Fergus.Visualizing and understanding

convolutional networks.In ECCV.2014.5

员工管理信息系统的设计与实现

计算机科学与工程学院 课程设计报告 题目全称:员工管理信息系统的设计与实现—岗位与薪金信息管理 学生学号:2606005011姓名:李伟德 指导老师:刘勇国职称:副教授 指导老师评语: 签字: 课程设计成绩: 设计过程表现设计报告质量总分

一、实验室名称:计算机学院软件实验室 二、实验项目名称:员工管理信息系统的设计与实现—岗位与薪 金信息管理 三、实验学时:32 四、实验原理: 员工管理信息系统是由员工管理,部门管理,岗位管理以及薪金管理四部分组成。系统前台采用Visual Stdio 2005 工具开发而成,开发语言是C#程序设计语言,主要是因为C#是微软为.NET平台量身定做的编程语言,它是一种现代面向对象程序设计语言,使程序员能够快速地在.NET平台上开发种类丰富的应用程序,它继承了C++和Java的语法,去掉了C++中的许多复杂和容易引起问题的东西,是由C和C++发展而来的一种“简单、高效、面向对象、类型安全”的程序设计语言,其综合了Visual Basic的高效率和C++的强大功能。 系统后台的数据库采用Miscrosoft Access 2003数据库,主要依据是考虑到系统的数据规模并不大,如果用SQL Server 2005等数据库会造成浪费,而且维护起来比较难。而Access数据库是一个轻量级的数据库,其具有简单,方便的特性,已经满足我们的需求。 五、实验目的: 1.使学生掌握数据库的实现原理,了解SQL的查询命令,并能在实践中使用。 2.使学生学会使用C#语言进行程序设计,了解Vistual Stdio 2005 的开发工具的原理, 并设计出实际可行的项目。 3.加强学生的动手能力,把课堂上学到得东西,融入到实际的项目,达到学以致用的目的。 4.锻炼学生的思维能力,使学生能够领略计算机编程的实现方法,达到举一反三的效果。 六、实验内容: 在员工信息管理系统中完成“岗位”和“薪金”信息管理功能。 岗位信息管理功能包括: 1. 添加岗位:可以添加岗位名称,岗位描述等信息。 2. 删除岗位:可以删除岗位名称,岗位描述等信息。 3. 修改岗位:可以修改指定岗位的岗位名称,岗位描述等信息。 4. 查询岗位:可以查询指定岗位的岗位名称,岗位描述等信息。 薪金信息管理功能包括: 1. 添加员工薪金信息:可以添加员工姓名,月份,备注,薪金等信息。 2. 删除员工薪金信息:可以删除指定员工的姓名,月份,备注,薪金等信息。 3. 修改员工薪金信息:可以修改指定员工的姓名,月份,备注,薪金等信息。 4. 查询员工薪金信息:可以查询指定员工的薪金等信息。 七、实验器材(设备、元器件): 1.一台Windows XP平台或以上的PC机;

企业员工信息管理系统

学术活动 企业员工信息管理系统

比赛规则 一、题目:企业员工信息管理系统 二、目的和要求 目的: 1.熟练使用函数 2.熟练使用结构体 3.熟练使用流、文件流 4.熟练使用数组 5.熟练使用循环与选择 6.熟练使用链表和指针 7.熟练应用C语言综合知识 要求及评分参考: 设计一个企业员工信息管理系统,使之能提供以下功能: 1、应提供一个界面来调用各个功能,调用界面和各个功能的操作界面 应尽可能清晰美观! 2、输入功能:职工信息录入(职工信息用文件保存),可以一次完成 若干条记录的输入。 3、浏览功能:完成对全部职工信息的显示。 4、查找功能:①完成按职工的职工号查询职工的相关信息,并显示。 ②完成按职工的学历查询职工的相关信息,并显示。 ③完成按职工的查询职工的相关信息,并显示。 5、删除功能:通过输入职工的完成对该名职工的信息进行删除。 6、添加功能:完成添加新职工的信息的任务。 7、修改功能:通过输入职工的完成对该名职工的信息进行修改。 8、退出职工信息管理系统。 三、信息描述 职工信息包括职工号、、性别、年龄、学历、工资、住址、等。 四、解决方案 1、首先进行需求分析,搞清楚系统功能和任务; 2、然后在设计中确定模块结构、划分功能模块,将软件功能需求分配给所

划分的最单元模块(即那些函数来完成哪些模块,模块如何划分给组员)。确定模块间的联系,确定数据结构、文件结构、确定测试方法与策略; 3、确定每一模块采用的数据结构和模块接口的细节,包括对系统外部的接口和用户界面及算法,对系统部其他模块的接口; 4、根据分析编写C语言代码。 五、进度安排 设计时间为5个工作日,每组分为6个人(包括一个组长),并分为五个阶段完成: 1、分析设计阶段。在老师的指导下自主学习和钻研问题,组员之间讨论, 明确设计要求,找出现实方法。按照需求分析、功能划分、详细设计步骤 等几个步骤进行。这一阶段前1天完成,也作为每组的评分标准; 2、编码阶段。根据设计分析方案组员开始编写C语言代码,然后调试该代 码,实现课题要求的功能。这一阶段在2-4天完成,这阶段有组员之间分配任务,分工合作完成(注意,代码中必须使用指针、链表来操作数据,必须把数据使用文件流保存到文件中,以此作为评分标准,如果算法优秀 有加分); 3、总结报告阶段。总结设计工作,写出课程设计说明书,包括需求分析、 总体设计、详细设计、编码、测试的步骤和容。这一阶段在5天完成; 4、考核评分阶段。 (此页不能提交给学生,只作为评分的参考)

员工管理系统毕业_设计论文

目录 三正文 (4) 3.1课程设计的目的与要求 (4) 3.2系统分析 (4) 3.2.1系统开发背景、开发范围、建设目标与必要性 (4) 3.2.2 业务流程分析 (4) 3.2.3数据字典 (6) 3.2.4处理逻辑的定义 (6) 3.3 系统设计 (8) 3.3.1功能设计 (8) 3.3.2系统运行环境 (9) 3.4系统实施 (10) 3.4.1程序代码 (10) 3.4.2运行结果 (29) 四课程设计总结或结论35 五参考文献 35

三、正文 3.1课程设计的目的与要求 通过管理信息系统课程设计,进一步掌握管理信息系统的理论和方法。培养和锻炼开发管理信息系统的能力。为今后信息系统开发打下良好的基础。 要求了解企业管理信息系统开发的全过程,理解信息系统课程相关的概念,掌握管理信息系统的开发方法。主要包括:系统调研方法;业务分析、数据分析、系统逻辑模型设计方法;数据库设计、功能设计、物理模型设计方法;系统的实现等方法。完成一个小型系统的设计与开发。 3.2 系统分析 3.2.1系统开发背景、开发范围、建设目标与必要性 随着本世纪以来科学技术的突飞猛进和社会生产力的迅速发展,人们进行信息交流的深度与广度不断增加,信息量急剧增长,传统的信息处理与决策的手段已不能适应社会的需要,信息的重要性和信息处理问题的紧迫性空前提高了,面对着日益复杂和不断发展,变化的社会环境,特别是企业间日趋剧烈的竞争形势,一个人、一个企业要在现代社会中求生存,求发展,必须具备足够的信息和强有力的信息收集与处理手段。电子计算机以强大的信息处理能力作为人类脑力劳动的有力助手登上历史舞台以后,出现了把人类从繁琐的脑力劳动下解放出来的现代信息革命。 为了适应现代企业或公司经营发展的需要,为提高企业工作效率、保证企业职工信息管理质量、快而准确地为企业制定好的经营方针与决策,我们有必要开发一个职工信息管理系统。 3.2.2业务流程分析 现行管理系统的业务流程图 ………

#员工管理信息系统的设计与实现

计算机科学和工程学院 课程设计报告 题目全称:员工管理信息系统的设计和实现—岗位和薪金信息管理 学生学号:2606005011姓名:李伟德 指导老师:刘勇国职称:副教授 指导老师评语: 签字: 课程设计成绩: 设计过程表现设计报告质量总分 一、实验室名称:计算机学院软件实验室 二、实验项目名称:员工管理信息系统的设计和实现—岗位和薪 金信息管理 三、实验学时:32 四、实验原理: 员工管理信息系统是由员工管理,部门管理,岗位管理以及薪金管理四部分组成。系统前台采用Visual Stdio 2005 工具开发而成,开发语言是C#程序设计语言,主要是因为C#是微软为.NET平台量身定做的编程语言,它是一种现代面向对象程序设计语言,使程序员能够快速地在.NET平台上开发种类丰富的使用程序,它继承了C++和Java的语法,去掉了C++中的许多复杂和容易引起问题的东西,是由C和C++发展而来的一种“简单、高效、面向对象、类型安全”的程序设计语言,其综合了Visual Basic的高效率和C++的强大功能。 系统后台的数据库采用Miscrosoft Access 2003数据库,主要依据是考虑到系统的数据规模并不大,如果用SQL Server 2005等数据库会造成浪费,而且维护起来比较难。而Access数据库是一个轻量级的数据库,其具有简单,方便的特性,已经满足我们的需求。 五、实验目的: 1.使学生掌握数据库的实现原理,了解SQL的查询命令,并能在实践中使用。

2.使学生学会使用C#语言进行程序设计,了解Vistual Stdio 2005 的开发工具的原理, 并设计出实际可行的项目。 3.加强学生的动手能力,把课堂上学到得东西,融入到实际的项目,达到学以致用的目的。 4.锻炼学生的思维能力,使学生能够领略计算机编程的实现方法,达到举一反三的效果。 六、实验内容: 在员工信息管理系统中完成“岗位”和“薪金”信息管理功能。 岗位信息管理功能包括: 1. 添加岗位:可以添加岗位名称,岗位描述等信息。 2. 删除岗位:可以删除岗位名称,岗位描述等信息。 3. 修改岗位:可以修改指定岗位的岗位名称,岗位描述等信息。 4. 查询岗位:可以查询指定岗位的岗位名称,岗位描述等信息。 薪金信息管理功能包括: 1. 添加员工薪金信息:可以添加员工姓名,月份,备注,薪金等信息。 2. 删除员工薪金信息:可以删除指定员工的姓名,月份,备注,薪金等信息。 3. 修改员工薪金信息:可以修改指定员工的姓名,月份,备注,薪金等信息。 4. 查询员工薪金信息:可以查询指定员工的薪金等信息。 七、实验器材(设备、元器件): 1.一台Windows XP平台或以上的PC机; 2.Vistual Stdio 2005开发软件及Microsoft ACCESS2003数据库软件; 八、实验步骤: 1、设计系统结构组成 系统提供了一套员工综合信息管理平台,使得系统管理人员对公司的岗位进行分类,进而确定各个岗位所对应的部门信息,在已有部门信息的基础上能够对所有员工信息进行分类管理。主要功能有:岗位设置、员工个人信息管理、员工所属部门信息管理、员工薪金信息管理。 系统模块设计划分如下: 员工薪金信息模块:可以删除、添加、修改和查询员工薪金信息; 岗位设置模块:可以删除、添加、修改和查询岗位; 它们之间既是相互联系同时又是彼此独立的,整个框架结构如图1所示。

企业员工信息管理系统

本科毕业设计说明书 企业员工信息管理系统的设计与实现EMPLOYEE INFORMATION MANAGEMENT SYSTEM DESIGN AND IMPLEMENTATION 学院(部): 专业班级: 学生姓名: 指导教师: 2012年5月25日

企业员工信息管理系统的设计与实现 摘要 现今互联网发展越来越迅速,给人们的工作和生活带来了极大的便利和高效,信息化,电子化已经成为节约运营成本,提高工作效率的首选。因此在信息化科技飞速发展的今天,借助于电脑,通过员工信息管理系统管理各部门职工,能为企业人力资源的管理者提供人性化的服务。同时也能为企业的员工提供一定的方便。 本系统具有多方面特点:系统功能完备,用户使用方便简捷,人性化的用户界面,安全保密设置齐全,大大减少了操作人员和用户的工作负担,提高了企业员工信息管理的工作效率和企业的信息化的水平。 本论文从员工信息管理系统的初步调查开始,详细介绍员工信息管系统的需求分析和数据流程分析,并进行了系统总体结构设计、数据结构和数据库设计、输入输出设计等。 关键词:J2EE,Mysql,struts2,企业员工信息管理

EMPLOYEE INFORMATION MANAGEMENT SYSTEM DESIGN AND IMPLEMENTATION ABSTRACT Nowadays, the Internet development is fast, bringing people's work and life tremendous convenience with efficiently.Therefore, the rapid development of technology of information technology today, through the use of computers, employee information management system to manage the various departments and workers, to provide personalized service for corporate human resources managers.Also provides a convenience for the employees of the enterprise. This system has a various characteristics:The system function is complete, using conveniently, the customer interface humanization, the safety keeps secret a constitution well-found, reduced an operation the work of the personnel and customer burden consumedly.Raise the work efficiency of the business enterprise information management and the information-based level of the business enterprise. Papers from personnel management information system, the preliminary survey began detailed introduction of human resource management information system requirements analysis, and data flow analysis, and a system overall structure design, data structure and database design, input/output design, etc. KEYWORDS:J2EE, Mysql,struts2,Employee information management

职工信息管理系统

职工信息管理系统 1.可行性分析 在当今社会,互联网的发展,给人们的工作和生活带来了极大的便利和高效,信息化,电子化已经成为节约运营成本,提高工作效率的首选。 当前大量企业的员工管理尚处于手工作业阶段,不但效率低下,还常常因为管理的不慎而出现纰漏。因此部分企业需求,设计企业员工信息管理系统,以帮助企业达到员工管理办公自动化、节约管理成本、提高企业工作效率的目的。员工信息管理系统主要对企业员工的信息进行集中管理,方便企业建立一个完善的、强大的员工信息数据库,它是以SQL2000数据库作为开发平台,使用java编写程序、完成数据输入、修改、存储、调用查询等功能。并使用SQL 2000数据库形成数据,进行数据存储。本项目开发计划旨在明确规范开发过程,保证项目质量,统一小组成员对项目的理解,并对其开发工作提供指导;同时还作为项目通过评审的依据。并说明该软件开发项目的实现在技术上、经济上和社会因素上的可行性,评述为了合理地达到开发目标可供选择的各种可能实施方案,说明并论证所选定实施方案的理由。 1.1 技术可行性 根据用户提出的系统功能、性能及实现系统的各项约束条件,根据新系统目标,来衡量所需技术是否具备。本系统主要采用数据库管理方法,服务器选用MySQL Server 数据库,他是它是目前能处理所有中小型系统最方便的流行数据库,它能够处理大量数据,同时保持数据的完整性并提供许多高级管理功能。它的灵活性、安全性和易用性为数据库编程提供了良好的条件。硬件方面,该系统短小精悍对赢家没有太大要求,只要能够运行windows操作系统就可以很好的运行该软件。 1.2操作可行性 由系统分系可以看出本系统的开发在技术上具有可行性。首先系统对于服务器端和客户端所要求的软、硬件的最低配置现在大多数的用户用机都能达到。本系统对管理人员和用户没有任何的特殊要求,实际操作基本上以鼠标操作为主并辅以少量的键盘操作,操作方式很方便。因此该项目具有良好的易用性。用户只要具备简单的应用计算机的能力无论学历,无论背景,均可以使用本系统,用户界面上的按钮的功能明确,用户一看就可以了解怎么使用本系统,以及本系统能够完成的功能,因此本系统在操作上是可行的。 1.3经济可行性 估算新系统的成本效益分析,其中包括估计项目开发的成本,开发费用和今后的运行、维护费用,估计新系统将获得的效益,估算开发成本是否回高于项目预期的全部经费。并且,分析系统开发是否会对其它产品或利润带来一定影响。本系统作为一个课程设计,没有必要考虑维护费用,以及本系统可获得的效益等问题。 1.4法律及社会效益方面的可行性

N阱硅栅CMOS集成电路制造工艺流程

N 阱硅栅CMOS 集成电路制造工艺的主要流程 (1)生长一层SiO 2。 (2)在SiO 2上涂光刻胶,光刻N 阱掺杂窗口(一次光刻)。 (3)用HF 刻蚀窗口处的SiO 2,去胶。 (4)在窗口处注入N 型杂质。 (5)形成N 阱,去除硅片上的SiO 2。 (6)生长一层SiO 2,再生长一层Si 3N 4。光刻场区(二次光刻),刻蚀场区的Si 3N 4,去胶。由 于Si 3N 4和Si 之间的应力较大,而SiO 2与Si 和Si 3N 4之间的应力较小,所以用SiO 2作为过渡层。 (7)生长场区SiO 2(场氧)。CMOS 工艺之所以不象NMOS 工艺那样直接生长场氧,一是因为 CMOS 工艺比NMOS 工艺出现得晚,更先进;二是因为生长场氧时间很长,会消耗很多硅,这样会使有源区边缘产生很高的台阶,给以后台阶覆盖带来困难,台阶太高会产生覆盖死角。 (8)去除Si 3N 4和有源区处的SiO 2。 (9)重新生长一层薄薄的SiO 2(栅氧)。 (10)生长一层多晶硅。 (11)光刻多晶硅栅极(三次光刻)。 (12)刻蚀栅极以外的多晶硅,去胶。 (13)光刻P +离子注入窗口(四次光刻),刻蚀窗口处的SiO 2,去胶。在窗口处注入P 型杂质, 形成PMOS 的源漏区和衬底欧姆接触。生长SiO 2。 (14)光刻N +离子注入窗口(五次光刻),刻蚀窗口处的SiO 2,去胶。在窗口处注入N 型杂质, 形成NMOS 的源漏区和阱欧姆接触。 (15)生长一层SiO 2。 (16)光刻接触孔(六次光刻),刻蚀接触孔处的SiO 2,去胶。 (17)生长一层金属,光刻金属引线(七次光刻)。 (18)刻蚀引线外的金属,去胶。 (19)淀积钝化层。

员工信息管理系统

摘要 企业员工信息管理就是企业管理中的一个重要内容,随着时代的进步,企业人员数量也不断增加、分工的不断细化、各个行业间联系的不断密切,对人事管理的要求也不断提高。如何管理好企业内部员工的信息,对员工实施高效的宏观管理,对企事业单位的人员构成情况与发展趋势进行统计、规划、分析并制定切实可行的人事政策,就是一项繁重而艰巨的任务。 本文以企业管理信息系统建设中的员工信息管理子系统的设计与实现为目标,利用软件工程中系统开发的原理与方法,详细论述了系统的设计方案、开发、测试等过程。系统在Linux平台下,以Oracle10g为后台数据库管理系统,实现了以员工信息管理,考勤信息管理、工资管理及综合查询模块为主要功能的系统开发。为企业人事管理提供信息咨询,信息检索,信息存取等服务,系统的实施基本上能够满足现代企业对人事管理的需要,为提高企业人事管理效率提供了行之有效的手段。6 第一章 职工管理的背景 借助现代信息技术与管理理论,建立员工管理信息系统就是当今社会的重要趋势。党与政府根据知识经济时代的特点,对国民经济建设提出了“用信息化带动工业化”的指导思想。对企业而言,全面开发与应用计算机管理信息系统就就是近期不能回避的问题。在企业管理中,人力资源就是企业最宝贵的资源,也就是企业的“生命线”,因此职工管理就是企业的计算机管理信息系统重要组成部分。而职工管理又就是人力资源管理的重中之重。实行电子化的职工管理,可以让人力资源管理人员从繁重琐碎的案头工作解脱出来,去完成更重要的工作。职工管理信息系统的实现可以减轻比较繁琐的手工职工管理。 现在应用在大中型企业的管理信息系统中,几乎都包括了职工管理模块。有些环境中就是由作为大型ERP软件中的一个模块引进的,有些就是作为企业的财务系统的一部分。这些根据规范的西方的管理制度设计的职工管理软件,在很多时候还不能完全解决中国特色的中小企业的问题,所以我们设计了一个简单的职工管理系统,为这些具有中国特色的中小企业解决她们在职工管理方面的问题。

职工信息管理系统

职工信息管理系统

职工信息管理系统 摘要 随着计算机的飞速发展,它的应用已经十分广泛,它在人们的生产、生活、工作和学习中发挥着重要的作用。例如一个现代化的公司,拥有数千名的员工,那么如何管理这么庞大的职工信息档案呢?这时,开发一个功能完善的职工信息管理系统就必不可少了。本文介绍了在https://www.doczj.com/doc/6512170732.html,框架下采用“自上而下地总体规划,自下而上地应用开发”的策略开发本系统的详细过程,提出了实现职工信息、部门信息查询、管理、更新的基本目标并阐述系统结构设计和功能设计从软件工程的角度进行了科学而严谨的阐述。从职工信息的查询到管理实现了自动化的模式,从而提高了工作效率。 本系统采用了B/S模式的结构设计,为企业的人事部门提供了一套操作简易、应用广泛、扩展性强的人事管理系统。可以对企业内部的员工管理更加方便。这种采用网络管理的好处是可以对企业的众多

员工的信息进行动态的管理,修改、添加和删除都非常方便,不必再像原来准备一个巨大的档案库,在诸多文挡中查找资料,减少了这些重要工作出错的可能性。 本文通过作者设计和开发一个中小型职工信息管理系统的实践,阐述了人事管理软件中所应具有的功能及其设计与实现。主要有以下三个方面的功能:1.职工和部门信息的查询;2.职工和部门信息的管理(包括添加、删除和修改)3.评出每个月工作成绩最优秀的职工。 关键词:职工信息管理,https://www.doczj.com/doc/6512170732.html,,B/S

Abstract With the development at full speed of computer, its application is very extensively, and it is giving play to the important effect in the production, life, work and study of people. Does a such as modernized company possess the staff of several thousand, and how manages so huge staff and workers' information archives like that? At this moment, the staff and workers' information management system that to develop the function perfect was indispensable. The tactics that this text, article, etc. introduced at https://www.doczj.com/doc/6512170732.html, and adopts under the frame " the development is applied in the comprehensive planning from top to bottom from bottom to top " are developed the detailed course of this system, and put forward the basic objective to realize that staff and workers' information and department information are inquired about, are managed and are renewed and expounding system structural design from the angle of software engineering having carried on expounding of science and

职工信息管理系统

摘要: 本论文设计了程序:《C语言课程设计指导书》第11题职工信息管理系统。这个管理系统能对职工信息(包括职工号、姓名、性别、年龄、学历、工资、住址、电话等)进行有效地信息录入、浏览、查询、插入、删除等操作,会给用户带来很大的便利。 一职工信息管理系统 1.1 题目要求 职工信息包括职工号、姓名、性别、年龄、学历、工资、住址、电话等(职工号不重复)。试设计一职工信息管理系统,使之能提供以下功能: (1)系统以菜单方式工作 (2)职工信息录入功能(职工信息用文件保存) (3)职工信息浏览功能 (4)职工信息查询功能,查询方式 按职工号查询等 按学历查询等 (5)职工信息删除、修改功能(可选项) [提示]建立职工信息结构体,结构体成员包括职工号、姓名、性别、年龄、学历、工资、住址、电话。 1.2 需求分析 根据题目要求,要求对职工信息进行输入、输出等操作;在程序中需要浏览职工的信息,应提供显示、查找、浏览、插入、修改等操作;另外还应提供键盘式选择菜单实现功能选择。 1.3总体设计 根据需求分析,可以对这个系统的设计分为以下几个模块: 职工信息管理系统 数据输入数据浏览数据查找数据插入退出 图1.3 系统功能模块图 1.4详细设计 1.4.1 主函数 主函数一般设计得比较简洁,只提供输入,处理和输出部分的函数调用。其中各功能模块用菜单方式选择。 [流程图]:

显示各功能选项 N 输入n ,判断n 是否是0-5 Y 根据n的值调用各功能模块函数 图1.4.1 主函数流程图 [程序]: /*****************主函数**********************/ main () {menu(); } /*****************menu函数********************/ void menu() { int n,w1;/*变量n保存选择菜单数字,w1判断输入的数字是否在功能菜单对应数字范围*/ do { puts("\t\t*****************menu****************\n\n"); puts("\t\t\t\t1.Enter new data"); puts("\t\t\t\t2.Browse all"); puts("\t\t\t\t3.Search "); puts("\t\t\t\t4.add"); puts("\t\t\t\t5.Exit"); puts("\n\n\t\t***********************************\n"); printf("Choice your number(1-5):[ ]\b\b"); scanf("%d",&n); if(n<1||n>5) /*对选择的数字作出判断,是否在菜单功能数字范围内*/ {w1=1;getchar();} else w1=0; } while(w1==1); switch(n)/*根据输入的数字,进入到相应的操作模块中*/ {case 1:enter();break;/*输入模块*/ case 2:browse();break; /*浏览模块*/ case 3:search();break; /*查找模块,其中包括按不同类别进行查找的多个子模块*/ case 4:add();break;/*插入模块*/ case 5:exit(0); /*退出*/ } } 注:menu函数能提供菜单方式选择功能,可以根据用户需要进入到所想要的操作模块中,此外把menu函数独立出来,可以方便随时对它进行调用,容易返回到系统界面。

员工信息综合管理系统

目录 前言.............................................................................................................................................. - 1 - 1 系统概述................................................................................................................................ - 2 - 1.1 程序总的功能说明....................................................................................................... - 2 - 1.2 总的设计思路及流程................................................................................................... - 2 - 2 系统功能的说明和设计及各模块流程图............................................................................ - 3 - 2.1 登陆验证....................................................................................................................... - 4 - 2.2 添加信息....................................................................................................................... - 5 - 2.3 查询信息....................................................................................................................... - 5 - 2.4 信息显示....................................................................................................................... - 6 - 2.5 信息统计....................................................................................................................... - 7 - 2.6 删除信息....................................................................................................................... - 8 - 2.7 修改信息....................................................................................................................... - 8 - 2.8 退出程序....................................................................................................................... - 9 - 3 调试及运行结果.................................................................................................................... - 11 - 3.1 主菜单......................................................................................................................... - 11 - 3.2 添加信息..................................................................................................................... - 12 - 3.3 查询信息..................................................................................................................... - 13 - 3.4 显示信息..................................................................................................................... - 14 - 3.5 统计信息..................................................................................................................... - 15 - 3.6 删除信息..................................................................................................................... - 15 - 3.7 修改信息..................................................................................................................... - 15 - 3.8 退出程序..................................................................................................................... - 16 - 4 总结........................................................................................................................................ - 17 - 参考文献.................................................................................................................................... - 17 - 附录:程序代码........................................................................................................................ - 18 -

职工信息管理系统

《数据结构》课程设计报告 设计题目职工信息管理系统 专业 班级 姓名 学号 完成日期

目录 1. 问题描述 (3) 2. 系统设计 (3) 3. 数据结构与算法描述 (5) 4. 测试结果与分析 (15) 5. 总结 (20) 6. 参考文献 (20) 附录程序源代码 (20)

职工信息管理系统 1. 问题描述 试设计一个职工信息管理系统。要求可对职工信息进行插入、删除、查找、排序、输出等功能,职工对象包括编号、姓名、性别、出生年月、学历、工作年月、基本工资、电话、家庭住址等等。 2. 系统设计 2.1 设计目标 随着计算机产业的不断发展和信息时代的到来,各个企业和部门对员工的信息管理也已经走上了数字化的阶段,尤其是利用微机对工资的管理已经在各个企业单位起着越来越多的作用。以前的财会人员完全是靠手写来完成员工的各种记录,每天都忙碌于各种繁多的票据之间,这种工作繁重复杂,容易出错,不易修改,所以使用一个好的管理系统对减轻管理人员的工作量是很有帮助的,也是一个企业发展不可缺少的基础,它的开发应用简单而不失一般性,操作方便,功能强大,系统化,规范化,自动化,而且经济使用的职工管理系统已经成为了各个企业和部门离不开的电脑软件。 本系统实现的操作和功能如下: 1)职工信息表的建立:根据职工提供的信息,按单链表的方法建立职工信息表; 2)职工信息的添加:根据公司的需要,增加新的职工,必须把职工的基本信息写入职工信息表,方便公司管理; 3)职工信息的查询:根据公司需要,查找某一个职工的信息,可以按姓名或编号查找,将都显示职工的全部信息。 4)职工信息的修改:根据公司或职工需要,如果信息需要修改,可对职工的人一个信息进行更改。 5)职工信息的排序:根据公司需要,可对职工进行排序; 6)职工信息的删除:根据公司需要,若职工不在公司工作,可对其信息删除,以及时更新职工信息表;

企业员工信息管理系统 毕业设计论文

企业员工信息管理系统论文 摘要及关键字 摘要: 员工管理系统是典型的信息管理系统(MIS),其开发主要包括后台数据库的建立和维护以及前端应用程序的开发两个方面。对于前者要求建立起数据一致性和完整性强、数据安全性好的库。而对于后者则要求应用程序功能完备,易使用等特点。 经过分析,我们使用 MICROSOFT公司的 VISUAL BASIC开发工具,利用其提供的各种面向对象的开发工具,尤其是数据窗口这一能方便而简洁操纵数据库的智能化对象,首先在短时间内建立系统应用原型,然后,对初始原型系统进行需求迭代,不断修正和改进,直到形成用户满意的可行系统。 整个系统从符合操作简便、界面友好、灵活、实用、安全的要求出发。 论文主要介绍了本课题的开发背景,所要完成的功能和开发的过程。重点的说明了系统设计的重点、设计思想、难点技术和解决方案。 Abstract The system of managing census register file is a typical application of managing information system (know as MIS),which mainly includes building up data-base of back-end and developing the application interface of front-end. The former required consistency and integrality and security of data. The later should make the application powerful and easily used. By looking up lots of datum, we selected Visual Basic presented by Microsoft because of its objective tools in Win32. VB offered a series of ActiveX operating a data-base. It can give you a short-cut to build up a prototype of system application. The prototype could be modified and developed till users are satisfied with it. 关键字: 员工管理系统,数据字典,数据结构,资源管理 目录 摘要及关键字 (1) 第一章前言 (3) 1.1本课题的意义 (3) 1.2当前企业员工管理软件的现状 (3) 1.3选择本课题的目的 (3) 第二章开发运行环境及相关理论知识 (4) 2.1运行环境 (4) 2.2开发工具及平台 (4) 2.3相关理论知识 (4) 2.3.1 Visual Basic 6.0简介 (4)

员工信息管理系统课程设计

《信息系统设计》软件工程课程设计 课题名称:员工信息管理系统 姓名:兰朝仁 学号: 080一五3035 学院:继续教育学院 专业:计算机科学与技术 年级:2008级(夜大专升本)指导教师:陈郞钦 完成日期:2010年10月11日

目录

【摘要】本文从员工信息管理系统规划、需求分析、系统设计、系统实现及系统测试等多个方面,分别叙述系统研发的整个实现过程,简述采用 7编程工具及数据库实现系统应用的设计要点,重点阐述系统实现过程中的重点和难点问题的分析及其解决方案,解决企业对员工的计算机管理。 【关键词】员工、人事、工资、管理、数据库 引言 随着我国国民经济建设的蓬勃发展和社会主义市场经济体制的迅速完善,各个行业都在积极使用现代化的手段,不断改善服务质量,提高工作效率,这些都在很大程度上给企业提出越来越严峻的挑战,对企业体系无论是在行政职能、企业管理水平以及优质服务上都提出更高的要求。建设一个科学高效的信息管理系统是解决这一问题的必由之路。员工信息管理作为企业内部的一种员工基本档案管理也是如此,由于企业的人数较多,每一位员工的具体实际情况也不尽相同,如果没有一个完整的员工信息管理系统去完成,将使工作变得复杂,并且对于工作的效率也将使一个致命的打击,使无论如何也无法适应现代社的需要。另外,目前部分企业使用的员工信息管理系统只有信息的录入,修改和删除的功能,而不具有信息查询的功能,这对于企业的信息管理工作来说是一个很遗憾的事情。因此,开发一套功能完整,设计合理,使用方便的企业员工信息管理系统成为很有必要的事情。 企业员工信息管理系统的内容功能对于企业的决策者和管理者来说都至关重要,所以企业员工信息管理系统应该能够为管理者提供充足的信息和快捷的查询与管理手段。作为计算机应用的一部分,使用计算机对企业员工信息进行管理,具有手工管理所无法比拟的优点.例如:查询迅速、查找方便、可靠性高、存储量大、BaoMi性好、寿命长、成本低等。这些优点能够降低员工管理工作的成本,减轻企业管理人员的负担,方便员工信息的更新、维护和查询,增加数据的可靠性;从而提高企业员工信息管理的效率,开拓企业员工管理工作的新局面,提高管理水平,是企业管理的科学化、正规化、信息化管理,与世界接轨的重要条件。 当前主流的程序开发环境有,,等。对系统兼容性好;可以支持多种环境及操作系统平台;而则可在系统下快速开发结构程序,具有多方成熟的控件完善系统开发。 主流的数据系统则有,和。功能强大,稳定性好;稳定,功能齐全;而简单

相关主题
相关文档 最新文档