当前位置:文档之家› P2P流量识别

P2P流量识别

P2P流量识别
P2P流量识别

Research of P2P Traffic Identification Based on BP Neural Network

Shen Fuke

Computer Science

Department, East China Normal

University Shanghai 200062, China fkshen@https://www.doczj.com/doc/878251177.html,

Change Pan

Computer Science

Department,

East China Normal

University

Shanghai 200062, China

pchang @https://www.doczj.com/doc/878251177.html,

Ren Xiaoli

Computer Science

Department,

East China Normal

University

Shanghai 200062, China

xlren@https://www.doczj.com/doc/878251177.html,

Abstract

Today’s P2P application is a big challenge to network traffic workload. In contrast to first generation P2P network s which used well-defined port numbers, current P2P applications have ability to disguise their existence through the use of arbitrary ports. Our goal is to give out a new approach for P2P traffic identification based BP Neural Network, and without relying on k eyword matching. This article introduces BP algorithm, analyzes the characters of P2P traffic, gives out the BP network based on connection patterns of P2P networks. The trained BPNN was applied as a P2P traffic identifier, which can be used to distinguish any k ind of P2P applications from non-P2P applications. This feasible solution has many advantages in P2P traffic identification. We believe our approach is the first method for characterizing P2P traffic using network dynamics based on BP network rather than any user payload.

1. Introdoction

Today’s Peer-to-Peer (P2P) application is a big challenge to Internet traffic workload and P2P represents a formidable component of campus network traffic. Network traffic identification on campus network exit link or backbone node can provide support to network resource planning and traffic control [5]. In contrast to first generation P2P networks which used well-defined port numbers, current P2P applications have ability to disguise their existence through the use of arbitrary ports [3]. Most existing traffic identification technology is based on the ports number or payload keyword matching named Deep Packet Inspection (DPI) [8]. Not only do most P2P applications now operate on top of nonstandard, custom-designed proprietary protocols, but also current P2P clients can easily operate on any port number, even HTTP, FTP, or other popular app’s port [9]. These circumstances portend a frustrating conclusion: robust identification of P2P traffic is only possible by inspecting user payload [10]. Yet packet payload capture and analysis poses a set of often insurmountable methodological landmines: legal, privacy, technical, efficient, etc. Further obfuscating workload characterization attempt is the increasing tendency of P2P protocols to support payload encryption. Indeed, the frequency with which P2P protocols are introduced and/or upgraded renders packet payload analysis not only impractical but also glaringly inefficient.

P2P flow inspection based traffic flow characteristic can be viewed as a Pattern Recognition problem which has two output, one is Yes and another is No [11]. So, Neural Network technology can be used

in P2P traffic inspection based traffic characteristic. In this paper we develop a BP neural network method to identify P2P flows at the campus exit link, based on flow connection patterns of P2P traffic, and without depending on packet payload. The significance of our algorithm lies in its ability to identify P2P protocols without relying on their underlying format, and we can identify previously unknown P2P protocols.

2. BP Neural Network (BPNN)

The attraction of neural networks is that they are best suited to solving the problems that are the most difficult to solve by traditional computational methods. BP Neural Network is a multi-tier network which weight value is trained by non-linear differentiable function. BPNN’s structure is simple and fictile, It is widely used in pattern recognition, voice recognition, image processing, Compression, Data mining, etc.

Suppose we have P training samples, that means we have P input/output pairs(I p,T p), p=1,2,…P, and:

I p =(i p1,?i pm )T ,(input vector), T p =(t p1,?t pn )T ,(output vector),

The m is the dimension number of input vector, and n is the dimension number of output vector.

O p =(o p1,…,o pn )T is the actual output vector corresponding to the input vector I p . W ij is the weight value of i th (i=1,…,n) output component corresponding to the j th (j=1,…,m) input component. ?W ij denotes modification value of increase by degrees one time, so one time increasing by degrees formula is:

W ij +?W ij =>W ij (1)

We can explain formula (1) with: put the W ij ’s modification value ?W ij to W ij and get a new W ij . And ?W ij =

|

?(t pi -o pi )i pj =

??pi i pj , (2)

P

p 1

|

P

p 1

?pi =t pi -o pi

? is learning efficiency. Formula (1) and (2) are named Delta learning rule. A BP network learns by example, that is, we must provide a learning set that consists of some input examples and the known-correct output for each case. So, we use these input-output examples to show the network what type of behavior is expected, and the BP algorithm allows the network to adapt. When the BPNN is trained, compare the network Resulting output with desired output gradually, and change the connection weight value according to the Delta learning rule. The cycle is repeated until the overall error value drops below some pre-determined threshold.

For BP network, a continuous function in any closed interval can be asymptotic by single hidden layer BPNN. One of the easiest forms of BPNN topology is made of three layers: one input layer (the inputs of our network), one hidden layer, one output layer (the outputs of our network). All neurons from one layer are connected to all neurons in the next layer. The number of input neurons is determined by the number of input class traits, the number of output layer neurons is the number of target class. So, before set up a BP Neural Network, we need determine the traits value of input dataset (the number is m), and the number of output layer nodes. If the BP Neural Network is used as a classifier, and there is n class mode, then the neurons number of output layer is n. The number of hidden layer neurons is related to requirement of problem, the number of input/output layer neurons, etc. Too big hidden layer neurons number can result to long training time, and the error is not always best of all, can result to bad fault-tolerant and can not identify sample unseen before. So, there must be a best hidden layer neurons number, neither very big nor very small. Here is a reference formula of best hidden layer neurons number [13].

1n m n +a (3)

Thereinto, m is input layer neurons number, n is output layer neurons number, a is a constant between 1 and 10.

Another important parameter is the training times, which is the number of input training data set traversing the BP Neural Network. Increasing the training times can improve the veracity of the BPNN model, but also can cost long training time. Decreasing the training times can reduce the veracity of the BPNN model, but also can cost short training time. So, we can choice one between 100 and 300 by experience.

Learning efficiency ? can be chose between 0.1 and 0.9. Lower learning efficiency needs more training iterative loop. Higher learning efficiency can make the BPNN overlay speedily and get a more non-linear result. If the hidden layer neurons number is too small, we can get a higher training error rate, and we must tolerant higher generalization error rate. If the hidden layer neurons number is too big, although training error rate falls, we can also get a higher generalization error rate because of exceeding appropriate value.

3. BPNN in P2P Traffic Identification

Before traffic identification, we need understand the traffic behavior of different applications, analyze and select the character of different traffic. Table 1 illustrates some character of popular Internet services. From Table 1, we can see, P2P flow is easy to be distinguished from other services except FTP service [7]. One way to distinguish P2P from FTP is to see if user interaction exists. P2P application has 2 states, message associate state and data transfer state, which alternates in a very high frequency. Message associating is a way of user interaction. We can distinguish P2P from FTP by this feature.

Character Selection is defined as choosing a best subset from input character set. Character extraction is defined as conversion or integration of original characters. There are 2 characters which can represent user interaction of data flow [6][12]:

?Mean Squared Deviation (MSD) of every flow’s packet size

When a date flow’s packet size is variational and the variation is very big, the mean squared deviation of this flow’s packet size is very big. Although both WWW and P2P service have big MSD, We can distinguish P2P from WWW by flow duration, P2P flow has longer duration than WWW.

?Switching Frequency of every flow’s packet size Switching Frequency of P2P flow’s packet size is higher than that of other services.

Table 1. Some characters of popular Internet services

So, we choose MSD(Mean Squared Deviation of the flow’s packet size), Switching Freq(Switching Frequency of the flow’s packet size), Rates(Average of the flow’s packet size), Packets(packet quantity of the flow ) and Bytes(Total bytes of the flow) as input characters. That means m=5.

We inspect the traffic flow, only to identify the flow is P2P or not. So, the output has 2 options, that means n=2.

So, we get: Input neurons number m=5, output neurons number n=2, hidden layer neurons number

=4, training times epcoh ˙250, learning

efficiency ?=0.7.

1n Using MATLAB’s Neural Network toolskit, we analyzed the data flow captured from East China Normal University campus network, we get 250 groups’ data flow samples (Table 2 gives out 8 of them). There are P2P flows and non-P2P flows. Sample 1 to 4 is P2P flow, and sample 5 to 8 is non-P2P. We denote P2P flow with (1,0), denote non-P2P with (0,1), So, Sample 1 to 4’s output is (1,0), and sample 5 to 8’s output is (0,1).

Now we have a BPNN. Before we use the BPNN, we must train it. Only after having been trained enough, It can be used as a data flow classifier. In our experiment, we set the satisfactory threshold to be over 98%, that means it can not exceed 5 mis-identifications among 250 training samples. For testing our BPNN, we use 5 computers running FTP and WWW together with 2 of them running BitTorrent and eMule, other 3 of them running eDonkey and Tuotu(supporting encrypted transmission). We get 48875 flows, 2340268

packets, 643.7M bytes by sniffer. Through our identification system, we get the number of P2P flow is 2787 while the actual number is 2895, the discovery rate is 96.3%. After 6 experiments in different time slot, we get discovery rate between 93% and 99%. Because this resolution is based on traffic characters, it can identify encrypted and unknown P2P traffic, it has a very good adaptability. We compare our solution with[1],[2],[3], we get comparison table 3. [1] is solution 1 in table 3, based on Deep Packet Inspection. In [1], the claimed discovery rate 99.4% is aimed at known P2P, this solution can not identify unknown and encrypted P2P. [2] is based on network layer and transport layer connection patterns of P2P networks. [3] gives out a solution based on Bayesian Analysis Techniques. From table 3, we can see our solution has advantages in flexibility, adaptability and discovery rate.

4. Conclusion

The application layer signature is not operational because it must enumerate all possible signatures and it is useful to unknown P2P applications. The trained BPNN is based on P2P traffic characters rather than any user payload, which can be used to distinguish any kind of P2P applications from non-P2P applications. This solution is independent of the type of P2P application, port number, and transmission encrypted or not. So it has good adaptability. Because our experiment is offline, what we will do next step is to test this solution online, which needs

Service Duration Average Rate Count of Bytes HTTP

short

high

From lower to higher VPN long low

high G ames long low

high Streaming long

mid

high

Telnet long low mid FTP/P2P

long

Mid to high

high

higher performance. We also need more data to verify whether the five characters are enough to

identify unknown P2P.

Table 2. Samples of flow captured from ECNU

Table 3. Comparison with other solution

Solution Solution Description Encrypted or not

Forged port

Discovery Rate Solution 1 Based on packet payload No

Yes

99.4%

Solution 2 Based on traffic behavior of network layer and transport layer

Yes Yes 95%

Solution 3 Using Bayesian Analysis Techniques and FCBF

Yes Yes 55.18% Our Solution

Based on traffic characters

Yes

Yes

96%

5. References

[1]S. Sen, O.Spatscheck, D.Wang. Accurate, Scalable In-Network Identification of P2P traffic Using Application Signatures. May 2004

[2]A. W. Moore, K. Papgiannaki. Toward the Accurate Identification of Network Applications. March 2005

[3]T. Karagiannis, A.Broido, M.Faloutsos. Transport Layer Identification of P2P Traffic. October 2004

[4]A. W. Moore, D.Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. June 2005

[5]https://www.doczj.com/doc/878251177.html,/adsl2006/adsl2006/p2p.html

[6] Li JiangTao, Jiang YongLing Telecom Science,2005 49(3).

[7] https://www.doczj.com/doc/878251177.html,/infocus/1843/1~3 [8]https://www.doczj.com/doc/878251177.html,/doc.aspx?nid=58&language=sc&mid=14&smid=58 [9]https://www.doczj.com/doc/878251177.html, [10]https://www.doczj.com/doc/878251177.html,

Sample ID

MSD

Switching Freq

Rates

Packets

Bytes

Type

1 464.1424 1.431

2 1363.875 152 207309 P2P 2 452.6442 1.0364 1397.611 16

3 227811 P2P 3534.8935 4.469

4 893.

5 149 133132 P2P 4 558.6362 3.8437 1074.848 154

165527 P2P

5701.5207 0.2592 747.25 9772483 Non-P2P 6103.6535 0.0069 145

124

17980 Non-P2P 7102.6203 0.0086 149.0357 136 20269 Non-P2P 8

103.1136 0.0122

151.0654 131

19790

Non-P2P

[11]https://www.doczj.com/doc/878251177.html,/PUBLICATIONS/AB STRACTS/EAGE98/eage98_hcd.pdf [12]https://www.doczj.com/doc/878251177.html,

[13]FeiSi Science and Technology R&D Center ,Beijng. Electronic Industry Public House. Mar. 2005.

车流量检测.pdf

道路车辆检测技术概述 近年来,随着我国交通运输事业的蓬勃发展,智能交通系统(ITS)的研究和应用越来越得到重视,交通运输部于2011年4月颁布了《公路水路交通运输信息化“十二五”发展规划》,提出“必须把推进交通运输信息化建设摆在‘十二五’规划中的突出位置”。准确、实时、完整的交通信息采集是ITS的基础,而车辆检测器则是对动态交通信息进行实时采集的基础设施。 随着电子技术、通信技术和计算机技术的不断发展,车辆检测器也由过去比较单一的种类发展为采用不同技术手段,具有多类型、多品种、多系列的交通车辆参数检测器家族。按信息采集方式的不同,可分为固定型检测技术和移动型检测技术。固定型检测技术可分为磁频采集、波频采集和视频采集3类,主要有感应线圈检测器、磁力检测器、微波检测器、超声波检测器、红外线检测器和视频检测器等,目前我国道路监控系统中,使用最多的是感应线圈车辆检测器、视频车辆检测器和微波车辆检测器3种。移动型检测技术目前主要有浮动车法、车辆识别法和探测车法等,运用的技术主要有基于GPS的定位采集技术、基于汽车牌照自动判别的采集技术、基于电子标签(Beacon)的定位采集技术和基于手机探测车的采集技术。 1磁频类车辆检测器 磁频类车辆检测器是基于电磁感应原理的车辆检测器,主要有感应线圈检测器、磁性检测器和地磁检测器等,其中感应线圈检测器是目前使用最广泛的交通流量检测装置。 1.1感应线圈检测器 感应线圈检测器是地埋型检测器,其传感器为一组通有一定工作电流的环形感应线圈。当车辆进入环形感应线圈所形成的磁场时,引起电路中调谐电流的频率或相位变化,检测处理单元通过对频率或相位变化的响应,得出一个检测到车辆的输出信号。感应线圈检测器可直接提供车辆出现、车辆通过、车辆计数及车道占有率等交通流信息。调查表明,用2m×2m的标准感应线圈对交通流量进行检测,其精度可达到98%~99%。通常在同一车道内埋设2个感应线圈,根据测定车辆

P2P流量识别

Research of P2P Traffic Identification Based on BP Neural Network Shen Fuke Computer Science Department, East China Normal University Shanghai 200062, China fkshen@https://www.doczj.com/doc/878251177.html, Change Pan Computer Science Department, East China Normal University Shanghai 200062, China pchang @https://www.doczj.com/doc/878251177.html, Ren Xiaoli Computer Science Department, East China Normal University Shanghai 200062, China xlren@https://www.doczj.com/doc/878251177.html, Abstract Today’s P2P application is a big challenge to network traffic workload. In contrast to first generation P2P network s which used well-defined port numbers, current P2P applications have ability to disguise their existence through the use of arbitrary ports. Our goal is to give out a new approach for P2P traffic identification based BP Neural Network, and without relying on k eyword matching. This article introduces BP algorithm, analyzes the characters of P2P traffic, gives out the BP network based on connection patterns of P2P networks. The trained BPNN was applied as a P2P traffic identifier, which can be used to distinguish any k ind of P2P applications from non-P2P applications. This feasible solution has many advantages in P2P traffic identification. We believe our approach is the first method for characterizing P2P traffic using network dynamics based on BP network rather than any user payload. 1. Introdoction Today’s Peer-to-Peer (P2P) application is a big challenge to Internet traffic workload and P2P represents a formidable component of campus network traffic. Network traffic identification on campus network exit link or backbone node can provide support to network resource planning and traffic control [5]. In contrast to first generation P2P networks which used well-defined port numbers, current P2P applications have ability to disguise their existence through the use of arbitrary ports [3]. Most existing traffic identification technology is based on the ports number or payload keyword matching named Deep Packet Inspection (DPI) [8]. Not only do most P2P applications now operate on top of nonstandard, custom-designed proprietary protocols, but also current P2P clients can easily operate on any port number, even HTTP, FTP, or other popular app’s port [9]. These circumstances portend a frustrating conclusion: robust identification of P2P traffic is only possible by inspecting user payload [10]. Yet packet payload capture and analysis poses a set of often insurmountable methodological landmines: legal, privacy, technical, efficient, etc. Further obfuscating workload characterization attempt is the increasing tendency of P2P protocols to support payload encryption. Indeed, the frequency with which P2P protocols are introduced and/or upgraded renders packet payload analysis not only impractical but also glaringly inefficient. P2P flow inspection based traffic flow characteristic can be viewed as a Pattern Recognition problem which has two output, one is Yes and another is No [11]. So, Neural Network technology can be used in P2P traffic inspection based traffic characteristic. In this paper we develop a BP neural network method to identify P2P flows at the campus exit link, based on flow connection patterns of P2P traffic, and without depending on packet payload. The significance of our algorithm lies in its ability to identify P2P protocols without relying on their underlying format, and we can identify previously unknown P2P protocols. 2. BP Neural Network (BPNN) The attraction of neural networks is that they are best suited to solving the problems that are the most difficult to solve by traditional computational methods. BP Neural Network is a multi-tier network which weight value is trained by non-linear differentiable function. BPNN’s structure is simple and fictile, It is widely used in pattern recognition, voice recognition, image processing, Compression, Data mining, etc. Suppose we have P training samples, that means we have P input/output pairs(I p,T p), p=1,2,…P, and:

车流量检测方法纵览

车流量检测技术综述 胡明亮1,李飞飞 2 ,钟德浩3 (1、江西方兴科技有限公司,江西南昌330003) (2、江西省高等级公路管理局泰井管理处,江西南昌330003) (3、江西省高等级公路管理局瑞赣养护中心,江西南昌330003) 摘要:车流量检测是交通管理与控制的基础。在综述了车流量检测的传统方法、技术特点和 存在的问题后,重点分析了基于视频图像的车流量检测技术,并对其发展趋势进行了展望。 关键词:信息工程;视频图像;车流量检测;数字图像处理 0 前言 城市智能交通已逐步得到社会各界的广泛关注,如何通过智能交通系统建设来缓解日益严重的交通问题已成为交通领域的研究热点。车流量检测系统是智能交通(ITS)的基础部分,在城市道路建设、国道高速公路建设、隧道桥梁建设以及交通流的基础理论研究中占有很重要的地位。近年来,逐渐发展起来了以空气管道检测技术、磁感应检测技术、波频检测技术和视频检测技术等[1~2]为代表的多种交通检测技术[3]。车流量检测主要是通过各种传感设备对路面行驶车辆进行探测,获取相关交通参数,以达到对公路各路段交通状况及异常事件的自动检测、监控、报警等目的。 较其它方法而言,基于视频图像的检测技术涉及到视频采集、通信传输、图像处理、人工智能以及计算机视觉等多个学科,具有安装维修灵活、成本低、应用范围广、可拓展性强和交通管理信息全面等优点,并已经在国内外高速公路和公路的交通监控系统中得到应用。常用的基于视频图像的车辆检测算法有:灰度法、背景差法、相邻帧差法、边缘检测法[4]等。随着图像处理技术、计算机视觉、人工智能的发展和硬件处理速度的提高,基于视频图像的车流量检测技术得到了广泛的应用。本文对各种车流量检测方法进行了综述,并对基于视频图像的车流量检测研究工作进行了展望。 1 传统车流量检测方法 按照车辆信息获取方式的不同,实际应用当中已经产生了空气管道检测技术、磁感应检测技术和波频检测技术。 1.1 空气管道检测技术 空气管道检测是接触式的检测方法,在高速公路主线的检测点拉一条空心的塑料管道并作固定,一端封闭,另一端连接计数器,当车辆经过塑料管道时,车轮压到空气管道,管内空气被挤压而触动计数器进行计算车流量的方法。 显然,该方法只能获取单一的车辆信息,且方法繁琐,寿命短,已经被磁感应检测等技术所取代。 1.2 磁感应检测技术 磁感应检测器可分为线圈和磁阻传感器两种。环形线圈检测器是目前世界上应用最广泛的一种检测设备,由埋设在路表下的线圈和能够测量该线圈电感的电子设备组成。车辆通过线圈,引起线圈磁场的变化,检测器据此计算出车辆的流量、速度、时间占有率和长度等交通参数。图1利用一个LC振荡器和一个通用单片机即构成了感应线圈检测系统。当感应线圈的电感L发生变化时,LC振荡器的振荡频率也随之变化,由单片机获取其振荡频率并通过频率变化给出高/低电平信号来判断是否有车辆通过[5~6]。磁阻传感器的基本原理是在铁磁材料中会发生磁阻的非均质现像(AMR)。当沿着一条长且薄的铁磁合金带的长度方向施加一个电流,在垂直于电流的方向施

车流量检测技术综述

车流量检测技术综述 胡明亮1,李飞飞2 ,钟德浩3 (1、江西方兴科技有限公司,江西南昌330003) (2、江西省高等级公路管理局泰井管理处,江西南昌330003) (3、江西省高等级公路管理局瑞赣养护中心,江西南昌330003) 摘要:车流量检测是交通管理与控制的基础。在综述了车流量检测的传统方法、技术特点和 存在的问题后,重点分析了基于视频图像的车流量检测技术,并对其发展趋势进行了展望。 关键词:信息工程;视频图像;车流量检测;数字图像处理 0 前言 城市智能交通已逐步得到社会各界的广泛关注,如何通过智能交通系统建设来缓解日益严重的交通问题已成为交通领域的研究热点。车流量检测系统是智能交通(ITS)的基础部分,在城市道路建设、国道高速公路建设、隧道桥梁建设以及交通流的基础理论研究中占有很重要的地位。近年来,逐渐发展起来了以空气管道检测技术、磁感应检测技术、波频检测技术和视频检测技术等[1~2]为代表的多种交通检测技术[3]。车流量检测主要是通过各种传感设备对路面行驶车辆进行探测,获取相关交通参数,以达到对公路各路段交通状况及异常事件的自动检测、监控、报警等目的。 较其它方法而言,基于视频图像的检测技术涉及到视频采集、通信传输、图像处理、人工智能以及计算机视觉等多个学科,具有安装维修灵活、成本低、应用范围广、可拓展性强和交通管理信息全面等优点,并已经在国内外高速公路和公路的交通监控系统中得到应用。常用的基于视频图像的车辆检测算法有:灰度法、背景差法、相邻帧差法、边缘检测法[4]等。随着图像处理技术、计算机视觉、人工智能的发展和硬件处理速度的提高,基于视频图像的车流量检测技术得到了广泛的应用。本文对各种车流量检测方法进行了综述,并对基于视频图像的车流量检测研究工作进行了展望。 1 传统车流量检测方法 按照车辆信息获取方式的不同,实际应用当中已经产生了空气管道检测技术、磁感应检测技术和波频检测技术。 1.1 空气管道检测技术

基于行为特征的P2P流量快速识别

邮局订阅号:82-946360元/年技术创新 软件时空 《PLC 技术应用200例》 您的论文得到两院院士关注 基于行为特征的P2P 流量快速识别 Classifying P2P stream quickly with behavior pattern (哈尔滨工业大学) 戴强张宏莉叶麟 DAI Qiang ZHANG Hong-li YE Lin 摘要:网络中P2P 流量的快速识别,对于实现网络流量控制和QoS 保证提供了有效的流量组成结果。本文提出了一种基于行为特征的流量识别技术,捕获流量数据,分析数据集端口与IP 个数比值,上行数据量和下行数据量比值,实现P2P 流量快速识别。 关键词:P2P;流量识别;行为特征 中图分类号:TP393.08文献标识码:A Abstract:This paper indicates a quick classification of p2p streams while focusing on the behavior characteristics of streams.In order to get higher correct rate,the ratio of input packets and output packets and the scope of packet length are considered.The given re -sults illustrate the practicability of the design. Key words:Peer-to-peer;flow classification;behavior characteristics 文章编号:1008-0570(2009)01-3-0209-02 1概述 目前基于P2P 协议的网络应用快速发展,极大地推动了网络发展,但其网络带宽过度消耗亦成为自身发展的桎梏。用户希望能有效地引导P2P 流量及其服务,促进了P2P 流量识别的研究。从研究方法上,P2P 流量识别主要分为基于端口的方法、基于应用层特征的方法和基于流量模式的方法。 P2P 应用在早期使用固定端口号通讯,如Gnutella 使用6346-6347端口,BitTorrent 使用6881-6889端口。为实现P2P 流量的识别,采用P2P 应用程序端口建立分类器,识别数据集中的P2P 流量。基于端口的识别法简单迅速,结果准确度高。 为了避免对P2P 应用的恶意干扰,目前P2P 应用采用随机端口方式隐藏数据报文,端口已经无法唯一标识P2P 应用。Thomas Karagiannis 和Andre Broido 依据协议分析以及逆向工程提出基于应用层特征的方法,其统计结果表明,P2P 应用中部分数据报文携带特定字段,如eMule 协议中存在0xE3,BitTor -rent 含有0x13Bit 。该类方法能够准确识别已知数据格式的P2P 应用,但P2P 协议更新频繁,从而需要依据协议的变化动态增 减特征串,扩展性差。 随着用户安全性的加强,部分P2P 应用已采用加密传输数据,目前约20%的BitTorrent 和eDonkey 流量采用了加密设计, 95%的因特网电话流量来自于加密的Skype 。加密流量导致基 于应用层特征的识别法失去效果,进而出现基于流量模式的方法,该类方法依据P2P 流量的内在特征实现流量识别。研究人员在离线方式下采用K-均值聚簇算法和DBSCAN 算法,统计 TCP 连接时间间隔等特征;或者采用EM 算法,统计数据报文的 有效载荷与数据报文到达时间的比值,依据大数据量的统计结果获取P2P 流量中的报文分布特征,进而识别具备相应特征的P2P 流量。文献中提出基于社会层、功能层、应用层的启发式分析方法,综合了P2P 客户端IP 出现频率实现流量识别。基于流量模式的识别法不检查数据报文的应用层内容,不受数据加密的限制,有效预测未知P2P 流量,但对内存空间以及处理速度提出了比较大的要求。 2P2P 流量快速识别方法 P2P 流量作为网络中的一种流量类型,采用TCP 连接实现共享文件数据的传输。P2P 网络中的节点同时具备客户机和服务器的功能,在文件共享过程中,P2P 流量具备以下特征。 2.1TCP 连接特征P2P 流量具有多个对端IP 和对端Port 。对于P2P 文件传输,一个P2P 客户端(源端)和一个或多个P2P 客户端(对端)建立连接。相对于源端,对端的IP 地址数量较多,对端的IP 和对端Port 都是随机的,而且对端IP 和对端Port 个数比值接近1.0(部分Port 可能相同)。采用对端IP 的数量以及对端IP 和对端Port 个数的比值作为P2P 流量的一个 识别特征。 2.2上下行P2P 流量特征 P2P 应用在传输文件时,源端向对端稳定上传、下载数据,受用户主观影响小,上行流量和下行流量对称。对于P2P 应用, 单个时间段内源端数据下载总量和上传总量的比值在一定的区间内波动。 2.3数据报文大小特征 考虑到网络中协议繁多,P2P 应用在传输过程中,多数IP 数据报文含有一定大小的有效载荷。网络中各种应用由于应用目的的不同,数据报文大小存在一定的分布规律,通过分析数据报文大小规律,可以提高识别的准确率。 本文将eMule 作为典型的P2P 应用进行分析。eMule 协议通过TCP 连接传输数据,实现文件共享,因此本文针对文件传输过程中的行为特征进行研究。 3P2P 流量快速识别步骤 3.1对端IP 地址数量、IP 地址数量与Port 数量的比值 (1)源端和对端连接以后,使用TCP 传输数据,捕获单个时间片内数据传输过程当中的数据报文,统计对端IP 地址总数(IP_Num)和对端不同的Port 总数(Port_Num)。 (2)计算比值(Ratio =IP_Num/Port_Num)。实验设定Ratio 取值范围。如果Ratio 不在这个范围之内,则认为不属于P2P 数据报文。 戴强:硕士生 209--

基于视频的车流量检测算法研究

西南交通大学 毕业设计(论文) 基于视频的车流量检测算法研究 专业: 自动化 指导老师: 侯进 二零一零年六月

西南交通大学本科毕业设计(论文)第I页 院系信息科学与技术学院专业自动化 年级2006级姓名安伟 题目基于视频的车流量检测算法研究 指导教师 评语 指导教师(签章) 评阅人 评语 评阅人(签章) 成绩 答辩委员会主任(签章) 年月日

毕业设计任务书 班级自动化2班学生姓名安伟学号2006 专业自动化 发题日期:2010 年1月1 日完成日期:2010 年6 月15 日 题目基于视频的车流量检测算法研究 题目类型:工程设计√技术专题研究理论研究软硬件产品开发 一、设计任务及要求 车流量信息是交通控制中的重要信息,其检测在智能交通系统中占有重要地位。基于视频图像处理技术的车流量检测系统,通过安装在道路旁边或者中间隔离带的支架上的摄像机和图像采集设备将实时的视频信息采入,经过对视频图像的处理分析可以进行车流量的实时检测。基于视频的车流量检测系统有易安装、维护及实现方便等明显的优势,非常有利于交通系统的管理及控制。具体要求如下: 1. 对图像进行预处理 2. 进行车流量的统计 3. 人机界面简单清楚友好 二、应完成的硬件或软件实验 采集视频图像,对图像进行分析处理,完成车流量的统计,与实际通过车辆数目比较,分析本系统的正确检测率。 三、应交出的设计文件及实物(包括设计论文、程序清单或磁盘、实验装置或产品等) 1. 毕业设计论文(必须完全符合学校规范,内容严禁有丝毫的抄袭剽窃) 2. CD-R(含论文,程序,程序使用说明书,演示视频,盘面注明姓名,专业,日期) 3. 英文翻译按学校规定,导师无特殊要求

车流量检测技术综述

车流量检测技术综述 前言 城市智能交通已逐步得到社会各界的广泛关注,如何通过智能交通系统建设来缓解日益严重的交通问题已成为交通领域的研究热点。车流量检测系统是智能交通(ITS)的基础部分,在城市道路建设、国道高速公路建设、隧道桥梁建设以及交通流的基础理论研究中占有很重要的地位。近年来,逐渐发展起来了以空气管道检测技术、磁感应检测技术、波频检测技术和视频检测技术等为代表的多种交通检测技术。车流量检测主要是通过各种传感设备对路面行驶车辆进行探测,获取相关交通参数,以达到对公路各路段交通状况及异常事件的自动检测、监控、报警等目的。 较其它方法而言,基于视频图像的检测技术涉及到视频采集、通信传输、图像处理、人工智能以及计算机视觉等多个学科,具有安装维修灵活、成本低、应用范围广、可拓展性强和交通管理信息全面等优点,并已经在国内外高速公路和公路的交通监控系统中得到应用。常用的基于视频图像的车辆检测算法有:灰度法、背景差法、相邻帧差法、边缘检测法等。随着图像处理技术、计算机视觉、人工智能的发展和硬件处理速度的提高,基于视频图像的车流量检测技术得到了广泛的应用。本文对各种车流量检测方法进行了综述,并对基于视频图像的车流量检测研究工作进行了展望。 1 传统车流量检测方法 按照车辆信息获取方式的不同,实际应用当中已经产生了空气管道检测技术、磁感应检测技术和波频检测技术。 1.1 空气管道检测技术 空气管道检测是接触式的检测方法,在高速公路主线的检测点拉一条空心的塑料管道并作固定,一端封闭,另一端连接计数器,当车辆经过塑料管道时,车轮压到空气管道,管内空气被挤压而触动计数器进行计算车流量的方法。 显然,该方法只能获取单一的车辆信息,且方法繁琐,寿命短,已经被磁感应检测等技术所取代。 1.2 磁感应检测技术 磁感应检测器可分为线圈和磁阻传感器两种。环形线圈检测器是目前世界上

流量检测的意义

流量检测意义 在工业生产过程中需要进行物质流量和数量测量的场合随处可见,流体流量测量对于节约能源、防止大气污染和生产过程自动化是必不可少的。随着生产技术的发展,对流体流量和总量的计量和测试提出了越来越多、越来越高的要求,特别是在注重节省能源、提高能源利用率的今天,流量计量和测试的重要性也就更加突出。而微电子技术的发展,带动了流量计向智能化方向发展的同时,也极大地促进了流量测量和标定。 各种物质流量和数量的准确测量,对国民经济各个部门有重大的现实意义。因此,各种流量测量仪表,尤其是作为经济核算依据和量值传递的高准确度的标准流量计,从研制到投入使用之前,就必须对其特性进行细致、系统的研究。而这些试验研究工作,都需要在流量标准装置上进行,所以流量标准装置必须满足仪表试验特性的要求,并且对流量标准装置的精度、可靠性、智能化程度提出了新要求。 然而,影响流量计特性的因素很多,除了仪表本身的设计参数和结构之外,与流体的流动特性、工作状态等都有着密切的关系。同时,由于现场流量计的使用条件千变万化,建立完全和现场条件一致的流量标准装置是很困难的。因此,必须选定其共性条件,建立标准装置,运用理论与实际相结合的方法,解决使用条件的问题。 各种流量测量仪表在研制的工作中有大量实流实验的工作,以便充分掌握仪表的动态特性,进一步验证或改进设计。在流量计的生产过程中需逐台检定,以确定流量计的仪表系数、不确定度、重复性和量程范围等技术指标。这些工作都需要在流量标准装置上才能进行,因此,流量标准装置的研究和应用是流量计量和测试技术发展的重要基础。 由此可见,流量标准装置的研究和应用是流量计和检测技术发展的重要环节,所以多少年来一直得到各国政府和组织的重视。有些国家组成了专门的研究机构,研究建立了种类繁多的流量标准装置,并对装置的技术指标、特性和经济效益进行了分析和研究。虽然,目前流量标准装置的研究和建立已有了很大的发展,但在大流量、特种介质等流量装置研制上还是遇到了不少的困难,这些困难有的是技术上的,有的是经济上的。由于投资大,影响因素多,因此,想要解决问题还

钻井液流量检测技术

钻井液流量检测技术 中石化胜利工程有限公司地质录井公司 摘要:钻井液出入口流量的准确检测是发现以上异常现象的重要手段之一,因此准确实现钻井液出入口流量的检测,对于现场油气钻探的安全施工有着重要的意义。 关键词:钻井液流量;检测;录井;研究方法 引言 在钻井现场,钻井液出口流量是一个重要的参数,根据出口流量的变化能够判断井下异常情况,通常情况下是利用靶式流量计来测量,其测量原理是靠泥浆的冲击使靶体发生位移,带动电阻变化,产生信号变化,反应灵敏,测量结果能够快速反映钻井液出口流量的变化;靶体使用优质不锈钢材料制作,成本低廉、原理简单、不容易损坏。该传感器存在诸多缺点: 1、使用困难,传感器一般是装在架空管线上,需要对架空管线开口,安装人员需要佩戴安全带,进行高处作业; 2、经过长时期使用,传感器会变得不灵活,泥浆在靶体上固结,形成泥饼,影响了测量的精度,导致传感器的输出信号变小,不能反映泥浆流量的真实变化; 3、无法根据实际情况标定传感器,当受到钻井液冲击后,其 上升和回落之间的落差较大,只能反映一个相对值,不能计算

真实的流量变化。 所以,靶式流量传感器的测量精度不能满足钻井过程中井涌、井漏及其他钻井安全事故监控预报的需求。 1、研究意义钻井液入口流量采用泥浆泵计算的方法获得,存在误差大、受泥浆泵效率影响大等问题。因此,研制一套钻井液出入口流量实时检测单元,对于准确计算钻井过程中的钻井液体积变化具有重要意义。 在钻井现场,如果采用的流量检测手段不适合,对井漏井涌等异常工况的发生预测不及时,将会造成极为严重的后果。在重庆开县发生过重大的死伤事故,在天然气井钻进时,若处理措施不恰当,还会引起失控着火、爆炸以及地层下陷等事故。为预防各种事故的发生,钻井过程中,录井人员应该做好井控监视工作,及时发现溢流、井漏等征兆,进行快速汇报。需要对钻井液流量进行定量、实时的检测,及时发现各类异常工况,及时进行预警,在根本上防止井喷等事故的发生,以便于钻井工作的顺利实施,提高社会效益。 目前,录井技术逐渐向智能化发展,以电子设施、智能化仪表的自动监测控制代替常规的人工坐岗,能够减少因人工疲倦、失误造成的情况误判、漏报、错报,尤其在情况复杂的地区,凭人工的经验进行施工,容易造成巨大的事故,导致国家财产蒙受巨量的损失。钻井液流量的智能监测、智能预报,不仅提供了可观的数据信息,还可以实时分析相关的参数,进行智能化预报,为钻井工程技

基于流量特征建模的网络异常行为检测技术

第8卷第4期2019年7月Vol. 8 No. 4Jul. 2019网络新媒体技术 基于流量特征建模的网络异常行为检测技术* *本文于2018 -05 -09收到。 *中科院率先行动计划项目:端到端关键技术研究与系统研发(编号:SXJH201609)。黄河▽邓浩江3陈君I C 中国科学院声学研究所国家网络新媒体工程技术研究中心北京100190 2中国科学院大学北京100190)摘要:基于流量特征建模的网络异常行为检测技术通过对网络流量进行特征匹配与模式识别,进而检测岀潜在的、恶意入侵 的网络流量,是网络异常行为检测的有效手段。根据检测数据来源的不同,传统检测方法可以分为基于传输层信息、载荷信 息、主机行为特征等三类,而近年来兴起的深度学习方法已经开始应用于这三类数据,并可以综合应用三类数据,本文从技术 原理与特点、实验方式、取得的成果等方面对上述技术路线进行了综述,并分析了存在的主要问题和发展趋势。关键词:网络异常行为,异常检测,模式识别,流量特征建模,深度学习 Network Abnormal Behavior Detection Technologies Based on Traffic - feature Modeling HUANG He 1'2, DENG Haojiang 1'2, CHEN Jun 1 (1 National Network New Media Engineering Research Center, Institute of Acoustics , Beijing, 100190, China , 2University of Chinese Academy of Science , Beijing, 100190, China) Abstract : The network abnormal behavior detection technology based on traffic - feature modeling can detect potential and malicious intrusion of network traffic by feature matching and pattern recognition of network traffic , and is an effective measure of network abnor - mal behavior detection. According to the different sources of detection data , traditional detection methods can be classified into three categories based on transport layer information , on load information , and on host behavior characteristics. In recent years , the deep learning method that has emerged has begun to be applied to these three types of data , and can be comprehensively applied. This paper summarizes the above technical routes in terms of technical principles and characteristics , experimental methods , and achievements , and analyzes the major problems and development trends.. Keywords : Network abnormal behavior, Anomaly detection , Pattern recognition , Traffic - feature modeling, Deep learning 0引言 “互联网是第一种由人类建造,但不为人类所理解之物,它是有史以来我们对无序状态最疯狂的实 验。”⑴Google 公司前首席执行官Eric Schmidt 在2010年的这段公开谈话直观揭示了因特网的混沌性与复 杂性,其背后的逻辑在于因特网用户行为的多元化。时至2018年,全球因特网的接入用户数量与数据总量 和2010年相比已经不可同日而语⑵,网络安全牵涉到信息社会中公共安全和个人隐私、财产的方方面面,网 络安全问题正成为学术研究和工程应用中亟待解决的难题。本文涉及的是基于流量特征建模的网络异常 行为检测技术,这是网络安全技术的一个分支,它的核心思想是通过对网络流量进行特征匹配与模式识别,

相关主题
文本预览
相关文档 最新文档