学术论文

      基于支持向量机的加密流量识别方法

      Identification method of encrypted traffic based on support vector machine

      摘要:
      针对现有的加密流量识别方法难以区分加密流量和非加密压缩文件流量的问题,对互联网中的加密流量、txt流量、doc流量、jpg流量和压缩文件流量进行分析,发现基于信息熵的方法能够有效地将低熵值数据流和高熵值数据流区分开.但该方法不能识别每个字节是随机的而全部流量是伪随机的非加密压缩文件流量,因此采用相对熵特征向量{h0,h1,h2,h3}区分低熵值数据流和高熵值数据流,采用蒙特卡洛仿真方法估计π值的误差perror来区分局部随机流量和整体随机流量.最终提出基于支持向量机的加密流量和非加密流量的识别方法SVM-ID,并将特征子空间ΦSVM={h0,h1,h2,h3,perror}作为SVM-ID方法的输入.将SVM-ID方法和相对熵方法进行对比实验,结果表明,所提方法不仅能够很好地识别加密流量,还能区分加密流量和非加密的压缩文件流量.
      Abstract:
      The existing methods of encrypted traffic classification are difficult to effectively distinguish encrypted traffic and compressed file traffic.Through analyzing the encrypted traffic, txt traffic, doc traffic, jpg traffic,and compressed file traffic, it is found that the methods based on information entropy can effectively separate the low entropy traffic and the high entropy traffic.However, this method cannot distinguish non-encrypted compressed file traffic with byte randomness and full flow pseudo randomness.Therefore, the relative entropy feature vector {h0,h1,h2,h3} is employed to distinguish the low entropy traffic and the high entropy traffic,and the Monte Carlo simulation method is used to estimate the error of π value, perror, which can be used to distinguish the local random traffic and the whole random traffic.Finally, a support vector machine (SVM)-based identification method (SVM-ID) for encrypted traffic and non encrypted traffic is proposed.And, the SVM-ID method uses the feature space ΦSVM={h0,h1,h2,h3,perror} as the input.The SVM-ID method is compared with the relative entropy method.The experimental results show that the proposed method can not only identify the encrypted traffic well, but also distinguish the encrypted traffic and the non-encrypted compressed file traffic.
      作者: 程光 [1] 陈玉祥 [2]
      Author: Cheng Guang [1] Chen Yuxiang [2]
      作者单位: 东南大学计算机科学与工程学院,南京,211189 东南大学教育部计算机网络与信息集成重点实验室,南京,211189
      年,卷(期): 2017, 47(4)
      分类号: TP393.4
      在线出版日期: 2017年8月15日
      基金项目: 国家高技术研究发展计划(863计划)资助项目,国家自然科学基金资助项目,中兴通讯研究基金资助项目、软件新技术与产业化协同创新中心资助项目