Towards User Profiling for Web Recommendation
- 格式:pdf
- 大小:240.62 KB
- 文档页数:10
人工智能推荐系统个性化推荐的关键技术人工智能(Artificial Intelligence,简称AI)已经在当今社会发挥着重要的作用,尤其是在推荐系统领域。
随着互联网的迅猛发展和信息爆炸式增长,人们越来越需要个性化的推荐服务来帮助他们过滤和选择信息。
个性化推荐系统的关键技术正是人工智能的重要应用之一。
在本文中,我们将探讨个性化推荐系统所涉及的关键技术,并介绍它们在现实生活中的应用。
一、数据收集和预处理技术个性化推荐系统的核心在于通过分析用户的历史行为和兴趣来预测用户的兴趣和需求。
因此,数据收集和预处理技术是个性化推荐系统的关键。
常见的数据收集方式包括用户行为记录、用户个人信息和社交网络等。
预处理技术主要包括数据清洗、数据集成和特征提取等。
通过数据收集和预处理,个性化推荐系统可以建立起用户画像,以准确理解用户的兴趣和需求。
二、协同过滤技术协同过滤是个性化推荐系统中常用的一种方法。
它基于用户与项目之间的相似性来推荐用户可能感兴趣的项目。
协同过滤技术可以分为基于用户的协同过滤和基于项目的协同过滤两种。
基于用户的协同过滤通过比较用户之间的兴趣相似度来进行推荐,而基于项目的协同过滤则是通过比较项目之间的相似度来进行推荐。
协同过滤技术的关键在于相似度的计算方法和推荐结果的评估方法。
三、深度学习技术深度学习技术是人工智能领域的热门技术之一,也在个性化推荐系统中得到了广泛应用。
深度学习技术通过构建深层神经网络模型,可以自动学习用户和项目之间的复杂关系。
通过深度学习技术,个性化推荐系统可以更好地理解用户的兴趣和需求,并为用户提供更加准确和个性化的推荐结果。
然而,深度学习技术在计算资源和数据规模方面的要求较高,需要充分考虑系统的可扩展性和稳定性。
四、推荐算法的融合与优化个性化推荐系统不同的算法有不同的优势和适用场景。
因此,推荐算法的融合与优化也是个性化推荐系统的关键技术之一。
通过将多个推荐算法融合在一起,可以充分利用它们的优势,提高推荐结果的准确性和个性化程度。
基于用户兴趣的个性化推荐算法研究孙克雷, 陈安东(安徽理工大学 计算机科学与工程学院, 安徽 淮南 232001)摘 要:针对协同过滤算法存在用户兴趣不易捕捉的问题,提出了一种基于用户兴趣偏移和项目自身属性特征的个性化推荐算法。
利用滑动时间窗内项目属性和用户评分建立出用户兴趣偏爱因子,通过推荐项目自身属性特征给出用户对项目的偏爱度;最后结合项目偏爱度和协同过滤算法中预测评分产生推荐。
实验结果表明,该算法准确反映出用户兴趣的偏移和项目自身属性特征,在推荐质量上也得到提高。
关键词:用户兴趣;协同过滤;时间窗;个性化推荐中图分类号:TP391 文献标识码:A 文章编号:2095-8382(2017)01-065-05Research on Personalized Recommendation Algorithm Based on User InterestSUN Kelei, CHEN Andong(School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China)Abstract :Aiming at the problem that user's interest is not easy to capture by the collaborative filtering algorithm, a personalized recommendation algorithm based on the changes of users’ interest and the self-characteristic of items is proposed. The interest preference factors of users are established by using items attributes and user rating within the sliding time windows. Then the items preference degrees of user are given by the characteristic of recommended items themselves. Finally, the recommendation is produced according to the preference degree of items and the predictive score in the collaborative filtering algorithm. Experimental results show that the proposed algorithm can accurately reflect the changes of users’ interest and the attribute of items themselves. In the meanwhile, the quality of recommendation is improved compared to the classical UserCF method.Keywords :User interest; collaborative filtering; time window; personalized recommendation随着Web2.0时代的来临,数据信息量呈爆炸式增长,难以从海量信息中找到感兴趣数据。
Key FeaturesA new frame of mind.No other full frame, interchangeable-lens camera is this light or this portable. 24.3 MP of rich detail. A true-to-life 2.4 million dot OLED viewfinder. Wi-Fi sharing and an expandable shoe system. It’s all the full-frame performance you ever wanted in a compact size that will change your perspective entirely.World’s smallest lightest interchangeable lens full-frame cameraSony’s Exmor image sensor takes full advantage of the Full-frame format, but in a camera body less than half the size and weight of a full-frame DSLR.Full Frame 24.3 MP resolution with 14-bit RAW outputA whole new world of high-quality images are realized through the 24.3 MP effective 35 mm full-frame sensor, a normal sensor range of ISO 100 – 25600, and a sophisticated balance of high resolving power, gradation and low noise. The BIONZ® X image processor enables up to 5 fps high-speed continuous shooting and 14-bit RAW image data recording.Fast Hybrid AF w/ phase-detection for DSLR-like focusing speedEnhanced Fast Hybrid auto focus combines speedy phase-detection AF with highly accurate contrast-detection AF , which has been accelerated through a new Spatial Object Detection algorithm, to achieve among the fastest autofocusing performance of any full-frame camera. First, phase-detection AF with 117 densely placed phase-detection AF points swiftly and efficiently moves the lens to bring the subject nearly into focus. Then contrast-detection AF with wide AF coverage fine-tunes the focusing in the blink of an eye.Fast Intelligent AF for responsive, accurate, and greater operability with full frame sensorThe high-speed image processing engine and improved algorithms combine with optimized image sensor read-out speed to achieve ultra high-speed AF despite the use of a full-frame sensor.New Eye AF controlEven when capturing a subject partially turned away from the camera with a shallow depth of field, the face will be sharply focused thanks to extremely accurate eye detection that can prioritize a single pupil. A green frame appears over the prioritized eye when focus has been achieved for easy confirmation. Eye AF can be used when the function is assigned to a customizable button, allowing users to instantly activate it depending on the scene.Fully compatible with Sony’s E-mount lens system and new full-frame lensesTo take advantage of the lightweight on-the-go body, the α7 is fully compatible with Sony’s E-mount lens system and expanded line of E-mount compact and lightweight full-frame lenses from Carl Zeiss and Sony’s premier G-series.Direct access interface for fast, intuitive shooting controlQuick Navi Pro displays all major shooting options on the LCD screen so you can rapidly confirm settings and make adjustments as desired without searching through dedicated menus. When fleeting shooting opportunities arise, you’ll be able to respond swiftly with just the right settings.High contrast 2.4M dot OLED EVF for eye-level framingView every scene in rich detail with the XGA OLED Tru-Finder, which features OLED improvements and the same 3-lens optical system used in the flagship α99. The viewfinder faithfully displays what will appear in your recording, including the effects of your camera settings, so you can accurately monitor the results. You’ll enjoy rich tonal gradations and 3 times the contrast of the α99. High-end features like 100% frame coverage and a wide viewing angle are also provided.3.0" 1.23M dot LCD tilts for high and low angle framingILCE-7K/Ba7 (Alpha 7) Interchangeable Lens CameraNo other full frame, interchangeable-lens camera is this light or this portable. 24.3 MP of rich detail. A true-to-life 2.4 million dot OLED viewfinder. Wi-Fi ® sharing and an expandable shoe system. It’s all the full-frame performance you ever wanted in a compact size that will change your perspective entirely.The tiltable 3.0” (1,229k dots) Xtra Fine™ LCD Display makes it easy to photograph over crowds or low to capture pets eye to eye by swinging up approx. 84° and down approx. 45°. Easily scroll through menus and preview life thanks to WhiteMagic™ technology that dramatically increases visibility in bright daylight. The large display delivers brilliant-quality still images and movies while enabling easy focusing operation.Simple connectivity to smartphones via Wi-Fi® or NFCConnectivity with smartphones for One-touch sharing/One-touch remote has been simplified with Wi-Fi®/NFC control. In addition to Wi-Fi support for connecting to smartphones, the α7 also supports NFC (near field communication) providing “one touch connection” convenience when transferring images to Android™ smartphones and tablets. Users need only touch devices to connect; no complex set-up is required. Moreover, when using Smart Remote Control — a feature that allows shutter release to be controlled by a smartphone — connection to the smartphone can be established by simply touching compatible devices.New BIONZ X image processing engineSony proudly introduces the new BIONZ X image processing engine, which faithfully reproduces textures and details in real time, as seen by the naked eye, via extra high-speed processing capabilities. Together with front-end LSI (large scale integration) that accelerates processing in the earliest stages, it enables more natural details, more realistic images, richer tonal gradations and lower noise whether you shoot still images or movies.Full HD movie at 24p/60i/60p w/uncompressed HDMI outputCapture Full 1920 x 1080 HD uncompressed clean-screen video files to external recording devices via an HDMI® connection in 60p and 60i frame-rates. Selectable in-camera A VCHD™ codec frames rates include super-smooth 60p, standard 60i or cinematic 24p. MP4 codec is also available for smaller files for easier upload to the web.Up to 5 fps shooting to capture the decisive momentWhen your subject is moving fast, you can capture the decisive moment with clarity and precision by shooting at speeds up to 5 frames per second. New faster, more accurate AF tracking, made possible by Fast Hybrid AF, uses powerful predictive algorithms and subject recognition technology to track every move with greater speed and precision. PlayMemories™ Camera Apps allows feature upgradesPersonalize your camera by adding new features of your choice with PlayMemories Camera Apps. Find apps to fit your shooting style from portraits, detailed close-ups, sports, time lapse, motion shot and much more. Use apps that shoot, share and save photos using Wi-Fi that make it easy to control and view your camera from smartphone, and post photos directly to Facebook or backup images to the cloud without connecting to a computer.114K Still image output by HDMI8 or Wi-Fi for viewing on 4K TVsEnjoy Ultra High Definition slide shows directly from the camera to a compatible 4K television. The α7 converts images for optimized 4K image size playback (8MP). Enjoy expressive rich colors and amazing detail like never before. Images can be viewed via an optional HDMI or WiFi.Vertical Grip CapableEnjoy long hours of comfortable operation in the vertical orientation with this sure vertical grip, which can hold two batteries for longer shooting and features dust and moisture protection.Mount AdaptorsBoth of these 35mm full-frame compatible adaptors let you mount the α7R with any A-mount lens. The LA-EA4 additionally features a built-in AF motor, aperture-drive mechanism and Translucent Mirror Technology to enable continuous phase-detection AF. Both adaptors also feature a tripod hole that allows mounting of a tripod to support large A-mount lenses.Specifications1. Among interchangeable-lens cameras with an full frame sensor as of October 20132. Records in up to 29 minute segments.3. 99 points when an APS-C lens compatible with Fast Hybrid AF is mounted.7. Actual performance varies based on settings, environmental conditions, and usage. Battery capacity decreases over time and use.8. Requires compatible BRA VIA HDTV and cable sold separately.9. Auto Focus function available with Sony E-Mount lenses and Sony A-mount SSM and SAM series lenses when using LA-EA2/EA4 lens adaptor.。
第 22卷第 3期2023年 3月Vol.22 No.3Mar.2023软件导刊Software Guide融合注意力机制的知识图谱推荐模型李君,倪晓军(南京邮电大学计算机学院、软件学院、网络空间安全学院,江苏南京 210000)摘要:知识图谱在推荐领域得到了广泛关注,通常被用来作为辅助信息嵌入到推荐模型中,以更好地缓解传统推荐算法数据稀疏和冷启动问题。
但是部分模型的输入向量较为稀疏,也没有充分挖掘用户与物品之间的特征交互,进而影响模型性能。
因此,提出一种基于 FGCNN 与 MKR 的融合注意力机制的知识图谱推荐模型(BAKR)。
首先,利用 FGCNN 的 Feature Generation 模块提取用户和物品的特征向量;其次,使用知识图谱获取实体之间的依赖关系,将隐含的辅助信息嵌入到模型中,再通过注意力机制重新分配用户的偏好权重值,进而更好地协助推荐任务,提高推荐性能;最后,在 MovieLens-1M 数据集和Book-Crossing数据集上进行仿真实验。
结果证明,该模型可显著提升推荐的准确率。
关键词:推荐模型;知识图谱;注意力机制DOI:10.11907/rjdk.222429开放科学(资源服务)标识码(OSID):中图分类号:TP391.3 文献标识码:A文章编号:1672-7800(2023)003-0118-07Knowledge Graph Recommendation Model Integrating Attention MechanismLI Jun, NI Xiao-jun(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210000, China)Abstract:Knowledge graph has received extensive attention in the field of recommendation, and it is often used as auxiliary information to be embedded in recommendation models to better alleviate the data sparsity and cold start problems of traditional recommendation algorithms. However, the input vector of some models is relatively sparse, and the feature interaction between users and items is not fully explored, which makes the representation between users and items less accurate and affects the performance of the model. Therefore, a knowledge graph recom⁃mendation model (BAKR) based on the fusion attention mechanism of FGCNN and MKR is proposed. First, FGCNN′s Feature Generation module is used to extract feature vectors of users and items. Secondly, the knowledge graph is used to obtain the dependencies between enti⁃ties, embed the implied auxiliary information into the model, and then redistribute the user′s preference weight value through the attention mechanism to better assist the recommendation task and improve the recommendation performance. Finally, simulation experiments are car⁃ried out on the MovieLens-1M and Book-Crossing dataset, and the experimental results show that the accuracy of the model for the recommen⁃dation effect is significantly improved.Key Words:recommendation system; knowledge graph; attention mechanism0 引言随着信息化社会的发展,其产生的数据量进一步爆炸式增长[1],人们所面临的问题不再是信息匮乏,而是如何从海量数据中获取用户需要的信息(如商品、电影、书籍等)。
Computer Science and Application 计算机科学与应用, 2023, 13(4), 764-772 Published Online April 2023 in Hans. https:///journal/csa https:///10.12677/csa.2023.134075基于深度学习的垃圾邮件检测俞荧妹,禹素萍,许武军,范 红东华大学信息科学与技术学院,上海收稿日期:2023年3月17日;录用日期:2023年4月14日;发布日期:2023年4月21日摘要邮件是日常生活中的一种通讯工具,但垃圾邮件对用户造成严重困扰,因此改进垃圾邮件识别技术、提升其准确率和效率具有重要现实意义。
在文本分类领域,深度学习有很好的应用效果。
故文章提出了一种基于CNN 的BiGRU-Attention 模型,旨在充分利用CNN 的特征提取能力和BiGRU 的全局特征提取能力。
引入注意力机制能够突出显示重要文本,前后共经过两层双向门控循环单元,从而更全面地提取邮件文本特征。
实验数据选取Trec06c 数据集,并与其他分类模型对比,结果表明,检测准确率达到91.56%。
关键词垃圾邮件,文本分类,深度学习,双向门控循环单元,注意力机制Spam Detection Based on Deep LearningYingmei Yu, Suping Yu, Wujun Xu, Hong FanCollege of Information Science and Technology, Donghua University, ShanghaiReceived: Mar. 17th , 2023; accepted: Apr. 14th , 2023; published: Apr. 21st, 2023AbstractEmail is a communication tool in daily life, but spam has caused serious problems for users, As a re-sult, it is crucial to improve spam identification technology and improve its accuracy and efficien-cy. In the field of text classification, deep learning has a good application effect. In order to fully util-ize CNN’s feature extraction capabilities and BiGRU’s global feature extraction capabilities, this ar-ticle suggests a CNN-based BiGRU-Attention model. The introduction of the attention mechanism can highlight important text, which passes through two layers of two-way gated loop units before and after, so as to extract more comprehensive features of email text. The experimental data is selected from Trec06c dataset and compared with other classification models. The results show that the de-tection accuracy reaches 91.56%.俞荧妹等KeywordsSpam, Text Classification, Deep Learning, BiGRU, Attention MechanismCopyright © 2023 by author(s) and Hans Publishers Inc.This work is licensed under the Creative Commons Attribution International License (CC BY 4.0)./licenses/by/4.0/1. 引言当今社会,互联网的快速发展使得电子邮件在人们的日常生活中发挥了很大的功能,既可以提高工作效率、节约成本,又可以促进人们之间的交流和沟通。
DATA SHEETFortiManagerAutomation-Driven Centralized Management Manage all your Fortinet devices in a single-console central management system. FortiManager provides full visibility of your network, offering streamlined provisioning and innovative automation tools.Integrated with Fortinet’s Security Fabric , the security architecture and FortiManager’s Automation Driven Network Operations capabilities provide a foundation to secure and optimize network security , such as provisioning and monitoring SD-WANs.Orchestrate security devices and systems on-premise or in the cloud to streamline network provisioning, security policy updates & change management.Automate your time-intensive processes and accelerate workflows to offload NOC-SOC, reduce administrative tasks and address talent shortages.Optimize Visibility to the entire digital attack surface and awareness of increasing cyber threats from one centralized location, through accurate detection, automated correlation and rapid response features.§ § § § § § §DATA SHEET | FortiManager2HighlightsSingle Pane Automation and OrchestrationFortinet Security Fabric delivers sophisticated security management for unified, end-to-end protection. Deploying Fortinet-based security infrastructure to battle advanced threats, and adding FortiManager to provide single-pane-of-glass management across your entire extended enterprise provides insight into network-wide traffic and threats.FortiManager offers enterprise-class features to contain advanced threats. FortiManager also delivers the industry’s best scalability to manage up to 100,000 Fortinet devices. FortiManager, coupled with the FortiAnalyzer family of centralized logging and reporting appliances,provides a comprehensive and powerful centralized management solution for your organization.Centralized SD-WAN Deployment & MonitoringPowerful SD-WAN management capabilities by using templates. Enhanced SD-WAN monitoring for each SD-WAN link member with visibility of link status, application performance, bandwidth utilization. The SLA targets are included in performance monitoring graphs for each WAN provider.Configuration and Settings ManagementCollectively configure the device settings - using the provisioning templates and advance CLI templates improves management of a large number of devices. Automatic device configuration backup with revision control and change audit make it easier for daily administrative tasks.Central Management of Network InfrastructureCentrally manage FortiGate , FortiSwitch, FortiExtender, FortiAP . The VPN manager simplifies the deployment and allows centrally-provisioned VPN community and monitoring of VPN connections on Google Map. FortiAP Manager allows configuring, deploying and monitoring FortiAPs from a single console with Google Map view. The FortiClient Manager allows centralized configuration, deployment and monitoring of FortiClients.Multi-Tenancy & Role Based AdministrationFortiManager equips admins with granular device and role based administration for deploying multi-tenancy architecture to large enterprises, with a hierarchical objects database to facilitate re-use of common configurations and serve multiple customers. The graphical interface makes it easy to view, create, clone and manage ADOMs. You can use ADOMs to manage independent security environments, each ADOM with its own security policies and configuration database. FortiManager enables you to group devices logically or geographically for flexible management, and the zero-touch deployment uses templates to provision devices for quick mass deployment and supports firmware version enforcement. Define global objects such as Firewall Objects, Policies and Security Profiles to share across multiple ADOMs. Granular permissions allow assigning ADOMs, devices and policies to users based on role and duties.API for Automation and OrchestrationRESTful API allows MSSPs/large enterprises to create customized, branded web portals for policy and object administration. Automate common tasks such as provisioning new FortiGates and configuring existing devices. Join Fortinet Developer Network (FNDN) to access exclusive articles, how-to content for automation and customization, community-built tools, scripts and sample code.Security Policy ManagementA set of commonly used security policies can be now grouped in a Policy Block and inserted as needed in different Policy Packages.Global policy feature that allows companies such as: Telecom, MSSP , SAAS providers applies a header and/or footer policy at the ADOM level to all the policy packages or to a selection of packages, as needed.DATA SHEET | FortiManagerHighlightsFortiManager VMFortinet offers the FortiManager VM in a stackable license model. This model allows you to expand your VM solution as your environment expands. Utilizing virtualization technology, FortiManager-VM is a software-based version of the FortiManager hardware appliance and is designed to run on many virtualization platforms. It offers all the features of the FortiManager hardware appliance.The FortiManager virtual appliance family minimizes the effort required to monitor and maintain acceptable use policies, as well as identify attack patterns that can be used to fine tune the security policy, thwarting future attackers.SpecificationsFMG-VM-10-UG FMG-VM-100-UG FMG-VM-1000-UG FMG-VM-5000-UG10 +100 +1,000 +5,000 +200 GB 1 TB 4 TB8 TB251025VMware ESX/ESXi 5.0/5.1/5.5/6.0/6.5/6.7, Microsoft Hyper-V 2008 R2/2012/2012 R2/2016, Citrix XenServer 6.0+ and Open SourceXen 4.1+, KVM on Redhat 6.5+ and Ubuntu 17.04, Nutanix AHV (AOS 5.10.5), Amazon Web Services (AWS), Microsoft Azure, GoogleCloud (GCP), Oracle Cloud Infrastructure (OCI), Alibaba Cloud (AliCloud)vCPU Support (Minimum / Maximum) 2 / UnlimitedNetwork Interface Support (Min / Max) 1 / 4Integration & Security FabricIntegration with ITSM to mitigate security events and applyconfiguration changes and policy updates. Seamless integrationwith FortiAnalyzer appliances provides in-depth discovery, analysis,prioritization and reporting of network security events. Create fabricconnectors to facilitate connections with third-party vendors viapxGrid , OCI, ESXi and others, to share and exchange data.FortiManager’s workflow for audit and compliance enables youto review, approve and audit policy changes from a central place,including automated processes to facilitate policy compliance,policy lifecycle management, and enforced workflow to reduce riskfor policy changes.Monitor and Report for Deep VisibilityAccess vital security and network statistics, as well as real-timemonitoring and integrated reporting provides visibility into networkand user activity. For more powerful analytics, combine with aFortiAnalyzer appliance for additional data mining and graphicalreporting capabilities.Network & Security Operations VisibilityAutomated data exchanges between security (SOC) workflows andoperational (NOC) workflows, creating a single, complete workflowthat not only saves time, but also provides the capacity to completeadditional incident response activities. FortiManager’s NOC-SOCdelivers advanced data visualization to help Analysts quicklyconnect dots and identify threats, simplifying how organizationsdeliver security and remediate breaches, data exfiltration, andcompromised hosts.DATA SHEET | FortiManager4Safety CertificationscUL, CB CE, BSMI, KC, UL/cUL, CB, GOST FCC Part 15 Class A, C-Tick, VCCI, CE, UL/cUL, CBSpecifications1 Each Virtual Domain (VDOM) operating on a physical or virtual device counts as one (1) licensed network device. Global Policies and high availability support available on all models* Optional redundant AC power supply, not includedDATA SHEET | FortiManager5FMG-2000EFMG-3000FSafety CertificationscUL, CBCE, BSMI, KC, UL/cUL, CB, GOSTcUL, CB, GOSTSpecifications1 Each Virtual Domain (VDOM) operating on a physical or virtual device counts as one (1) licensed network device Global Policies and high availability support available on all models. 4 + Indicates Device Add-On License availableDATA SHEET | FortiManagerOrder InformationProduct SKU DescriptionFortiManager FMG-200F Centralized management, log and analysis appliance — 2xRJ45 GE, 2xSFP, 8 TB storage, up to 30x Fortinet devices/virtual domains.FMG-300F Centralized management, log and analysis appliance — 4x GE RJ45, 2xSFP, 16 TB storage, up to 100x Fortinet devices/virtual domains.FMG-1000F Centralized management, log and analysis appliance — 2x RJ45 10G, 2x SFP+ slots, 32 TB storage, up to 1000x Fortinet devices/virtual domains.FMG-2000E Centralized management, log and analysis appliance — 4x GE RJ45, 2x 10 GE SFP+ slots, 36 TB storage, dual power supplies, manages up to 1,200Fortinet devices/virtual domains.FMG-3000F Centralized management, log and analysis appliance — 4x GE RJ45, 2x 10 GE SFP+ slots, 48 TB storage, dual power supplies, manages up to 4,000Fortinet devices/virtual domains.FMG-3700F Centralized management, log and analysis appliance — 2x10GbE SFP+, 2x1GbE RJ-45 slots, 240 TB storage, dual power supplies, manages up to 10,000Fortinet devices/virtual domains.FortiManager Device Upgrade FMG-DEV-100-UG FortiManager device upgrade license for adding 100 Fortinet devices/VDOMs (3000 series and above - hardware only)FortiManager VM Built-in Evaluation Built-in 15-day EVAL license, no activation required.Full Evaluation (60-days)EVAL license. License and activation required.FMG-VM-Base Base license for stackable FortiManager-VM. Manages up to 10 Fortinet devices/Virtual Domains, 1 GB/Day of Logs and 100 GB storage capacity. Designedfor all supported platforms.FMG-VM-10-UG Upgrade license for adding 10 Fortinet devices/Virtual Domains; allows for total of 2 GB/Day of Logs and 200 GB storage capacity.FMG-VM-100-UG Upgrade license for adding 100 Fortinet devices/Virtual Domains; allows for total of 5 GB/Day of Logs and 1 TB storage capacity.FMG-VM-1000-UG Upgrade license for adding 1,000 Fortinet devices/Virtual Domains; allows for total of 10 GB/Day of Logs and 4 TB storage capacity.FMG-VM-5000-UG Upgrade license for adding 5,000 Fortinet devices/Virtual Domains; allows for total of 25 GB/Day of Logs and 8 TB storage capacity.Additional FortiManager Items FC-10-FDN1-139-02-12 1 Year Subscription Renewal for 1 User to Fortinet Developer NetworkFC-10-FDN2-139-02-12 1 Year Subscription for Unlimited Users to Fortinet Developer NetworkFMG-SDNS License to operate FortiManager as a dedicated Secure DNS server appliance (3000 series and above – hardware only) Copyright © 2019 Fortinet, Inc. All rights reserved. Fortinet®, FortiGate®, FortiCare® and FortiGuard®, and certain other marks are registered trademarks of Fortinet, Inc., and other Fortinet names herein may also be registered and/or common law trademarks of Fortinet. All other product or company names may be trademarks of their respective owners. Performance and other metrics contained herein were attained in internal lab tests under ideal conditions, and actual performance and other results may vary. Network variables, different network environments and other conditions may affect performance results. Nothing herein represents any binding commitment by Fortinet, and Fortinet disclaims all warranties, whether express or implied, except to the extent Fortinet enters a binding written contract, signed by Fortinet’s General Counsel, with a purchaser that expressly warrants that the identified product will perform according to certain expressly-identified performance metrics and, in such event, only the specific performance metrics expressly identified in such binding written contract shall be binding on Fortinet. For absolute clarity, any such warranty will be limited to performance in the same ideal conditions as in Fortinet’s internal lab tests. Fortinet disclaims in full any covenants, representations, and guarantees pursuant hereto, whether express or implied. Fortinet reserves the right to change, modify, transfer, or otherwise revise this publication without notice, and the most current version of the publication shall be applicable. Fortinet disclaims in full any covenants, representations, and guarantees pursuant hereto, whether express or implied. Fortinet reserves the right to change, modify, transfer, or otherwise revise this publication without notice, and the most current version of the publication shall be applicable.FST-PROD-DS-FMG FMG-DAT-R47-201908。
Deep Web数据库的语义高斯边缘化索引路径优选郭红涛【期刊名称】《科技通报》【年(卷),期】2015(31)6【摘要】数据库的索引路径优化选择是实现对Deep Web数据库的深度访问和安全访问的关键。
传统方法中对Deep Web数据库的路径选择采用关键字搜索方法,根据关键字罗列出所有可能复合信息的数据,当出现歧义特征时,数据索引准确度不高。
提出一种基于语义高斯边缘化的数据库索引路径选择方法。
构建Deep Web 数据库的特征模型,计算节点与关键词的匹配度,得到高斯边缘化路径控制目标函数,把语义相似度分解为用户查询意图的相关性指向函数,实现高斯边缘化路径控制。
将数据库的预测控制指令输入序列进行变量耦合加权,与邻阶跨层链路进行均衡处理,设置语义高斯边缘化索引复激活函数,提高对数据库的索引性能,实现路径优选。
仿真结果表明,该算法能提高数据库的查准率,减少查询时间,实现对Deep Web数据库高效安全访问。
【总页数】3页(P73-75)【关键词】数据库;语义;索引;查询【作者】郭红涛【作者单位】华北水利水电大学软件学院【正文语种】中文【中图分类】TP311【相关文献】1.基于语义Web Service的Deep Web动态竞争情报采集 [J], 刘高勇;汪会玲;吴金红2.基于本体的 Deep Web 查询接口语义扩展 [J], 田祎;刘爱军;李巍3.基于潜在语义分析的Deep Web查询接口聚类研究 [J], 强保华;李巍;邹显春;汪天天;吴春明4.基于语义相似度计算的Deep Web数据库查询 [J], 夏海峰;陈军华5.基于潜在语义分析的Deep Web查询接口匹配研究 [J], 苟和平;景永霞;刘强;吴多智因版权原因,仅展示原文概要,查看原文内容请购买。
FAQFortiWeb: Web Application Firewall (WAF) Comprehensive, High-Performance Web Application SecurityCan’t an IPS or Firewall provide protection for hosted web-based applications?Next Generation and Application Aware IPS firewalls extend and enhance protection and add additional functionality but the majority ofthe ‘application aware’ functionality is focused on securing/restricting internal clients when accessing the internet but not securing internal applications from external threats. Web Application Firewalls are different as they protect internal web applications from sophisticated application layer external attacks. They provide both a positive and negative security model and protect against the major threats to applications today (SQL Injection, Cross Site Scripting, URL Access, CSRF, Injection attacks and more).Why is FortiWeb’s AI-based Machine Learning threat detection superior to other threat detection methods?Other vendors use application learning using an observational method to automate profile creation for web-based application protection. Application learning is a good detection method, but it has many drawbacks. These include:n high false-positive detectionsnnn labor-intensive to fine tunenn unobserved legitimate traffic creates anomaliesn aggressive tuning lets attacks slip through more easilynnn changes to the application require substantial re-learning to prevent false-positive detectionsFortiWeb’s behavioral detection uses two layers of AI-based machine learning and statistical probabilities to detect anomalies and threats separately. With machine learning FortiWeb is able to deliver near 100% application threat detection accuracy with virtually no resources required to manage it. AI-based machine learning for FortiWeb creates nearly a “set and forget” web application firewall that doesn’t sacrifice accuracy for ease of management.What size WAF do I need?There are many factors that determine WAF sizing ranging from application throughput, numbers of users, and number of sites to be protected. We strongly recommend discussing your requirements with a Fortinet Partner to find the best option to meet your needs.How does FortiWeb Cloud differ from an on-prem FortiWeb deployment?FortiWeb Cloud is a ‘skinny’ WAF solution offering negative security model rules while the FortiWeb platform is a full blown WAF offering both positive and negative security models. Most customers using a Cloud WAF are looking for a set-it-and-forget type solution that they can quickly configure and use without having to manage daily. By offering a subset of what FortiWeb on-prem offers but with a simple, straightforward configuration and management FortiWeb Cloud addresses these requirements.Do I need a WAF if I already have a Secure Web Gateway (SWG)?Yes. A SWG protects users within the organization from accessing infected external websites or undesirable content hosted outside of the organization. A WAF protects hosted web-based applications from attacks that are initiated by external attackers. A simplified view is the SWGs protect users and WAFs protect applications.1Copyright © 2019 Fortinet, Inc. All rights reserved. Fortinet , FortiGate , FortiCare and FortiGuard , and certain other marks are registered trademarks of Fortinet, Inc., and other Fortinet names herein may also be registered and/or common law trademarks of Fortinet. All other product or company names may be trademarks of their respective owners. Performance and other metrics contained herein were attained in internal lab tests under ideal conditions, and actual performance and other results may vary. Network variables, different network environments and other conditions may affect performance results. Nothing herein represents any binding commitment by Fortinet, and Fortinet disclaims all warranties, whether express or implied, except to the extent Fortinet enters a binding written contract, signed by Fortinet’s General Counsel, with a purchaser that expressly warrants that the identified product will perform according to certain expressly-identified performance metrics and, in such event, only the specific performance metrics expressly identified in such binding written contract shall be binding on Fortinet. For absolute clarity, any such warranty will be limited to performance in the same ideal conditions as in Fortinet’s internal lab tests. Fortinet disclaims in full any covenants, representations, and guarantees pursuant hereto, whether express or implied. Fortinet reserves the right to change, modify, transfer, or otherwise revise this publication without notice, and the most current version of the publication shall be applicable. Fortinet disclaims in full any covenants, representations, and guarantees pursuant hereto, whether express or implied. Fortinet reserves the right to change, modify, transfer, or otherwise revise this publication without notice, and the most current version of the publication shall be February 19, 2019 9:51 AM Mac:Users:susiehwang:Desktop:Egnyte:Egnyte:Shared:Creative Services:Team:Susie-Hwang:Egnyte:Shared:CREATIVE SERVICES:Team:Susie-Hwang:2019:FAQ-FortiWeb:FAQ-FortiWeb-021919-950am FAQ | FortiWeb: Web Application Firewall (WAF)358956-0-0-EN FortiWeb WAF vs. WAF in an ADCA dedicated WAF appliance will not decrease performance, plus an appliance like FortiWeb has the processing power to perform behavior-based detection of application attacks. Most WAF modules on ADCs offer only basic WAF protection for applications.Can a FortiWeb permanently patch application vulnerabilities?Yes it can. FortiWeb can provide temporary application patching until development teams are able deploy permanent patches forvulnerabilities, or it can permanently patch them. It is usually recommended to permanently fix a known vulnerability, however there are many situations where that isn’t possible or practical, such as inherited applications or older applications that are about to be retired.。
贝叶斯超参数优化是一种用于自动调整机器学习模型超参数的优化技术。
它使用贝叶斯概率理论来估计超参数的最佳值,以优化模型的性能。
多层感知器(MLP)是一种常用的神经网络模型,由多个隐藏层组成,每个层包含多个神经元。
MLP可以用于分类、回归等多种任务。
当使用贝叶斯超参数优化来调整MLP的超参数时,通常会选择一些常见的超参数,如学习率、批量大小、迭代次数等。
贝叶斯优化器会根据这些超参数的性能,选择下一个可能的最佳值。
它通过在每个步骤中随机选择少量的超参数组合,而不是搜索每个可能的组合,来提高效率。
在实践中,贝叶斯超参数优化通常使用一种称为高斯过程回归(Gaussian Process Regression)的方法,该方法可以估计每个超参数的可能值以及它们的概率分布。
然后,根据这些信息选择下一个超参数的值,以最大化模型性能的预期改善。
使用贝叶斯超参数优化可以自动调整超参数,避免了手动调整的困难和耗时。
此外,它还可以帮助找到更好的超参数组合,从而提高模型的性能和准确性。
这对于机器学习任务的实验和开发非常重要,因为它可以帮助快速找到最佳的模型配置。
autoformer模型大白话解读-回复什么是autoformer模型?AutoFormer模型是一种自动化神经网络模型,它是对Transformer模型的改进和扩展。
Transformer模型由于其在自然语言处理任务中取得的卓越性能而被广泛应用,但是它的训练和调整参数过程较为耗时,需要大量的计算资源。
AutoFormer模型通过自动搜索技术,能够自动化地调整和优化Transformer模型的参数,从而提高模型的性能和效率。
AutoFormer模型是由一篇名为《AutoFormer: Searching Transformers for Visual Recognition》的论文提出的。
该论文的作者通过自动搜索算法,从Transformer模型中选择出最佳结构和超参数,组合成一个新的AutoFormer模型。
AutoFormer模型保留了Transformer模型中的自注意力机制和多头注意力机制,但对其进行了改进,使其更加适用于图像和视频识别任务。
如何进行autoformer模型的自动化搜索?AutoFormer模型通过自动化搜索算法来确定模型的架构和超参数。
具体来说,它采用了一种基于强化学习的方法,称为强化搜索算法(ReinforcementSearch)。
这种方法通过在一个搜索空间中不断尝试和评估不同的模型结构和超参数组合,来找到最佳的组合。
在AutoFormer模型中,搜索空间包含了各种可能的层之间的连接方式、注意力机制的类型、网络深度和宽度等参数。
通过强化搜索算法,模型会自动尝试不同的参数组合,并根据任务的表现来评估每个组合的性能。
逐步地,模型会学习到哪些参数组合最适合当前任务,并通过不断地优化和更新参数,逐渐找到最佳的模型结构和超参数。
为什么autoformer模型能够取得较好的表现?AutoFormer模型相比于传统的手动调整参数的方法,有以下几个优势:首先,AutoFormer模型利用了自动化搜索算法,能够自动地找到最佳的参数组合。
S. Zhang and R. Jarvis (Eds.): AI 2005, LNAI 3809, pp. 415 – 424, 2005.© Springer-Verlag Berlin Heidelberg 2005Towards User Profiling for Web RecommendationGuandong Xu 1, Yanchun Zhang 1, and Xiaofang Zhou 21School of Computer Science and Mathematics,Victoria University, PO Box 14428, VIC 8001, Australia {xu,yzhang}@.au 2 School of Information Technology & Electrical Engineering,University of Queensland, Brisbane QLD 4072, Australia zxf@.au Abstract. Collaborative recommendation is one of widely used recommenda-tion systems, which recommend items to visitor on a basis of referring other’spreference that is similar to current user. User profiling technique upon Webtransaction data is able to capture such informative knowledge of user task orinterest. With the discovered usage pattern information, it is likely to recom-mend Web users more preferred content or customize the Web presentation tovisitors via collaborative recommendation. In addition, it is helpful to identifythe underlying relationships among Web users, items as well as latent tasks dur-ing Web mining period. In this paper, we propose a Web recommendationframework based on user profiling technique. In this approach, we employProbabilistic Latent Semantic Analysis (PLSA) to model the co-occurrence ac-tivities and develop a modified k-means clustering algorithm to build user pro-files as the representatives of usage patterns. Moreover, the hidden task modelis derived by characterizing the meaningful latent factor space. With the dis-covered user profiles, we then choose the most matched profile, which pos-sesses the closely similar preference to current user and make collaborative rec-ommendation based on the corresponding page weights appeared in the selecteduser profile. The preliminary experimental results performed on real world datasets show that the proposed approach is capable of making recommendation ac-curately and efficiently.1 IntroductionIn recent years, the massive influx of information onto World Wide Web has facili-tated user, not only retrieving information, but also discovering knowledge. However, Web users usually suffer from the information overload problem due to the fact of significantly increasing and rapidly expanding growth in amount of information on the Web. One approach addressed to the information overload is the recommendation system, which aims to help users locate more needed or preferred information. Typi-cally, Web recommendation system focuses on the processes of identifying Web users or objects, collecting information with respect to users’ preference or interests as well as adapting its service to satisfy the users’ needs. In short, Web recommendation can be used to provide better quality service and application of Web to users during their browsing period.416 G. Xu, Y. Zhang, and X. ZhouTo-date, the problem of recommending appropriate items from data repository to users has been extensively studied and two paradigms named content-based filtering and collaborative filtering systems have emerged. Content-based filtering systems such as WebWatcher [8], try to recommend items that are similar to those visited by a given user in the past, whereas collaborative filtering systems intend to identify user category whose taste or preference is close enough to the given user and recommend items that are historically rated by them [6]. The former often utilizes traditional in-formation filtering and information retrieval methods, while the latter employs user correlation or nearest-neighbor algorithm. Especially, the collaborative filtering tech-nique has been gradually adopted in the context of Web recommendation applications and has achieved great success as well [5, 9] in recent years.Web usage mining technique, which exploits data mining methods, such as k-Nearest Neighbor algorithm (kNN) [5], Web user or page clustering [4, 11, 12], asso-ciation rule mining [1] and sequential pattern mining technique [2], to create model based on the analysis of usage data, has been used in building Web recommendation system recently. With the usage pattern knowledge discovered in Web usage mining process, Web recommendation system can generate usage-based user profiles as the representatives of the aggregate user behaviors for collaborative recommendation.As a result, a variety of research communities have addressed this topic and Web usage mining is becoming a potential approach for Web recommendation. To reveal the un-derlying relationships among Web objects, Latent Semantic Analysis (LSA) tech-nique has been incorporated into Web usage mining process. Some LSA-based algo-rithms are developed for Web recommendation [13, 14].In this paper, we propose a Web recommendation framework based on user profil-ing technique. The usage pattern knowledge, in the form of user profile derived from Web usage mining, is combined into Web recommendation system to improve the ef-ficiency of recommendation by predicting user-preferred content and customizing the presentation. During pattern discovery stage, probabilistic inference method based on Probabilistic Latent Semantic Analysis (PLSA) model, a variant of LSA, is exploited to model the underlying relationships among the co-occurrence activities and identify the latent task model in terms of latent semantic factor. Through Web user session clustering, we create user profiles as the representatives of usage patterns. To make Web recommendation, we match the current active user activity against such discov-ered patterns to find the most like-minded user category, in turn, determine the poten-tially interested pages as recommendation set based on the visited probabilities exhib-ited by such type of users. We demonstrate the effectiveness of the proposed technique through experiments performed on real world data sets. The evaluation re-sults show that the usage-based approach is more applicable in comparison with some traditional techniques.The rest of the paper is organized as follows. In section 2, we introduce the Web usage mining process, especially we focus on how to model Web co-occurrence ac-tivities based on PLSA. We present the algorithms for discovering usage-based user profiles and latent factors in section 3. In section 4, we propose the Web recommen-dation framework upon user profiling approach. We conduct preliminary experiments on two real world datasets, implement some comparisons against the traditional work in section 5, conclude and outline future work in section 6.Towards User Profiling for Web Recommendation 417 2 Usage-Based User Profiling with PLSAAs discussed above, Web recommendation is the ultimate goal of Web usage mining conducted on the data collected at the Web log servers of a specific Web site. This whole procedure usually consists of three steps, i.e. data collection and preprocessing, pattern mining as well as knowledge application. Figure 1 depicts the whole process.Fig. 1. The process of Web Mining and Web Recommendation2.1 Usage Data RepresentationPrior to introducing user profiling technique, we briefly discuss the issue with respect to construction of usage data. In general, the exhibited user access interests may be re-flected by the varying degrees of visits on different Web pages during one session. Thus, we can represent a user session as a weighted page vector visited by the user during a period. In this paper, we use the following notations to model the co-occurrence activities of Web users and pages:• {}12,,m S s s s = : a set of m user sessions.• {}12,,n P p p p = : a set of n Web pages.• For each user, the navigational session is represented as a sequence of visited pages with corresponding weights: {},1,2,,,i i i i n s a a a = , where ,i j a denotes the weight for page j p visited in i s user session. The corresponding weight is usually determined by the number of hit or the amount time spent on the spe-cific page. Here, we use both of them to construct usage data from two real world data sets.• {},m n i j SP a ×=: the ultimate usage data in the form of weight matrix with di-mensionality of m n ×.418 G. Xu, Y. Zhang, and X. Zhou2.2 PLSA ModelThe PLSA model is based on a statistic model called aspect model, which can be util-ized to identify the hidden semantic relationships among general co-occurrence activi-ties. Similarly, we can conceptually view the user sessions over Web pages space as co-occurrence activities in the context of Web usage mining to discover the latent us-age pattern. For the given aspect model, suppose that there is a latent factor space {}12,,k Z z z z = and each co-occurrence observation data ,i j s p <>is associated withthe factor k z Z ∈ by varying degree to k z .Based on these assumptions and Bayesian rule, we calculate the probability of an observed pair ,i j s p <> by adopting the latent factor variable z k as:(,)()(|)(|)k i j k i k j k z Z P s p P z P s z P P z ∈=••∑ (1)Following the likelihood principle, the total likelihood is determined as∑∈∈•=P p S s j i j i i j i p s P p s m L ,),(log ),( (2)where (,)i j m s p is the element of the session-page matrix corresponding to sessioni s and page j p .In order to maximize the total likelihood, we make use of Expectation Maximiza-tion (EM) algorithm to perform maximum likelihood estimation of ()k P z , (|)i k P s z , (|)j k P p z in latent variable model [3]. The executing of E-step and M-step is repeat-ing until i L is converging to a local optimal limit, which means the estimated results can represent the final probabilities of observation data. It is easily found that the computational complexity of this algorithm is ()O mnk , where m is the number of user session, n is the number of page, and k is the number of factors.3 Discovery of Latent Factors and Usage-Based User ProfilesAs we discussed in section 2, the estimated probabilities quantitatively measure the underlying relationships among Web users, pages as well as latent factors (i.e. tasks). Therefore, it is reasonable to identify the latent factors and discover the related usage-based access patterns upon probability inference process. In this section, we propose how to derive the aforementioned usage information.3.1 Characterizing Latent FactorFirst, we discuss how to capture the latent factor associated with user navigational be-havior. This aim is to be achieved by characterizing the “dominant” pages that con-tribute significantly to the factor. Note that (|)j k p p z represents the conditional oc-currence probability over the page space corresponding to a specific factor, whereas (|)k j p z p reflects the conditional probability distribution over the factor space corre-Towards User Profiling for Web Recommendation 419 sponding to a specific page. Thus, we may choose the pages whose conditional prob-abilities (|)k j p z p and (|)j k p p z are both greater than a predefined threshold to form “dominant” page set. Exploring the contents of these pages would result in character-izing the semantic meaning of each factor. In section 4, we will present various ex-amples of latent factors as well as those “dominant” pages derived from two real data sets.3.2 Building Usage-Based User ProfilesNote that the set of (|)k i P z s is conceptually representing the probability distributionover the latent factor space for a specific user session i s , we, thus, construct the ses-sion-factor matrix based on the calculated probability estimates, to reflect the rela-tionship between Web users and latent factors, which is expressed as follows:',1,2,(,,...,)i i i i k s b b b = (3)where ,i s b is the occurrence probability of session i s on factor s z . In this way, the distance between two session vectors may reflect the exhibited navigational behavior similarity. We, therefore, define their similarity by applying well-known cosine simi-larity as:()''''''22(,),()i j i ji j sim s s s s s s =• (4) where ()'',,1,i j k i m j m m s s b b ==∑, '2i s,'2js =With the page similarity measurement (4), we propose a modified k-means cluster-ing algorithm [13] to partition user sessions into corresponding clusters. As each user session is represented as a weighted page vector, it is reasonable to derive the centroid of cluster obtained as the usage pattern in the form of user profile. In this work, we compute the mean vector to represent the centroid. The algorithm for clustering user sessions and constructing user profiles is as follows:Algorithm 1. Building User ProfilesInput: the set of conditional probabilities (|)k i P z sOutput: A set of user session clusters 12 {,,, }P SCL SCL SCL SCL = and a set of user profiles 12{,,,}p PF PF PF PF =1. For all user sessions, employ the modified k-means clustering algorithm and out-put a set of usage-based session clusters {}t SCL SCL =.2. for each user session cluster, calculate the centroid of cluster as'1/i i t t t s SCL Cid SCL s ∈=•∑ (5) where t SCL is the number of sessions in the cluster.420 G. Xu, Y. Zhang, and X. Zhou3. Treat the centroid of generated cluster as the aggregate user profile, and sort the normalized weights in a descending order to reflect the relative “significance” contributed by the corresponding pages within the selected user profile, i.e.{}2112,,,,,,t t tt t t t n n PF p w p w p w =<><><> (6) where ,1/i t tj t i j s SCL w SCL a ∈=•∑, 12t t t n w w w >>> , and t j p P ∈4. Output {}t PF PF =. 4 Using PLSA for Web PersonalizationGenerally, we recommend Web items to users in customized or preferred style based on analysis of their interests exhibited by individual or groups of users. In this work, we adopt the model-based technique in our Web recommendation framework. We consider the usage-based user profiles generated in section 3.2 as the aggregated rep-resentatives of common navigational behaviors exhibited by all individuals in same particular user category. For a newly coming active user session, we utilize cosine function to measure the similarity between it and discovered user profile. We, then, choose the closest profile, which shares the highest similarity with the current user session, as the matched pattern to current user. Finally, we generate the top-N recom-mendation pages based on the historically visited probabilities of pages by other users in the selected profile. The detailed procedure is as follows:Algorithm 2. Web Recommendation Based on user profilingInput: An active user session and a set of user profilesOutput: The top-N recommendation pages1. The active session and the profiles are to be simplified as n-dimensional weight vectors a s ,p s instead of page-weight pair vector over the page space that is generated from algorithm 3 within a site, i.e. 12[,,,]p p p p n s w w w = , where pi w is the significance weight contributed by page i p in this profile, similarly12[,,]a a a a n s w w w = , where 1a i w =, if page i p is already accessed, and other-wise 0a i w =.2. Measure the similarities between the active session and all derived usage pro-files, and choose the maximum one out of the calculated similarities as the most matched pattern:22(,)max((,))max(())mat j a p a p a p a p j j sim s s sim s s s s s s ==i (7)3. Incorporate the selected profile p mat s with the active session a s , then calculate the recommendation score ()i rs p for each page i p :Towards User Profiling for Web Recommendation 421()mat i i rs p w =, mat mati p w s ∈ (8)Thus, each page in the profile will be assigned a recommendation score be-tween 0 and 1. Note that the recommendation score will be 0 if the page is al-ready visited in the current session.4. Sort the calculated recommendation scores in step 3 in a descending order, i.e.12(,,,)mat mat mat n rs w w w = , and select the N pages with the highest recom-mendation scores to construct the top-N recommendation set:1(){|()(),1,2,,}mat mat mat mat j j j j REC N p rs p rs p j N p P +=>=∈ (9)5 Experiments and EvaluationsIn order to evaluate the effectiveness of the proposed method based on PLSA model and explore the discovered latent semantic factor, we have conducted preliminary ex-periments on two real world data sets.5.1 Data SetsThe first data set we used is downloaded from KDDCUP Web site (/KDDCUP/). After data preparation, we have setup an evalua-tion data set including 9308 user sessions and 69 pages, where every session consists of 11.88 pages in average. We refer this data set to “KDDCUP data”. In this data set, the number of Web page hits by the given user determines the element in session-page matrix associated with the specific page in the given session.The second data set is from a academic Website log files[10]. The data is based on a 2-week Web log file during April of 2002. After data preprocessing stage, the fil-tered data contains 13745 sessions and 683 pages. The entries in the usage data corre-spond to the amount of time (in seconds) spent on pages during a given session. For convenience, we refer this data as “CTI data”.5.2 Latent Factors Based on PLSA ModelWe conduct experiments on the two data sets to extract the latent factors via identify-ing “dominant” page set. Here, we present the experimental results of the derived latent factors from two real data sets based on PLSA model respectively. Table 1 il-lustrates one example out of the derived factors extracted from the KDDCUP data set as well as the “dominant” page set, whose probabilities are over the predefined threshold, whereas Table 2 presents the example out of those from CTI data set. From these tables, it is easily concluded that the factor #6 in KDDCUP data set reflects the scenario involving in online shopping process, whereas the factor #13 stands for ac-tivity of searching postgraduate program information.422 G. Xu, Y. Zhang, and X. ZhouTable 1. Example of laten factor and its associated pages from KDDCUP FactorPage # Content Pgae # Content 27 main/login2 50 account/past_orders 32 main/registration 52 account/credit_info 42 account/your_account 60 checkout/thankyou 44 checkout/expresCheckout64 account/create_credit 45 checout/confirm_order 65 main/welcome# 6onlineshoppingprocess 47 account/address 66 account/edit_credit Table 2. Example of laten factor and its associated pages from CTIFactorPage # Content Pgae # Content 386 /News 588 /Prog/2002/Gradect2002 575 /Programs 590 /Prog/2002/Gradis2002 586 /Prog/2002/Gradcs2002 591 /Prog/2002/Gradmis2002# 13Postgrad-program 587 /Prog/2002/Gradds2002 592 /Prog/2002/Gradse20025.3 Evaluation Metric of User Session Clusters and Web Recommendation In order to evaluate the quality of clusters derived from PLSA-based approach, we adopt one specific metric, named the Weighted Average Visit Percentage (WAVP) [8]. This evaluation method is based on assessing each user profile individually according to the likelihood that a user session, which contains any pages in the session cluster, will include the rest pages in the cluster during the same session. Suppose T is one of session set within the evaluation set, and for s specific cluster C , let T c denote a subset of T whose elements contain at least one page from C, the WAVP is computed as:(,)c t T p PFc t C WAVP wt p pf T ∈∈⎛⎞⎛⎞•=⎜⎟⎜⎟⎝⎠⎝⎠∑∑On the other hand, we exploit a metric called hit precision [7] to measure the preci-sion in the context of top-N recommendation. Given a user session in the test set, we extract the first j pages as an active user session to generate a top-N recommendation set via the procedure described in section 4. Since the recommendation set is in de-scending order, we then obtain the rank of 1j + page in the sorted recommendation list. Furthermore, for each rank 0r >, we sum the number of test data that exactly rank the r th as ()Nb r . Let 1()()ri S r Nb i ==∑, and ()/hitp S N T =, where T repre-sents the number of testing data in the whole test set. Thus, hitp stands for the hit pre-cision of Web recommendation.In order to compare our approach with other existing methods, we implement a baseline method that is based on the clustering technique [11]. This method is toTowards User Profiling for Web Recommendation 423Fig. 2. WAVP comparison for CTI Fig. 3.Hitp comparison for CTI generate usage-based session clusters by performing k-means clustering process on us-age data explicitly. Then, the cluster centroids are treated as the aggregated access patterns.Figures 2 and 3 depict the comparison results of WAVP and hitp coefficient per-formed on CTI dataset using the two methods discussed above respectively. The re-sults demonstrate that the proposed PLSA-based technique consistently overweighs standard clustering-based algorithm in terms of WAVP and hit precision parameter. In this scenario, it can be concluded that our approach is capable of making Web rec-ommendation more accurately and effectively against the conventional method. In addition to recommendation, this approach is able to identify the hidden factors why such user sessions or Web pages are grouped together in same category.6 Conclusion and Future WorkIn this paper, we have developed a Web recommendation framework incorporating user profiling technique based on PLSA model. With the proposed probabilistic method, we can measure the co-occurrence activities (i.e. user sessions) in terms of probability estimations to capture the underlying relationships among Web users, pages as well as latent tasks. Analysis of the estimated probabilities leads to build up usage-based user profiles and identify the hidden factors associated with the corre-sponding interests or patterns as well. The discovered usage patterns in the forms of user profiles is used to make collaborative recommendation, in turn, lead to improve the precision and effectiveness of Web recommendation. We have demonstrated the efficiency of our technique through preliminary experiments performed on the real world datasets and comparisons with other existing work.Our future work will focus on the following issues: we intend to identify the primitive task of active user and incorporate Web page categories to predict user potentially visited pages, and implement more experiments to validate the scalability of our approach.424 G. Xu, Y. Zhang, and X. ZhouReferences1 R. Agarwal, C. Aggarwal and V. Prasad, A Tree Projection Algorithm for Generation ofFrequent Itemsets, Journal of Parallel and Distributed Computing, 61 (1999), pp. 350-371.2 R. Agrawal and R. Srikant, Mining Sequential Patterns, in P. S. Y. a. A. S. P. Chen, ed.,Proceedings of the International Conference on Data Engineering (ICDE), IEEE Com-puter Society Press, Taipei, Taiwan, 1995, pp. 3-14.3 A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete datavia the EM algorithm, Journal Royal Statist. Soc. B, 39 (1977), pp. 1-38.4 E. Han, G. Karypis, V. Kumar and B. Mobasher, Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results, IEEE Data Engineering Bulletin, 21 (1998), pp. 15-22.5 J. Herlocker, J. KONSTAN, A. BORCHERS and J. RIEDL, An Algorithmic Frameworkfor Performing Collaborative Filtering, Proceedings of the 22nd ACM Conference on Re-searchand Development in Information Retrieval (SIGIR'99), Berkeley, CA., 1999.6 J. L. Herlocker, J. A. Konstan, L. G. Terveen and J. T. Riedl, Evaluating collaborative fil-tering recommender systems, ACM Transactions on Information Systems (TOIS), 22 (2004), pp. 5 - 53.7 X. Jin, Y. Zhou and B. Mobasher, A Unified Approach to Personalization Based on Prob-abilistic Latent Semantic Models of Web Usage and Content, Proceedings of the AAAI 2004 Workshop on Semantic Web Personalization (SWP'04), San Jose, 2004.8 T. Joachims, D. Freitag and T. Mitchell, Webwatcher: A tour guide for the world wideweb, The 15th International Joint Conference on Artificial Intelligence (ICJAI'97), Na-goya, Japan, 1997, pp. 770-777.9 J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon and J. Riedl, Grouplens: Apply-ing Collaborative Filtering to Usenet News, Communications of the ACM, 40 (1997), pp.77-87.10 B. Mobasher, Web Usage Mining and Personalization, in M. P. Singh, ed., PracticalHandbook of Internet Computing, CRC Press, 2004.11 B. Mobasher, H. Dai, M. Nakagawa and T. Luo, Discovery and Evaluation of AggregateUsage Profiles for Web Personalization, Data Mining and Knowledge Discovery, 6 (2002), pp. 61-82.12 M. Perkowitz and O. Etzioni, Adaptive Web Sites: Automatically Synthesizing WebPages., Proceedings of the 15th National Conference on Artificial Intelligence, AAAI, Madison, WI, 1998, pp. 727-732.13 G. Xu, Y. Zhang and X. Zhou, A Latent Usage Approach for Clustering Web Transactionand Building User Profile, The First International Conference on Advanced Data Mining and Applications (ADMA 2005), Springer, Wuhan, china, 2005, pp. 31-42.14 G. Xu, Y. Zhang and X. Zhou, Using Probabilistic Semantic Latent Analysis for WebPage Grouping, 15th International Workshop on Research Issues on Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'2005), Tokyo, Japan, 2005.。