WEB RECOMMENDATION SYSTEM BASED ON A MARKOV-CHAIN MODEL
- 格式:pdf
- 大小:85.65 KB
- 文档页数:8
基于Hadoop的新闻推荐算法研究发布时间:2023-02-01T05:32:28.525Z 来源:《科学与技术》2022年第16期8月作者:尹铁源张思淇[导读] 随着线上阅读新闻方式的兴起,传统的新闻推荐算法存在着特征稀疏、缺少多样性等问题。
为解决以上问题,本文提出一种基于Hadoop的融合兴趣模型推荐算法。
尹铁源张思淇沈阳工业大学信息科学与工程学院辽宁沈阳 110870摘要:随着线上阅读新闻方式的兴起,传统的新闻推荐算法存在着特征稀疏、缺少多样性等问题。
为解决以上问题,本文提出一种基于Hadoop的融合兴趣模型推荐算法。
首先,考虑特征稀疏问题,将特征词扩展得到兴趣扩展模型,其次,考虑新闻热度和阅读时长对相似度的影响,提出了改进的相似度计算方法,得到用户潜在兴趣扩展模型,最后,将两个模型进行混合得到融合兴趣模型,进行新闻推荐。
实验结果表明,在hadoop中运行改进后的算法,推荐效果有所提升。
关键词:新闻推荐;Hadoop;基于内容的推荐Research on Hadoop-based news recommendation algorithmYIN Tie-yuan, ZHANG Si-qi(School of information science and engineering, Shenyang University of technology, Shenyang, Liaoning 110870)Absrtact: With the rise of online news reading, traditional news recommendation algorithms have some problems, such as sparse features and lack of diversity. To solve the above problems, this paper proposes a Hadoop based fusion interest model recommendation algorithm. Firstly, considering the problem of feature sparsity, the feature words are extended to obtain the interest expansion model. Secondly, considering the impact of news popularity and reading time on the similarity, an improved similarity calculation method is proposed to obtain the user potential interest expansion model. Finally, the two models are mixed to obtain the fusion interest model for news recommendation. The experimental results show that the performance of the improved algorithm in Hadoop is improved.Key words: news recommendation; Hadoop; Content based recommendations1引言随着互联网的崛起式发展,更多的人偏爱于网上阅读新闻报道,但由于网络上新闻报道的数量成千上万,使得用户在海量新闻中陷入迷茫,这就产生了“信息过载”的问题[1]。
写一篇希望喜欢的网站改进建议的英语作文全文共10篇示例,供读者参考篇1Dear website,I am a primary school student and I really like visiting your website. It's super cool! But I have some ideas on how you can make it even better. Here are my suggestions:Firstly, I think you could add more fun games and activities for kids to play. Maybe some puzzles or quizzes that are related to the topics on your website. That way, we can learn while having fun.Secondly, I noticed that some of the articles are a bit long and difficult to understand. It would be helpful if you could break them down into smaller sections and use simpler language. This way, it will be easier for us to follow along and learn new things.Thirdly, I think it would be great if you could have a section where we can share our own ideas and creations. It could be like a mini blog for kids to write about their interests and hobbies.This way, we can connect with other kids who have similar interests.Overall, I really enjoy visiting your website and I hope you will consider my suggestions to make it even better. Thank you for providing such a fun and educational platform for kids like me to explore!Yours sincerely,[Your name]篇2Hello everyone! Today I want to share with you some suggestions on how to improve your favorite website. I hope you enjoy it!First of all, I think the website could use some more interactive features. For example, it would be cool to have a chat room where users can talk to each other in real-time. It would also be fun to have some games or quizzes that users can play while they are on the site.Secondly, I believe that the website could benefit from a better search function. Sometimes it can be hard to find whatyou are looking for, so having a more advanced search tool would make it easier for users to navigate the site.Another suggestion is to add more personalized content. It would be nice if the website could recommend articles or videos based on the user's interests and browsing history. This way, users would feel more connected to the site and would be more likely to visit it regularly.Lastly, I think it would be great if the website could incorporate more multimedia elements. For example, adding videos or podcasts to complement the written content would make the site more engaging and appealing to users.Overall, I think these suggestions would make the website even more enjoyable for users. I hope the website developers will consider implementing some of these ideas in the future. Thank you for listening!篇3Hey everyone! Today I want to talk about my favorite website and some suggestions I have to make it even better. My favorite website is YouTube because I love watching videos of my favorite YouTubers, learning new things, and listening to music.First, I think YouTube should have a feature where you can create playlists with your favorite videos. This way, you can easily find and watch all your favorite videos in one place. It would be so cool to have different playlists for different moods or topics!Second, I think YouTube should have a "watch later" button so you can save videos to watch when you have more time. Sometimes I see a video I want to watch, but I don't have time right then. It would be so helpful to have a way to save it for later.Third, I think YouTube should improve its recommendation system. Sometimes, the videos it suggests for me aren't things I'm interested in. It would be great if YouTube could recommend more videos that I would actually enjoy based on what I watch.Overall, I love YouTube and I think these improvements would make it even better. What do you think? Do you have any suggestions for your favorite website? Let me know in the comments!篇4Dear website,I really like your website because it has so many cool games and videos to watch. But I think there are some things that you could do to make it even better!One thing you could do is add more games for different age groups. Some of the games on your website are a little too hard for me and my friends. It would be awesome if you could make some games that are easier for kids like us to play.Another thing you could do is add a chat feature so that we can talk to our friends while we are playing games. It would be so much fun to be able to chat and play at the same time!I also think it would be cool if you could have a section where we can write reviews of the games and videos on your website. That way, we can tell other kids what we like about them and help them decide what to play.Overall, I really like your website and I hope you will consider my suggestions to make it even better. Thank you for reading this letter!Sincerely,[Your name]篇5Hey guys, have you ever thought about how cool it would be if our favorite website could be even better? Well, I have some ideas for making our favorite website even more awesome!First of all, I think it would be super helpful if the website had a search bar at the top of the page. That way, if we want to find something specific, we can just type it in and it will show up right away. It would save us a lot of time scrolling through all the different pages.Another suggestion I have is to add a section where we can write reviews for the things we like on the website. For example, if we read a really interesting article or watch a funny video, we could leave a comment about how much we liked it. It would be fun to see what other people think too!Lastly, I think it would be really cool if the website had a chat feature so we could talk to our friends while we are on the site. We could share our thoughts about the latest news or gossip about our favorite celebrities. It would make the website feel more like a community.I hope the people who run the website will consider my suggestions because I know it would make me and my friends really happy. Thanks for listening to my ideas!篇6Hey guys, today I want to talk about my favorite website and some suggestions for improvement. The website I really like is YouTube. I love watching videos of cute animals, funny skits, and cool science experiments. But sometimes there are things that I wish YouTube could do better.First of all, I think YouTube should have a better system for recommending videos. Sometimes I watch one video about cats, and then my whole recommended list is just cat videos. It would be nice if the recommendations were more diverse and showed me videos on different topics that I might be interested in.Secondly, I think YouTube should have more ways for me to interact with my favorite creators. I would love to be able to send them messages or even ask them questions during live streams. It would make me feel more connected to the people that I watch and admire.Lastly, I think YouTube should do a better job of monitoring and removing inappropriate content. Sometimes I stumble upon videos that are not suitable for kids like me, and it can be really scary or upsetting. I think YouTube should make sure that all the videos on their site are safe and appropriate for everyone.Overall, I really love YouTube and spend a lot of time on it. I just think that with a few small changes, it could be even better. Thanks for listening to my suggestions!篇7Hello everyone! Today I want to talk about my favorite website and give some suggestions for improvements. The website I really like is YouTube because I can watch funny videos, learn new things, and listen to music.First of all, I think YouTube should have more videos for kids like me. Sometimes it's hard to find videos that are appropriate for my age. It would be great if they have a special section just for kids with fun and educational videos.Secondly, I think YouTube should improve the quality of their videos. Sometimes the videos are blurry or take a long time to load. It would be nice if they could make the videos clearer and faster to watch.Also, I wish YouTube could have more interactive features. It would be fun if I could play games or quizzes while watching videos. This way, I can learn and have fun at the same time.Lastly, I think YouTube should have better parental controls. Sometimes I come across videos that are not suitable for kids. It would be good if parents can have more options to filter out inappropriate content.In conclusion, I really love YouTube and I hope they can consider my suggestions to make the website even better for kids like me. Thank you for listening!篇8Hello everyone! Today I want to talk about a website that I really like and some suggestions for making it even better.The website I’m talking about is called . It’s a super fun website where you can play all kinds of awesome games for free. I love playing games on because there are so many different ones to choose from. Whether you like puzzle games, action games, or even dress up games, you can find something you enjoy on this website.One suggestion I have for is to add more new games more often. I love trying out new games and it would be really cool if there were new ones to play every week. It would keep things fresh and exciting for all the players.Another suggestion I have is to make the website easier to navigate. Sometimes it’s a little tricky to find the games I want to play because there are so many of them. It would be awesome if there were categories or filters to help me quickly find the games I’m interested in.Lastly, I think it would be great if had a chat feature so that players could talk to each other while they’re playing. It would be fun to make new friends and chat about the games we’re playing together.Overall, I really love CoolGa and I think it’s already a fantastic website. But I believe these suggestions could make it even better. I hope the people who run the website will consider these ideas.Thanks for reading my suggestions! Have a great day and keep on gaming!篇9Title: My Suggestions for Improving the Website I LoveHi everyone! I want to talk about my favorite website today. It's so awesome, but I think it can be even better with a few changes. Here are my suggestions for improving the website:First of all, I think the website should have more interactive features. Like maybe a chat room where users can talk to each other and make new friends. It would be super fun to chat with other people who love the website as much as I do.Secondly, I think the website should have a section where users can share their own stories and experiences. It would be so cool to read about how the website has helped or inspired other people. Plus, it would make the website feel more personal and welcoming.Another idea I have is to add more games and activities to the website. I love playing games online, and I think it would be really fun to have some games on the website that are related to the website's theme. It would make the website even more entertaining and engaging.Lastly, I think the website could use a makeover in terms of design. Maybe some new colors or graphics would make the website more visually appealing. A fresh new look could attract more users and keep current users coming back for more.Overall, I really love this website and I think these improvements could make it even better. I hope the website owners will consider my suggestions. Thank you for reading!篇10Title: My Suggestions for Improving My Favorite WebsiteHi everyone! Today I want to talk about my favorite website and some ideas to make it even better. My favorite website is ABC Kids, a website full of fun games, cartoons and educational activities. I love spending time on it, but I think there are some ways it could be improved.First of all, I think the website could add more new games and activities. While I love the games that are already on there, it would be great to have some new ones to keep things fresh and exciting. It would also be cool to be able to customize your own profile on the website with your favorite characters and colors.Another idea is to have more interactive features on the website. For example, it would be fun to be able to chat with other kids who are on the website at the same time. We could share tips and tricks for the games, or just chat about our favorite cartoons.I also think it would be nice to have a section on the website where kids can submit their own artwork, stories or videos. It would be so cool to see what other kids have created, and it would be a great way to showcase our talents.Overall, I think ABC Kids is already a great website, but with a few tweaks and improvements, it could be even better. I hope the website developers will take my suggestions into consideration. Thanks for listening!。
Part 1.Explanation of Terms, 30 pointsNOTE: Give the definitions or explanations of the following terms, 5 points for each.(1)Data IntegrityAssures that information and programs are changed only in a specified and authorized manner.In information security, integrity means that data cannot be modified undetectably.Integrity is violated when a message is actively modified in transit. Information security systems typically provide messageintegrity in addition to data confidentiality(2)Information Security AuditAn information security audit is an audit on the level of information security in an organization(3)PKIPKI provides well-conceived infrastructures to deliver security services in an efficient and unified style. PKI is a long-term solutionthat can be used to provide a large spectrum of security protection.(4)X.509In cryptography, X.509 is an ITU-T standard for a public key infrastructure (PKI) for single sign-on (SSO,单点登录)and Privilege Management Infrastructure (PMI,特权管理基础架构).The ITU-T recommendation X.509 defines a directory service that maintains a database of information about users for theprovision of authentication services…(5)Denial-of-Service AttackDoS (Denial of Service) is an attempt by attackers to make a computer resource unavailable to its intended users.(6)SOA(Service-Oriented Architecture)SOA is a flexible set of design principles used during the phases of systems development and integration in computing. A systembased on a SOA will package functionality as a suite of interoperable services that can be used within multiple, separate systemsfrom several business domains.(7)Access ControlAccess control is a system that enables an authority to control access to areas and resources in a given physical facility orcomputer - based information system. An access control system, within the field of physical security, is generally seen as the second layer in the security of a physical structure.(Access control refers to exerting control over who can interact with a resource. Often but not always, this involves an authority,who does the controlling. The resource can be a given building, group of buildings, or computer-based information system. But itcan also refer to a restroom stall where access is controlled by using a coin to open the door)(8)Salted ValueIn cryptography, a salt consists of random bits, creating one of the inputs to a one-way function. The other input is usually apassword or passphrase. The output of the one-way function can be stored (alongside the salt) rather than the password, and stillbe used for authenticating users. The one-way function typically uses a cryptographic hash function. A salt can also be combinedwith a password by a key derivation function such as PBKDF2 to- generate a key for use with a cipher or other cryptographicalgorithm. The benefit provided by using a salted password is making a lookup table assisted dictionary attack against the storedvalues impractical, provided the salt is large enough. That is, an attacker would not be able to create a precomputed lookup table(i.e. a rainbow table) of hashed values (password i salt), because it would require a large computation for each salt.(9)SOAPSOAP is a protocol specification for exchanging structured information in the implementation of Web Services in computernetworks. It relies on Extensible Markup Language (XML) for its message format, and usually relies on other Application Layerprotocols, most notably Hypertext Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), for message negotiationand transmission(10)ConfidentialityConfidentiality is the term used to prevent the disclosure of information to unauthorized individuals or systems. Confidentiality isnecessary (but not sufficient) for maintaining the privacy of the people whose personal information a system holds.(ll)AuthenticationIn computing, e-Business and information security is necessary to ensure that the data, transactions, communications ordocuments (electronic or physical) are genuine. It is also important for authenticity to validate that both parties involved are whothey claim they are.(12)KerberosKerberos is an authentication service developed at MIT which allows a distributed system to be able to authenticate requests forservice generated from workstations.Kerberos (ITU-T) is a computer network authentication protocol which works on the basis of “tickets” to allow nodescommunicating over a non-secure network to prove their identity to one another in a secure manner.(13)SSL/TLSSSL are cryptographic protocols that provide communication security over the Internet. TLS and its predecessor, SSL encrypt thesegments of network connections above the Transport Layer, using asymmetric cryptography for key exchange, symmetricencryption for privacy, and message authentication codes for message integrity.(14)Man-in-the-Middle AttackMan-in-the-Middle Attack is a form of active eavesdropping in which the attacker makes independent connections with the victimsand relays messages between them, making them believe that they are talking directly to each other over a private connection,when in fact the entire conversation is controlled by the attacker.(15)System VulnerabilityA vulnerability is a flaw or weakness in a system,s design, implementation, or operation and management that could be exploitedto violate the system,s security policy (which allows an attacker to reduce a system's information assurance). Vulnerability is theintersection of three elements: a system susceptibility or flaw, attacker access to the flaw, and attacker capability to exploit theflaw.(16)Non-RepudiationNon-repudiation refers to a state of affairs where the author of a statement will not be able to successfully challenge theauthorship of the statement or validity of an associated contract.The term is often seen in a legal setting wherein the authenticity of a signature is being challenged. In such an instance, theauthenticity is being "repudiated".(17)Bastion HostA bastion host is a computers on a network, specifically designed and configured to withstand attacks. It,s identified by the firewalladmin as a critical strong point in the network,s security. The firewalls (application - level or circuit - level gateways) and routers can be considered bastion hosts. Other types of bastion hosts include web, mail, DNS, and FTP servers.(18)CSRFCross-Site Request Forgery (CSRF) is an attack that forces an end user to execute unwanted actions on a web application in whichthey're currently authenticated. CSRF attacks specifically target state-changing requests, not theft of data, since the attacker hasno way to see the response to the forged request. With a little help of social engineering (such as sending a link via email or chat),an attacker may trick the users of a web application into executing actions of the attacker's choosing. If the victim is a normaluser, a successful CSRF attack can force the user to perform state changing requests like transferring funds, changing their emailaddress, and so forth. If the victim is an administrative account, CSRF can compromise the entire web application.Part 2.Brief Questions, 40 pointsNOTE: Answer the following HOW TO questions in brief, 8 points for each.(1)Asymmetric Cryptographic Method.非对称加密算法需要两个密钥:公开密钥(public - key)和私有密钥(private - key)。
旅游管理系统参考文献英文本文为旅游管理系统参考文献英文,包括以下内容:1. Ali, A., & Al-Sabaan, A. (2018). An efficient recommendation system for tourism management using social network analysis. Computers & Industrial Engineering, 125, 89-99.2. Chang, H. H., & Chen, S. W. (2015). The effects of online travel reviews on consumer behavior: A perspective of social influence. Tourism Management, 47, 46-54.3. Chen, C. C., & Tsai, D. C. (2016). An intelligent recommendation system for tourism planning. Journal of Hospitality and Tourism Technology, 7(1), 2-14.4. Huang, Y., Li, X., & Li, J. (2017). A tourism recommendation system based on user preferences and behavior analysis. Journal of Tourism and Hospitality Management, 5(1), 26-37.5. Li, X., Li, J., & Huang, Y. (2018). A personalized recommendation system for tourism based on big data analytics. Journal of Travel Research, 57(8), 1091-1105.6. Lin, H. F., & Chen, Y. C. (2016). A decision support system for tourism management: A case study in Taiwan. Journal of Travel and Tourism Marketing, 33(1), 43-58.7. Yang, Y., Lee, S. H., & Lee, S. H. (2015). An intelligent tourism recommendation system using sentiment analysis. Information Sciences, 325, 310-323.8. Yu, B., Zhang, J., & Guo, X. (2018). A personalized tourism recommendation system based on hybrid collaborative filtering algorithm. Journal of Hospitality and Tourism Management, 36, 1-12.9. Zhang, Y., Huang, W., & Zhang, X. (2017). A tourist behavior prediction and recommendation system based on big data. Journal of Hospitality and Tourism Technology, 8(4), 458-473.10. Zhou, L., Zhang, Y., & Wang, Y. (2015). An intelligent tourism recommendation system based on cloud computing. Journal of Hospitality and Tourism Technology, 6(1), 2-12.。
When Amazon recommends a product on its site,it is clearly not a coincidence.At root,the retail giant's recommendation system is based on a number of simple elements:what a user has bought in the past,which items they have in their virtual shopping cart,items they've rated and liked,and what other customers have viewed and purchased.Amazon(AMZN)calls this homegrown math"item-to-item collaborative filtering,"and it's used this algorithm to heavily customize the browsing experience for returning customers.A gadget enthusiast may find Amazon web pages heavy on device suggestions,while a new mother could see those same pages offering up baby products.Judging by Amazon's success,the recommendation system works.The company reported a29%sales increase to$12.83billion during its second fiscal quarter,up from $9.9billion during the same time last year.A lot of that growth arguably has to do with the way Amazon has integrated recommendations into nearly every part of the purchasing process from product discovery to checkout.Go to and you'll find multiple panes of product suggestions;navigate to a particular product page and you'll see areas plugging items"Frequently Bought Together"or other items customers also bought.The company remains tight-lipped about how effective recommendations are.("Our mission is to delight our customers by allowing them to serendipitously discover great products,"an Amazon spokesperson told Fortune."We believe this happens every single day and that's our biggest metric of success.")Amazon also doles out recommendations to users via email.Whereas the web site recommendation process is more automated,there remains to this day a large manual component.According to one employee,the company provides some staffers with numerous software tools to target customers based on purchasing and browsing behavior.But the actual targeting is done by the employees and not by machine.If an employee is tasked with promoting a movie to purchase like say,Captain America, they may think up similar film titles and make sure customers who have viewed other comic book action films receive an email encouraging them to check out Captain America in the future.Amazon employees study key engagement metrics like open rate,click rate,opt-out --all pretty standard for email marketing channels at any company--but lesser known is the fact that the company employs a survival-of-the-fittest-type revenue and mail metric to prioritize the Amazon email ecosystem."It's pretty cool. Basically,if a customer qualifies for both a Books mail and a Video Games mail, the email with a higher average revenue-per-mail-sent will win out,"this employee told Fortune."Now imagine that on a scale across every single product line--customers qualifying for dozens of emails,but only the most effective one reaches their inbox."The tactic prevents email inboxes from being flooded,at least by Amazon.At the same time it maximizes the purchase opportunity.In fact,the conversion rate and efficiency of such emails are"very high,"significantly more effective than on-site recommendations.According to Sucharita Mulpuru,a Forrester analyst,Amazon's conversion to sales of on-site recommendations could be as high as60%in some cases based off the performance of other e-commerce sites.Still,although Amazon recommendations are cited by many company observers as a killer feature,analysts believe there's a lot of room for growth."There's a collective belief within the e-commerce industry that Amazon's recommendation engine is a suboptimal solution,"says Mulpuru.Trisha Dill,a Well's Fargo analyst, says it's hard to fault Amazon for their recommendations,but she also says the company has a lot of work to do in offering users items more relevant to them.As an example,she points to a targeted email she received pushing a chainsaw carrying case.(She doesn't own a chainsaw.)Besides refining the accuracy of recommendations themselves,Amazon could explore more ways to reach customers.Already,the company has begun selling items previously sold in bulk that were too cost-prohibitive to ship individually like say,a deck of cards or a jar of cinnamon.Customers may buy them,but only if they have an order totaling$25or over.But the company could actively recommend these add-on products during check-out when an order crosses that pricing threshold,much like traditional supermarkets have impulse-purchase items like gum and candy bars at the register.At that point,the Amazon customer,just as they would in the supermarket,might think,"It's just a few more bucks.Why not?。
摘要摘要随着互联网的飞速发展,每天都有浩如烟海的信息产生,面对数据量庞大的信息海洋,人们往往会感到无所适从,因此,推荐系统应运而生。
推荐系统的目的是主动向用户提供其感兴趣的物品或资源而无需用户主动搜寻。
经过20多年的发展,推荐系统已经深入到了人们生活的方方面面,如电子商务,新闻推荐,影视推荐等。
其中影视推荐是推荐系统技术研究的重要领域。
现有的影视推荐主要是热门推荐和相关推荐,热门推荐容易导致马太效应,而相关推荐在一定程度上符合用户喜好,但是个性化程度较低,不同用户在同一个播放页上看到的推荐列表往往是相同的。
协作过滤算法是推荐领域中最成功也是应用最广泛的推荐策略,常用于个性化推荐。
本文在基于用户的协作过滤策略的基础上进行改进。
用户评分的高低表达了对电影的喜好程度,而用户的标注行为表达了用户的喜好倾向,两者结合可以有效提升推荐结果的个性化程度。
本文首先在用户行为数据建模阶段对用户的行为数据进行分析,将用户的评分行为和标注行为结合起来建立了初始的用户行为数据模型。
同时,考虑到用户喜好并不是一成不变,参考“牛顿冷却定律”引入了时间衰减因子模拟整个时间轴上的用户喜好变化,对用户行为数据模型进行偏移处理。
之后使用该模型进行用户之间的类似程度计算,获得推荐的电影资源候选池。
在电影资源的评分预测阶段,考虑到标签在一定程度上也反映了电影资源的内容特征信息,参考信息挖掘领域“词频-逆文档频率”的思想建立电影资源和标签之间的联系并对侯选池中的电影资源进行评分预测的改进。
然后对本文做出的改进设计了对比实验验证其有效性,选取了Top-N推荐中常用的评价标准命中率(Hit-rate)和命中排序(Hit-rank)作为衡量指标进行相关实验,验证了在推荐同等数量电影资源的情况下,改进后的算法Hit-rate和Hit-rank 都要高于现有的协作过滤算法。
本文在最后以前文提出的改进的推荐算法为基础设计并实现了一个影视推荐系统,首先分析了系统的需求,然后根据需求进行相关设计,并用SS2H框架实现了该系统,并给出了系统主要的数据表展示与功能界面展示。
---文档均为word文档,下载后可直接编辑使用亦可打印---摘要随着互联网和移动通信迅猛发展,电子商务强势崛起,越来越多的人倾向于网上消费。
如何从海量的互联网数据中筛选出用户感兴趣的信息成为了全球互联网用户潜在的问题,推荐系统(Recommendation System)技术通过搜索大量动态生成的信息来为用户提供个性化的内容和服务来解决这个问题。
推荐系统作为一种信息过滤方式,试图预测用户的偏好兴趣和对物品的评价。
近年来,频繁活跃的互联网用户在消费信息的同时也产出了海量的原创内容。
本文的主要研究工作是深度挖掘用户原创的评论内容,分析出用户和物品的特征,进而进行评分预测。
评论(Comment)指人对于事物做出的客观叙述,反映了人的主观感受。
基于用户的文本评论数据,本文的主要研究工作如下:首先,从互联网上采集包含有用户、物品和用户文本评论的数据。
该数据集来源于大众点评网。
然后对评论文本进行分词,用词向量对其进行数学表达,形成主题词的分布表。
最后,基于用户文本用评论主题词进行评分预测,通过线性回归模型和改进的协同过滤算法预测评分,最终的实验结果表明,预测的评分客观准确,同时组合的预测算法效果更优。
关键词:推荐系统;用户评论;线性回归;评分预测AbstractWith the rapid development of the Internet and mobile communications, and the strong rise of e-commerce, more and more people tend to spend online.How to filter the information that users are interested in from the massive Internet data has become a potential problem for global Internet users. Recommendation systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services.The recommendation system serves as an information filtering method that attempts to predict the user's preference for interest and the evaluation of the item.In recent years, frequent and active Internet users have also produced massive amounts of original content while consuming information.The main research work of this paper is to deeply mine user-originated commentary content, analyze the characteristics of users and items, and then make score predictions.Comment reflects people’s subjective feelings. Based on the user's text review data, the main research work of this paper is as follows:First, data containing user, item, and user text reviews is collected from the Internet. This dataset comes from the Dianping’s website. Then, the comment text is segmented and mathematically expressed by the word vector.Then the text of the comment is segmented and expressed mathematically by the word vector to form the distribution table of the topic word.Finally, based on the user's comment, the scores are predicted by the subject headings, and the linear regression model and the improved collaborative filtering algorithm are used to predict the scores. The final experimental results show that the predicted scores are objective and accurate, and the combined rating prediction algorithm is more effective.Keywords: Recommendation System; Users’ Comment; Linear Regression; Rating Forecast前言进入互联网时代后,技术发展日新月异,人类获取信息的数量也急剧增长,从匮乏到当前的过载,信息的获取信息的方式也逐渐多样化。
resume英语作文Resume。
Name: Li Ming。
Gender: Male。
Date of Birth: September 1st, 1995。
Nationality: Chinese。
Education:Bachelor’s degree in Computer Science, Shanghai Jiao Tong University, 2014-2018。
Master’s degree in Computer Science, Massachusetts Institute of Technology, 2018-2020。
Skills:Proficient in programming languages such as Java, Python, and C++。
Experienced in software development, including web development and mobile app development。
Familiar with machine learning and data analysis。
Strong problem-solving and analytical skills。
Experience:Software Engineer, Google, 2020-present。
Developed and maintained software systems for Google’s search engine. Collaborated with cross-functional teams to improve the user experience and optimize search algorithms.Software Development Intern, Microsoft, 2019。
Worked on a team developing a mobile app forMicrosoft’s cloud platform. Contributed to the design and implementation of the app’s user interface and backend.Research Assistant, MIT Computer Science and Artificial Intelligence Laboratory, 2018-2020。
个性化推荐系统在电子商务网站中的应用研究一、引言随着Internet的普及,信息爆炸时代接踵而至,海量的信息同时呈现,使用户难以从中发现自己感兴趣的部分,甚至也使得大量几乎无人问津的信息称为网络总的“暗信息”无法被一般用户获取。
同样,随着电子商务迅猛发展,网站在为用户提供越来越多选择的同时,其结构也变得更加复杂,用户经常会迷失在大量的商品信息空间中,无法顺利找到自己需要的商品。
个性化推荐,被认为是当前解决信息超载问题最有效的工具之一.推荐问题从根本上说就是从用户的角度出发,代替用户去评估其从未看过的产品,使用户不只是被动的网页浏览者,而成为主动参与者。
准确、高效的推荐系统可以挖掘用户的偏好和需求,从而成为发现用户潜在的消费倾向,为其提供个性化服务。
在日趋激烈的竞争环境下,个性化推荐系统已经不仅仅是一种商业营销手断,更重要的是可以增进用户的黏着性。
本文对文献的综述包括个性化推荐系统的概述、常用的个性化推荐系统算法分析以及个性化推荐系统能够为电子商务网站带来的价值。
二、个性化推荐系统概述个性化推荐系统是指根据用户的兴趣特点和购买行为,向用户推荐用户感兴趣的信息和商品。
它是建立在海量数据挖掘基础上的一种高级商务智能平台,以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。
购物网站的推荐系统为客户推荐商品,自动完成个性化选择商品的过程,满足客户的个性化需求,推荐基于:网站最热卖商品、客户所处城市、客户过去的购买行为和购买记录,推测客户将来可能的购买行为。
1995年3月,卡内基 梅隆大学的Robert Armstrong等人在美国人工智能协会首次提出了个性化导航系统Web-Watcher,斯坦福大学的Marko Balabanovic 等人在同一次会议上推出了个性化推荐系统LIRA。
同年8月,麻省理工学院的Henry Liberman在国际人工智能联合大会(IJCAI)上提出了个性化导航智能体Letizia。
WEB RECOMMENDATION SYSTEM BASED ON AMARKOV-CHAIN MODELFrancois Fouss,Stephane Faulkner,Manuel Kolp,Alain Pirotte,Marco SaerensInformation Systems Research UnitIAG,Universite catholique de Louvain,Place des Doyens1,B-1348Louvain-la-Neuve,Belgiumfouss,kolp,pirotte,saerens@isys.ucl.ac.beInformation Systems Research UnitDepartment of Management Science,Universite of Namur,Rempart de la Vierge8,B-5000Namur,Belgiumstephane.faulkner@fundp.ac.beKeywords:Collaborative Filtering,Markov Chains,Multi Agent System.Abstract:This work presents some general procedures for computing dissimilarities between nodes of a weighted,undi-rected,graph.It is based on a Markov-chain model of random walk through the graph.This method is appliedon the architecture of a Multi Agent System(MAS),in which each agent can be considered as a node andeach interaction between two agents as a link.The model assigns transition probabilities to the links betweenagents,so that a random walker can jump from agent to agent.A quantity,called the averagefirst-passagetime,computes the average number of steps needed by a random walker for reaching agent for thefirst time,when starting from agent.A closely related quantity,called the average commute time,provides a distancemeasure between any pair of agents.Yet another quantity of interest,closely related to the average commutetime,is the pseudoinverse of the Laplacian matrix of the graph,which represents a similarity measure be-tween the nodes of the graph.These quantities,representing dissimilarities(similarities)between any twoagents,have the nice property of decreasing(increasing)when the number of paths connecting two agentsincreases and when the“length”of any path decreases.The model is applied on a collaborativefiltering taskwhere suggestions are made about which movies people should watch based upon what they watched in thepast.For the experiments,we build a MAS architecture and we instantiated the agents belief-set from a realmovie database.Experimental results show that the Laplacian-pseudoinverse based similarity outperforms allthe other methods.1IntroductionGathering product information from large elec-tronic catalogue on E-Commerce sites can be a time-consuming and information-overloading process.As information becomes more and more available on the World Wide Web,it becomes increasingly difficult for users tofind the desired product from the millions of products available.Recommender systems have emerged in response to these issues(Breese et al., 1998),(Resnick et al.,1994),or(Shardanand and Maes,1995).They use the opinions of members of a community to help individuals in that community to identify the information or products most likely to be interesting to them or relevant to their needs.As so, recommender systems can help E-commerce in con-verting web surfers into buyers by personalization of the web interface.They can also improve cross-sales by suggesting other products in which the consumer might be interested.In a world where an E-commerce site competitors are only two clicks away,gaining consumer loyalty is an essential business strategy.In this way,recommender systems can improve loyalty by creating a value-added relationship between sup-plier and consumer.One of the most successful technologies for recom-mender systems,called collaborativefiltering(CF), has been developed and improved over the past decade.For example,the GroupLens Research sys-tem(Konstan et al.,1997)provides a pseudony-mous CF application for Usenet news and movies. Ringo(Shardanand and Maes,1995)and MovieLens (Sarwar et al.,2001)are web systems that generate recommendations on music and movies respectively, suggesting collaborativefiltering to be applicable to many different types of media.Moreover,some of the highest commercial web sites like ,, and madeuse of CF technology.Although CF systems have been developed with success in a variety of domains,important research issues remain to be addressed in order to overcome two fundamental challenges:performances(e.g.,the CF system can deal with a great number of consumers in a reasonable amount of time)and accuracy(e.g., users need recommendations they can trust to help themfind products they will indeed like).This paper addresses both challenges by propos-ing a novel method for CF.The method includes a procedure based on a Markov-chain model used for computing dissimilarities between nodes of an undi-rected graph.This procedure is applied on the ar-chitecture of a Multi Agent System(MAS),in which each agent can be considered as a node and each in-teraction among agents as a link.Moreover,MAS ar-chitectures are gaining popularity over classic ones to build robust andflexible CF applications(Wooldridge and Jennings,1994)by distributing responsabilities among autonomous and cooperating agents.For illustration purposes,we consider in this work a simple MAS architecture which supports an E-commerce site selling DVD movies.The MAS ar-chitecture is instantiated with three sets of agents: user agent,movie agent and movie-category agent, and two kinds of interactions:between user agent and movie agent(has watched),and between movie agent and movie-category agent(belongs to).Then, the procedure allows to compute dissimilarities be-tween any pair of agents:Computing similarities between user agents allows to cluster them into groups with similar interest about bought movies.Computing similarities between user agent and movie agents allows to suggest movies to buy or not to buy.Computing similarities between user agent and movie-category agents allows to attach a most rel-evant category to each user agent.To compute the dissimilarities,we define a random-walk model through the architecture of the MAS by assigning a transition probability to each link (i.e.,interaction instance).Thus,a random walker can jump from neighbouring agents and each agent there-fore represents a state of the Markov model.From the Markov-chain model,we then compute a quantity,,called the averagefirst-passage time(Kemeny and Snell,1976),which is the average number of steps needed by a random walker for reach-ing state for thefirst time,when starting from state. The symmetrized quantity,, called the average commute time(Gobel and Jagers, 1974),provides a distance measure between any pair of agents.The fact that this quantity is indeed a dis-tance on a graph has been proved independently by Klein&Randic(Klein and Randic,1993)and Gobel &Jagers(Gobel and Jagers,1974).These dissimilarity quantities have the nice prop-erty of decreasing when the number of paths connect-ing the two agents increases and when the“length”of any path decreases.In short,two agents are consid-ered similar if there are many short paths connecting them.To our knowledge,while being interesting alterna-tives to the well-known“shortest path”or“geodesic”distance on a graph(Buckley and Harary,1990),these quantities have not been exploited in the context of collaborativefiltering;with the notable exception of (White and Smyth,2003)who,independently of our work,investigated the use of the averagefirst-passage time as a similarity measure between nodes.The “shortest path”distance does not have the nice prop-erty of decreasing when connections between nodes are added,therefore facilitating the communication between the nodes(it does not capture the fact that strongly connected nodes are at a smaller distance than weakly connected nodes).This fact has already been recognized in thefield of mathematical chem-istry where there were attempts to use the“commute time”distance instead of the“shortest path”distance (Klein and Randic,1993).Notice that there are many different ways of computing these quantities,by us-ing pseudoinverses or iterative procedures;details are provided in a related paper.Section2introduces the random-walk model-a Markov chain model.Section3develops our dissim-ilarity measures as well as the iterative formulae to compute them.Section4specifies our experimental methodology.Section5illustrates the concepts with experimental results obtained on a MAS instantiated from the MovieLens database.Section6is the con-clusion.2A Markov-chain model of MAS architecture2.1Definition of the weighted graphA weighted graph is associated with a MAS archi-tecture in the following obvious way:agents corre-spond to nodes of the graph and each interaction be-tween two agents is expressed as an edge connecting the corresponding nodes.In our movie example,this means that each instan-tiated agent(user agent,movie agent,and movie cat-egory agent)corresponds to a node of the graph,and each has_watched and belongs_to interaction is expressed as an edge connecting the corresponding nodes.The weight of the edge connecting node and node(say there are nodes in total)should be set to some meaningful value,with the following convention:the more important the relation between node and node,the larger the value of,and consequently the easier the communication through the edge.Notice that we require that the weights be both positive()and symmetric(). The elements of the adjacency matrix of the graph are defined in a standard way asif node is connected to nodeotherwise(1) where is symmetric.We also introduce the Lapla-cian matrix of the graph,defined in the usual man-ner:(2) where with(element of is).We also suppose that the graph is connected;that is,any node can be reached from any other node of the graph.In this case,has rank,where is the number of nodes(Chung,1997).If is a column vector made of(i.e.,T,where T de-notes the matrix transpose)and is a column vector made of,and T T hold:is doubly centered.The null space of is therefore the one-dimensional space spanned by.Moreover,one can easily show that is symmetric and positive semidef-inite(Chung,1997).Because of the way the graph is defined,user agents who watch the same kind of movie,and there-fore have similar taste,will have a comparatively large number of short paths connecting them.On the contrary,for user agents with different interests,we can expect that there will be fewer paths connecting them and that these paths will be longer.2.2A random walk model on thegraphThe Markov chain describing the sequence of nodes visited by a random walker is called a random walk on a weighted graph.We associate a state of the Markov chain to every node(say in total);we also define a random variable,,representing the state of the Markov model at time step.If the random walker is in state at time,we say.We define a random walk by the following single-step transition probabilities,whereIn other words,to any state or node,we asso-ciate a probability of jumping to an adjacent node,,which is proportional to the weight of the edge connecting and.The transition probabilities only depend on the current state and not on the past ones(first-order Markov chain).Since the graph is totally connected,the Markov chain is irre-ducible,that is,every state can be reached from any other state.If this is not the case,the Markov chain can be decomposed into closed sets of states which are completely independent(there is no communica-tion between them),each closed set being irreducible. Now,if we denote the probability of being in state at time by and we define as the transition matrix whose entries are,the evolution of the Markov chain is characterized byOr,in matrix form,T(3) where T is the matrix transpose.This provides the state probability distributionT at time once theinitial probability density,,is known.For more de-tails on Markov chains,the reader is invited to consult standard textbooks on the subject(Bremaud,1999),(Kemeny and Snell,1976),(Norris,1997).3Averagefirst-passage time and average commute timeIn this section,we review two basic quantities that can be computed from the definition of the Markovchain,that is,from its probability transition matrix:the averagefirst-passage time and the average com-mute time.Relationships allowing to compute thesequantities are derived in a heuristic way(see,e.g., (Kemeny and Snell,1976)for a more formal treat-ment).3.1The averagefirst-passage time The averagefirst-passage time,is defined as the average number of steps a random walker,start-ing in state,will take to enter state for thefirst time(Norris,1997).More precisely,we define theminimum time until absorption by state asand for one realiza-tion of the stochastic process.The averagefirst-passage time is the expectation of this quantity,when starting from state:. We show in a related paper how to derive a recur-rence relation for computing byfirst-step anal-ysis.We obtain,for(4) These equations can be used in order to iteratively compute thefirst-passage times(Norris,1997).The meaning of these formulae is quite obvious:in order to go from state to state,one has to go to any ad-jacent state and proceed from there.3.2The average commute timeWe now introduce a closely related quantity,the aver-age commute time,,which is defined as the average number of steps a random walker,starting in state,will take before entering a given state for thefirst time,and go back to.That is,.Notice that,whileis symmetric by definition,is not.3.3The average commute time is adistanceAs shown by several authors(Gobel and Jagers, 1974),(Klein and Randic,1993),the average com-mute time is a distance measure,since,for any states ,,:if and only ifAnother important point not proved here is that is a matrix whose elements are the inner prod-ucts of the node vectors embedded in an Euclidean space preserving the ECTD between the nodes in this Euclidean space,the node vectors are exactly sepa-rated by ECTD.can therefore be considered as a similarity matrix between the nodes(as in the vectors space model in information retrieval).In summary,three basic quantities will be used as providing a dissimilarity/similarity measure between nodes:the averagefirst-passage time,the average commute time,and the pseudoinverse of the Lapla-cian matrix.4Experimental methodology Remember that each agent of the three sets cor-responds to a node of the graph.Each node of the user-agent set is connected by an edge to the watched movies of the movie-agent set.In all these experi-ments we do not take the movie-category agent set into account in order to perform fair comparisons be-tween the different methods.Indeed,two scoring algorithms(i.e.,cosine and nearest-neighbours algo-rithms)cannot naturally use the movie-category set to rank the movies.4.1Data setFor these experiments,we developed a MAS archi-tecture corresponding to our movie example.The belief set of the user agents,movie agents,and movie-category agents has been instantiated from the real MovieLens database(). Each week hundreds of users visit MovieLens to rate and receive recommendations for movies.We used a sample of this database proposed in (Sarwar et al.,2002).Enough users were randomly selected to obtain100,000ratings(considering only users that had rated20or more movies).The database was then divided into a training set and a test set(which contains10ratings for each of 943users).The training set set was converted into a2625x2625matrix(943user agents,and1682 movie agents that were rated by at least one of the user agents).The results shown here do not take into account of the ratings provided by the user agents here(the experiments using the ratings gave similar results)but only the fact that a user agent has or has not interacted with a movie agent(i.e.,the user-movie matrix isfilled in with’s or’s).We then applied the methods described in Section 4.2to the training set and compared the results thanks to the test set.4.2Scoring algorithmsEach method supplies,for each user agent,a set of similarities(called scores)indicating preferences about the movies,as computed by the method.Tech-nically,these scores are derived from the computa-tion of dissimilarities between the user-agent nodes and the movie-agent nodes.The movie agents that are closest to an user agent,in terms of this dissimilarity score,(and that have not been watched)are consid-ered the most relevant.Thefirst four scoring algorithms are based on the averagefirst-passage time and are computed from the probability transition matrix of the corresponding Markov model.Average commute time(CT).We use the average commute time,,to rank the agents of the con-sidered set,where is an agent of the user-agent set and is an agent of the set to which we compute the dissimilarity(the movie-agent set).For instance,if we want to suggest movies to people for watching, we will compute the average commute time between user agents and movie agents.The lower the value is,the more similar the two agents are.In the sequel, this quantity will simply be referred to as“commute time”.Principal components analysis defined on aver-age commute times(PCA CT).In a related paper, we showed that,based on the eigenvector decomposi-tion,the nodes vectors,,can be mapped into a new Euclidean space(with2625dimensions in this case) that preserves the Euclidean Commute Time Distance (ECTD),or a dimensional subspace keeping as much variance as possible,in terms of ECTD.We varied the dimension of the subspace,,from to by step of.It shows the percentage of vari-ance accounted for by thefirst principal compo-nents.After performing a PCA and keeping a given number of principal components, we recompute the distances in this reduced subspace. These Euclidean commute time distances between user agents and movie agents are then used in order to rank the movies for each user agent(the closest first).The best results were obtained for dimen-sions().Notice that,in the related paper,we also shows that this decomposition is similar to principal components analysis in the sense that the projection has maximal variance among all the possible candidate projections. Averagefirst-passage time(one-way).In a simi-lar way,we use the averagefirst-passage time,, to rank agent of a the movie-agent set with respect to agent of the user-agent set.This provides a dis-similarity between agent and any agent of the con-sidered set.This quantity will simply be referred to as “one-way time”.Averagefirst-passage time(return).As a dis-similarity between agent of the user-agent set and agent of the movie-agent set,we now use(the transpose of),that is,the average time used to reach(from the user-agent set)when starting from .This quantity will simply be referred to as“return time”.We now introduce other standard collaborativefil-tering methods to which we will compare our algo-rithms based onfirst-passage time.Nearest neighbours(KNN).The nearest neigh-bours method is one of the simplest and oldest meth-ods for performing general classification tasks.It can be represented by the following rule:to classify an unknown pattern,choose the class of the nearest ex-ample in the training set as measured by a similar-ity metric.When choosing the-nearest examples to classify the unknown pattern,one speaks about-nearest neighbours techniques.Using a nearest neighbours technique requires a measure of“closeness”,or“similarity”.There is often a great deal of subjectivity involved in the choice of a similarity measure(Johnson and Wichern, 2002).Important considerations include the nature of the variables(discrete,continuous,binary),scales of measurement(nominal,ordinal,interval,ratio),and subject matter knowledge.IndividualTotals IndividualTotalsTable1:Contingency table.In the case of our MAS movie architecture,pairs of agents are compared on the basis of the presence or absence of certain features.Similar agents have more features in common than do dissimilar agents.The presence or absence of a feature is described mathe-matically by using a binary variable,which assumes the value if the feature is present(if the person has watched the movie,that is if the user agent has an interaction with movie agent)and the value if the feature is absent(if the person has not watched the movie that is if the user agent has no interaction with movie agent).More precisely,each agent is characterized by a binary vector,,encoding the interactions with the movie agents(remember that there is an interac-tion between an user agent and a movie agent if the considered user has watched the considered movie). The nearest neighbours of agent are computed by taking the nearest according to a given simi-larity measure between binary vectors,.We performed systematic comparisons between eight different such measures(see(Johnson and Wichern,2002),p.674).Based on these compar-isons,we retained the measure that provide the best results:,where,,and are defined in Table1.In this table,represents the frequency of -matches between and,is the frequency of -matches,and so forth.We also varied systematically the number of neigh-bours.The best score was ob-tained with neighbours.In Section5,we only present the results ob-tained by the best-nearest neighbours model(i.e.,and).Once the-nearest neighbours are computed,the movie agents that are proposed to user agent are those that have the highest predicted values.The pre-dicted value of user agent for movie agent is com-puted as a sum weighted by of the values(or)of item for the neighbours of user agent:(5) where is defined in Equation1and we keep only the nearest neighbours.Cosine coefficient.The cosine coefficient be-tween user agents and,which measures the strength and the direction of a linear relationship between two variables,is defined byT.The predicted value of user agent for movie agent ,considering neighbours(i.e.,),is com-puted in a similar way as in the-nearest neighbours method(see Equation5).Dunham overviews in(Dunham,2003)other simi-larity measures related to cosine coefficient(i.e.,Dice similarity,Jaccard similarity and Overlap similarity). In Section5,we only show the results for the cosine coefficient,the other methods giving very close re-sults.Katz.This similarity index has been proposed in the social sciencesfield.In his attempt tofind a new social status index for evaluating status in a manner free from the deficiencies of popularity contest pro-cedures,Katz proposed in(Katz,1953)a method of computing similarities,taking into account not only the number of direct links between items but,also, the number of indirect links(going through interme-diaries)between items.The similarity matrix iswhere is the adjacency matrix and is a constant which has the force of a probability of effectiveness of a single link.A-step chain or path,then,has probability of being effective.In this sense,ac-tually measures the non-attenuation in a link, corresponding to complete attenuation and to absence of any attenuation.For the series to be con-vergent,must be less than the inverse of the spectral radius of.For the experiment,we varied systematically the value of and we only present the results obtained by the best model(i.e.,*(spectral radius)). Once we have computed the similarity matrix,the closest movie agent representing a movie that has not been watched is proposedfirst to the user agent. Dijkstra’s algorithm.Dijkstra’s algorithm solves a shortest path problem for a directed and connectedgraph which has nonnegative edge weights.As a dis-tance between two agents of theMAS architecture,we compute the shortest path between these two agents. The closest movie agent representing a movie that has not been watched is proposedfirst to the user agent. Pseudoinverse of the Laplacian matrix(). The pseudoinverse of the Laplacian matrix provides a similarity measure since is the matrix contain-ing the inner product of the vectors in the transformed space where the nodes are exactly separated by the ECTD(details are provided in a related paper).The predicted value of user agent for movie agent, considering neighbours(i.e.,),is com-puted in a similar way as in the-nearest neighbours method(see Equation5).4.3Performance evaluationThe performances of the scoring algorithms will be assessed by a variant of Somers’D,the degree of agreement(Siegel and Castellan,1988).For computing this degree of agreement,we con-sider each possible pair of movie agents and deter-mine if our method ranks the two agents of each pair in the correct order(in comparison with the test set which contains watched movies that should be rankedfirst)or not.The degree of agree-ment is therefore the proportion of pairs ranked in the correct order with respect to the total number of pairs, without considering those for which there is no pref-erence.A degree of agreement of(of all the pairs are in correct order and are in bad order)is similar to a completely random ranking.On the other hand,a degree of agreement of1means that the pro-posed ranking is identical to the ideal ranking.5Results5.1Ranking procedureFor each user agent,wefirst select the movie agents representing movies that have not been watched. Then,we rank them according to one of the proposed scoring algorithms.Finally,we compare the proposed ranking with the test set(if the ranking proce-dure performs well,we expect watched movies be-longing to the test set to be on top of the list)by using the degree of agreement.CT PCA CT One-way Return Katz KNN Dijkstra CosineTable2:Results obtained by the ranking procedures without considering the movie-category set.5.2Results and discussionsThe results of the comparison are tabulated in Table 2(where we display the degree of agreement for each method).We used the test set(which includes 10movies for each of the943users)to compute the global degree of agreement.Based on Table2,we observe that the best degree of agreement is obtained by the method(). The next degrees of agreement are obtained by the Cosine()and the-nearest neighbours method ().It is also observed that the commute time and the averagefirst-passage time(one-way)provide good results too,but are outperformed by the Co-sine,the KNN,Katz’algorithm(),and the PCA().They present a degree of agreement of and respectively.Notice,however,that the results of the method,the Cosine,the KNN, and the PCA are purely indicative,since they highly depend on the appropriate number of neighbours or on the appropriate number of principal components, which are difficult to estimate a priori.The com-mute time and the averagefirst-passage time(one-way)outperform the averagefirst-passage time(re-turn)().A method provides much worse re-sults:Dijkstra’s algorithm().It seems that,for Dijkstra algorithm,nearly each movie agent can be reached from any user agent with a shortest path dis-tance of.The degree of agreement is therefore close to because of the difficulty to rank the movies agent.5.3Computational issuesIn this section,we perform a comparison of the com-puting times(for a Pentium4,2.40GHz)for all the implemented methods:the average commute time, the principal components analysis(we consider10 components),the averagefirst-passage time one-way and return,the Katz method,the-nearest neigh-bours(we consider neighbours),the Dijkstra al-gorithm,the Cosine method(we consider againCT PCA CT One-way Return Katz KNN Dijkstra CosineTable3:Time(in sec)needed to compute predictions for all the non-watched movies and all the users neighbours),and the method.Table3shows the times,in seconds(using the Matlab cputime func-tion),needed by each method to provide predictions for all the non-watched movies agent and for each user agent(i.e.,user agents).We observe on the one hand,that the fastest method is the-nearest neighbours method and one the other hand,that the slowest methods are PCA and Dijkstra algorithm.The method which provides the best de-gree of agreement(i.e.,using the matrix as sim-ilarity measure)takes much more time than the-nearest neighbours method but is quite as fast as the Markov-based algorithms.6Conclusions and further workWe introduced a general procedure for computing dissimilarities between agents of a MAS architecture. It is based on a particular Markov-chain model of ran-dom walk through the graph.More precisely,we compute quantities(averagefirst-passage time,av-erage commute time,and the pseudoinverse of the Laplacian matrix)that provide dissimilarity measures between any pair of agents in the system.We showed through experiments performed on MAS architecture instantiated from the MovieLens database that these quantities perform well in com-parison with standard methods.In fact,as already stressed by(Klein and Randic,1993),the introduced quantities provide a very general mechanism for com-puting similarities between nodes of a graph,by ex-ploiting its structure.We are now investigating ways to improve the Markov-chain based methods.The main drawback of these methods is that it does not scale well for large MAS.Indeed,the Markov model has as many states as agents in the MAS.Thus, in the case of large MAS,we should rely on the sparseness of the data matrix as well as on iterative formulae(such as Equation4).REFERENCESBreese,J.,Heckerman,D.,and Kadie,C.(1998).Empirical analysis of pre-dictive algorithms for collaborativefiltering.Proceedings of the14th Conference on Uncertainty in Artificial Intelligence.Bremaud,P.(1999).Markov Chains:Gibbs Fields,Monte Carlo Simulation, and Queues.Springer-Verlag.Buckley,F.and Harary,F.(1990).Distance in graphs.Addison-Wesley Publishing Company.Chung,F.R.(1997).Spectral Graph Theory.American Mathematical Soci-ety.Dunham,M.(2003).Data Mining:Introductory and Advanced Topics.Pren-tice Hall.Gobel,F.and Jagers,A.(1974).Random walks on graphs.Stochastic Pro-cesses and their Applications,2:311–336.Johnson,R.and Wichern,D.(2002).Applied Multivariate Statistical Analy-sis,5th Ed.Prentice Hall.Katz,L.(1953).A new status index derived from sociometric analysis.Psy-chmetrika,18(1):39–43.Kemeny,J.G.and Snell,J.L.(1976).Finite Markov Chains.Springer-Verlag.Klein,D.J.and Randic,M.(1993).Resistance distance.Journal of Mathe-matical Chemistry,12:81–95.Konstan,J.,Miller,B.,Maltz,D.,Herlocker,J.,Gordon,L.,and Riedl,J.(1997).Grouplens:Applying collaborativefiltering to usenet news.Communications of the ACM,40(3):77–87.Norris,J.(1997).Markov Chains.Cambridge University Press.Resnick,P.,Neophytos,I.,Mitesh,S.,Bergstrom,P.,and Riedl,J.(1994).Grouplens:An open architecture for collaborativefiltering of net-news.Proceedings of the Conference on Computer Supported Coop-erative Work,pages175–186.Sarwar,B.,Karypis,G.,Konstan,J.,and Riedl,J.(2001).Item-based col-laborativefiltering recommendation algorithms.Proceedings of the International World Wide Web Conference,pages285–295.Sarwar,B.,Karypis,G.,Konstan,J.,and Riedl,J.(2002).Recommender sys-tems for large-scale e-commerce:Scalable neighborhood formation using clustering.Proceedings of the Fifth International Conference on Computer and Information Technology.Shardanand,U.and Maes,P.(1995).Social informationfiltering:Algo-rithms for automating’word of mouth’.Proceedings of the Confer-ence on Human Factors in Computing Systems,pages210–217.Siegel,S.and Castellan,J.(1988).Nonparametric Statistics for the Behav-ioral Sciences,2nd Ed.McGraw-Hill.White,S.and Smyth,P.(2003).Algorithms for estimating relative impor-tance in networks.Proceedings of the ninth ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data mining,pages 266–275.Wooldridge,M.and Jennings,N.R.(1994).Intelligent agents:Theory and practice.Knowledge Engineering Review paper,2:115–152.。