当前位置:文档之家› semantic Web mining Integrating Semantic Knowledge with Web Usage Mining

semantic Web mining Integrating Semantic Knowledge with Web Usage Mining

semantic Web mining Integrating Semantic Knowledge with Web Usage Mining
semantic Web mining Integrating Semantic Knowledge with Web Usage Mining

Integrating Semantic Knowledge with Web Usage Mining

for Personalization

Honghua Dai and Bamshad Mobasher

School of Computer Science, Telecommunication, and Information Systems

DePaul University

243 S. Wabash Ave.

Chicago, Illinois, 60604

Phone: 312-362-5174

Fax: 312-362-6116

Email: mobasher@https://www.doczj.com/doc/db11248558.html,

Keywords:Web usage mining, personalization, data mining, domain knowledge, ontologies, semantic Web mining

Integrating Semantic Knowledge with Web Usage Mining

for Personalization

Abstract

Web usage mining has been used effectively as an approach to automatic personalization and as a way to overcome deficiencies of traditional approaches such as collaborative filtering. Despite their success, such systems, as in more traditional ones, do not take into account the semantic knowledge about the underlying domain. Without such semantic knowledge, personalization systems cannot recommend different types of complex objects based in their underlying properties and attributes. Nor can these systems possess the ability to automatically explain or reason about the user models or user recommendations. The integration of semantic knowledge is, in fact, the primary challenge for the next generation of personalization systems. In this chapter we provide an overview of approaches for incorporating semantic knowledge into Web usage mining and personalization processes. In particular we discuss the issues and requirements for successful integration of semantic knowledge from different sources, such as the content and the structure of Web sites for personalization. Finally, we present a general framework for fully integrating domain ontologies with Web Usage Mining and Personalization processes at different stages, including the preprocessing and pattern discovery phases, as well as in the final stage where the discovered patterns are used for personalization.

INTRODUCTION

With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, personalization has emerged as a critical application which is essential to the success of a Web site. It is now common for Web users to encounter sites that provide dynamic recommendations for products and services, targeted banner advertising, and

individualized link selections. Indeed, nowhere is this phenomenon more apparent as in the business-to-consumer e-commerce arena. The reason is that, in today's highly competitive e-commerce environment, the success of a site often depends on the site's ability to retain visitors and turn casual browsers into potential customers. Automatic personalization and recommender system technologies have become critical tools, precisely because they help engage visitors at a deeper and more intimate level by tailoring the site's interaction with a visitor to her needs and interests.

Web personalization can be defined as any action that tailors the Web experience to a particular user, or set of users (Mobasher, Cooley, & Srivastava, 2000a). The experience can be something as casual as browsing a Web site or as (economically) significant as trading stocks or purchasing a car. Principal elements of Web personalization include modeling of Web objects (pages, etc.) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. The actions can range from simply making the presentation more pleasing to anticipating the needs of a user and providing customized information.

Traditional approaches to personalization have included both content-based and user-based techniques. Content-based techniques use personal profiles of users and recommend other items or pages based on their content similarity to the items or pages that are in the user’s profile. The underlying mechanism in these systems is usually the comparison of sets of keywords representing pages or item descriptions. Examples of such systems include Letizia (Lieberman, 1995) and WebWatcher (Joachims, Freitag, & Mitchell, 1997). While these systems perform well from the perspective of the end user who is searching the Web for information, they are less useful in e-commerce applications, partly due to the lack of server-side control by site owners, and partly because techniques based on content similarity alone may miss other types of semantic relationships among objects (for example, the associations among products or services that are semantically different, but are often used together).

User-based techniques for personalization, on the other hand, primarily focus on the similarities among users rather than item-based similarities. The most widely used technology user-based

personalization is collaborative filtering (CF) (Herlocker, Konstan, Borchers, & Riedl, 1999). Given a target user’s record of activity or preferences, CF-based techniques compare that record with the historical records of other users in order to find the users with similar interests. This is the so called neighborhood of the current user. The mapping of a visitor record to its neighborhood could be based on similarity in ratings of items, access to similar content or pages, or purchase of similar items. The identified neighborhood is then used to recommend items not already accessed or purchased by the active user. The advantage of this approach over purely content-based approaches which rely on content similarity in item-to-item comparisons is that it can capture “pragmatic” relationships among items based on their intended use or based on similar tastes of the users.

The CF-based techniques, however, suffer from some well-known limitations (Sarwar, Karypis, Konstan, & Riedl, 2000). For the most part these limitations are related to the scalability and efficiency of the underlying algorithms which requires real-time computation in both the neighborhood formation and the recommendation phases. The effectiveness and scalability of collaborative filtering can be dramatically enhanced by the application of Web usage mining techniques.

In general, Web mining can be characterized as the application of data mining to the content, structure, and usage of Web resources (Cooley, Mobasher, & Srivastava, 1997; Srivastava, Cooley, Deshpande, & Tan, 2000). The goal of Web mining is to automatically discover local as well as global models and patterns within and between Web pages or other Web resources. The goal of Web usage mining, in particular, is to capture and model Web user behavioral patterns. The discovery of such patterns from the enormous amount of data generated by Web and application servers has found a number of important applications. Among these applications are systems to evaluate the effectiveness of a site in meeting user expectations (Spiliopoulou, 2000), techniques for dynamic load balancing and optimization of Web servers for better and more efficient user access (Pitkow & Pirolli, 1999; Palpanas & Mendelzon, 1999), and applications for dynamically restructuring or customizing a site based on users’ predicted needs and interests (Perkowitz & Etzioni, 1998).

More recently, Web usage mining techniques have been proposed as another user-based approach to personalization which alleviate some of the problems associated with collaborative filtering (Mobasher et al,, 2000a). In particular, Web usage mining has been used to improve the scalability of personalization systems based on traditional CF-based techiques (Mobasher, Dai, Luo, & Nakagawa, 2001; Mobasher, Dai, Luo, & Nakagawa, 2002).

However, the pure usage-based approach to personalization has an important drawback: the recommendation process relies on the existing user transaction data, thus items or pages added to a site recently cannot be recommended. This is generally referred to as the “new item problem”. A common approach to revolving this problem in collaborative filtering has been to integrate content characteristics of pages with the user ratings or judgments (Claypool et al. 1999; Pazzani, 1999). Generally, in these approaches, keywords are extracted from the content on the Web site and are used to either index pages by content or classify pages into various content categories. In the context of personalization, this approach would allow the system to recommend pages to a user, not only based on similar users, but also (or alternatively) based on the content similarity of these pages to the pages user has already visited.

Keyword-based approaches, however, are incapable of capturing more complex relationships among objects at a deeper semantic level based on the inherent properties associated with these objects. For example, potentially valuable relational structures among objects such as relationships between movies, directors, and actors, or between students, courses, and instructors, may be missed if one can only rely on the description of these entities using sets of keywords. To be able to recommend different types of complex objects using their underlying properties and attributes, the system must be able to rely on the characterization of user segments and objects, not just based on keywords, but at a deeper semantic level using the domain ontologies for the objects. For instance, in a traditional personalization system on a university Web site might recommend courses in Java to a student, simply because that student has previously taken or shown interest in Java courses. On the other hand, a system that has knowledge of the underlying domain ontology, might recognize that the student should first satisfy the prerequisite requirements for a recommended course, or be able to recommend the best instructors for Java course, and so on.

An ontology provides a set of well-founded constructs that define significant concepts and their semantic relationships. An example of an ontology is a relational schema for a database involving multiple tables and foreign keys semantically connecting these relations. Such constructs can be leveraged to build meaningful higher level knowledge in a particular domain. Domain ontologies for a Web site usually include concepts, subsumption relations between concepts (concept hierarchies), and other relations among concepts that exist in the domain that the Web site represents. For example, the domain ontologies of a movie Web site usually includes concepts such as “movie,” “actor,” “director,” “theater” etc. The genre hierarchy can be used to represent different categories of movie concepts. Typical relations in this domain may include, “Starring” (between actors and movies), “Directing”, “Playing” (between theaters and movies), etc.

The ontology of a Web site can be constructed by extracting relevant concepts and relations from the content and structure of the site, through machine learning and Web mining techniques. But, in addition to concepts and relations that can be acquired from Web content and structure information, we are also interested in usage-related concepts and relations in a Web site. For instance, in an E-commerce Web site, we may be interested in the relations between users and objects which define different types of online activity, such as browsing, searching, registering, buying, and bidding. The integration of such usage-based relations with ontological information representing the underlying concepts and attributes embedded in a site allows for more effective knowledge discovery, as well as better characterization and interpretation of the discovered patterns.

In the context of Web personalization and recommender systems, the use of semantic knowledge can lead to deeper interaction of the visitors or customers with the site. Integration of domain knowledge allows such systems to infer additional useful recommendations for users based on more fine grained characteristics of the objects being recommended, and provides the capability to explain and reason about user actions.

In this chapter we present an overview of the issues related to and requirements for successfully

integrating semantic knowledge in the Web usage mining and personalization processes. We begin by providing some general background on the use of semantic knowledge and ontologies in Web mining, as well as an overview of personalization based on Web usage mining. We then discuss how the content and the structure of the site can be leveraged to transform raw usage data into semantically-enhanced transactions that can be used for semantic Web usage mining and personalization. Finally we present a framework for more systematically integrating full-fledged domain ontologies in the personalization process.

BACKGROUND

Semantic Web Mining

Web mining is the process of discovering and extracting useful knowledge from the content, usage, and structure of one or more Web sites. Semantic Web mining (Berendt, Hotho, & Stumme, 2002) involves the integration of domain knowledge into the Web mining process.

For the most part the research in Semantic Web mining has been focused in application areas such as Web content and structure mining. In this section, we provide a brief overview and some examples of related work in this area. Few studies have focused on the use of domain knowledge in Web usage mining. Our goal in this chapter is to provide a road map for the integration of semantic and ontological knowledge into the process of Web usage mining, and particularly, in its application to Web personalization and recommender systems.

Domain knowledge can be integrated into the Web mining process in many ways. This includes leveraging explicit domain ontologies or implicit domain semantics extracted from the content or the structure of documents or Web site. In general, however, this process may involve one or more of three critical activities: domain ontology acquisition, knowledge base construction, and knowledge-enhanced pattern discovery.

Domain Ontology Acquisition

The process of acquiring, maintaining and enriching the domain ontologies is referred to as “ontology engineering”. For small Web sites with only static Web pages, it is feasible to construct a domain knowledge base manually or semi-manually. In Loh, Wives, & de Oliveira (2000) a semi-manual approach is adopted for defining each domain concept as a vector of terms with the help of existing vocabulary and natural language processing tools.

However, manual construction and maintenance of domain ontologies require a great deal of effort on the part of knowledge engineers, particularly for large-scale Web sites or Web sites with dynamically generated content. In dynamically generated Web sites, page templates are usually populated based on structured queries performed against back-end databases. In such cases, the database schema can be used directly to acquire ontological information. Some Web servers send structured data files (e.g., XML files) to users and let client-side formatting mechanisms (e.g., CSS files) work out the final Web representation on client agents. In this case, it is generally possible to infer the schema from the structured data files.

When there is no direct source for acquiring domain ontologies, machine learning and text mining techniques must be employed to extract domain knowledge from the content or hyperlink structure of the Web pages. In Clerkin, Cunningham, & Hayes (2001) a hierarchical clustering algorithm is applied to terms in order to create concept hierarchies. In Stumme, Taouil, Bastide, Pasquier, & Lakhal (2000) a Formal Concept Analysis framework is proposed to derive a concept lattice (a variation of association rule algorithm). The approach proposed in Maedche & Staab (2000) learns generalized conceptual relations by applying association rule mining. All these efforts aim to automatically generate machine understandable ontologies for Web site domains.

The outcome of this phase is a set of formally defined domain ontologies that precisely represent the Web site. Good representation should provide machine understandability, the power of reasoning, and computation efficiency. The choice of ontology representation language has a direct effect on the flexibility of the data mining phase. Common representation approaches are vector-space model (Loh et al., 2000), descriptive logics (such as DAML+OIL)

(Giugno & Lukasiewicz, 2002; Horrocks & Sattler 2001), first order logic (Craven et al., 2000), relational models (Dai & Mobasher, 2002), probabilistic relational models (Getoor, Friedman, Koller, & Taskar, 2001), and probabilistic Markov models (Anderson, Domingos, & Weld, 2002).

Knowledge Base Construction

The first phase generates the formal representation of concepts and relations among them. The second phase, knowledge base construction, can be viewed as building mappings between concepts or relations on the one hand, and objects on the Web. The goal of this phase is to find the instances of the concepts and relations from the Web site’s domain, so that they can be exploited to perform further data mining tasks. Learning algorithms play an important role in this phase.

In Ghani & Fano (2002) a text classifier is learned for each “semantic feature” (somewhat equivalent to the notion of a concept) based on a small manually labeled data set. First Web pages are extracted from different Web sites that belong to a similar domain, and then the semantic features are manually labeled. This small labeled data set is fed into a learning algorithm as the training data to learn the mappings between Web objects and the concept labels. In fact, this approach treats the process of assigning concept labels as filling “missing” data. Craven et al. (2000) adopt a combined approach of statistical text classification and first order text classification in recognizing concept instances. In that study, learning process is based on both page content and linkage information.

Knowledge-Enhanced Web Data Mining

Domain knowledge enables analysts to perform more powerful Web data mining tasks. The applications include content mining, information retrieval and extraction, Web usage mining, and personalization. On the other hand, data mining tasks can also help to enhance the process of domain knowledge discovery.

Domain knowledge can improve the accuracy of document clustering and classification and induce more powerful content patterns. For example, in Horrocks (2002), domain ontologies are employed in selecting textual features. The selection is based on lexical analysis tools that map

terms into concepts within the ontology. The approach also aggregates concepts by merging the concepts that have low support in the documents. After preprocessing, only necessary concepts are selected for the content clustering step. In McCallum, Rosenfeld, Mitchell, & Ng (1998) a concept hierarchy is used to improve the accuracy and the scalability of text classification.

Traditional approaches to content mining and information retrieval treat every document as a set or a bag of terms. Without domain semantics, we would treat “human” and “mankind” as different terms, or, “brake” and “car” as unrelated terms. In Loh et al. (2000) a concept is defined as a group of terms that are semantically relevant, e.g., as synonyms. With such concept definitions, concept distribution among documents is analyzed to find interesting concept patterns. For example, one can discover dominant themes in a document collection or in a single document; or find associations among concepts.

Ontologies and domain semantics have been applied extensively in the context of Web information retrieval and extraction. For example, the ARCH system (Parent, Mobasher, & Lytinen, 2001) adopts concept hierarchies because they allow users to formulate more expressive and less ambiguous queries when compared to simple keyword-based queries. In ARCH, an initial user query is used to find matching concepts within a portion of concept hierarchy. The concept hierarchy is stored in an aggregated form with each node represented as a term vector. The user can select or unselect nodes in the presented portion of the hierarchy, and relevance feedback techniques are used to modify the initial query based on these nodes.

Similarly, domain-specific search and retrieval applications allow for more focused and accurate search based on specific relations inherent in the underlying domain knowledge. The CiteSeer system (Bollacker, Lawrence, & Giles, 1998) is a Web agent for finding interesting research publications, in which the relation “cited by” is the primary relation discovered among objects (i.e., among published papers). Thus, CiteSeer allows for comparison and retrieval of documents, not only based on similar content, but also based on inter-citation linkage structure among documents.

CiteSeer is an example of an approach for integrating semantic knowledge based on Web

structure mining. In general, Web structure mining tasks take as input the hyperlink structure of Web pages (either belonging to a Web site or relative to the whole Web), and output the underlying patterns (e.g., page authority values, linkage similarity, Web communities, etc.) that can be inferred from the hypertext co-citations. Another example of such an approach is the PageRank algorithm which is the backbone of the Google search engine. PageRank uses the in-degree of indexed pages (i.e., number of pages referring to it) in order to rank pages based on quality or authoritativeness. Such algorithms that are based on the analysis of structural attributes, can be further enhanced by integrating content semantics (Chakrabarti et al., 1998). Web semantics can also enhance crawling algorithms by combining content or ontology with linkage information (Chakrabarti, van den Berg, & Dom, 1999; Maedche, Ehrig, Handschuh, Volz, & Stojanovic, 2002).

The use of domain knowledge can also provide tremendous advantages in Web usage mining and personalization. For example, semantic knowledge may help in interpreting, analyzing, and reasoning about usage patterns discovered in the mining phase. Furthermore, it can enhance collaborative filtering and personalization systems by providing concept-level recommendations (in contrast to item-based or user-based recommendations). Another advantage is that user demographic data, represented as part of a domain ontology, can be more systematically integrated into collaborative or usage-based recommendation engines. Several studies have considered various approaches to integrate content-based semantic knowledge into traditional collaborative filtering and personalization frameworks (Anderson et al., 2002; Claypool et al., 1999; Melville, Mooney, & Nagarajan, 2002; Mobasher, Dai, Luo, Sun, & Zhu, 2000b; Pazzani, 1999). Recently, we proposed a formal framework for integrating full domain ontologies with the personalization process based on Web usage mining (Dai & Mobasher, 2002).

Web Usage Mining and Personalization

The goal of personalization based on Web usage mining is to recommend a set of objects to the current (active) user, possibly consisting of links, ads, text, products, etc., tailored to the user’s perceived preferences as determined by the matching usage patterns. This task is accomplished by matching the active user session (possibly in conjunction with previously stored profiles for

that user) with the usage patterns discovered through Web usage mining. We call the usage patterns used in this context aggregate usage profiles since they provide an aggregate representation of the common activities or interests of groups of users. This process is performed by the recommendation engine which is the online component of the personalization system. If the data collection procedures in the system include the capability to track users across visits, then the recommendations can represent a longer term view of user’s potential interests based on the user’s activity history within the site. If, on the other hand, aggregate profiles are derived only from user sessions (single visits) contained in log files, then the recommendations provide a “short-term” view of user’s navigational interests. These recommended objects are added to the last page in the active session accessed by the user before that page is sent to the browser.

Figure 1: General Framework for Web Personalization Based on Web Usage Mining – The

Offline Pattern Discovery Component.

The overall process of Web personalization based on Web usage mining consists of three phases: data preparation and transformation, pattern discovery, and recommendation. Of these, only the latter phase is performed in real-time. The data preparation phase transforms raw Web log files into transaction data that can be processed by data mining tasks. A variety of data mining techniques can be applied to this transaction data in the pattern discovery phase, such as

clustering, association rule mining, and sequential pattern discovery. The results of the mining phase are transformed into aggregate usage profiles, suitable for use in the recommendation phase. The recommendation engine considers the active user session in conjunction with the discovered patterns to provide personalized content.

Figure 2: General Framework for Web Personalization Based on Web Usage Mining – The

Online Personalization Component.

The primary data sources used in Web usage mining are the server log files, which include Web server access logs and application server logs. Additional data sources that are also essential for both data preparation and pattern discovery include the site files (HTML, XML, etc.) and meta-data, operational databases, and domain knowledge. Generally speaking, the data obtained through these sources can be categorized into four groups (see also Cooley, Mobasher, & Srivastava, 1999 and Srivastava et al., 2000).

?Usage data: The log data collected automatically by the Web and application servers represents the fine-grained navigational behavior of visitors. Depending on the goals of the analysis, this data needs to be transformed and aggregated at different levels of abstraction. In

Web usage mining, the most basic level of data abstraction is that of a pageview. Physically,

a pageview is an aggregate representation of a collection of We

b objects contributing to the

display on a user’s browser resulting from a single user action (such as a clickthrough). These Web objects may include multiple pages (such as in a frame-based site), images, embedded components, or script and database queries that populate portions of the displayed page (in dynamically generated sites). Conceptually, each pageview represents a specific “type” of user activity on the site, e.g., reading a news article, browsing the results of a search query, viewing a product page, adding a product to the shopping cart, and so on. On the other hand, at the user level, the most basic level of behavioral abstraction is that of a server session (or simply a session). A session (also commonly referred to as a “visit”) is a sequence of pageviews by a single user during a single visit. The notion of a session can be further abstracted by selecting a subset of pageviews in the session that are significant or relevant for the analysis tasks at hand. We shall refer to such a semantically meaningful subset of pageviews as a transaction. It is important to note that a transaction does not refer simply to product purchases, but it can include a variety of types of user actions as captured by different pageviews in a session.

?Content data: The content data in a site is the collection of objects and relationships that are conveyed to the user. For the most part, this data is comprised of combinations of textual material and images. The data sources used to deliver or generate this data include static HTML/XML pages, image, video, and sound files, dynamically generated page segments from scripts or other applications, and collections of records from the operational database(s).

The site content data also includes semantic or structural meta-data embedded within the site or individual pages, such as descriptive keywords, document attributes, semantic tags, or HTTP variables. Finally, the underlying domain ontology for the site is also considered part of content data. The domain ontology may be captured implicitly within the site, or it may exist in some explicit form. The explicit representation of domain ontologies may include conceptual hierarchies over page contents, such as product categories, structural hierarchies represented by the underlying file and directory structure in which the site content is stored, explicit representation of semantic content and relationships via an ontology language such as RDF, or a database schema over the data contained in the operational databases.

?Structure data: The structure data represents the designer’s view of the content organization within the site. This organization is captured via the inter-page linkage structure among pages, as reflected through hyperlinks. The structure data also includes the intra-page structure of the content represented in the arrangement of HTML or XML tags within a page. For example, both HTML and XML documents can be represented as tree structures over the space of tags in the page. The structure data for a site is normally captured by an automatically generated “site map” which represents the hyperlink structure of the site. A site mapping tool must have the capability to capture and represent the inter- and intra-pageview relationships. This necessity becomes most evident in a frame-based site where portions of distinct pageviews may represent the same physical page. For dynamically generated pages, the site mapping tools must either incorporate intrinsic knowledge of the underlying applications and scripts, or must have the ability to generate content segments using a sampling of parameters passed to such applications or scripts.

?User data: The operational database(s) for the site may include additional user profile information. Such data may include demographic or other identifying information on registered users, user ratings on various objects such as pages, products, or movies, past purchase or visit histories of users, as well as other explicit or implicit representation of users’ interests. Obviously, capturing such data would require explicit interactions with the users of the site. Some of this data can be captured anonymously, without any identifying user information, so long as there is the ability to distinguish among different users. For example, anonymous information contained in client-side cookies can be considered a part of the user’s profile information, and can be used to identify repeat visitors to a site. Many personalization applications require the storage of prior user profile information. For example, collaborative filtering applications, generally, store prior ratings of objects by users, though, such information can be obtained anonymously, as well.

For a detailed discussion of preprocessing issues related to Web usage mining see Cooley et al., 1999. Usage preprocessing results in a set of n pageviews, P = {p1,p2,···,p n}, and a set of m user transactions, T = {t1,t2,···,t m}, where each t i in T is a subset of P. Pageviews are semantically

meaningful entities to which mining tasks are applied (such as pages or products). Conceptually, we view each transaction t as an l -length sequence of ordered pairs:

)(,(,),(,(),(,(2211t l t l t t t t p w p p w p p w p t L =,

where each = p t i p j for some j in {1,···,n }, and w ()t i p is the weight associated with pageview in transaction t , representing its significance (usually, but not exclusively, based on time duration).

t i p

For many data mining tasks, such as clustering and association rule discovery, as well as collaborative filtering based on the k NN technique, we can represent each user transaction as a vector over the n -dimensional space of pageviews. Given the transaction t above, the transaction

vector is given by: t v t p t p t

p n w w w ,,,21L =0=t

p i w t , where each ,)(t j t p p w w i = for some j in {1,···,n }, if p j

appears in the transaction t, and , otherwise. Thus, conceptually, the set of all user

transactions can be viewed as an m × n transaction-pageview matrix, denoted by TP.

Given a set of transactions as described above, a variety of unsupervised knowledge discovery techniques can be applied to obtain patterns. These techniques such as clustering of transactions (or sessions) can lead to the discovery of important user or visitor segments. Other techniques such as item (e.g., pageview) clustering and association or sequential pattern discovery can be used to find important relationships among items based on the navigational patterns of users in the site. In each case, the discovered patterns can be used in conjunction with the active user session to provide personalized content. This task is performed by a recommendation engine.

REQUIREMENTS FOR SEMANTIC WEB USAGE MINING

In this section, we present and discuss the essential requirements in the integration of domain knowledge with Web usage data for pattern discovery. Our focus is on the critical tasks that

particularly play an important role when the discovered patterns are to be used for Web personalization. As a concrete example, in the last part of this section we discuss an approach for integrating semantic features extracted from the content of Web sites with Web usage data, and how this integrated data can be used in conjunction with clustering to perform personalization. In the next section, we go beyond keyword-based semantics and present a more formal framework for integrating full ontologies with the Web usage mining and personalization processes.

Representation of Domain Knowledge

Representing Domain Knowledge as Content Features

One direct source of semantic knowledge that can be integrated into mining and personalization processes is the textual content of Web site pages. The semantics of a Web site are, in part, represented by the content features associated with items or objects on the Web site. These features include keywords, phrases, category names, or other textual content embedded as meta-information. Content preprocessing involves the extraction of relevant features from text and meta-data.

During the preprocessing, usually different weights are associated with features. For features extracted from meta-data, feature weights are usually provided as part of the domain knowledge specified by the analyst. Such weights may reflect the relative importance of certain concepts. For features extracted from text, weights can normally be derived automatically, for example as a function of the term frequency and inverse document frequency (tf.idf) which is commonly used in information retrieval.

Further preprocessing on content features can be performed by applying text mining techniques. This would provide the ability to filter the input to, or the output from, other mining algorithms. For example, classification of content features based on a concept hierarchy can be used to limit the discovered patterns from Web usage mining to those containing pageviews about a certain subject or class of products. Similarly, performing learning algorithms such as, clustering,

formal concept analysis, or association rule mining on the feature space can lead to composite features representing concept categories or hierarchies (Clerkin, Cunningham, & Hayes, 2001; Stumme et al., 2000).

The integration of content features with usage-based personalization is desirable when we are dealing with sites where text descriptions are dominant and other structural relationships in the data are not easy to obtain, e.g., news sites or online help systems, etc. This approach, however, is incapable of capturing more complex relations among objects at a deeper semantic level based on the inherent properties associated with these objects. To be able to recommend different types of complex objects using their underlying properties and attributes, the system must be able to rely on the characterization of user segments and objects, not just based on keywords, but at a deeper semantic level using the domain ontologies for the objects. We will discuss some examples of how integrated content features and usage data can be used for personalization later in this Section.

Representing Domain Knowledge as Structured Data

In Web usage mining, we are interested in the semantics underlying a Web transaction or a user profile which is usually composed of a group of pageview names and query strings (extracted from Web server logs). Such features, in isolation, do not convey the semantics associated with the underlying application. Thus, it is important to create a mapping between these features and the objects, concepts, or events they represent.

Many E-Commerce sites generate Web pages by querying operational databases or semi-structured data (e.g., XML and DTDs), from which semantic information can be easily derived. For Web sites in which such structured data cannot be easily acquired, we can adopt machine learning techniques to extract semantic information [10]. Furthermore, the domain knowledge acquired should be machine understandable in order to allow for further processing or reasoning. Therefore, the extracted knowledge should be represented in some standard knowledge representation language.

DAML+OIL (Horrocks & Sattler, 2001) is an example of an ontology language that combines

the Web standards from XML and RDF, with the reasoning capabilities from a description logic SHIP(DL). The combinations of relational models and probabilistic models is another common approach to enhance Web personalization with domain knowledge and reasoning mechanism.

Several approaches to personalization have used Relational Models such as Relational Markov

Model (Anderson et al., 2002). Both of these approaches provide the ability to represent knowl-edge at different levels of abstraction, and the ability to reason about concepts, including about such relations as subsumption and membership.

In Dai & Mobasher (2002), we adopted the syntax and semantics of another ontology representation framework, SHOQ(D), to represent domain ontologies. In SHOQ(D), the notion of concrete datatype is used to specify literal values and individuals which represent real objects in the domain ontology. Moreover, concepts can be viewed as sets of individuals, and roles are binary relations between a pair of concepts or between concepts and data types. The detailed formal definitions for concepts and roles are given in Horrocks & Sattler (2001) and Giugno & Lukasiewicz (2002). Because our current work does not focus on reasoning tasks such as deciding subsumption and membership, we do not focus our discussion on these operations. The reasoning apparatus in SHOQ(D) can be used to provide more intelligent data mining services as part of our future work.

Building “Mappings” Between Usage-Level and Domain-Level Instances During usage data preprocessing or post processing, we may want to assign domain semantics to user navigational paths/patterns by mapping the pageview names or URLs (or queries) to the instances in the knowledge base. To be more specific, instead of describing a user’s navigational path as: “a 1, a 2, ..., a n ” (where a i is a URL pointing to a Web resource), we need to represent it using the instances from the knowledge base, such as: “movie(name =Matrix), movie(name=Spiderman), …, movie(name=Xman).” With the help of a pre-acquired concept hierarchy, we may, for example, be able to infer that the current user’s interest is in the category of “Action&Sci-Fi.” We refer to this “semantic” form of usage data as “Semantic User Profiles.” These profiles, in turn, can be used for semantic pattern discovery and online recommendations. In the context of personalization applications, domain-level (semantic) instances may also need to

be mapped back to Web resources or pages. For example, a recommendation engine using semantic user profiles may result in recommendations in the form of a movie genre. This concept must be mapped back into specific pages, URLs, or sections of the site relating to this genre before recommendations can be relayed to the user.

Using Content and Structural Characteristics

Classification algorithms utilizing content and structural features from pages are well-suited for creating mappings from usage data to domain-level instances. For example, in Craven et al. (2000) and Ghani & Fano (2002) classifiers are trained that exploit content or structural features (such as terms, linkage information, and term proximity) of the pageviews. From the pageview names or URLs we can obtain the corresponding Web content such as meta-data or keywords. With help from text classification algorithms, it is possible to efficiently map from keywords to attribute instances (Ghani & Fano, 2002).

Another good heuristics used in creating semantic mappings is based on the anchor text associated with hyperlinks. If we can build the complete user navigational path, we would be able to acquire the anchor text for each URL or pageview name. We can include the anchor text as part of the content features extracted from the body of documents or in isolation. However, whereas the text features in a document represent the semantics of the document, itself, the anchor text represents the semantics of the document to which the associated hyperlink points.

Using Query Strings

So far, we have overlooked the enormous amount of information stored in databases or semi-structured documents associated with a site. Large information servers often serve content integrated from multiple underlying servers and databases (Berendt, & Spiliopoulou, 2000). The dynamically generated pages on such servers are based on queries with multiple parameters attached to the URL corresponding to the underlying scripts or applications. Using the Web server query string recorded in the server log files it is possible to reconstruct the response pages. For example, the following are query strings from a hypothetical online book-seller Web site:

MailGateway邮件安全网关产品解决方案

MailGateway邮件安全网关产品解决方案随着计算机技术的普遍使用,对外发送电子文档已经成为我们日常工作和生活必不可少的方式,而邮件则成为企业日常办公必需的工具,怎样有效控制邮件外发文件,防止机密数据和敏感信息二次扩散,是当今企业所面临的重大安全问题之一。 I.需求分析 1.企业应用需求 1)邮件外发附件支持自动加解密。 2)与可信任客户合作厂商邮件交流,无需审批外发附件自动解密。 3)可根据需求灵活设置外发邮件附件自动解密名单。 4)可追溯邮件外发附件自动解密记录,方便审计。 5)应用系统必须可靠易操作。 2.预期目标 对于企事业单位用户使用邮件加解密网关后,用户与可信任客户合作厂商邮件交流无需走繁琐的审批流程,提高工作效率。邮件外发自动解密名单可根据需要灵活设置,权限设置和附件外发解密记录可追溯,用户工作习惯不改变。II.解决方案 1.方案概述 针对该类需求,MailGateway邮件安全网关能够全面解决该类信息资产安全问题。 适用于任何基于TCP/IP协议的网络体系(局域网或广域网),部署方便不改变原有网络结构。以下是方案部署拓扑图:

企业内部网络企业数据中心机房 2.方案效果 运行效果如下: 1)用户在邮件外发时先经过内部邮件服务器,然后转发到MailGateway,再转发到外部邮件服务器分发最终用户;邮件接收时直接经外部邮件服务转发到企业内部邮件服务器分到接收用户无需再经过MailGateway。 2) 邮件经过MailGateway时,根据设置策略匹配决定附件是否解密。 3) 邮件白名单设置后发到该用户的所有邮件自动解密附件。 4) 邮件白名单设置由管理员设定。 以下是方案效果示意图

反垃圾邮件网关的技术规范

反垃圾邮件网关的技术规范 一、邮件网关要求 1、基本要求 (1)采用专用的硬件平台,自身安全性高、稳定性好。保证邮件网关系统的稳定性和性能,确保邮件网关设备不会成为网络系统的性能瓶颈。 (2)优越的系统性能。每小时处理的邮件流量和对收发邮件的处理内容扫描速度在同类产品中领先,支持标准SMTP和POP3协议,适用于任何支持上述邮件协议的邮件系统。 (3)要求通过公安部防病毒网关产品认证和防垃圾邮件认证,且同时拥有这两类安全产品的认证证书,最好能有河南省公安厅在本地的经营推荐证明。 (4)可以有效地实现电子邮件病毒过滤、内容过滤、垃圾邮件过滤,蠕虫过滤,阻断后门程序、DoS/DDoS等动态攻击行为。 (5)针对通过SMTP、POP3、HTTP、FTP等协议传输的内容进行过滤处理。 2、功能要求 (1)具备强大的反病毒功能 对所有进出站的邮件进行病毒扫描,应能够有效过滤普通病毒、邮件病毒、蠕虫病毒、木马活动,可以进行病毒邮件的隔离、删除、以及清除病毒的操作,支持病毒扫描引擎和病毒代码库的实时在线更新,及时遏制最新病毒的发作。为了保证系统的最佳性能,缓存扫描结果。 采用自主知识产权的成熟的防病毒引擎。 (2)能抵御对邮件服务器的各种攻击 全面防范针对传输层25端口攻击,防止邮件地址泄露,保障后端邮件系统的安全。提供最完善的防攻击体系,有效地防范针对邮件系统的各类攻击,包括邮件服务应用层的字典算法攻击、目录树攻击、多线程攻击、DHA攻击、DoS攻击等;邮件网关层的空文件攻击、多重病毒感染攻击、多重压缩攻击等。 (3)具有多层反垃圾邮件的防御结构 提供有力的、灵活的反垃圾邮件措施来保护邮件系统免受垃圾邮件的攻击,全面地防御垃圾邮件对邮件系统进行攻击。 所有的邮件都必须通过验证,才可以被发送至邮件系统;拒绝非法用户邮件的投递。 支持速率限制、并发连接、连接频率限制,防止拒绝服务攻击、保护网络带宽。防止邮件系统负担过重,造成正常邮件信息发送失败,

安全网关产品介绍

安全网关产品介绍 一、产品定义 简单的说:是为互联网接入用户解决边界安全问题的安全服务产品。 详细的说:“安全网关”是集防火墙、防病毒、防垃圾邮件、IPS 入侵防御系统/IDS入侵检测系统、内容过滤、VPN、DoS/DdoS攻击检测、P2P应用软件控制、IM应用软件控制等九大功能和安全报表统计分析服务为一体的网络信息安全产品。 二、产品特点 1)采用远程管理方式,利用统一的后台管理平台对放在客户网络边界的网关设备进行远程维护和管理。 2)使用范围广,所有运营商的互联网专线用户均可使用。 3)解决对信息安全防护需求迫切、资金投入有限、专业人员缺乏的客户群体,以租用方式为用户提供安全增值服务,实现以较低成本获得全面防御。 4)功能模块化、部署简便、配置灵活。 四、(UTM)功能模块 1)防火墙功能及特点:针对IP地址、服务、端口等参数,实现网络层、传输层及应用层的数据过滤。 2)防病毒功能及特点:能够有效检测、消除现有网络的病毒和蠕虫。实时扫描输入和输出邮件及其附件。

3)VPN功能及特点:可以保护VPN网关免受Ddos(distributed Deny of Service)攻击和入侵威胁,并且提供更好的处理性能,简化网络管理的任务,能够快速适应动态、变化的网络环境。 4)IPS(Intrusion Prevention System , 入侵防御系统)功能及特点:发现攻击和恶意流量立即进行阻断;实时的网络入侵检测和阻断。随时更新攻击特征库,保障防御最新的安全事件攻击。 5)Wed内容过滤功能及特点:处理浏览的网页内容,阻挡不适当的的内容和恶意脚本。 6)垃圾邮件过滤功能及特点:采用内容过滤、邮件地址/MIME头过滤、反向DNS解析以及黑名单匹配等多种垃圾邮件过滤手段。 7)DoS/DDoS(distributed Deny of Service)攻击检测功能:“安全网关”能够有效检测至少以下典型的DoS/DDoS攻击,并可以根据设定的Tcp_syn_flood、udp_flood等阀值阻断部分攻击。 8)P2P应用软件控制功能:可以实现对BitTorrent、eDonkey、Gnutella、KaZaa、Skype、WinNY,Xunlei等P2P软件的通过、阻断及限速。 9)IM应用软件控制功能:“安全网关”提供对IM应用软件的控制功能,可以实现对AIM、ICQ、MSN、Yahoo!、SIMPLE、QQ等IM软件的控制,包括:阻断登录、阻断文件传输等控制。 10)安全报表服务:“安全网关”提供用户报表定制功能,能够根据用户的要求对报表格式、发送方式及发送频率进行定制。其中内容包括:A、安全服务套餐报表;B、安全策略设置一览表;C、网络带宽占用及流量分析报告或报表;D、攻击事件分析报告或报表;E、按名称的病毒排名:本客户的病毒爆发次数排名;F、按IP的病毒排名:本客户所有计算机的病毒爆发多少排名;G、垃圾邮件排名:统计垃

邮件安全网关产品白皮书

邮件安全网关产品白皮书 中企动力科技集团股份有限公司协同通讯事业部 2013年4月

目录 1.概述 (3) 2.产品特点 (3) 2.1.部署灵活 (3) 2.1.1.转发部署 (4) 2.1.2.透明部署 (4) 2.1.3.邮件服务器前端 (5) 2.1.4.网络出口部署 (5) 2.2.分布式结构,性能卓越 (6) 3.流处理技术 (7) 4.独特的存储技术 (7) 5.功能介绍 (8) 5.1.反垃圾引擎 (8) 5.2.反病毒引擎 (10) 5.3.防攻击 (11) 5.4.支持子策略的多级过滤 (12) 5.5.投递控制 (13) 5.6.黑名单和白名单 (14) 5.7.垃圾邮件摘要 (14) 5.8.高效的邮件备份审计功能 (15) 5.9.完备的报表统计功能 (16) 5.10.TOP排名统计 (16) 6.产品报价 (17)

1.概述 邮件安全网关产品是一款集成了反垃圾邮件、反病毒邮件、智能灵活的邮件过滤、高效的邮件备份等功能与一体的安全网关产品,为用户供强大的邮件安全保护和过滤功能。 部署邮件安全网关有以下重要意义: ?杜绝邮件服务器的非自身原因宕机,保证邮件服务的连续性 ?可以基本斩断病毒的邮件入侵渠道,保证内网安全 ?可以最大限度的减少垃圾邮件的干扰,提高邮件服务满意率 ?可以避免重要信息通过邮件违规外泄,避免潜在法律风险 ?可以提高邮件应用效率,提高邮件传递速度 ?部署在网络出口,可以过滤进出网络所有的SMTP/POP3邮件 2.产品特点 ?采用优化的LINUX平台,性能优异,稳定安全 ?灵活的部署,支持转发和透明模式,可部署在不同的网络位置 ?采用流处理技术,确保没有邮件队列积压,收发没有延迟 ?高达99%的垃圾抓获率,低于0.001%的误判率,特有的反钓鱼数据库 ?集群统一管理和输出的统计分析功能 ?管理员权限的多级控制 ?可集成的网页过滤功能,体现完整的内容安全管理方案 2.1.部署灵活 邮件安全网关支持通常的转发方式部署以及透明方式部署,可以根据客户的实际情况进行不同的部署。

我校垃圾邮件安全网关系统简介

我校垃圾安全网关系统简介 垃圾被认为是当今困扰网络界的一大难题。垃圾对电子用户破坏性最大,给个人和企业都带来巨大的损失。垃圾、病毒的大量泛滥,严重影响了系统的工作效率和普通用户的工作效率,更为严重的是可能危害到计算机的正常使用。中国互联网协会2004年4月份所做的一份调查研究显示,在中国,垃圾占了总量的60.5%,2003年垃圾致使国内生产总值损失达到了48亿元人民币,折合美元5.83亿元。 一段时期以来,我校广大师生一直受到垃圾问题的困扰,为了解决这个问题,网络中心经过长期的考察和调研,为学校的电子服务器配置了“安全网关系统”。"安全网关"有效地从网络层到应用层保护服务器不受各种形式的网络攻击,同时为用户提供:屏蔽垃圾、查杀电子病毒(包括附件和压缩文件)和实现内容过滤(包括各种附件中的内容)等功能。 经过对目前市场上多种网关系统的长时间考察和测试,最终选择了“美讯智安全网关”,它是市场上唯一基于内容过滤、实现查杀病毒和防X垃圾的产品,大大提高了防X的准确率,垃圾过滤率最高可达98%。 下图是我校安全网关系统的统计图,其中正常只有5%,其余的是:病毒17%,垃圾27%,策略过滤垃圾50%,可见这段时间电子的稳定,安全网关系统起了关键的作用。 为了保证用户有用的不被过滤掉,每天用户会收到一封垃圾列表,列出自己当天被过滤

掉的。如果用户感到哪封信是自己需要的,只要点击“投递”即可将收到自己的信箱中。 此外,凡是您正常收到的经过过虑的,在内容中都有这样的信息: ……………… ………………. Powered by MessageSoft SMG SPAM, virus-free and secure email .messagesoft. 网络中心2004-9-8

最前沿!思科IronPort邮件网关

最前沿!思科IronPort邮件网关 【IT168专稿】本文的目标是让您能够了解:常见的邮件安全问题、ESA(email security appliance 电子邮件安全设备)的主要功能、ESA的典型应用场景、ESA带给用户的价值、ESA的最佳部署方案。涉及到的主题有:ESA简介、如何防御垃圾邮件、ESA系统架构、典型应用场景、邮件管道、邮件策略管理、数据泄露保护和内容过滤器、邮件加密。 一、ESA邮件安全简介 互联网技术的全球化、移动化、协作化发展,给用户带来很大方便,节约大量时间。而发展的同时各种威胁也不断出现,各行业推出了相关法规和安全策略,这些都是安全服务厂商需要面临和解决的问题。 思科的安全应用体系如下图。 安全应用体系涉及三个方面的内容,包括Cisco SensorBase、Threat Operations Center、Dynamic Updated。 Cisco SensorBase是一个全面的发件人信誉度和威胁数据库,其前身为IronPort公司运营管理的SenderBase,最初是针对邮件和Web安全来运营,包含了垃圾邮件发送者的信息和Web安全领域中的钓鱼网站、恶意网站等。现在的Cisco SensorBase已经不再局限于邮件和Web安全,更包括了一些像僵尸网络、肉机等新型的互联网危险。在未来的产品中,Cisco SensorBase数据除了应用于邮件和Web网关,还将应用于防火墙、IPS入侵检测等。 Threat Operations Center是由安全专家、分析师和程序员组成的威胁运营团队,每天24小时不间断的对SensorBase进行维护和更新,提供动态的实时的更新策略(Dynamic Updated)。 下面将针对Cisco SensorBase在邮件安全方面的应用作详细的介绍。 很多企业用户在使用邮件网关之前面临着各种各样的邮件问题如垃圾邮件、病毒邮件。通常的应对方法是使用防病毒邮件设备或单独的防垃圾邮件设备,存在分支机构的企业可能还需要部署邮件路由设备。通常这些设备是独立部署的,这将给用户的管理和效率都带来很大的问题。如下图是部署IronPort邮件安全网关前的网络结构。

KILL邮件安全网关

KILL邮件安全网关

KILL邮件安全网关 产品定位 KILL邮件安全网关(KSG-M)通过保护邮件系统安全高效工作、控制电子邮件合法传递等技术措施,有效保障网络信息安全和邮件传递、网络通讯的正常工作,是一款高性能的邮件安全专用设备。KILL邮件安全网关帮助用户全面预防垃圾邮件、病毒邮件、邮件欺诈、邮件攻击等危害、监管违法信息传输,避免邮件传递引起的各种安全风险,适用于保护企业网络用户、互联网使用单位、ISP等各种规模的电子邮件基础设施。 产品优势 ?多因素关联,精确识别垃圾邮件 超过二十种领先的邮件安全技术,通过多因素关联分析判别方法,精确识别处理垃圾邮件,大幅降低“非黑即白”所造成的误判和漏判。?基于信誉度的邮件流量整形技术 基于邮件活动进行实时信誉度分析,自动进行优先级分配,从而大幅减轻非正常流量压力,保障正常邮件的及时传递。 ?三向保护,节约投资

仅用一台设备即可实现由外到内、内到外、内到内的三向邮件传递保护,保障邮件安全管理策略的一致性,节约投资。 ?灵活的部署方式 既支持旁路(并联)方式,也支持透明(串联)方式的部署连结,可适应用户各种复杂网络环境要求。 ?自有防病毒引擎 自主产权的KILL防病毒引擎保障用户长远权益,使防病毒技术服务和产品升级的连续性能够得到有效保障。 ?自主知识产权,公安部双认证 完全自有的技术产权和自主创新研发能力保障产品的持续发展。产品获得了国家版权局著作权证书和公安部销售许可,是目前唯一获得公安部反病毒、反垃圾邮件双认证的产品。 产品功能 ?过滤邮件病毒 全面过滤外来电子邮件病毒的入侵破坏,保障内部网络环境安全;防止内部邮件病毒向外或在内部扩散,保证发送邮件的安全性和信誉。

(安全生产)CS邮件安全网关

IronPort C-Series邮件安全网关 技术建议书 IronPort Systems 2007年1月

目录 1 . 简介 (3) 2 . 垃圾邮件的现状 (5) 2.1 什么是垃圾邮件 (5) 3 . 目前的几种反垃圾邮件技术 (6) 4 . 传统技术的缺陷 (7) 5 . IronPort安全邮件网关的技术特点 (8) 5.1 专用的MTA平台AsyncOS? (8) 5.1.1 无堆栈线程 (8) 5.1.2 输入输出调度机制 (9) 5.1.3 AsyncFS文件系统 (9) 5.1.4 防邮件攻击 (10) 5.1.5 DomainKeys技术 (11) 5.1.6 Bounce Verification退信校验 (12) 5.2 反垃圾功能(Anti-Spam) (13) 5.2.1 SenderBase 名誉得分过滤技术 (13) 5.2.2 深层内容过滤技术 (14) 5.2.3 基于上下文分析的扫描引擎(CASE) (14) 5.2.4 图片垃圾邮件过滤技术 (14) 5.3 邮件通道控制 (15) 5.4 邮件流量监控 (15) 5.5 高性能 (16) 5.6 别名功能 (16) 5.7 防病毒技术 (16) 5.8 VOF主动式病毒预防技术 (16) 6 . 系统介绍 (16) 6.1 网络结构 (17) 6.2 工作原理 (17) 6.2.1 SendBase & IronPort Reputaiton Filter (18) 6.2.2 IronPort 内容过滤(Message Filter) (19) 6.2.3 深层内容过滤技术 (20) 6.2.4 Anti-Virus技术 (20) 6.2.5 VOF技术 (21) 7 .IronPort C-SERIES 与其他反垃圾技术的比较 (22)

网神安全网关配置方法

1. 将计算机IP地址设置为10.50.10.44,掩码255.255.255.0,网关 10.50.10.45,连接在VPN网关的FE1口。 2. 打开VPN网关配套光盘中的Admin Cert目录,双击证书文件SecGateAdmin.p12,弹出如下窗口。 按提示进行安装,密码为“123456”,其它按默认即可安装成功。 3. 在IE浏览器中输入:https://10.50.10.45:8889,密码为firewall 进入VPN网关管理界面。 4. 进入VPN网关管理界面。 5. 选择系统配置——》导入导出。 点击“浏览”,选择配置文件fwconfig.txt。 fwconfig.txt 如下: # hardware version: SecGate 3600-F3(SJW79)A # software version: 3.6.4.26 # hostname: SecGate # serial number: f6f335072669bb05 defaddr delalladdr defaddr add DMZ 0.0.0.0/0.0.0.0 comment "DMZ" defaddr add Trust 0.0.0.0/0.0.0.0 comment "Trust" defaddr add Untrust 0.0.0.0/0.0.0.0 comment "Untrust" vpn set default prekey PleaseInputPrekey ikelifetime 28800 ipseclifetime 3600 vpnstatus on vpnbak off vpn on vpn add remote static main psk name xian addr 222.91.74.218 prekey PleaseInputPrekey ike 3des-sha1-dh5,aes-sha1-dh5 initiate on obey off nat_t on ikelifetime 28800 dpddelay 0 dpdtimeout 0 vpn add tunnel name xian_qianxian local 61.185.40.23 remote xian auth esp ipsec aes128-md5,3des-sha1 pfs on dh_group 5 ipseclifetime 3600 proxy_localip 0.0.0.0 proxy_localmask 0.0.0.0 proxy_remoteip 0.0.0.0 proxy_remotemask 0.0.0.0 anti synflood fe1 200 anti icmpflood fe1 1000 anti pingofdeath fe1 800 anti udpflood fe1 1000 anti pingsweep fe1 10 anti tcpportscan fe1 10 anti udpportscan fe1 10

邮件过滤网关

KILL MailShield Gateway 赤霄KILL邮件过滤网关 随着计算机网络的飞速发展和应用,电子邮件系统已成为现代企业和广大用户进行通信与信息交流的主要手段,它在方便信息交流的同时,又为病毒提供了一个感染和快速传播的捷径,成为目前最常见的病毒传播途径。因此对各种邮件系统进行病毒防护,是当前防病毒厂家面临的巨大挑战。 但是邮件系统多种多样,不同的邮件系统需要不同的防毒软件,当邮件系统升级或改用另外一种邮件系统时,用户通常不得不重新购买相应的防毒软件。此外,邮件服务器是企业网络的核心服务器,在核心服务器上频繁地安装和卸载软件显然是有相当风险的。 KILL MailShield Gateway是一种与邮件系统类型无关的邮件防病毒产品,它解决了上述问题。它针对SMTP协议进行邮件过滤,其应用只与是否使用SMTP协议有关,而与具体的邮件系统无关。它使用独立的硬件平台进行病毒的查杀工作,确保了原有邮件系统的稳定性,并在查杀病毒的效率方面远远高于在邮件服务器上安装防病毒软件的方式。使用KILL MailShield Gateway,一旦安装和配置完成,无需值守,无需手工操作,可实现自动升级,自动发现、清除病毒、自动报警、自动生成报表,时时刻刻保护您的邮件系统。 KILL MailShield Gateway的外观 友好的系统管理界面 KILL MailShield Gateway的配置管理采用B/S方式,用户可使用浏览器远程查看、修改KILL MailShield Gateway系统本身配置及各项反病毒策略的配置,用户界面友好,操作简单。(如图示 ) KILL MailShield Gateway的系统信息界面 卓越的病毒查杀能力 KILL MailShield Gateway的反病毒引擎获得了ICSA(国际计算机安全协会)实验室的查杀病毒认证,能够100%地检测出目前“流行病毒名单”上的病毒。该引擎还凭借其卓越的病毒查杀功能荣获“Virus Bulletin”(病毒公告牌)授予的“VB 100%”奖,这是自1998年设立该奖项以来,该引擎第19次获得此殊荣,这一成就超过其它任何一个反病毒厂商。 更高的查杀毒效率 KILL MailShield Gateway使用独立的硬件平台进行病毒的查杀工作,无需占用邮件服务器的系统资源,一方面确保了原邮件服务器的稳定性,另一方面大大提高了查杀毒效率,其效率远远高于在邮件服务器上安装防病毒软件的方式。 接入方式简单易行 KILL MailShield Gateway的接入方式非常简单,无须修改原邮件服务器的任何配置,只要对本身和DNS进行简单的设置后,即可开始进行邮件的病毒查杀工作。此外,由于其工作与使用的邮件服务器的具体类型无关,即使用户更换了新的邮件服务器,也无

Surfront邮件安全解决方案

XXXX 邮件安全网关系统解决方案 Surfront Copyright?2008

目录 一、方案概述3 1.产品清单3 2.系统连接图3 3.预期效果4 二、厂商和产品介绍5 1.公司介绍5 2.产品功能5 3.产品应用6 4.产品资质6 三、客户需求分析7 四、解决方案描述9 1.设备描述9 2.系统安装部署10 3.方案优势分析11 4.预期效果分析12 五、成功案例12附件一:产品技术介绍13 1.系统模块构成13 2.系统管理、维护、备份建议18 3.产品功能列表18

一、方案概述 1.产品清单 备注:标准服务包括产品软件版本升级、反垃圾邮件特征库更新订阅、5x8 电话和远程技术支持。 2.系统连接图 eSurf Mail Filter 系统连接示意图 建议使用串联方式安装部署e Surf Mail Filter,按照以下两个步骤分步实施安全方案:

●第一阶段,不修改目前系统配置的情况下,在两台EQManager在线的 情况下,在Sun iPlanet邮件服务器前面串接e Surf Mail Filter,过滤来 自前面两台EQManager的邮件。 ●第二阶段,在系统稳定运行一周后,去掉2台EQManager,由e Surf Mail Filter单独过滤垃圾邮件。 说明:e Surf Mail Filter带有bypass功能,在系统软硬件发生故障时,自动将线路转换为透明网桥模式,所有邮件流量直接桥接到邮件服务器上,保证电子邮件正常的收发。 3.预期效果

二、厂商和产品介绍 1.公司介绍 思朗特公司(Surfront, Inc)成立于2005年,是一家总部位于北京的高科技公司。立足中国,为企业及电信客户提供技术领先的互联网内容安全产品和专业服务,是公司成立以来始终不变的目标。 公司先后成功开发出领先的邮件安全网关(e Surf Mail Filter)和互联网内容过滤产品(e Surf Content Manager),并在国内多个政府机关、大中型企业和电信运营商中得到成功实施和应用。 经过业内专家精心设计的邮件安全网关,在产品功能完整性、产品性能、垃圾邮件和病毒过滤效果等方面,在中国处于首屈一指的地位。 公司在国内运营商行业获得了广泛的认可,客户包括网通(青海网通、江苏网通等),电信(浙江电信、厦门电信、宁夏电信等),移动(安徽移动等),联通(上海联通,黑龙江联通等)。在北京本地的客户有北京日报,北大附中,中国数码,新网互联,中企动力,新晨科技,船舶重工,KDDI株式会社等多家知名企业、政府和院校。 由于对市场需求的深入调查和透彻理解,思朗特将致力于将业务延伸至世界的每一个角落。 2.产品功能 e Sur f Mail Filter将安全、可扩展的硬件平台和高效、准确的电子邮件安全应用软件结合在一起,是业界最强的电子邮件安全硬件平台。 e Sur f Mail Filter基于业界最坚固的操作系统内核之一,具有先进的技术架构、闪电般的性能和最大的可靠运行时间,使之有能力高效、灵活、准确的处理进出用户网络的所有电子邮件。

选择电子邮件安全网关的五个标准

选择电子邮件安全网关的五个标准 电子邮件安全网关用于监控企业的入站和出站电子邮件通信中不需要或恶意电子邮件。这些产品的核心功能是阻止或隔离恶意软件、网络钓鱼攻击和垃圾邮件,另外,很多产品还提供针对出站电子邮件的数据丢失防护(DLP)和/或加密功能。 现在市面上有很多电子邮件安全网关产品和服务,它们几乎可以满足所有企业的需求。然而,从这么多产品中选出最合适的产品或服务并不是简单的事情。对此,企业应该制定一套标准作为电子邮件安全网关评估的一部分,例如一份问题清单,针对每个评估产品,通过研究、供应商讨论、产品测试和其他方法来回答这些问题。 本文中提供了一些重要标准,以帮助企业评估电子邮件安全网关。 基本安全功能如何? 每个电子邮件安全网关都应帮助企业抵御“恶意”电子邮件:即包含恶意软件、网络钓鱼攻击和垃圾邮件的邮件。然而,这并不意味着电子邮件安全网关产品只需要提供基本的防病毒、反垃圾邮件和反钓鱼功能。毕竟,基于这种老一代反恶意软件控制的技术并不能非常有效地抵御当前的威胁。 企业应该寻找更先进的防病毒、反垃圾邮件和反钓鱼技术。例如,恶意软件检测应该使用沙箱等先进技术来评估文件的潜在恶意行为。现在,简单地使用基于签名的技术(例如防病毒签名)进行恶意软件检测已经不够。 理想情况下,这些基本的安全功能还应该利用最新威胁情报。威胁情报是指安全供应商收集的有关当前和最新威胁的信息,例如执行攻击的主机的IP地址或者恶意域名的URL等。通过整合威胁情报服务和高级检测技术,电子邮件安全网关可更有效地检测恶意电子邮件,前提是威胁情报总是保持最新状态(例如每隔几分钟更新) 电子邮件安全网关提供哪些其他安全功能? 有些网关仅提供上文提到的基本安全功能。然而,现在越来越多的网关开始提供额外的电子邮件相关的安全功能,特别是针对出站电子邮件的DLP和电子邮件加密功能。 对于很多企业而言,尤其是规模较大的企业,这些额外功能并没有什么用,因为这些企业已经有企业级DLP和电子邮件加密功能。但对于没有这些功能的企业,添加DLP和电子邮件加密功能到电子邮件安全网关(通常需支付额外费用)可符合成本效益地简单地为企业增加

梭子鱼邮件安全整体解决方案

大型企业 梭子鱼反垃圾与邮件储存 整体解决方案 应用背景 电子邮件系统是当前企业最重要的信息沟通手段之一。其方便快捷低成本等特性使得电子邮件的使用成为目前人们工作生活的日常习惯。但是由于垃圾邮件的肆虐,企业的电子邮件系统往往被大量的垃圾邮件及病毒邮件所困扰。另一方面,未经审计的电子邮件系统也容易被滥用,泄露企业机密,或者发送与企业无关的邮件。 博威特公司是全球应用安全领导企业之一,自2002年起开始从事邮件安全的研究,并形成了一套完整的邮件安全解决方案。博威特公司认为完整的邮件安全解决方案应包括三个方面的内容: 1 邮件系统自身的安全。 邮件系统应符合相应的规范:防止黑客或其他未授权用户非法访问邮件服务器;防止遭受病毒感染;关闭relay并对SMTP连接作相应限制;开启SMTP认证等。在2003年以前国家信产部提出的关于邮件安全主要内容就是这些。随着技术的发展,邮件系统厂商均能很好的支持上述功能。博威特公司作为电子邮件安全的领导者,在实施邮件安全整体解决时,将会帮助客户检查邮件系统自身是否存在安全隐患,并提出修正建议。 2 反垃圾邮件系统 自2003年起,由于垃圾邮件迅速增多,邮件系统本身的垃圾邮件过滤功能薄弱,企业纷纷购买专业的反垃圾邮件网关产品对垃圾邮件进行过滤。博威特公司推出的梭子鱼反垃圾邮件产品是其中的佼佼者,目前全球用户数超过5万家,国内用户近千家,市场占有率遥遥领先。梭子鱼对垃圾邮件的过滤率高达96%-98%,能够完美的解决垃圾邮件问题。 3 邮件储存网关。 自2006年起,随着企业在运营管理、内部控制和法规遵从方面的要求也越来越高。按照SOX的要求,所有可能最终涉及财务报告的内部资料均需存档以备核查,因此,公司的邮件消息需要有完整、可靠的存档,并在需要时可迅速查询。对此,博威特公司积极响应,推出了梭子鱼邮件存储网关来帮助企业实现对合规性的高要求。梭子鱼邮件存储网关是一款硬件和软件集成的解决方案,它可以帮助企业用户存档发送和接收到的所有邮件。梭子鱼邮件存储网关自动存储并且实时索引所有邮件,使授权用户可以进行快速地查找和取回邮件。 梭子鱼邮件安全整体解决方案

相关主题
文本预览
相关文档 最新文档