Data Provenance Some Basic Issues
- 格式:pdf
- 大小:121.18 KB
- 文档页数:7
WeChat,a popular Chinese messaging and social media app,has become an integral part of daily life for many people around the world.However,with its widespread use, there are also potential dangers associated with it.Here are some points to consider when discussing the potential risks of using WeChat:1.Privacy Concerns:WeChat collects a significant amount of user data,which can be a concern for ers should be aware of what information they are sharing and with whom.2.Security Vulnerabilities:Like any digital platform,WeChat is not immune to security ers need to be vigilant about protecting their accounts with strong passwords and enabling twofactor authentication.3.Misinformation Spread:The platform can be used to spread false information or ers should verify the credibility of the information they receive and share.4.Cyberbullying:As with any social media platform,WeChat can be a place where cyberbullying occurs.Its important for users to report any abusive behavior and to educate themselves on how to deal with such situations.5.Addiction:The constant connectivity that WeChat offers can lead to overuse and addiction,affecting users mental health and social interactions in the real world.6.Financial Scams:WeChat has payment features that can be exploited by scammers. Users should be cautious when making transactions and only deal with trusted contacts.7.Political Censorship:WeChat is known to comply with Chinese government regulations,which can lead to censorship of certain topics or content.This can limit the freedom of speech and access to information.8.Data Localization Laws:In some countries,there are concerns about data localization laws that require companies like Tencent WeChats parent company to store user data within the countrys borders,potentially making it more accessible to local authorities.9.Influence on Youth:The influence of social media,including WeChat,on young peoples behavior and values can be significant.Parents and educators should be aware of the content young people are exposed to and engage in conversations about responsible use.10.Business Risks:For businesses using WeChat for marketing or customer service,theres a risk of negative publicity if the platform is used improperly or if there are issues with the apps functionality.To mitigate these risks,its essential for users to stay informed about best practices for online safety,to be critical of the information they encounter,and to maintain a balance between online and offline life.Additionally,understanding the legal and regulatory environment in which WeChat operates can help users navigate potential challenges related to data privacy and security.。
云计算外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)原文:Technical Issues of Forensic Investigations in Cloud Computing EnvironmentsDominik BirkRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyAbstract—Cloud Computing is arguably one of the most discussedinformation technologies today. It presents many promising technological and economical opportunities. However, many customers remain reluctant to move their business IT infrastructure completely to the cloud. One of their main concerns is Cloud Security and the threat of the unknown. Cloud Service Providers(CSP) encourage this perception by not letting their customers see what is behind their virtual curtain. A seldomly discussed, but in this regard highly relevant open issue is the ability to perform digital investigations. This continues to fuel insecurity on the sides of both providers and customers. Cloud Forensics constitutes a new and disruptive challenge for investigators. Due to the decentralized nature of data processing in the cloud, traditional approaches to evidence collection and recovery are no longer practical. This paper focuses on the technical aspects of digital forensics in distributed cloud environments. We contribute by assessing whether it is possible for the customer of cloud computing services to perform a traditional digital investigation from a technical point of view. Furthermore we discuss possible solutions and possible new methodologies helping customers to perform such investigations.I. INTRODUCTIONAlthough the cloud might appear attractive to small as well as to large companies, it does not come along without its own unique problems. Outsourcing sensitive corporate data into the cloud raises concerns regarding the privacy and security of data. Security policies, companies main pillar concerning security, cannot be easily deployed into distributed, virtualized cloud environments. This situation is further complicated by the unknown physical location of the companie’s assets. Normally,if a security incident occurs, the corporate security team wants to be able to perform their own investigation without dependency on third parties. In the cloud, this is not possible anymore: The CSP obtains all the power over the environmentand thus controls the sources of evidence. In the best case, a trusted third party acts as a trustee and guarantees for the trustworthiness of the CSP. Furthermore, the implementation of the technical architecture and circumstances within cloud computing environments bias the way an investigation may be processed. In detail, evidence data has to be interpreted by an investigator in a We would like to thank the reviewers for the helpful comments and Dennis Heinson (Center for Advanced Security Research Darmstadt - CASED) for the profound discussions regarding the legal aspects of cloud forensics. proper manner which is hardly be possible due to the lackof circumstantial information. For auditors, this situation does not change: Questions who accessed specific data and information cannot be answered by the customers, if no corresponding logs are available. With the increasing demand for using the power of the cloud for processing also sensible information and data, enterprises face the issue of Data and Process Provenance in the cloud [10]. Digital provenance, meaning meta-data that describes the ancestry or history of a digital object, is a crucial feature for forensic investigations. In combination with a suitable authentication scheme, it provides information about who created and who modified what kind of data in the cloud. These are crucial aspects for digital investigations in distributed environments such as the cloud. Unfortunately, the aspects of forensic investigations in distributed environment have so far been mostly neglected by the research community. Current discussion centers mostly around security, privacy and data protection issues [35], [9], [12]. The impact of forensic investigations on cloud environments was little noticed albeit mentioned by the authors of [1] in 2009: ”[...] to our knowledge, no research has been published on how cloud computing environments affect digital artifacts,and on acquisition logistics and legal issues related to cloud computing env ironments.” This statement is also confirmed by other authors [34], [36], [40] stressing that further research on incident handling, evidence tracking and accountability in cloud environments has to be done. At the same time, massive investments are being made in cloud technology. Combined with the fact that information technology increasingly transcendents peoples’ private and professional life, thus mirroring more and more of peoples’actions, it becomes apparent that evidence gathered from cloud environments will be of high significance to litigation or criminal proceedings in the future. Within this work, we focus the notion of cloud forensics by addressing the technical issues of forensics in all three major cloud service models and consider cross-disciplinary aspects. Moreover, we address the usability of various sources of evidence for investigative purposes and propose potential solutions to the issues from a practical standpoint. This work should be considered as a surveying discussion of an almost unexplored research area. The paper is organized as follows: We discuss the related work and the fundamental technical background information of digital forensics, cloud computing and the fault model in section II and III. In section IV, we focus on the technical issues of cloud forensics and discuss the potential sources and nature of digital evidence as well as investigations in XaaS environments including thecross-disciplinary aspects. We conclude in section V.II. RELATED WORKVarious works have been published in the field of cloud security and privacy [9], [35], [30] focussing on aspects for protecting data in multi-tenant, virtualized environments. Desired security characteristics for current cloud infrastructures mainly revolve around isolation of multi-tenant platforms [12], security of hypervisors in order to protect virtualized guest systems and secure network infrastructures [32]. Albeit digital provenance, describing the ancestry of digital objects, still remains a challenging issue for cloud environments, several works have already been published in this field [8], [10] contributing to the issues of cloud forensis. Within this context, cryptographic proofs for verifying data integrity mainly in cloud storage offers have been proposed,yet lacking of practical implementations [24], [37], [23]. Traditional computer forensics has already well researched methods for various fields of application [4], [5], [6], [11], [13]. Also the aspects of forensics in virtual systems have been addressed by several works [2], [3], [20] including the notionof virtual introspection [25]. In addition, the NIST already addressed Web Service Forensics [22] which has a huge impact on investigation processes in cloud computing environments. In contrast, the aspects of forensic investigations in cloud environments have mostly been neglected by both the industry and the research community. One of the first papers focusing on this topic was published by Wolthusen [40] after Bebee et al already introduced problems within cloud environments [1]. Wolthusen stressed that there is an inherent strong need for interdisciplinary work linking the requirements and concepts of evidence arising from the legal field to what can be feasibly reconstructed and inferred algorithmically or in an exploratory manner. In 2010, Grobauer et al [36] published a paper discussing the issues of incident response in cloud environments - unfortunately no specific issues and solutions of cloud forensics have been proposed which will be done within this work.III. TECHNICAL BACKGROUNDA. Traditional Digital ForensicsThe notion of Digital Forensics is widely known as the practice of identifying, extracting and considering evidence from digital media. Unfortunately, digital evidence is both fragile and volatile and therefore requires the attention of special personnel and methods in order to ensure that evidence data can be proper isolated and evaluated. Normally, the process of a digital investigation can be separated into three different steps each having its own specificpurpose:1) In the Securing Phase, the major intention is the preservation of evidence for analysis. The data has to be collected in a manner that maximizes its integrity. This is normally done by a bitwise copy of the original media. As can be imagined, this represents a huge problem in the field of cloud computing where you never know exactly where your data is and additionallydo not have access to any physical hardware. However, the snapshot technology, discussed in section IV-B3, provides a powerful tool to freeze system states and thus makes digital investigations, at least in IaaS scenarios, theoretically possible.2) We refer to the Analyzing Phase as the stage in which the data is sifted and combined. It is in this phase that the data from multiple systems or sources is pulled together to create as complete a picture and event reconstruction as possible. Especially in distributed system infrastructures, this means that bits and pieces of data are pulled together for deciphering the real story of what happened and for providing a deeper look into the data.3) Finally, at the end of the examination and analysis of the data, the results of the previous phases will be reprocessed in the Presentation Phase. The report, created in this phase, is a compilation of all the documentation and evidence from the analysis stage. The main intention of such a report is that it contains all results, it is complete and clear to understand. Apparently, the success of these three steps strongly depends on the first stage. If it is not possible to secure the complete set of evidence data, no exhaustive analysis will be possible. However, in real world scenarios often only a subset of the evidence data can be secured by the investigator. In addition, an important definition in the general context of forensics is the notion of a Chain of Custody. This chain clarifies how and where evidence is stored and who takes possession of it. Especially for cases which are brought to court it is crucial that the chain of custody is preserved.B. Cloud ComputingAccording to the NIST [16], cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal CSP interaction. The new raw definition of cloud computing brought several new characteristics such as multi-tenancy, elasticity, pay-as-you-go and reliability. Within this work, the following three models are used: In the Infrastructure asa Service (IaaS) model, the customer is using the virtual machine provided by the CSP for installing his own system on it. The system can be used like any other physical computer with a few limitations. However, the additive customer power over the system comes along with additional security obligations. Platform as a Service (PaaS) offerings provide the capability to deploy application packages created using the virtual development environment supported by the CSP. For the efficiency of software development process this service model can be propellent. In the Software as a Service (SaaS) model, the customer makes use of a service run by the CSP on a cloud infrastructure. In most of the cases this service can be accessed through an API for a thin client interface such as a web browser. Closed-source public SaaS offers such as Amazon S3 and GoogleMail can only be used in the public deployment model leading to further issues concerning security, privacy and the gathering of suitable evidences. Furthermore, two main deployment models, private and public cloud have to be distinguished. Common public clouds are made available to the general public. The corresponding infrastructure is owned by one organization acting as a CSP and offering services to its customers. In contrast, the private cloud is exclusively operated for an organization but may not provide the scalability and agility of public offers. The additional notions of community and hybrid cloud are not exclusively covered within this work. However, independently from the specific model used, the movement of applications and data to the cloud comes along with limited control for the customer about the application itself, the data pushed into the applications and also about the underlying technical infrastructure.C. Fault ModelBe it an account for a SaaS application, a development environment (PaaS) or a virtual image of an IaaS environment, systems in the cloud can be affected by inconsistencies. Hence, for both customer and CSP it is crucial to have the ability to assign faults to the causing party, even in the presence of Byzantine behavior [33]. Generally, inconsistencies can be caused by the following two reasons:1) Maliciously Intended FaultsInternal or external adversaries with specific malicious intentions can cause faults on cloud instances or applications. Economic rivals as well as former employees can be the reason for these faults and state a constant threat to customers and CSP. In this model, also a malicious CSP is included albeit he isassumed to be rare in real world scenarios. Additionally, from the technical point of view, the movement of computing power to a virtualized, multi-tenant environment can pose further threads and risks to the systems. One reason for this is that if a single system or service in the cloud is compromised, all other guest systems and even the host system are at risk. Hence, besides the need for further security measures, precautions for potential forensic investigations have to be taken into consideration.2) Unintentional FaultsInconsistencies in technical systems or processes in the cloud do not have implicitly to be caused by malicious intent. Internal communication errors or human failures can lead to issues in the services offered to the costumer(i.e. loss or modification of data). Although these failures are not caused intentionally, both the CSP and the customer have a strong intention to discover the reasons and deploy corresponding fixes.IV. TECHNICAL ISSUESDigital investigations are about control of forensic evidence data. From the technical standpoint, this data can be available in three different states: at rest, in motion or in execution. Data at rest is represented by allocated disk space. Whether the data is stored in a database or in a specific file format, it allocates disk space. Furthermore, if a file is deleted, the disk space is de-allocated for the operating system but the data is still accessible since the disk space has not been re-allocated and overwritten. This fact is often exploited by investigators which explore these de-allocated disk space on harddisks. In case the data is in motion, data is transferred from one entity to another e.g. a typical file transfer over a network can be seen as a data in motion scenario. Several encapsulated protocols contain the data each leaving specific traces on systems and network devices which can in return be used by investigators. Data can be loaded into memory and executed as a process. In this case, the data is neither at rest or in motion but in execution. On the executing system, process information, machine instruction and allocated/de-allocated data can be analyzed by creating a snapshot of the current system state. In the following sections, we point out the potential sources for evidential data in cloud environments and discuss the technical issues of digital investigations in XaaS environmentsas well as suggest several solutions to these problems.A. Sources and Nature of EvidenceConcerning the technical aspects of forensic investigations, the amount of potential evidence available to the investigator strongly diverges between thedifferent cloud service and deployment models. The virtual machine (VM), hosting in most of the cases the server application, provides several pieces of information that could be used by investigators. On the network level, network components can provide information about possible communication channels between different parties involved. The browser on the client, acting often as the user agent for communicating with the cloud, also contains a lot of information that could be used as evidence in a forensic investigation. Independently from the used model, the following three components could act as sources for potential evidential data.1) Virtual Cloud Instance: The VM within the cloud, where i.e. data is stored or processes are handled, contains potential evidence [2], [3]. In most of the cases, it is the place where an incident happened and hence provides a good starting point for a forensic investigation. The VM instance can be accessed by both, the CSP and the customer who is running the instance. Furthermore, virtual introspection techniques [25] provide access to the runtime state of the VM via the hypervisor and snapshot technology supplies a powerful technique for the customer to freeze specific states of the VM. Therefore, virtual instances can be still running during analysis which leads to the case of live investigations [41] or can be turned off leading to static image analysis. In SaaS and PaaS scenarios, the ability to access the virtual instance for gathering evidential information is highly limited or simply not possible.2) Network Layer: Traditional network forensics is knownas the analysis of network traffic logs for tracing events that have occurred in the past. Since the different ISO/OSI network layers provide several information on protocols and communication between instances within as well as with instances outside the cloud [4], [5], [6], network forensics is theoretically also feasible in cloud environments. However in practice, ordinary CSP currently do not provide any log data from the network components used by the customer’s instances or applications. For instance, in case of a malware infection of an IaaS VM, it will be difficult for the investigator to get any form of routing information and network log datain general which is crucial for further investigative steps. This situation gets even more complicated in case of PaaS or SaaS. So again, the situation of gathering forensic evidence is strongly affected by the support the investigator receives from the customer and the CSP.3) Client System: On the system layer of the client, it completely depends on the used model (IaaS, PaaS, SaaS) if and where potential evidence could beextracted. In most of the scenarios, the user agent (e.g. the web browser) on the client system is the only application that communicates with the service in the cloud. This especially holds for SaaS applications which are used and controlled by the web browser. But also in IaaS scenarios, the administration interface is often controlled via the browser. Hence, in an exhaustive forensic investigation, the evidence data gathered from the browser environment [7] should not be omitted.a) Browser Forensics: Generally, the circumstances leading to an investigation have to be differentiated: In ordinary scenarios, the main goal of an investigation of the web browser is to determine if a user has been victim of a crime. In complex SaaS scenarios with high client-server interaction, this constitutes a difficult task. Additionally, customers strongly make use of third-party extensions [17] which can be abused for malicious purposes. Hence, the investigator might want to look for malicious extensions, searches performed, websites visited, files downloaded, information entered in forms or stored in local HTML5 stores, web-based email contents and persistent browser cookies for gathering potential evidence data. Within this context, it is inevitable to investigate the appearance of malicious JavaScript [18] leading to e.g. unintended AJAX requests and hence modified usage of administration interfaces. Generally, the web browser contains a lot of electronic evidence data that could be used to give an answer to both of the above questions - even if the private mode is switched on [19].B. Investigations in XaaS EnvironmentsTraditional digital forensic methodologies permit investigators to seize equipment and perform detailed analysis on the media and data recovered [11]. In a distributed infrastructure organization like the cloud computing environment, investigators are confronted with an entirely different situation. They have no longer the option of seizing physical data storage. Data and processes of the customer are dispensed over an undisclosed amount of virtual instances, applications and network elements. Hence, it is in question whether preliminary findings of the computer forensic community in the field of digital forensics apparently have to be revised and adapted to the new environment. Within this section, specific issues of investigations in SaaS, PaaS and IaaS environments will be discussed. In addition, cross-disciplinary issues which affect several environments uniformly, will be taken into consideration. We also suggest potential solutions to the mentioned problems.1) SaaS Environments: Especially in the SaaS model, the customer does notobtain any control of the underlying operating infrastructure such as network, servers, operating systems or the application that is used. This means that no deeper view into the system and its underlying infrastructure is provided to the customer. Only limited userspecific application configuration settings can be controlled contributing to the evidences which can be extracted fromthe client (see section IV-A3). In a lot of cases this urges the investigator to rely on high-level logs which are eventually provided by the CSP. Given the case that the CSP does not run any logging application, the customer has no opportunity to create any useful evidence through the installation of any toolkit or logging tool. These circumstances do not allow a valid forensic investigation and lead to the assumption that customers of SaaS offers do not have any chance to analyze potential incidences.a) Data Provenance: The notion of Digital Provenance is known as meta-data that describes the ancestry or history of digital objects. Secure provenance that records ownership and process history of data objects is vital to the success of data forensics in cloud environments, yet it is still a challenging issue today [8]. Albeit data provenance is of high significance also for IaaS and PaaS, it states a huge problem specifically for SaaS-based applications: Current global acting public SaaS CSP offer Single Sign-On (SSO) access control to the set of their services. Unfortunately in case of an account compromise, most of the CSP do not offer any possibility for the customer to figure out which data and information has been accessed by the adversary. For the victim, this situation can have tremendous impact: If sensitive data has been compromised, it is unclear which data has been leaked and which has not been accessed by the adversary. Additionally, data could be modified or deleted by an external adversary or even by the CSP e.g. due to storage reasons. The customer has no ability to proof otherwise. Secure provenance mechanisms for distributed environments can improve this situation but have not been practically implemented by CSP [10]. Suggested Solution: In private SaaS scenarios this situation is improved by the fact that the customer and the CSP are probably under the same authority. Hence, logging and provenance mechanisms could be implemented which contribute to potential investigations. Additionally, the exact location of the servers and the data is known at any time. Public SaaS CSP should offer additional interfaces for the purpose of compliance, forensics, operations and security matters to their customers. Through an API, the customers should have the ability to receive specific information suchas access, error and event logs that could improve their situation in case of aninvestigation. Furthermore, due to the limited ability of receiving forensic information from the server and proofing integrity of stored data in SaaS scenarios, the client has to contribute to this process. This could be achieved by implementing Proofs of Retrievability (POR) in which a verifier (client) is enabled to determine that a prover (server) possesses a file or data object and it can be retrieved unmodified [24]. Provable Data Possession (PDP) techniques [37] could be used to verify that an untrusted server possesses the original data without the need for the client to retrieve it. Although these cryptographic proofs have not been implemented by any CSP, the authors of [23] introduced a new data integrity verification mechanism for SaaS scenarios which could also be used for forensic purposes.2) PaaS Environments: One of the main advantages of the PaaS model is that the developed software application is under the control of the customer and except for some CSP, the source code of the application does not have to leave the local development environment. Given these circumstances, the customer obtains theoretically the power to dictate how the application interacts with other dependencies such as databases, storage entities etc. CSP normally claim this transfer is encrypted but this statement can hardly be verified by the customer. Since the customer has the ability to interact with the platform over a prepared API, system states and specific application logs can be extracted. However potential adversaries, which can compromise the application during runtime, should not be able to alter these log files afterwards. Suggested Solution:Depending on the runtime environment, logging mechanisms could be implemented which automatically sign and encrypt the log information before its transfer to a central logging server under the control of the customer. Additional signing and encrypting could prevent potential eavesdroppers from being able to view and alter log data information on the way to the logging server. Runtime compromise of an PaaS application by adversaries could be monitored by push-only mechanisms for log data presupposing that the needed information to detect such an attack are logged. Increasingly, CSP offering PaaS solutions give developers the ability to collect and store a variety of diagnostics data in a highly configurable way with the help of runtime feature sets [38].3) IaaS Environments: As expected, even virtual instances in the cloud get compromised by adversaries. Hence, the ability to determine how defenses in the virtual environment failed and to what extent the affected systems havebeen compromised is crucial not only for recovering from an incident. Also forensic investigations gain leverage from such information and contribute to resilience against future attacks on the systems. From the forensic point of view, IaaS instances do provide much more evidence data usable for potential forensics than PaaS and SaaS models do. This fact is caused throughthe ability of the customer to install and set up the image for forensic purposes before an incident occurs. Hence, as proposed for PaaS environments, log data and other forensic evidence information could be signed and encrypted before itis transferred to third-party hosts mitigating the chance that a maliciously motivated shutdown process destroys the volatile data. Although, IaaS environments provide plenty of potential evidence, it has to be emphasized that the customer VM is in the end still under the control of the CSP. He controls the hypervisor which is e.g. responsible for enforcing hardware boundaries and routing hardware requests among different VM. Hence, besides the security responsibilities of the hypervisor, he exerts tremendous control over how customer’s VM communicate with the hardware and theoretically can intervene executed processes on the hosted virtual instance through virtual introspection [25]. This could also affect encryption or signing processes executed on the VM and therefore leading to the leakage of the secret key. Although this risk can be disregarded in most of the cases, the impact on the security of high security environments is tremendous.a) Snapshot Analysis: Traditional forensics expect target machines to be powered down to collect an image (dead virtual instance). This situation completely changed with the advent of the snapshot technology which is supported by all popular hypervisors such as Xen, VMware ESX and Hyper-V.A snapshot, also referred to as the forensic image of a VM, providesa powerful tool with which a virtual instance can be clonedby one click including also the running system’s mem ory. Due to the invention of the snapshot technology, systems hosting crucial business processes do not have to be powered down for forensic investigation purposes. The investigator simply creates and loads a snapshot of the target VM for analysis(live virtual instance). This behavior is especially important for scenarios in which a downtime of a system is not feasible or practical due to existing SLA. However the information whether the machine is running or has been properly powered down is crucial [3] for the investigation. Live investigations of running virtual instances become more common providing evidence data that。
经管实证英文文献常用的缺失值处理方法Methods for Handling Missing Values in Empirical Studies in Economics and ManagementMissing values are a common issue in empirical studies in economics and management. These missing values can occur for a variety of reasons, such as data collection errors, non-response from survey participants, or incomplete information. Dealing with missing values is crucial for maintaining the quality and reliability of empirical findings. In this article, we will discuss some common methods for handling missing values in empirical studies in economics and management.1. Complete Case AnalysisOne common approach to handling missing values is to simply exclude cases with missing values from the analysis. This method is known as complete case analysis. While this method is simple and straightforward, it can lead to biased results if the missing values are not missing completely at random. In other words, if the missing values are related to the outcome of interest, excluding cases with missing values can lead to biased estimates.2. Imputation TechniquesImputation techniques are another common method for handling missing values. Imputation involves replacing missing values with estimated values based on the observed data. There are several methods for imputing missing values, including mean imputation, median imputation, and regression imputation. Mean imputation involves replacing missing values with the mean of the observed values for that variable. Median imputation involves replacing missing values with the median of the observed values. Regression imputation involves using a regression model to predict missing values based on other variables in the dataset.3. Multiple ImputationMultiple imputation is a more sophisticated imputation technique that involves generating multiple plausible values for each missing value and treating each set of imputed values as a complete dataset. This allows for uncertainty in the imputed values to be properly accounted for in the analysis. Multiple imputation has been shown to produce less biased estimates compared to single imputation methods.4. Maximum Likelihood EstimationMaximum likelihood estimation is another method for handling missing values that involves estimating the parametersof a statistical model by maximizing the likelihood function of the observed data. Missing values are treated as parameters to be estimated along with the other parameters of the model. Maximum likelihood estimation has been shown to produce unbiased estimates under certain assumptions about the missing data mechanism.5. Sensitivity AnalysisSensitivity analysis is a useful technique for assessing the robustness of empirical findings to different methods of handling missing values. This involves conducting the analysis using different methods for handling missing values and comparing the results. If the results are consistent across different methods, this provides more confidence in the validity of the findings.In conclusion, there are several methods available for handling missing values in empirical studies in economics and management. Each method has its advantages and limitations, and the choice of method should be guided by the nature of the data and the research question. It is important to carefully consider the implications of missing values and choose the most appropriate method for handling them to ensure the validity and reliability of empirical findings.。
The Archival Principle of Provenance and Its Applicationto Image Representation SystemsVariously described as a “powerful guiding principle” (Dearstyne, 1993), and “the only principle” of archival theory (Horsman, 1994), the Principle of Provenance distinguishes the archival profession from other information professions in its focus on a document’s context, use and meaning. This Principle, generally concerned with the origin of records, has three distinct meanings (Bellardo & Bellardo, 1992). First, and generally, it refers to the “office of origin” of records, or that office, administrative entity, person, family, firm, from which records, personal papers or manuscripts originate. Second, it refers to collecting information on successive transfers of ownership or custody of a particular paper or manuscript; and third, it refers to the idea that an archival collection of a given records creator must not be intermingled with those of other records creators. In this sense, the principle is often referred to by the French expression respect des fonds. A corollary principle, solemnly entitled, “Principle of the Sanctity of Original Order,” states that records should be kept in the order in which they were originally arranged.The Principle of Provenance was independently developed by early modern French and Prussian archives managers in the nineteenth century, and had its origins in necessity, both theoretical and practical. Prior to the development of the Principle, archives were arranged and described according to the “principle of pertinence,” where archives were arranged in terms of their subject content regardless of provenance and original order (Gränström, 1994). With the development of state-run archives in France and Prussia, the sheer volume of incoming records made working by this ethic impractical. Furthermore, historians of this era were, as they still are, concerned with objectivity of their original source material. They wanted to be able to establish what really took place, and to do that, they felt that the written sources should be maintained in their original order, and not rearranged. So the Principle met both standards – it was much easier and faster to process collections if there was no need to assign subject headings to each document or fond; and it met the objectivity standards put forth by historians. Related to the historical standards, the Principle of Provenance also held with medieval diplomatic procedures, which were concerned with defining and evaluating records based on their authenticity and evidential, primarily legal, value.However powerful, objective, and practical the Principle of Provenance might be, there is still a major complexity that bears some examination, namely the organic nature of archives. Peter Horsman has written two articles related to this problem. His essential argument is that an archival source (be it an administration, a person or a family) is a living organism, and its fonds grow and change with it, and there is rarely a time where one absolute, unchanging physical order for its documents exists. Rather, the fonds “are a complicated result of the activities of the creator, political decisions, organizational behavior, record-keeping methods and many other unexpected events” (Horsman, 1994). The traditional inventory or finding aid is simply a snapshot of the records at one distinct moment in time, typically at the end of their useful life, and acts only as evidence that this certain set of inter-related documents were physically gathered together at some defined instant (Horsman, 1999). The real power of an archive, as yet underutilized, is the notion of providing context. Context is a more complicated concept than “original order,” however, and in this case is concerned primarily with describing a continuum of relationships and inter-relationships over time and place. Preserving the physical original order of a fonds, which Horsman defines as the internal application of the Principle of Provenance, is merely a logistical artifact; valuable because it is, at least, “an original administrative artifact,” not defined from outside. To comprehend context, Horsman argues that the archivist not only has to describe and define the structure of the fonds in its seris and sub-series, but also to define and describe the relationships between the agency’s characteristics or functions, and the records it has created throughout the range of its existence.Unlikely though it may be, this idea of providing meaningful contextual information is also a problem being considered by art historians, in a quest to describe of works of art from different cultures in significant and equivalent language. The most recent work is being done by David Summers, in his new tome, Real Spaces: World Art History and the Rise of Western Modernism (Summers, 2003). Although the two fields, archival science and the history of art might, at first glance, seem to have little in common, on the first page of the introduction, Summers states, “However the discipline of the history of art may have changed over the last few decades of theoretical and critical examination, it has continued to be an archival field, concerned with setting its objects in spatial and temporal order, and with relating them to appropriate documents and archaeological evidence.” In trying to develop a new descriptive language for works of art, Summers focuses on the “organic nature” of the work – concentrating on the overarchingtheoretical construct of “facture,” which embodies the idea that the object itself carries some record of its having been made. The value of this physical and format-based characteristic is primary and unassailable.1 There is an obvious parallel here with the “organic character of records,” discussed by Schellenberg (1961),“Records that are the product of organic activity have a value that derives from theway they were produced. Since they were created in consequence of the actions towhich they relate, they often contain an unconscious and therefore impartial record ofthe action. Thus the evidence they contain of the actions they record has a peculiarvalue. It is the quality of this evidence that is our concern here. Records, however,also have a value for the evidence they contain of the actions that resulted in theirproduction. It is the content of the evidence that is our concern here.”What Summers calls “facture,” and Schellenberg calls “evidential value,” are related, and I think not explicitly spelled out due to the varying nature of their tasks: Summers is presenting a highly theoretical descriptive language for works of art, and Schellenberg, while concerned with theoretical underpinnings, is primarily interested in providing a real framework within which real, physical organizations (namely archives) can arrange and describe their collections.How does this relate to image content management systems? While Summers’ framework, such as it is,2 could be expanded to include descriptive languages for “anything that is made,” it was developed first and foremost for cultural, artistic artifacts. He argues that access to and understanding of artifacts will improve if we could provide more complete information on a given artifact’s facture (Winget, 2003) and provenance. Significantly, Summers is using the term “provenance” in an archival sense – he is concerned with documenting the name of the creator as well as the organization or entity for which the artifact was created, that creator or entity’s functions, relationships, and predecessors; and the artifact’s successive spaces and uses throughout the range of its life. The fact that a Renaissance triptych, for example, started out as a functional devotional device, lost that functionality, was collected by a host of individuals for its monetary or artifactual value, let’s say the last individual to collect the triptych was a German Jew, whose collection was perhaps stolen by the Nazis, and now it resides in an American Museum collection – is all noteworthy and interesting information, and, Summers argues rather forcefully, significantly more valuable than simply providing subject access to that image.1 I think it’s relevant here to point out that for Summers, a “work of art” is not limited to traditionally considered art objects. His definition is wider and more inclusive, and consists of “anything that is made.”2 Real Spaces is a nine-hundred-page book. It doesn’t put forth a “framework,” so much as a dense theoretical construct.Right now, image database managers, after worrying about quality and sustainability issues, seem to be primarily concerned with providing thematic or subject-oriented access to their collections. They are working with the “principle of pertinence,” as it were, and they’re running into the same problems that early-modern archivists had. It takes a very long time to provide robust subject access; it’s not objective, and in worst cases, can hinder retrieval. If they could twist the Principle of Provenance to relate primarily to providing access through description, rather than focusing on its use in arrangement,3 meaningful use of these image collections might rise, and retrieval problems might decline. The people in charge of image content management systems have a unique opportunity to develop a new system based principally on the user – providing facture and provenantial information without the difficulty of keeping a strict hierarchical structure that archives face. What’s more, for artifacts collected by museums at least, most of this information is already available: when acquiring a new work, curators research the artifact’s provenance to ensure that it is authentic and not stolen; conservators keep deliberate records about the format, materials and processes inherent in an artifact, and they furthermore tend to document any changes that happen to the work over time. There are a multitude of administrative attributes that are noted within the course of owning and maintaining culturally significant artifacts. The only problem is that these artifacts aren’t typically considered “important,” and they’re usually in paper form. If they are available digitally, access points are typically not provided (you can’t search on these terms).Summers’ new framework now gives us the theoretical tools to recognize these attributes’ importance, and the archival profession gives us a practical framework within which to work. Metadata initiatives like the Dublin Core and METS provide specific requirements for collecting information and describing these objects; the CIDOC-CRM provides an ontology that could be used to add semantic meaning (and hence understanding) between disparate attributes within these schemas; and OAIS provides frameworks within which information can be shared across space and disciplines. The pieces are all there. Provenance has proved to be a powerful and uniquely user-centered concept for the archival profession. With the advent of ubiquitous digital technology, which tends to help transfer ideas across traditional professional boundaries, it’s time to expand and translate that notion to other fields for other uses.3 I say that arrangement is not so important for image database structure because image databases generally don’t rely on hierarchies to the same extent that traditional archives do.ReferencesBellardo, L. J., & Bellardo, L. L. (1992). A glossary for archivists, manuscript curators, and records managers. Chicago: Society of American Archivists.Dearstyne, B. W. (1993). The archival enterprise: Modern archival principles, practices, and management techniques. Chicago, IL: American Library Association.Gränström, C. (1994). The Janus syndrome. The Principle of Provenance. Stockholm: Swedish National Archives.Horsman, P. (1994). Taming the elephant: An orthodox approach to the Principle of Provenance. The Principle of Provenance. Stockholm: Swedish National Archives.Horsman, P. (1999). Dirty Hands: A new perspective on the original order. Archives and Manuscripts, 27(1), 42-53.Schellenberg, T. R. (1961). Archival principles of arrangement. American Archivist, 24, 11-24. Summers, D. (2003). Real spaces: World art history and the rise of modernism. New York: Phaidon. Winget, M. (2003). Metadata for Digital Images: Theory and Practice.。
接受建议的英语作文英文回答:In a world where information is readily accessible, it is becoming increasingly important to have the ability to critically evaluate and select reliable sources. Accepting advice can be a challenging task, especially when it comes from an unknown or unreliable source. However, by following a few simple guidelines, you can significantly improve your chances of making informed decisions based on trustworthy advice.1. Consider the Source:The first step in evaluating advice is to consider the source. What is their expertise in the area? Do they have a vested interest in the outcome? Are they known for providing accurate and unbiased information? If you can't find a clear answer to these questions, you should approach the advice with caution.2. Look for Multiple Perspectives:It is never a good idea to rely on a single source for advice. Seek out multiple perspectives to gain a broader understanding of the issue. This will help you identify any potential biases or limitations in the information you are receiving.3. Evaluate the Evidence:Once you have gathered advice from multiple sources, it is important to evaluate the evidence presented to support each claim. Are the sources credible? Is the data presented reliable? Are there any obvious flaws in the logic or reasoning? By critically examining the evidence, you can assess the validity of the advice.4. Consider Your Own Values and Beliefs:While it is important to consider the advice of others, you should also take into account your own values andbeliefs. Does the advice align with your own principles and goals? If not, it may not be the best course of action for you.5. Seek Professional Guidance if Needed:In some cases, it may be necessary to seek professional guidance before making a decision. If the advice you are receiving is related to a sensitive or complex issue, it is recommended to consult with an expert in the field.中文回答:接受建议的技巧。
七年级英语数据统计单选题80题1. There are ______ students in our class.A. twentyB. twentysC. twentyesD. twentith答案:A。
twenty 是20 的正确写法,B 选项twentys 写法错误,C 选项twentyes 写法错误,D 选项twentith 是序数词“第二十”,不符合题意。
2. I have ______ apples.A. threeB. thirdC. the threeD. the third答案:A。
three 是基数词“三”,表示数量,A 选项正确;B 选项third 是序数词“第三”;C 选项the three 表述错误;D 选项the third 是“第三”。
3. The price of the shoes is ______.A. fifty yuanB. fiftieth yuanC. the fifty yuanD. the fiftieth yuan答案:A。
fifty yuan 表示“五十元”,A 选项正确;B 选项fiftieth yuan 表述错误;C 选项the fifty yuan 表述错误;D 选项the fiftieth yuan 表示“第五十元”,不符合题意。
4. We need ______ books.A. fiveB. fifthC. the fiveD. the fifth答案:A。
five 是基数词“五”,表示数量,A 选项正确;B 选项fifth 是序数词“第五”;C 选项the five 表述错误;D 选项the fifth 是“第五”。
5. There are ______ days in a week.A. sevenB. seventhC. the sevenD. the seventh答案:A。
seven 是基数词“七”,表示数量,A 选项正确;B 选项seventh 是序数词“第七”;C 选项the seven 表述错误;D 选项the seventh 是“第七”。
2025年研究生考试考研英语(一201)自测试卷及答案指导一、完型填空(10分)Section I: Cloze TestDirections: Read the following text carefully and choose the best answer from the four choices marked A, B, C, and D for each blank.Passage:In today’s rapidly evolving digital landscape, the role of social media has become increasingly significant. Social media platforms are not just tools for personal interaction; they also serve as powerful channels for business promotion and customer engagement. Companies are now leveraging these platforms to reach out to their target audience more effectively than ever before. However, the effectiveness of social media marketing (1)_on how well the company understands its audience and the specific platform being used. For instance, while Facebook may be suitable for reaching older demographics, Instagram is more popular among younger users. Therefore, it is crucial for businesses to tailor their content to fit the preferences and behaviors of the (2)_demographic they wish to target.Moreover, the rise of mobile devices has further transformed the way peopleconsume content online. The majority of social media users now access these platforms via smartphones, which means that companies must ensure that their content is optimized for mobile viewing. In addition, the speed at which information spreads on social media can be both a boon and a bane. On one hand, positive news about a brand can quickly go viral, leading to increased visibility and potentially higher sales. On the other hand, negative publicity can spread just as fast, potentially causing serious damage to a brand’s reputation. As such, it is imperative for companies to have a well-thought-out strategy for managing their online presence and responding to feedback in a timely and professional manner.In conclusion, social media offers unparalleled opportunities for businesses to connect with customers, but it requires careful planning and execution to (3)___the maximum benefits. By staying attuned to trends and continuously adapting their strategies, companies can harness the power of social media to foster growth and build strong relationships with their audiences.1.[A] relies [B] bases [C] stands [D] depends2.[A] particular [B] peculiar [C] special [D] unique3.[A] obtain [B] gain [C] achieve [D] accomplishAnswers:1.D - depends2.A - particular3.C - achieveThis cloze test is designed to assess comprehension and vocabulary skills, as well as the ability to infer the correct usage of words within the context of the passage. Each question is crafted to require understanding of the sentence structure and meaning to select the best option.二、传统阅读理解(本部分有4大题,每大题10分,共40分)第一题Passage:In the 1950s, the United States experienced a significant shift in the way people viewed education. This shift was largely due to the Cold War, which created a demand for a highly educated workforce. As a result, the number of students pursuing higher education in the U.S. began to grow rapidly.One of the most important developments during this period was the creation of the Master’s degree program. The Master’s degree was designed to provide students with advanced knowledge and skills in a specific field. This program became increasingly popular as more and more people realized the value of a higher education.The growth of the Master’s degree program had a profound impact on American society. It helped to create a more educated and skilled workforce, which in turn contributed to the nation’s economic growth. It also helped to improve the quality of life for many Americans by providing them with opportunities for career advancement and personal development.Today, the Master’s degree is still an important part of the American educational system. However, there are some challenges that need to be addressed. One of the biggest challenges is the rising cost of education. As the cost of tuition continues to rise, many students are unable to afford the cost of a Master’s degree. This is a problem that needs to be addressed if we are to continue to provide high-quality education to all Americans.1、What was the main reason for the shift in the way people viewed education in the 1950s?A. The demand for a highly educated workforce due to the Cold War.B. The desire to improve the quality of life for all Americans.C. The increasing cost of education.D. The creation of the Master’s degree program.2、What is the purpose of the Master’s degree program?A. To provide students with basic knowledge and skills in a specific field.B. To provide students with advanced knowledge and skills in a specific field.C. To provide students with job training.D. To provide students with a general education.3、How did the growth of the Master’s degree program impact American society?A. It helped to create a more educated and skilled workforce.B. It helped to improve the quality of life for many Americans.C. It caused the economy to decline.D. It increased the cost of education.4、What is one of the biggest challenges facing the Master’s deg ree program today?A. The demand for a highly educated workforce.B. The rising cost of education.C. The desire to improve the quality of life for all Americans.D. The creation of new educational programs.5、What is the author’s main point in the last pa ragraph?A. The Master’s degree program is still an important part of the American educational system.B. The cost of education needs to be addressed.C. The Master’s degree program is no longer relevant.D. The author is unsure about the future of the Master’s degree program.第二题Reading Comprehension (Traditional)Passage:The digital revolution has transformed the way we live, work, and communicate. With the advent of the internet and the proliferation of smart devices, information is more accessible than ever before. This transformation has had a profound impact on education, with online learning platforms providing unprecedented access to knowledge. However, this shift towards digital learningalso poses challenges, particularly in terms of ensuring equitable access and maintaining educational quality.While the benefits of digital learning are numerous, including flexibility, cost-effectiveness, and the ability to reach a wider audience, there are concerns about the potential for increased social isolation and the difficulty in replicating the dynamic, interactive environment of a traditional classroom. Moreover, not all students have equal access to the technology required for online learning, which can exacerbate existing inequalities. It’s crucial that as we embrace the opportunities presented by digital technologies, we also address these challenges to ensure that no student is left behind.Educators must adapt their teaching methods to take advantage of new tools while also being mindful of the need to foster a sense of community and support among students. By integrating both digital and traditional approaches, it’s possible to create a learning environment that leverages the strengths of each, ultimately enhancing the educational experience for all students.Questions:1、What is one of the main impacts of the digital revolution mentioned in the passage?•A) The reduction of social interactions•B) The increase in physical book sales•C) The transformation of communication methods•D) The decline of online learning platformsAnswer: C) The transformation of communication methods2、According to the passage, what is a challenge associated with digital learning?•A) The inability to provide any form of interaction•B) The potential to widen the gap between different socioeconomic groups •C) The lack of available content for online courses•D) The complete replacement of traditional classroomsAnswer: B) The potential to widen the gap between different socioeconomic groups3、Which of the following is NOT listed as a benefit of digital learning in the passage?•A) Cost-effectiveness•B) Flexibility•C) Increased social isolation•D) Wider reachAnswer: C) Increased social isolation4、The passage suggests that educators should do which of the following in response to the digital revolution?•A) Abandon all traditional teaching methods•B) Focus solely on improving students’ technical skills•C) Integrate digital and traditional teaching methods•D) Avoid using any digital tools in the classroomAnswer: C) Integrate digital and traditional teaching methods5、What is the author’s stance on the role of digital technologies ineducation?•A) They are unnecessary and should be avoided•B) They offer opportunities that should be embraced, but with caution •C) They are the only solution to current educational challenges•D) They have no real impact on the quality of educationAnswer: B) They offer opportunities that should be embraced, but with cautionThis reading comprehension exercise is designed to test your understanding of the text and your ability to identify key points and arguments within the passage.第三题Reading PassageWhen the French sociologist and philosopher Henri Lefebvre died in 1991, he left behind a body of work that has had a profound influence on the fields of sociology, philosophy, and cultural studies. Lefebvre’s theories focused on the relationship between space and society, particularly how space is produced, represented, and experienced. His work has been widely discussed and debated, with scholars and critics alike finding value in his insights.Lefebvre’s most famous work, “The Production of Space,” published in 1974, laid the foundation for his theoretical framework. In this book, he argues that space is not simply a container for human activities but rather an active agent in shaping social relationships and structures. Lefebvre introduces the concept of “three spaces” to describe the production of space: the perceived space,the lived space, and the representative space.1、According to Lefebvre, what is the primary focus of his theories?A. The development of urban planningB. The relationship between space and societyC. The history of architectural designD. The evolution of cultural practices2、What is the main argument presented in “The Production of Space”?A. Space is a passive entity that reflects social structures.B. Space is a fundamental building block of society.C. Space is an object that can be easily manipulated by humans.D. Space is irrelevant to the functioning of society.3、Lefebvre identifies three distinct spaces. Which of the following is NOT one of these spaces?A. Perceived spaceB. Lived spaceC. Representative spaceD. Economic space4、How does Lefebvre define the concept of “three spaces”?A. They are different types of architectural designs.B. They represent different stages of the production of space.C. They are different ways of perceiving and experiencing space.D. They are different social classes that occupy space.5、What is the significance of Lefebvre’s work in the fields of sociology and philosophy?A. It provides a new perspective on the role of space in social relationships.B. It offers a comprehensive guide to urban planning and development.C. It promotes the idea that space is an unimportant aspect of society.D. It focuses solely on the history of architectural movements.Answers:1、B2、B3、D4、C5、A第四题Reading Comprehension (Traditional)Read the following passage and answer the questions that follow. Choose the best answer from the options provided.Passage:In recent years, there has been a growing interest in the concept of “smart cities,” which are urban areas that u se different types of electronic data collection sensors to supply information which is used to manage assets and resources efficiently. This includes data collected from citizens, devices, andassets that is processed and analyzed to monitor and manage traffic and transportation systems, power plants, water supply networks, waste management, law enforcement, information systems, schools, libraries, hospitals, and other community services. The goal of building a smart city is to improve quality of life by using technology to enhance the performance and interactivity of urban services, to reduce costs and resource consumption, and to increase contact between citizens and government. Smart city applications are developed to address urban challenges such as environmental sustainability, mobility, and economic development.Critics argue, however, that while the idea of a smart city is appealing, it raises significant concerns about privacy and security. As more and more aspects of daily life become digitized, the amount of personal data being collected also increases, leading to potential misuse or unauthorized access. Moreover, the reliance on technology for critical infrastructure can create vulnerabilities if not properly secured against cyber-attacks. There is also a risk of widening the digital divide, as those without access to the necessary technologies may be left behind, further exacerbating social inequalities.Despite these concerns, many governments around the world are moving forward with plans to develop smart cities, seeing them as a key component of their future strategies. They believe that the benefits of improved efficiency and service delivery will outweigh the potential risks, provided that adequate safeguards are put in place to protect citizen s’ data and ensure the resilience of thecity’s technological framework.Questions:1、What is the primary purpose of developing a smart city?•A) To collect as much data as possible•B) To improve the quality of life through efficient use of technology •C) To replace all traditional forms of communication•D) To eliminate the need for human interaction in urban services2、According to the passage, what is one of the main concerns raised by critics regarding smart cities?•A) The lack of available technology•B) The high cost of implementing smart city solutions•C) Privacy and security issues related to data collection•D) The inability to provide essential services3、Which of the following is NOT mentioned as an area where smart city technology could be applied?•A) Traffic and transportation systems•B) Waste management•C) Educational institutions•D) Agricultural production4、How do some governments view the development of smart cities despite the criticisms?•A) As a risky endeavor that should be avoided•B) As a temporary trend that will soon pass•C) As a strategic move with long-term benefits•D) As an unnecessary investment in technology5、What does the term “digital divide” refer to in the context of smart cities?•A) The gap between the amount of data collected and the amount of data analyzed•B) The difference in technological advancement between urban and rural areas•C) The disparity in access to technology and its impact on social inequality•D) The separation of digital and non-digital methods of service delivery Answers:1、B) To improve the quality of life through efficient use of technology2、C) Privacy and security issues related to data collection3、D) Agricultural production4、C) As a strategic move with long-term benefits5、C) The disparity in access to technology and its impact on social inequality三、阅读理解新题型(10分)Reading Comprehension (New Type)Passage:The rise of e-commerce has transformed the way people shop and has had aprofound impact on traditional brick-and-mortar retailers. Online shopping offers convenience, a wide range of products, and competitive prices. However, it has also raised concerns about the future of physical stores. This passage examines the challenges and opportunities facing traditional retailers in the age of e-commerce.In recent years, the popularity of e-commerce has soared, thanks to advancements in technology and changing consumer behavior. According to a report by Statista, global e-commerce sales reached nearly$4.2 trillion in 2020. This upward trend is expected to continue, with projections showing that online sales will account for 25% of total retail sales by 2025. As a result, traditional retailers are facing fierce competition and must adapt to the digital landscape.One of the main challenges for brick-and-mortar retailers is the shift in consumer preferences. Many shoppers now prefer the convenience of online shopping, which allows them to compare prices, read reviews, and purchase products from the comfort of their homes. This has led to a decrease in foot traffic in physical stores, causing many retailers to struggle to attract customers. Additionally, the ability to offer a wide range of products at competitive prices has become a hallmark of e-commerce, making it difficult for traditional retailers to compete.Despite these challenges, there are opportunities for traditional retailers to thrive in the age of e-commerce. One approach is to leverage the unique strengths of physical stores, such as the ability to provide an immersiveshopping experience and personalized customer service. Retailers can also use technology to enhance the in-store experience, such as implementing augmented reality (AR) to allow customers to visualize products in their own homes before purchasing.Another strategy is to embrace the digital world and create a seamless shopping experience that integrates online and offline channels. For example, retailers can offer online returns to brick-and-mortar stores, allowing customers to shop online and return items in person. This not only provides convenience but also encourages customers to make additional purchases while they are in the store.Furthermore, traditional retailers can leverage their established brand loyalty and customer base to create a competitive advantage. By focusing on niche markets and offering unique products or services, retailers can differentiate themselves from e-commerce giants. Additionally, retailers can invest in marketing and promotions to drive traffic to their physical stores, even as more consumers turn to online shopping.In conclusion, the rise of e-commerce has presented traditional retailers with significant challenges. However, by embracing the digital landscape, leveraging their unique strengths, and focusing on customer satisfaction, traditional retailers can adapt and thrive in the age of e-commerce.Questions:1.What is the main concern raised about traditional retailers in the age of e-commerce?2.According to the passage, what is one of the main reasons for the decline in foot traffic in physical stores?3.How can traditional retailers leverage technology to enhance the in-store experience?4.What strategy is mentioned in the passage that involves integrating online and offline channels?5.How can traditional retailers create a competitive advantage in the age of e-commerce?Answers:1.The main concern is the fierce competition from e-commerce and the shift in consumer preferences towards online shopping.2.The main reason is the convenience and competitive prices offered by e-commerce, which make it difficult for traditional retailers to compete.3.Traditional retailers can leverage technology by implementing augmented reality (AR) and offering online returns to brick-and-mortar stores.4.The strategy mentioned is to create a seamless shopping experience that integrates online and offline channels, such as offering online returns to brick-and-mortar stores.5.Traditional retailers can create a competitive advantage by focusing on niche markets, offering unique products or services, and investing in marketing and promotions to drive traffic to their physical stores.四、翻译(本大题有5小题,每小题2分,共10分)First QuestionTranslate the following sentence into Chinese. Write your translation on the ANSWER SHEET.Original Sentence:“Although technology has brought about nume rous conveniences in our daily lives, it is also true that it has led to significant privacy concerns, especially with the rapid development of digital communication tools.”Answer:尽管技术在我们的日常生活中带来了诸多便利,但也不可否认它导致了重大的隐私问题,尤其是在数字通信工具快速发展的情况下。
Data Provenance:Some Basic IssuesPeter Buneman,Sanjeev Khanna,and Wang-Chiew TanUniversity of PennsylvaniaAbstract.The ease with which one can copy and transform data onthe Web,has made it increasingly difficult to determine the origins of apiece of data.We use the term data provenance to refer to the processof tracing and recording the origins of data and its movement betweendatabases.Provenance is now an acute issue in scientific databases whereit is central to the validation of data.In this paper we discuss some ofthe technical issues that have emerged in an initial exploration of thetopic.1IntroductionWhen youfind some data on the Web,do you have any information about how it got there?It is quite possible that it was copied from somewhere else on the Web,which,in turn may have also been copied;and in this process it may well have been transformed and edited.Of course,when we are looking for a best buy,a news story,or a movie rating,we know that what we are getting may be inaccurate,and we have learned not to put too much faith in what we extract from the Web.However,if you are a scientist,or any kind of scholar,you would like to have confidence in the accuracy and timeliness of the data that you are working with.In particular,you would like to know how it got there.In its brief existence,the Web has completely changed the way in which data is circulated.We have moved very rapidly from a world of paper documents to a world of on-line documents and databases.In particular,this is having a profound effect on how scientific research is conducted.Let us list some aspects of this transformation:–A paper document is essentially unmodifiable.To“change”it one issues a new edition,and this is a costly and slow process.On-line documents,by contrast,can be(and often are)frequently updated.–On-line documents are often databases,which means that they have explicit structure.The development of XML has blurred the distinction between documents and databases.–On-line documents/databases typically contain data extracted from other documents/databases through the use of query languages or“screen-scrap-ers”.Among the sciences,thefield of Molecular Biology is possibly one of the most sophisticated consumers of modern database technology and has generated S.Kapoor and S.Prasad(Eds.):FST TCS2000,LNCS1974,pp.87–93,2000.c Springer-Verlag Berlin Heidelberg200088Peter Buneman,Sanjeev Khanna,and Wang-Chiew Tana wealth of new database issues[15].A substantial fraction of research in ge-netics is conducted in “dry”laboratories using in silico experiments –analysis of data in the available databases.Figure 1shows how data flows through a very small fraction of the available molecular biology databases 1.In all but one case,there is a Lit –for literature –input to a database indicating that this is database is curated .The database is not simply obtained by a database query or by on-line submission,but involves human intervention in the form of addi-tional classification,annotation and error correction.An interesting property of this flow diagram is that there is a cycle in it.This does not mean that there is perpetual loop of possibly inaccurate data flowing through the system (though this might happen);it means that the two databases overlap in some area and borrow on the expertise of their respective curators.The point is that it may now be very difficult to determine where a specific piece of data comes from.We use the term data provenance broadly to refer to a description of the origins of a piece of data and the process by which it arrived in a database.Most im-plementors and curators of scientific databases would like to record provenance,but current database technology does not provide much help in this process for databases are typically rather rigid structures and do not allow the kinds of ad hoc annotations that are often needed for recording provenance.Fig.1.The Flow of Data in BioinformaticsThe databases used in molecular biology form just one example of why data provenance is an important issue.There are other areas in which it is equally acute [5].It is an issue that is certainly broader than computer science,with legal 1Thanks to Susan Davidson,Fidel Salas and Chris Stoeckert of the Bioinformatics Center at Penn for providing this information.Data Provenance:Some Basic Issues89 and ethical aspects.The question that computer scientists,especially theoretical computer scientists,may want to ask is what are the technical issues involved in the study of data provenance.As in most areas of computer science,the hard part is to formulate the problem in a concise and applicable fashion.Once that is done,it often happens that interesting technical problems emerge.This abstract reviews some of the technical issues that have emerged in an initial exploration.2Computing Provenance:Query InversionPerhaps the only area of data provenance to receive any substantial attention is that of provenance of data obtained via query operations on some input databases.Even in this restricted setting,a formalization of the notion of data provenance turns out to be a challenging problem.Specifically,given a tuple t in the output of a database query Q applied on some source data D,we want to understand which tuples in D contributed to the output tuple t,and if there is a compact mechanism for identifying these input tuples.A natural approach is to generate a new query Q ,determined by Q,D and t,such that when the query Q is applied to D,it generates a collection of input tuples that“contributed to”the output tuple t.In other words,we would like to identify the provenance by inverting the original query.Of course,we have to ask what we mean by con-tributed to?This problem has been studied under various names including“data pedigree”and“data lineage”in[1,9,7].One way we might answer this question is to say that a tuple in the input database“contributes to”an output tuple if changing the input tuple causes the output tuple to change or to disappear from the output.This definition breaks down on the simplest queries(a projection or union).A better approach is to use a simple proof-theoretic definition.If we are dealing with queries that are expressible in positive relational algebra(SPJU) or more generally in positive datalog,we can say that an input tuple(a fact)“contributes to”an output tuple if it is used in some minimal derivation of that tuple.This simple definition works well,and has the expected properties:it is invariant under query rewriting,and it is compositional in the expected way. Unfortunately,these desirable properties break down in the presence of negation or any form of aggregation.To see this consider a simple SQL query: SELECT name,telephoneFROM employeeWHERE salary>SELECT AVERAGE salary FROM employeeHere,modifying any tuple in the employee relation could affect the presence of any given output tuple.Indeed,for this query,the definition of“contributes to”given in[9]makes the whole of the employee relation contribute to each tuple in the output.While this is a perfectly reasonable definition,the properties of invariance under query rewriting and compositionality break down,indicating that a more sophisticated definition may be needed.Before going further it is worth remarking that this characterization of prove-nance is related to the topics of truth maintenance[10]and view maintenance90Peter Buneman,Sanjeev Khanna,and Wang-Chiew Tan[12].The problem in view maintenance is as follows.Suppose a database(a view) is generated by an expensive query on some other database.When the source database changes,we would like to recompute the view without recomputing the whole query.Truth maintenance is the same problem in the terminology of deductive systems.What may make query inversion simpler is that we are only interested in what is in the database;we are not interested in updates that would add tuples to the database.In[7]another notion of provenance is introduced.Consider the SQL query above,and suppose we see the tuple("John Doe",12345)in the output.What the previous discussion tells us is why that tuple is in the output.However,we might ask an apparently simpler question:given that the tuple appears in the output,where does the telephone number12345come from?The answer to this seems easy–from the"John Doe"tuple in the input.This seems to imply that as long as there is some means of identifying tuples in the employee relation, one can compute where-provenance by tracing the variable(that emits12345) of the query.However,this intuition is fragile and a general characterization is not obvious;it is discussed in[7].We remark that this second form of provenance,where-provenance,is also related to the view update problem[3]:if John Doe decides to change his tele-phone number at the view,which data should be modified in the employee relation?Again,where-provenance seems simpler because we are only interested in modifications to the existing view;we are not interested in insertions to the view.Another issue in query inversion is to capture other query languages and other data models.For example,we would like to describe the problem in object-oriented[11]or semistructured data models[2](XML).What makes these models interesting is that we are no longer operating at thefixed level of tuples in the relational model.We may want to ask for the why-or where-provenance of some deeply nested component of some structure.To this end,[7]studies the issue of data provenance in a“deterministic”model of semistructured data in which every element has a canonical path or identifier.Work on view maintainence based on this model has also been studied in[14].This leads us to our next topics,those of citing and archiving data.3Data CitationA digital library is typically a large and heterogeneous collection of on-line docu-ments and databases with sophisticated software for exploring the collection[13]. However many digital libraries are also being organized so that they serve as scholarly resources.This being the case,how do we cite a component of a digital library.Surprisingly,this topic has received very little attention.There appear to be no generally useful standards for citations.Well organized databases are constructed with keys that allow us uniquely to identify a tuple in a relation. By giving the attribute name we can identify a component of a tuple,so there is usually a canonical path to any component of the database.Data Provenance:Some Basic Issues91 How we cite portions of documents,especially XML documents is not soclear.A URL provides us with a universal locator for a document,but howare we to proceed once we are inside the document?Page numbers and line numbers–if they exist–are friable,and we have to remember that an XMLdocument may now represent a database for which the linear document structure is irrelevant.There are some initial notions of keys in the XML standard[4]and in the XML Schema proposals[16].In the XML Document Type Descriptor(DTD)one can declare an ID attribute.Values for this attribute are to be unique in the document and can be used to locate elements of the document.Howeverthe ID attribute has nothing to do with the structure of the document–it issimply a user-defined identifier.In XML-Schema the definition of a key relies on XPath[8],a path descriptionlanguage for XML.Roughly speaking a key consists of two paths through the data.Thefirst is a path,for example Department/Employee,that describes theset of nodes upon which a key constraint is to be imposed.This is called the target set.The second is another path,for example IdCard/Number that uniquelyidentifies nodes in the target set.This second part is called the key path,andthe rule is that two distinct nodes in the target set must have different values at the end of their key paths.Apart from some details and the fact that XPathis probably too complex a language for key specification,this definition is quiteserviceable,but it does not take into account the hierarchical structure of keys that are common in well-organized databases and documents.To give an example of what is needed,consider the problem of citing a part of a bible,organized by chapter,book and verse.We might start withthe idea that books in the bible are keyed by name,so we use the pair of paths (Bible/Book,Name).We are assuming here that Bible is the unique root.Now we may want to indicate that chapters are specified by number,but it wouldbe incorrect to write(Bible/Book/Chapter,Number)because this says that that chapter numbers are unique within the bible.Instead we need to specify a relative key which consists of a triple,(Bible/Book,Chapter,Number).What this means is that the(Chapter,Number)key is to hold at every node specified by by the path Bible/Book.A more detailed description of relative keys is given in[6].While some basic inference results are known,there is a litany of open questions surrounding them:What are appropriate path languages for the various components of a key?What inference results can be established for these languages?How do we specify foreign keys,and what results hold for them?What interactions are there between keys and DTDs.These are practical questions that will need to be answered if,as we do in databases,use keys as the basis for indexing and query optimization.92Peter Buneman,Sanjeev Khanna,and Wang-Chiew Tan4Archiving and Other Problems Associated with ProvenanceLet us suppose that we have a good formulation,or even a standard,for data citation,and that document A cites a(component of a)document B.Whose responsibility is it to maintain the integrity of B?The owner of B may wish to update it,thereby invalidating the citation in A.This is a serious problem in scientific databases,and what is commonly done is to release successive versions of a database as separate documents.Since one version is–more or less–an extension the previous version,this is wasteful of space and the space overhead limits the rate at which one can release versions.Also,it is difficult when the history of a database is kept in this form to trace the history of components of the database as defined by the key structure.There are a number of open questions:–Can we compress versions so that the history of A can be efficiently recorded?–Should keeping the cited data be the responsibility of A rather than B?–Should Bfigure out what is being cited and keep only those portions?In this context it is worth noting that,when we cite a URL,we hardly ever give a date for the citation.If we did this,at least the person who follows the citation will know whether to question the validity of the citation by comparing it with the timestamp on the URL.Again,let us suppose that we have an agreed standard for citations and that,rather than computing provenance by query inversion(which is only possi-ble when the data of interest is created by a query,)we decide to annotate each element in the database with one or more citations that describes its provenance. What is the space overhead for doing this?Given that the citations have struc-ture and that the structure of the data will,in part,be related to the structure of the data,one assumes that some form of compression is possible.Finally,one is tempted to speculate that we may need a completely different model of data exchange and databases to characterize and to capture provenance. One could imagine that data is exchanged in packages that are“self aware”2and somehow contain a complete history of how they moved through the system of databases,of how they were constructed,and of how they were changed.The idea is obviously appealing,but whether it can be formulated clearly,let alone be implemented,is an open question.References[1] A.Woodruffand M.Stonebraker.Supportingfine-grained data lineage in adatabase visualization environment.In ICDE,pages91–102,1997.[2]Serge Abiteboul,Peter Buneman,and Dan Suciu.Data on the Web.From Rela-tions to Semistructured Data and XML.Morgan Kaufman,2000.2A term suggested by David MaierData Provenance:Some Basic Issues93 [3]T.Barsalou,N.Siambela,A.Keller,and G Wiederhold.Updating relationaldatabases through object-based views.In Proceedings ACM SIGMOD,May1991.[4]Tim Bray,Jean Paoli,and C.M.Sperberg-McQueen.Extensible MarkupLanguage(XML) 1.0.World Wide Web Consortium(W3C),Feb1998./TR/REC-xml.[5]P.Buneman,S.Davidson,M.Liberman,C.Overton,and V.Tannen.Data prove-nance./∼wctan/DataProvenance/precis/index.html.[6]Peter Buneman,Susan Davidson,Carmem Hara,Wenfei Fan,and Wang-ChiewTan.Keys for XML.Technical report,University of Pennsylvania,2000..[7]Peter Buneman,Sanjeev Khanna,and Wang-Chiew Tan.Why and Where:ACharacterization of Data Provenance.In International Conference on Database Theory,2001.To appear,available at .[8]James Clark and Steve DeRose.XML Path Language(XPath).W3C WorkingDraft,November1999./TR/xpath.[9]Y.Cui and J.Widom.Practical lineage tracing in data warehouses.In ICDE,pages367–378,2000.[10]Jon Doyle.A truth maintenance system.Artificial Intelligence,12:231–272,1979.[11]R.G.G.Cattell et al,editor.The Object Database Standard:Odmg2.0.MorganKaufmann,1997.[12] A.Gupta and I.Mumick.Maintenance of materialized views:Problems,tech-niques,and applications.IEEE Data Engineering Bulletin,Vol.18,No.2,June 1995.,1995.[13]Michael Lesk.Practical Digital Libraries:Books,Bytes and Bucks,.MorganKaufmann,July1997.[14]Hartmut Liefke and Susan Davidson.View maintenance for hierarchical semistruc-tured data.In International Conference on Data Warehousing and Knowledge Discovery,2000.[15]Susan Davidson and Chris Overton and Peter Buneman.Challenges in IntegratingBiological Data Sources.Journal of Computational Biology,2(4):557–572,Winter 1995.[16]World Wide Web Consortium(W3C).XML Schema Part0:Primer,2000./TR/xmlschema-0/.。