Sigmetrics15

格式：pdf
大小：306.43 KB
文档页数：14

下载文档原格式

清华会议分级

2
（2）rank1.5 不在 rank1 中的会议，如果符合下面的条件，其 RANK 为 1.5 *IF >= 1.2,且领域不是非常窄 *IF >=0.9,且领域不是非常窄，且在作为参考的 3 所大学（MIT, UCLA, NTU）的列表中不只被列做 RANK2 *IF >= 0.6，且在至少一个参考列表中列为 RANK1
列表的生成采用量化标准＋定性分析的方法量化指标包括：Citeseer 给出的排序结果、清华大学计算机系计算的排序结果（按照 SCI 引用因子的计算方式计算的 IF 排序，计算方法见附录）、三所大学（MIT、UCLA、NUS）的排序结果。各个列表中的数据统计见下表。定性分析由计算机系组织各所代表进行讨论，主要进行了三次讨论，参加讨论的人员包括：孙茂松、陈文光、冯建华、唐杰、王建勇、徐明伟、任丰原、刘永进、张敏、崔勇和白晓颖。部分会议的引用因子满足进入 rank2 的条件，但参与前期讨论的老师都不熟悉该会议，暂时列为待确认 rank2 会议。最后 rank1, rank1.5, rank2 将最多只保留 300 个会议。 Rank1 MIT UCLA NUS Citeseer 覆盖的会议 THU（清华大学）覆盖的会议具体评估指标：（1）rank1 *IF 3.5 以上，且其领域不是非常窄 *IF >= 1.5, 且该领域没有其它明显强于此会议的其他会议 *0.8<=IF<1.5,且在作为参考的 3 所大学（MIT, UCLA, NTU）的列表中都列为 RANK1 的会议 38 91 59 965 1202 Rank2 0 93 96
5
3. 一流会议列表(Rank1.5)
会议缩写会议全称每年篇数影响因子 5.79

中国计算机学会推荐国际学术会议和期刊目录AB

IFIP International Conferences on Networking
International Conference on Network Protocols International Conference on Pervasive Computing and Communications Internet Measurement Conference International Symposium on Mobile Ad Hoc Networking and Computing International Workshop on Quality of Service
3 4 5 6
Eurocrypt
ESORICS CSFW RAID
7 8 9 10 11
NDSS DSN
ISOC Network and Distributed System Security Symposium The International Conference on Dependable Systems and Networks Theory of Cryptography Conference USENIX Security Symposium Workshop on Information Hiding
Annual Computer Security Applications Conference
CRYPTO 会议简称
ACSAC
B类
序号 1
2
ASIACRYPT
Annual International Conference on the Theory and Application of Cryptology and Information Security Annual International Conference on the Theory and Applications of Cryptographic Techniques European Symposium on Research in Computer Security IEEE Computer Security Foundations Workshop International Symposium on Recent Advances in Intrusion Detection

上海交大计算机系推荐投稿杂志及会议

(三) 人本计算 24. IEEE Transactions on Computational Biology and Bioinformatics 25. IEEE Transactions on Biomedical Engineering 26. IEEE Trans. on Information Technology in Biomedicine
(五) 智能系统与知识处理 36. Artificial Intelligence 37. IEEE Trans on Pattern Analysis and Machine Intl 38. Intl Jnl of Computer Vision 39. IEEE Trans Neural Networks 40. IEEE Trans Image Processing 41. Computational Linguistics 42. Machine Learning 43. Journal of Machine Learning Research
(七) 程序语言与软件工程 48. ACM Trans on Programming Languages & Systems 49. IEEE Trans on Software Engineering (TSE) 50. ACM Trans on S/W Eng and Methodology 51. ACM Transactions on Embedded Computing Systems (TECS) 52. ACM Transactions on Architecture and Code Optimization (TACO)
上海交通大学计算机科学与工程系日期： 2010 年 10 Nhomakorabea 25 日

IEEE会议排名

Rank 1:SIGCOMM：ACM Conf on Comm Architectures，Protocols ＆Apps INFOCOM: Annual Joint Conf IEEE Comp ＆Comm SocSPAA: Symp on Parallel Algms and ArchitecturePODC：ACM Symp on Principles of Distributed ComputingPPoPP：Principles and Practice of Parallel ProgrammingRTSS：Real Time Systems SympSOSP: ACM SIGOPS Symp on OS PrinciplesSOSDI: Usenix Symp on OS Design and ImplementationCCS: ACM Conf on Comp and Communications SecurityIEEE Symposium on Security and PrivacyMOBICOM: ACM Intl Conf on Mobile Computing and NetworkingUSENIX Conf on Internet Tech and SysICNP：Intl Conf on Network ProtocolsPACT：Intl Conf on Parallel Arch and Compil TechRTAS: IEEE Real-Time and Embedded Technology and Applications Symposium ICDCS：IEEE Intl Conf on Distributed Comp SystemsRank 2：CC: Compiler ConstructionIPDPS: Intl Parallel and Dist Processing SympIC3N: Intl Conf on Comp Comm and NetworksICPP: Intl Conf on Parallel ProcessingSRDS: Symp on Reliable Distributed SystemsMPPOI：Massively Par Proc Using Opt InterconnsASAP: Intl Conf on Apps for Specific Array ProcessorsEuro—Par：European Conf。

Mytrix GMS-15 RGB 游戏鼠标垫大辣椒粉说明书

Manuals+— User Manuals Simpliﬁed.Mytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink User GuideHome » mytrix » Mytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink User GuideContents1 Mytrix GMS-15 RGB Gaming Mousepad Large CherryPink2 INTERFACE INSTRUCTIONS3 RGB SETTINGS4 TECHNICAL SPECIFICATION5 Documents / Resources6 Related PostsMytrix GMS-15 RGB Gaming Mousepad Large Cherry PinkThank you for buying Mytrix RGB mouse pad GMS-15. It supports a plug-andplay USB interface conveniently and there are a total of 14 light modes to meet your preference.INTERFACE INSTRUCTIONS1. RGB Light Mode Selecting Button2. Breathable Waterproof Micro-Texture Cloth Surface3. Non-slip Rubber Base4. RGB Light EdgeCONNECTIONPlug the detachable Type-C to USB-A cable into the mouse pad and turn on the SWITCH button.RGB SETTINGSFunctions:Select the Light Mode:1. RGB Light Mode Selecting Button2. Breathable Waterproof Micro-Texture Cloth Surface3. Non-slip Rubber Base4. RGB Light EdgePlug the detachable Type-C to USB-A cable into the mouse pad and turn on the SWITCH button. Press the SWITCH button to switch from 14 light modes in sequenceAdjust the Brightness:Double press the ﬁngerprint icon to switch between the high and low brightness of the current light modeACCESSORY LISTThe product comes with a 1-year limited warranty. Our goal is to achieve total customer satisfaction, we will do our best to understand our customer’s requirements and always meet the requirements. If you have any questions, please do not hesitate to contact us via *****************. We will be more than happy to assist you.Company: Mytrix Technology LLCCustomer Service: +1-978-496-8821Email: *****************Web: Address: 13 Garabedian Dr. Unit C,Salem NH 03079Documents / ResourcesMytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink [pdf] User GuideGMS-15 RGB Gaming Mousepad Large Cherry Pink, GMS-15, RGB Gaming Mousepad LargeCherry Pink, Gaming Mousepad Large Cherry Pink, Mousepad Large Cherry Pink, Large Cherry Pink, Cherry Pink, PinkManuals+,。

固态硬盘失效率报告

A Large-Scale Study of Flash Memory Failures in the FieldJustin Meza Carnegie Mellon University meza@Qiang WuFacebook,Inc.qwu@Sanjeev KumarFacebook,Inc.skumar@Onur MutluCarnegie Mellon Universityonur@ABSTRACTServers useﬂash memory based solid state drives(SSDs)as a high-performance alternative to hard disk drives to store per-sistent data.Unfortunately,recent increases inﬂash density have also brought about decreases in chip-level reliability.In a data center environment,ﬂash-based SSD failures can lead to downtime and,in the worst case,data loss.As a result, it is important to understandﬂash memory reliability char-acteristics overﬂash lifetime in a realistic production data center environment running modern applications and system software.This paper presents theﬁrst large-scale study ofﬂash-based SSD reliability in theﬁeld.We analyze data collected across a majority ofﬂash-based solid state drives at Facebook data centers over nearly four years and many millions of operational hours in order to understand failure properties and trends of ﬂash-based SSDs.Our study considers a variety of SSD char-acteristics,including:the amount of data written to and read fromﬂash chips;how data is mapped within the SSD address space;the amount of data copied,erased,and discarded by the ﬂash controller;andﬂash board temperature and bus power. Based on ourﬁeld analysis of howﬂash memory errors man-ifest when running modern workloads on modern SSDs,this paper is theﬁrst to make several major observations:(1) SSD failure rates do not increase monotonically withﬂash chip wear;instead they go through several distinct periods corresponding to how failures emerge and are subsequently detected,(2)the eﬀects of read disturbance errors are not prevalent in theﬁeld,(3)sparse logical data layout across an SSD’s physical address space(e.g.,non-contiguous data),as measured by the amount of metadata required to track logical address translations stored in an SSD-internal DRAM buﬀer, can greatly aﬀect SSD failure rate,(4)higher temperatures lead to higher failure rates,but techniques that throttle SSD operation appear to greatly reduce the negative reliability im-pact of higher temperatures,and(5)data written by the op-erating system toﬂash-based SSDs does not always accurately indicate the amount of wear induced onﬂash cells due to op-timizations in the SSD controller and buﬀering employed in the system software.We hope that theﬁndings of thisﬁrst large-scaleﬂash memory reliability study can inspire others to develop other publicly-available analyses and novelﬂash reliability solutions.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage,and that copies bear this notice and the full ci-tation on theﬁrst page.Copyrights for third-party components of this work must be honored.For all other uses,contact the owner/author(s).Copyright is held by the author/owner(s).SIGMETRICS’15,June15–19,2015,Portland,OR,USA.ACM978-1-4503-3486-0/15/06./10.1145/2745844.2745848.Categories and Subject DescriptorsB.3.4.[Hardware]:Memory Structures—Reliability,Test-ing,and Fault-ToleranceKeywordsﬂash memory;reliability;warehouse-scale data centers1.INTRODUCTIONServers useﬂash memory for persistent storage due to the low access latency ofﬂash chips compared to hard disk drives. Historically,ﬂash capacity has lagged behind hard disk drive capacity,limiting the use ofﬂash memory.In the past decade, however,advances in NANDﬂash memory technology have increasedﬂash capacity by more than1000×.This rapid in-crease inﬂash capacity has brought both an increase inﬂash memory use and a decrease inﬂash memory reliability.For example,the number of times that a cell can be reliably pro-grammed and erased before wearing out and failing dropped from10,000times for50nm cells to only2,000times for20nm cells[28].This trend is expected to continue for newer gen-erations ofﬂash memory.Therefore,if we want to improve the operational lifetime and reliability ofﬂash memory-based devices,we mustﬁrst fully understand their failure character-istics.In the past,a large body of prior work examined the failure characteristics ofﬂash cells in controlled environments using small numbers e.g.,tens)of rawﬂash chips(e.g.,[36,23,21, 27,22,25,16,33,14,5,18,4,24,40,41,26,31,30,37,6,11, 10,7,13,9,8,12,20]).This work quantiﬁed a variety ofﬂash cell failure modes and formed the basis of the community’s un-derstanding ofﬂash cell reliability.Yet prior work was limited in its analysis because these studies:(1)were conducted on small numbers of rawﬂash chips accessed in adversarial man-ners over short amounts of time,(2)did not examine failures when using real applications running on modern servers and instead used synthetic access patterns,and(3)did not account for the storage software stack that real applications need to go through to accessﬂash memories.Such conditions assumed in these prior studies are substantially diﬀerent from those expe-rienced byﬂash-based SSDs in large-scale installations in the ﬁeld.In such large-scale systems:(1)real applications access ﬂash-based SSDs in diﬀerent ways over a time span of years, (2)applications access SSDs via the storage software stack, which employs various amounts of buﬀering and hence aﬀects the access pattern seen by theﬂash chips,(3)ﬂash-based SSDs employ aggressive techniques to reduce device wear and to cor-rect errors,(4)factors in platform design,including how many SSDs are present in a node,can aﬀect the access patterns to SSDs,(5)there can be signiﬁcant variation in reliability due to the existence of a very large number of SSDs andﬂash chips.All of these real-world conditions present in large-scalePlatform SSDs PCIePer SSDCapacity Age(years)Data written Data read UBERA1v1,×4720GB 2.4±1.027.2TB23.8TB5.2×10−10B248.5TB45.1TB2.6×10−9C1v2,×41.2TB 1.6±0.937.8TB43.4TB1.5×10−10D218.9TB30.6TB5.7×10−11E13.2TB0.5±0.523.9TB51.1TB5.1×10−11F214.8TB18.2TB1.8×10−10Table1:The platforms examined in our study.PCIe technology is denoted by vX,×Y where X=version and Y=number of lanes.Data was collected over the entire age of the SSDs.Data written and data read are to/from the physical storage over an SSD’s lifetime.UBER=uncorrectable bit error rate(Section3.2).rection and forwards the result to the host’s OS.Errors that are not correctable by the driver result in data loss.1Our SSDs allow us to collect information only on large er-rors that are uncorrectable by the SSD but correctable by the host.For this reason,our results are in terms of such SSD-uncorrectable but host-correctable errors,and when we refer to uncorrectable errors we mean these type of errors.We refer to the occurrence of such uncorrectable errors in an SSD as an SSD failure.2.3The SystemsWe examined the majority ofﬂash-based SSDs in Face-book’s serverﬂeet,which have operational lifetimes extend-ing over nearly four years and comprising many millions of SSD-days of usage.Data was collected over the lifetime of the SSDs.We found it useful to separate the SSDs based on the type of platform an SSD was used in.We deﬁne a platform as a combination of the device capacity used in the system,the PCIe technology used,and the number of SSDs in the system. Table1characterizes the platforms we examine in our study. We examined a range of high-capacity multi-level cell(MLC)ﬂash-based SSDs with capacities of720GB,1.2TB,and3.2TB. The technologies we examined spanned two generations of PCIe,versions1and2.Some of the systems in theﬂeet em-ployed one SSD while others employed two.Platforms A and B contain SSDs with around two or more years of operation and represent16.6%of the SSDs examined;Platforms C and D contain SSDs with around one to two years of operation and represent50.6%of the SSDs examined;and Platforms E and F contain SSDs with around half a year of operation and represent around22.8%of the SSDs examined.2.4Measurement MethodologyTheﬂash devices in ourﬂeet contain registers that keep track of statistics related to SSD operation(e.g.,number of bytes read,number of bytes written,number of uncorrectable errors).These registers are similar to,but distinct from,the standardized SMART data stored by some SSDs to monitor their reliability characteristics[3].The values in these regis-ters can be queried by the host machine.We use a collector script to retrieve the raw values from the SSD and parse them into a form that can be curated in a Hive[38]table.This process is done in real time every hour.The scale of the systems we analyzed and the amount of data being collected posed challenges for analysis.To process the many millions of SSD-days of information,we used a cluster of machines to perform a parallelized aggregation of the data stored in Hive using MapReduce jobs in order to get a set of 1We do not examine the rate of data loss in this work.lifetime statistics for each of the SSDs we analyzed.We then processed this summary data in R[2]to collect our results.2.5Analytical MethodologyOur infrastructure allows us to examine a snapshot of SSD data at a point in time(i.e.,our infrastructure does not store timeseries information for the many SSDs in theﬂeet).This limits our analysis to the behavior of theﬂeet of SSDs at a point in time(for our study,we focus on a snapshot of SSD data taken during November2014).Fortunately,the number of SSDs we examine and the diversity of their characteristics allow us to examine how reliability trends change with vari-ous characteristics.When we analyze an SSD characteristic (e.g.,the amount of data written to an SSD),we group SSDs into buckets based on their value for that characteristic in the snapshot and plot the failure rate for SSDs in each bucket. For example,if an SSD s is placed in a bucket for N TB of data written,it is not also placed in the bucket for(N−k)TB of data written(even though at some point in its life it did have only(N−k)TB of data written).When performing bucketing,we round the value of an SSD’s characteristic to the nearest bucket and we eliminate buckets that contain less than0.1%of the SSDs analyzed to have a statistically sig-niﬁcant sample of SSDs for our measurements.In order to express the conﬁdence of our observations across buckets that contain diﬀerent numbers of servers,we show the95th per-centile conﬁdence interval for all of our data(using a binomial distribution when considering failure rates).We measure SSD failure rate for each bucket in terms of the fraction of SSDs that have had an uncorrectable error compared to the total number of SSDs in that bucket.3.BASELINE STATISTICSWe willﬁrst focus on the overall error rate and error distri-bution among the SSDs in the platforms we analyze and then examine correlations between diﬀerent failure events.3.1Bit Error RateThe bit error rate(BER)of an SSD is the rate at which er-rors occur relative to the amount of information that is trans-mitted from/to the SSD.BER can be used to gauge the relia-bility of data transmission across a medium.We measure the uncorrectable bit error rate(UBER)from the SSD as:UBER=Uncorrectable errorsBits accessedForﬂash-based SSDs,UBER is an important reliability met-ric that is related to the SSD lifetime.SSDs with high UBERs are expected to have more failed cells and encounter more(se-vere)errors that may potentially go undetected and corruptdata than SSDs with low UBERs.Recent work by Grupp et al.examined the BER of raw MLCﬂash chips(without per-forming error correction in theﬂash controller)in a controlled environment[20].They found the raw BER to vary from 1×10−1for the least reliableﬂash chips down to1×10−8 for the most reliable,with most chips having a BER in the 1×10−6to1×10−8range.Their study did not analyze the eﬀects of the use of chips in SSDs under real workloads andsystem software.Table1shows the UBER of the platforms that we examine. Weﬁnd that forﬂash-based SSDs used in servers,the UBER ranges from2.6×10−9to5.1×10−11.While we expect that the UBER of the SSDs that we measure(which correct small errors,perform wear leveling,and are not at the end of their rated life but still being used in a production environment) should be less than the raw chip BER in Grupp et al.’s study (which did not correct any errors in theﬂash controller,exer-cisedﬂash chips until the end of their rated life,and accessed ﬂash chips in an adversarial manner),weﬁnd that in some cases the BERs were within around one order of magnitude of each other.For example,the UBER of Platform B in our study,2.6×10−9,comes close to the lower end of the raw chip BER range reported in prior work,1×10−8.Thus,we observe that inﬂash-based SSDs that employ error correction for small errors and wear leveling,the UBER ranges from around1/10to1/1000×the raw BER of similarﬂash chips examined in prior work[20].This is likely due to the fact that ourﬂash-based SSDs correct small errors,perform wear leveling,and are not at the end of their rated life.As a result,the error rate we see is smaller than the error rate the previous study observed.As shown by the SSD UBER,the eﬀects of uncorrectable errors are noticeable across the SSDs that we examine.We next turn to understanding how errors are distributed amonga population of SSDs and how failures occur within SSDs.3.2SSD Failure Rate and Error CountFigure2(top)shows the SSD incidence failure rate within each platform–the fraction of SSDs in each platform that have reported at least one uncorrectable error.Weﬁnd that SSD failures are relatively common events with between4.2% and34.1%of the SSDs in the platforms we examine reporting uncorrectable errors.Interestingly,the failure rate is much lower for Platforms C and E despite their comparable amounts of data written and read(cf.Table1).This suggests that there are diﬀerences in the failure process among platforms.We analyze which factors play a role in determining the occurrence of uncorrectable errors in Sections4and5.Figure2(bottom)shows the average yearly uncorrectable error rate among SSDs within the diﬀerent platforms–the sum of errors that occurred on all servers within a platform over12months ending in November2014divided by the num-ber of servers in the platform.The yearly rates of uncor-rectable errors on the SSDs we examined range from15,128 for Platform D to978,806for Platform B.The older Platforms A and B have a higher error rate than the younger Platforms C through F,suggesting that the incidence of uncorrectable errors increases as SSDs are utilized more.We will examine this relationship further in Section4.Platform B has a much higher average yearly rate of un-correctable errors(978,806)compared to the other platforms (the second highest amount,for Platform A,is106,678).We ﬁnd that this is due to a small number of SSDs having amuchA B C D E FSSDfailurerate..2.4.6.81.A B C D E FYearlyuncorrectableerrorsperSSDe+4e+58e+5Figure2:The failure rate(top)and average yearly uncorrectable errors per SSD(bottom),among SSDs within each platform.higher number of errors in that platform:Figure3quantiﬁes the distribution of errors among SSDs in each platform.The x axis is the normalized SSD number within the platform,with SSDs sorted based on the number of errors they have had over their lifetime.The y axis plots the number of errors for a given SSD in log scale.For every platform,we observe that the top 10%of SSDs with the most errors have over80%of all un-correctable errors seen for that platform.For Platforms B,C, E,and F,the distribution is even more skewed,with10%of SSDs with errors making up over95%of all observed uncor-rectable errors.We alsoﬁnd that the distribution of number of errors among SSDs in a platform is similar to that of a Weibull distribution with a shape parameter of0.3and scale parameter of5×103.The solid black line on Figure3plots this distribution.An explanation for the relatively large diﬀerences in errors per machine could be that error events are correlated.Exam-ining the data shows that this is indeed the case:during a recent two weeks,99.8%of the SSDs that had an error during theﬁrst week also had an error during the second week.We therefore conclude that an SSD that has had an error in the past is highly likely to continue to have errors in the future. With this in mind,we next turn to understanding the correla-tion between errors occurring on both devices in the two-SSD systems we examine.3.3Correlations Between Different SSDsGiven that some of the platforms we examine have twoﬂash-based SSDs,we are interested in understanding if the likeli-directly to ﬂash cells),cells are read,programmed,and erased,and unreliable cells are identiﬁed by the SSD controller,re-sulting in an initially high failure rate among devices.Figure 4pictorially illustrates the lifecycle failure pattern we observe,which is quantiﬁed by Figure 5.Figure 5plots how the failure rate of SSDs varies with theamount of data written to the ﬂash cells.We have groupedthe platforms based on the individual capacity of their SSDs.Notice that across most platforms,the failure rate is low whenlittle data is written to ﬂash cells.It then increases at ﬁrst(corresponding to the early detection period,region 1in theﬁgures),and decreases next (corresponding to the early failureperiod,region 2in the ﬁgures).Finally,the error rate gener-ally increases for the remainder of the SSD’s lifetime (corre-sponding to the useful life and wearout periods,region 3inthe ﬁgures).An obvious outlier for this trend is Platform C –in Section 5,we observe that some external characteristics ofthis platform lead to its atypical lifecycle failure pattern.Note that diﬀerent platforms are in diﬀerent stages in theirlifecycle depending on the amount of data written to ﬂashcells.For example,SSDs in Platforms D and F,which havethe least amount of data written to ﬂash cells on average,aremainly in the early detection or early failure periods.On theother hand,SSDs in older Platforms A and B,which havemore data written to ﬂash cells,span all stages of the lifecyclefailure pattern (depicted in Figure 4).For SSDs in PlatformA,we observe up to an 81.7%diﬀerence between the failurerates of SSDs in the early detection period and SSDs in thewearout period of the lifecycle.As explained before and as depicted in Figure 4,the lifecy-cle failure rates we observe with the amount of data writtento ﬂash cells does not follow the conventional bathtub curve.In particular,the new early detection period we observe acrossthe large number of devices leads us to investigate why this“early detection period”behavior exists.Recall that the earlydetection period refers to failure rate increasing early in life-time (i.e.,when a small amount of data is written to the SSD).After the early detection period,failure rate starts decreasing.We hypothesize that this non-monotonic error rate behav-ior during and after the early detection period can be ac-counted for by a two-pool model of ﬂash blocks:one poolof blocks,called the weaker pool ,consists of cells whose errorrate increases much faster than the other pool of blocks,called the stronger pool .The weaker pool quickly generates uncor-rectable errors (leading to increasing failure rates observed in the early detection period as these blocks keep failing).The cells comprising this pool ultimately fail and are taken out of use early by the SSD controller.As the blocks in the weaker pool are exhausted,the overall error rate starts decreasing (as we observe after the end of what we call the early detection period )and it continues to decrease until the more durable blocks in the stronger pool start to wear out due to typical use.We notice a general increase in the duration of the lifecycle periods (in terms of data written)for SSDs with larger capac-ities.For example,while the early detection period ends after around 3TB of data written for 720GB and 1.2TB SSDs (in Platforms A,B,C,and D),it ends after around 10TB of data written for 3.2TB SSDs (in Platforms E and F).Similarly,the early failure period ends after around 15TB of data written for 720GB SSDs (in Platforms A and B),28TB for 1.2TB SSDs (in Platforms C and D),and 75TB for 3.2TB SSDs (in Platforms E and F).This is likely due to the more ﬂexibility a higher-capacity SSD has in reducing wear across a larger number of ﬂash cells.4.2Data Read from Flash Cells Similar to data written ,our framework allows us to measure the amount of data directly read from ﬂash cells over each SSD’s lifetime.Prior works in controlled environments have shown that errors can be induced due to read-based access patterns [5,32,6,8].We are interested in understanding how prevalent this eﬀect is across the SSDs we examine.Figure 6plots how failure rate of SSDs varies with the amount of data read from ﬂash cells.For most platforms (i.e.,A,B,C,and F),the failure rate trends we see in Figure 6are similar to those we observed in Figure 5.We ﬁnd this similar-ity when the average amount of data written to ﬂash cells by SSDs in a platform is more than the average amount of data read from the ﬂash cells.Platforms A,B,C,and F show this behavior.22Though Platform B has a set of SSDs with a large amount of data read,the average amount of data read across all SSDs in this platform is less than the average amount of data written.0e+004e+138e+13Data written (B)0.000.501.00S S D f a i l u r e r a t e ●Platform B0.0e+00 1.0e+14Data written (B)0.000.501.00S S D f a i l u r e r a te ●Platform D0.0e+00 1.5e+14 3.0e+14Data written (B)0.000.501.00S S D f a i l u r e r a t e ●Platform E Platform F Figure 5:SSD failure rate vs.the amount of data written to ﬂash cells.SSDs go through several distinct phases throughout their life:increasing failure rates during early detection of less reliable cells (1),decreasing failure rates during early cell failure and subsequent removal (2),and eventually increasing error rates during cell wearout (3).0.0e+00 1.0e+14 2.0e+14Data read (B)0.000.501.00S S D f a i l u r e r a t e ●Platform B0.0e+00 1.0e+142.0e+14Data read (B)0.000.501.00S S D f a i l u r e r a te ●Platform C Platform D 0.0e+00 1.5e+14Data read (B)0.000.501.00S S D f a i l u r e r a t e ●Platform E Platform F Figure 6:SSD failure rate vs.the amount of data read from ﬂash cells.SSDs in Platform E,where over twice as much data is read from ﬂash cells as written to ﬂash cells,do not show failures dependent on the amount of data read from ﬂash cells.In Platform D,where more data is read from ﬂash cells thanwritten to ﬂash cells (30.6TB versus 18.9TB,on average),wenotice error trends associated with the early detection andearly failure periods.Unfortunately,even though devices inPlatform D may be prime candidates for observing the occur-rence of read disturbance errors (because more data is readfrom ﬂash cells than written to ﬂash cells),the eﬀects of earlydetection and early failure appear to dominate the types oferrors observed on this platform.In Platform E,however,over twice as much data is readfrom ﬂash cells as written to ﬂash cells (51.1TB versus 23.9TBon average).In addition,unlike Platform D,devices in Plat-form E display a diverse amount of utilization.Under theseconditions,we do not observe a statistically signiﬁcant diﬀer-ence in the failure rate between SSDs that have read the mostamount of data versus those that have read the least amountof data.This suggests that the eﬀect of read-induced errors inthe SSDs we examine is not predominant compared to othereﬀects such as write-induced errors.This corroborates priorﬂash cell analysis that showed that most errors occur due toretention eﬀects and not read disturbance [32,11,12].4.3Block ErasesBefore new data can be written to a page in ﬂash mem-ory,the entire block needs to be ﬁrst erased.3Each erasewears out the block as shown in previous works [32,11,12].Our infrastructure tracks the average number of blocks erasedwhen an SSD controller performs garbage collection .Garbagecollection occurs in the background and erases blocks to en-sure that there is enough available space to write new data.The basic steps involved in garbage collection are:(1)select-ing blocks to erase (e.g.,blocks with many unused pages),(2)copying pages from the selected block to a free area,and (3)erasing the selected block [29].While we do not have statisticson how frequently garbage collection is performed,we exam-ine the number of erases per garbage collection period metricas an indicator of the average amount of erasing that occurswithin each SSD.We examined the relationship between the number of erasesperformed versus the failure rate (not plotted).Across the3Flash pages (around 8KB in size)compose each ﬂash block.A ﬂash block is around 128×8KB ﬂash pages.SSDs we examine,we ﬁnd that the failure trends in terms of erases are correlated with the trends in terms of data written (shown in Figure 5).This behavior reﬂects the typical opera-tion of garbage collection in SSD controllers:As more data is written,wear must be leveled across the ﬂash cells within an SSD,requiring more blocks to be erased and more pages to be copied to diﬀerent locations (an eﬀect we analyze next).4.4Page Copies As data is written to ﬂash-based SSDs,pages are copied to free areas during garbage collection in order to free up blocks with unused data to be erased and to more evenly level wear across the ﬂash chips.Our infrastructure measures the num-ber of pages copied across an SSD’s lifetime.We examined the relationship between the number of pages copied versus SSD failure rate (not plotted).Since the page copying process is used to free up space to write data and also to balance the wear due to writes,its operation is largely dic-tated by the amount of data written to an SSD.Accordingly,we observe similar SSD failure rate trends with the number ofcopied pages (not plotted)as we observed with the amount ofdata written to ﬂash cells (shown in Figure 5).4.5Discarded Blocks Blocks are discarded by the SSD controller when they are deemed unreliable for further use.Discarding blocks aﬀects the usable lifetime of a ﬂash-based SSD by reducing the amount of over-provisioned capacity of the SSD.At the same time,dis-carding blocks has the potential to reduce the amount of errors generated by an SSD,by preventing unreliable cells from being accessed.Understanding the reliability eﬀects of discarding blocks is important,as this process is the main defense that SSD controllers have against failed cells and is also a major impediment for SSD lifetime.Figure 7plots the SSD failure rate versus number of dis-carded blocks over the SSD’s lifetime.For SSDs in most plat-forms,we observe an initially decreasing trend in SSD failure rate with respect to the number of blocks discarded followed by an increasing and then ﬁnally decreasing failure rate trend.We attribute the initially high SSD failure rate when few blocks have been discarded to the two-pool model of ﬂash cell failure discussed in Section 4.1.In this case,the weaker pool of ﬂash blocks (corresponding to weak blocks that fail。

国内外各领域顶级学术会议大全

文案大全
实用标准文档
序号 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
刊物名称(以期刊名称的拼音为序) 总被引频次
电子学报（英文版、中文版）
中文版 1676
高技术通讯（英文版、中文版）计算机辅助设计与图形学学报计算机工程计算机集成制造系统 J COMPUT SCI & TECH 计算机学报计算机研究与发展计算数学（英）科学通报（英文版、中文版）模式识别与人工智能软件学报系统仿真学报系统工程理论与实践
数学类第 4 名综合类第 1 名
计算机类第 10 名计算机类第 2 名电子类第 7 名信息类第 7 名信息类第 3 名计算机类第 16 名
综合类第 10 名
计算机类第 7 名计算机类第 6 名
综合类第 11 名
计算机类第 9 名计算机类第 18 名计算机类第 20 名计算机类第 21 名
28 IEEE/ACM Design Automation Conference
国际计算语言学会年会，是本领域最权威的国际学术会议之一，每年举办一次计算语言学会议，也是本领域最权威的国际学术会议之一，两年一次是语音和声学信号处理领域最权威的国际学术会议之一，也是图像、视频信号处理领域的权威会议之一，每年举办一次自然语言处理亚洲联盟主办的国际会议，是自然语言处理领域亚洲区域最有影响的学术会议，基本是每年举办一次
网络测量领域顶级的专业会议
网络测量
下一代互联网研究 rank1
中心
12 ICCV: IEEE International Conference on
Computer Vision
领域顶级国际会议，录取率 20%左右，2 年一次，中国计算机视觉，模式

国内外各领域顶级学术会议大全

ACM的旗舰会议之一，也是网络领域顶级学术会议，内容侧重于有线网络，每年举办一次，录用率约为10%左右。
网络通信领域
下一代互联网研究中心
rank1
2
IEEE INFOCOM: The Conference on Computer Communications
IEEE计算机和通信分会联合年会，由IEEE计算机通信技术委员会和IEEE通信协会联合举办，是信息通信领域规模最大的顶尖国际学术会议，录用率约为16%左右。这个每年一度的会议的主要议题是计算机通信，重点是流量管理和协议。
16
软件学报
1598
0．919
计算机类第2名
17
通信学报
581
0．343
电子类第7名
18
系统仿真学报
867
0．415
信息类第7名
19
系统工程理论与实践
1372
0．533
信息类第3名
20
小型微型计算机系统
746
0．275
计算机类第16名
21
中国科学
E辑 403
E辑 0.444
综合类第10名
22
中国图象图形学报
ECCV: European Conference on Computer Vision
领域顶级国际会议，录取率25%左右，2年一次，中国大陆每年论文数不超过20篇
模式识别，计算机视觉，多媒体计算
数字媒体中心
rank1
15
DCC: Data Compression Conference
领域顶级国际会议，录取率很低，每年一次，目前完全国内论文极少
附4：计算所的限定供博士生选择的相关刊物
序号

2015CCF推荐期刊及会议

中国计算机学会推荐国际学术会议和期刊目录（2015年）中国计算机学会中国计算机学会推荐国际学术期刊（计算机体系结构/并行与分布计算/存储系统）一、A类二、B类三、C类中国计算机学会推荐国际学术会议计算机体系结构/并行与分布计算/存储系统一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（计算机网络）一、A类二、B类三、C类中国计算机学会推荐国际学术会议（计算机网络）一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（网络与信息安全）一、A类二、B类三、C类中国计算机学会推荐国际学术会议（网络与信息安全）一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（软件工程/系统软件/程序设计语言）一、A类二、B类三、C类中国计算机学会推荐国际学术会议（软件工程/系统软件/程序设计语言）一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(数据库/数据挖掘/内容检索)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(数据库／数据挖掘／内容检索)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（计算机科学理论）一、A类二、B类三、C类中国计算机学会推荐国际学术会议（计算机科学理论）一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（计算机图形学与多媒体）一、A类二、B类三、C类中国计算机学会推荐国际学术会议（计算机图形学与多媒体）一、A类二、B类三、C类中国计算机学会推荐国际学术期刊（人工智能）一、A类。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

An Online Learning Approach to Improving the Quality ofCrowd-SourcingY ang Liu,Mingyan LiuDepartment of Electrical Engineering and Computer ScienceUniversity of Michigan,Ann ArborEmail:{youngliu,mingyan}@ABSTRACTWe consider a crowd-sourcing problem where in the process of la-beling massive datasets,multiple labelers with unknown annota-tion quality must be selected to perform the labeling task for each incoming data sample or task,with the results aggregated using for example simple or weighted majority voting rule.In this pa-per we approach this labeler selection problem in an online learn-ing framework,whereby the quality of the labeling outcome by a speciﬁc set of labelers is estimated so that the learning algorithm over time learns to use the most effective combinations of labelers. This type of online learning in some sense falls under the family of multi-armed bandit(MAB)problems,but with a distinct feature not commonly seen:since the data is unlabeled to begin with and the labelers’quality is unknown,their labeling outcome(or reward in the MAB context)cannot be directly veriﬁed;it can only be es-timated against the crowd and known probabilistically.We design an efﬁcient online algorithm LS_OL using a simple majority vot-ing rule that can differentiate high-and low-quality labelers over time,and is shown to have a regret(w.r.t.always using the optimal set of labelers)of O(log2T)uniformly in time under mild assump-tions on the collective quality of the crowd,thus regret free in the average sense.We discuss performance improvement by using a more sophisticated majority voting rule,and show how to detec-t andﬁlter out“bad”(dishonest,malicious or very incompetent) labelers to further enhance the quality of crowd-sourcing.Exten-sion to the case when a labeler’s quality is task-type dependent is also discussed using techniques from the literature on continuous arms.We present numerical results using both simulation and a re-al dataset on a set of images labeled by Amazon Mechanic Turks (AMT).Categories and Subject DescriptorsF.1.2[Modes of Computation]:Online computation;I.2.6[Artiﬁcial Intelligence]:Learning;H.2.8[Database Applications]:Data min-ingPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full cita-tion on theﬁrst page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or re-publish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.Request permissions from permissions@.SIGMETRICS’15,June15–19,2015,Portland,OR,USA.Copyright c 2015ACM978-1-4503-3486-0/15/06...$15.00./10.1145/2745844.2745874.General TermsAlgorithm,Theory,ExperimentationKeywordsCrowd-sourcing,online learning,quality control1.INTRODUCTIONMachine learning techniques often rely on correctly labeled data for purposes such as building classiﬁers;this is particularly true for supervised discriminative learning.As shown in[19,22],the quali-ty of labels can signiﬁcantly impact the quality of the trained classi-ﬁer and in turn the system performance.Semi-supervised learning methods,e.g.[5,15,25]have been proposed to circumvent the need for labeled data or lower the requirement on the size of labeled data; nonetheless,many state-of-the-art machine learning systems such as those used for pattern recognition continue to rely heavily on su-pervised learning,which necessitates cleanly labeled data.At the same time,advances in instrumentation and miniaturization,com-bined with frameworks like participatory sensing,rush in enormous quantities of unlabeled data.Against this backdrop,crowd-sourcing has emerged as a viable and often favored solution as evidenced by the popularity of the A-mazon Mechanical Turk(AMT)system.Prime examples include a number of recent efforts on collecting large-scale labeled image datasets,such as ImageNet[7]and LabelMe[21].The concept of crowd-sourcing has also been studied in contexts other than pro-cessing large amounts of unlabeled data,see e.g.,user-generated map[10],opinion/information diffusion[9],and event monitor-ing[6]in large,decentralized systems.Its many advantages notwithstanding,the biggest problem with crowd-sourcing is quality control:as shown in several previous s-tudies[12,22],if labelers(e.g.,AMTs)are not selected careful-ly the resulting labels can be very noisy,due to reasons such as varying degrees of competence,individual biases,and sometimes irresponsible behavior.At the same time,the cost in having a large amount of data labeled(payment to the labelers)is non-trivial.This makes it important to look into ways of improving the quality of the crowd-sourcing process and the quality of the results generated by the labelers.In this paper we approach the labeler selection problem in an online learning framework,whereby the labeling quality of the la-belers is estimated as tasks are assigned and performed,so that an algorithm over time learns to use the more effective combinations of labelers for arriving tasks.This problem in some sense can be cast as a multi-armed bandit(MAB)problem,see e.g.,[4,16,23]. Within such a framework,the objective is to select the best of a set of choices(or“arms”)by repeatedly sampling different choices(referred to as exploration),and their empirical quality is subse-quently used to control how often a choice is used(referred to as exploitation).However,there are two distinct features that set our problem apart from the existing literature in bandit problems.First-ly,since the data is unlabeled to begin with and the labelers’quality is unknown,a particular choice of labelers leads to unknown quali-ty of their labeling outcome(mapped to the“reward”of selecting a choice in the MAB context).Whereas this reward is assumed to be known instantaneously following a selection in the MAB problem, in our model this remains unknown and at best can only be esti-mated with a certain error probability.This poses signiﬁcant tech-nical challenge compared to a standard MAB problem.Secondly, to avoid having to deal with a combinatorial number of arms,it is desirable to learn and estimate each individual labeler’s quality separately(as opposed to estimating the quality of different combi-nations of labelers).The optimal selection of labelers then depends on individual qualities as well as how the labeling outcome is com-puted using individual labels.In this study we will consider both a simple majority voting rule as well as a weighted majority voting rule and derive the respective optimal selection of labelers given their estimated quality.Due to its online nature,our algorithm can be used in real time, processing tasks as they arrive.Our algorithm thus has the advan-tage of performing quality assessment and adapting to better labeler selections as tasks arrive.This is a desirable feature because gen-erating and processing large datasets can incur signiﬁcant cost and delay,so the ability to improve labeler selection on theﬂy(rather than waiting till the end)can result in substantial cost savings and improvement in processing quality.Below we review the literature most relevant to the study presented in this paper in addition to the MAB literature cited above.Within the context of learning and differentiating labelers’ex-pertise in crowd-sourcing systems,a number of studies have looked into ofﬂine algorithms.For instance,in[8],methods are proposed to eliminate irrelevant users from a set of user-generated dataset;in this case the elimination is done as post-processing to clean up the data since the data has already been labeled by the labelers(tasks have been performed).In[13]an iterative algorithm is proposed that infers labeling quality using a process similar to belief propa-gation and it is shown that label aggregation based on this method outperforms simple majority voting.Another example is the fami-ly of matrix factorization or matrix completion based methods,see e.g.,[24],where labeler selection is implicitly done through the numerical process ofﬁnding the best recommendation for a partic-ipant.Again this is done after the labeling has already been done for all data by all(or almost all)labelers.This type of approaches is more appropriate when used in a recommendation system where data and user-generated labels already exist in large quantities. Recent study[14]has examined the fundamental trade-off be-tween labeling accuracy and redundancy in task assignment in crowd-sourcing systems.In particular,it is shown that a labeling accuracy of1−εfor each task can be achieved with a per-task assignment re-dundancy no more than O(K/q·log(K/ε));thus more redundancy can be traded for more accurate outcome.In[14]the task assign-ment is done in a one-shot fashion(thus non-adaptive)rather than sequentially with each task arrival as considered in our paper,thus the result is more applicable to ofﬂine settings similar to those cited in the previous paragraph.Within online solutions,the concept of active learning has been quite intensively studied,where the labelers are guided to make the labeling process more efﬁcient.Examples include[12],which uses a Bayesian framework to actively assign unlabeled data based on past observations on labeling outcomes,and[18],which uses a probabilistic model to estimate the labelers’expertise.However, most studies on active learning require either an oracle to verify the correctness of theﬁnished tasks which in practice does not exist,or ground-truth feedback from indirect but relevant experiments(see e.g.,[12]).Similarly,existing work on using online learning for task assignment also typically assumes the availability of ground truth(as in MAB problems).For instance,in[11]online learning is applied to sequential task assignment but ground-truth of the task performance is used to estimate the performer’s quality.Our work differs from the above as we do not require an ora-cle or the availability of ground-truth;we instead impose a mild assumption on the collective quality of the crowd(without which crowd-sourcing would be useless and would not have existed).Sec-ondly,our framework allows us to obtain performance bounds on the proposed algorithm in the form of regret with respect to the op-timal strategy that always uses the best set of labelers;this type of performance guarantee is lacking in most of the work cited above. Last but not least,our algorithm is broadly applicable to gener-ic crowd-sourcing task assignments rather than being designed for speciﬁc type of tasks or data.Our main contributions are summarized as follows.1.We design an online learning algorithm to estimate the qual-ity of labelers in a crowd-sourcing setting without ground-truth information but with mild assumptions on the quality of the crowd as a whole,and show that it is able to learn the optimal set of labelers under both simple and weighted ma-jority voting rules and attains no-regret performance guaran-tees(w.r.t.always using the optimal set of labelers).2.We similarly provide regret bounds on the cost of this learn-ing algorithm w.r.t.always using the optimal set of labelers.3.We show how our model and results can be extended to thecase where the quality of a labeler may be task-type depen-dent,as well as a simple procedure to quickly detect andﬁl-ter out“bad”(dishonest,malicious or incompetent)labelers to further enhance the quality of crowd-sourcing.4.Our validation includes both simulation and the use of a real-world AMT dataset.The remainder of the paper is organized as follows.We formu-late our problem in Section2.In Sections3and4we introduce our learning algorithm along with regret analysis under a simple major-ity and weighted majority voting rule,respectively.We extend our model to the case where labelers’expertise may be task-dependent in Section5.Numerical experiments are presented in Section6. Section7concludes the paper.2.PROBLEM FORMULATION AND PRE-LIMINARIES2.1The crowd-sourcing modelWe begin by introducing the following major components of the crowd-sourcing system we consider.er.There is a single user with a sequence of tasks(un-labeled data)to be performed/labeled.Our proposed online learning algorithm is to be employed by the user in making labeler selections.Throughout our discussion the terms task and unlabeled data will be used interchangeably.beler.There are a total of M labelers,each may be select-ed to perform a labeling task for a piece of unlabeled data.The set of labelers is denoted by M ={1,2,...,M }.A la-beler i produces the true label for each assigned task with probability p i independent of the task;a more sophisticated task-dependent version is discussed in Section 5.This will also be referred to as the quality or accuracy of this label-er.We will assume no two labelers are exactly the same,i.e.,p i =p j ,∀i =j and we consider non-trivial cases with 0<p i <1,∀i .These quantities are unknown to the user a pri-ori.We will also assume that the accuracy of the collection oflabelers satisﬁes ¯p :=∑M i =1p i M >12,and that M >log22(¯p −1/2)2.The justiﬁcation and implication of these assumptions are discussed in more detail in Section 2.3.Our learning system works in discrete time steps t =1,2,...,T .At time t ,a task k ∈K arrives to be labeled,where K could be either a ﬁnite or inﬁnite set.For simplicity of presentation,we will assume that a single task arrives at each time,and that the labeling outcome is binary:1or 0;however,both assumptions can be fairly easily relaxed 1.For task k ,the user selects a subset S t ⊆M to label it.The label generated by labeler i ∈S t for data k at time t is denoted by L i (t ).The set of labels {L i (t )}i ∈S t generated by the selected labelers then need to be combined to produce a single label for the data;this is often referred to as the information aggregation phase.Since we have no prior knowledge on the labelers’accuracy,we will ﬁrst apply the simple majority voting rule over the set of labels;later we will also examine a more sophisticated weighted majority voting rule.Mathematically,the majority voting rule at time t leads to the following label output:L ∗(t )=argmax l ∈{0,1}∑i ∈S tI L i(t )=l ,(1)with ties (i.e.,∑i ∈S t I L i (t )=0=∑i ∈S t I L i (t )=1,where I denotes the indicator function)broken randomly.Denote by π(S t )the probability of obtaining the correct label following the simple majority rule above,and we have:π(S t )=∑S :S ⊆S t ,|S |≥⌈|S t |+12⌉∏i ∈S p i ·∏j ∈S t \S(1−p j )Majority wins+∑S :S ⊆S t ,|S |=|S t |2∏i ∈S p i ·∏j ∈S t \S (1−p j )2Ties broken equally likely.(2)Denote by c i a normalized cost/payment per task to labeler i andconsider the following linear cost functionC (S )=∑i ∈Sc i ,S ⊆M .(3)Extensions of our analysis to other forms of cost functions are fea-sible though with more cumbersome notations.DenoteS ∗=argmax S ⊆M π(S ),(4)thus S ∗is the optimal selection of labelers given each individual’saccuracy.We also refer to π(S )as the utility for selecting the set of labelers S and denote it equivalently as U (S ).C (S ∗)will be referred to as the necessary cost per task.In most crowd-sourcing systems the main goal is to obtain high quality labels while the cost accrued is a secondary issue.For completeness,however,we will also analyze the tradeoff between the two.Therefore we shall 1Indeedin our experiment shown later in Section 6,our algorithmis applied to a non-binary multi-label case.adopt two objectives when designing an efﬁcient online algorithm:choosing the best set of labelers while keeping the cost low.It should be noted that one can also combine labeling accuracy and cost to form a single objective function,such as π(S )−C (S ).The resulting analysis is quite similar to and can be easily repro-duced from that presented in this paper .2.2Off-line optimal selection of labelersBefore addressing the learning problem,we will ﬁrst take a look at how to efﬁciently derive the optimal selection S ∗given accuracy probabilities {p i }i ∈M .This will be a crucial step repeatedly in-voked by the learning procedure we develop next,to determine the set of labelers to use given a set of estimated accuracy probabilities.The optimal selection is a function of the values {p i }i ∈M and the aggregation rule used to compute the ﬁnal label.While there is a combinatorial number of possible selections,the next two results combined lead to a very simple and linear-complexity procedure in ﬁnding the optimal S ∗.T HEOREM 1.Under the simple majority vote rule,the optimal number of labelers s ∗=|S ∗|must be an odd number.T HEOREM 2.The optimal set S ∗is monotonic,i.e.,if we have i ∈S ∗and j ∈S ∗then we must have p i >p j .Proofs of the above two theorems can be found in the ing these results,given a set of accuracy probabilities the optimal selection under the majority vote rule consists of the top s ∗(an odd number)labelers with the highest quality;thus we only need to compute s ∗,which has a linear complexity of O (M /2).A set that consists of the highest m labelers will henceforth be referred to as a m-monotonic set ,and denoted as S m ⊆M .2.3The lack of ground truthAs mentioned,a key difference between our model and many other studies on crowd-sourcing as well as the basic framework of MAB problems is that we lack ground-truth in our system;we e-laborate on this below.In a standard MAB setting,when a player (the user in our scenario)selects a set of arms (labelers)to activate,she immediately ﬁnds out the rewards associated with those select-ed arms.This information allows the player to collect statistics on each arm (e.g.,sample mean rewards)which is then used in her future selection decisions.In our scenario,the user sees the labels generated by each selected labeler,but does not know which ones are true.In this sense the user does not ﬁnd out about her reward immediately after a decision;she can only do so probabilistically over a period of time through additional estimation devices.This constitutes the main conceptual and technical difference between our problem and the standard MAB problem.Given the lack of ground-truth,the crowd-sourcing system is on-ly useful if the average labeler is more or less trustworthy.For instance,if a majority of the labelers produce the wrong label most of the time,unbeknownst to the user,then the system is effective-ly useless,i.e.,the user has no way to tell whether she could trust the outcome so she might as well abandon the system.It is there-fore reasonable to have some trustworthiness assumption in place.Accordingly,we have assumed that ¯p =∑M i =1p iM >0.5,i.e.,the av-erage labeling quality is higher than 0.5or a random guess;this is a common assumption in the crowd-sourcing literature (see e.g.,[8]).Note that this is a fairly mild assumption:not all labelers need to have accuracy p i >0.5or near 0.5;some labeler may have arbitrar-ily low quality (∼0)as long as it is in the minority.Denote by X i a binomial random variable with parameter p i to model labeler i ’s outcome on a given task:X i =1if her label iscorrect ing Chernoff Hoeffding’s inequality we haveP(∑M i=1X iM>1/2)=1−P(∑M i=1X iM≤1/2)=1−P(∑M i=1X iM−¯p≤1/2−¯p)≥1−e−2M·(¯p−1/2)2.Deﬁne a min:=P(∑M i=1X iM >1/2);this is the probability that a simplemajority vote over the M labelers is correct.Therefore,if¯p>1/2and further M>log22(¯p−1/2)2,then1−e−2M·(¯p−1/2)2>1/2,meaninga simple majority vote would be correct most of the time.Through-out the paper we will assume both these conditions are true.It also follows that we have:P(∑i∈S∗X is∗>1/2)≥P(∑M i=1X iM>1/2),where the inequality is due to the deﬁnition of the optimal set S∗.3.LEARNING THE OPTIMAL LABELERSELECTIONIn this section we present an online learning algorithm LS_OL that over time learns each labeler’s accuracy,which it then uses to compute an estimated optimal set of labelers using the properties given in the previous section.3.1An online learning algorithm LS_OLThe algorithm consists of two types of time steps,exploration and exploitation,as is common to online learning.However,the de-sign of the exploration step is complicated by the additional estima-tion needs due to the lack of ground-truth revelation.Speciﬁcally, a set of tasks will be designated as“testers”and may be repeated-ly assigned to the same labeler in order to obtain consistent results used for estimating her label quality.This can be done in one of t-wo ways depending on the nature of the tasks.For tasks like survey questions(e.g.those with multiple choices),a labeler may indeed be prompted to answer the same question(or equivalent variants with alternative wording)multiple times,usually not in succession, during the survey process.This is a common technique used by survey designers for quality control by testing whether a participant answers questions randomly or consistently,whether a participant is losing attention over time,and so on,see e.g.,[20].For tasks like labeling images,a labeler may be given identical images repeatedly or each time with small,added iid noise.With the above in mind,the algorithm conceptually proceeds as follows.A condition is checked to determine whether the algorithm should explore or exploit in a given time step.If it is to exploit, then the algorithm selects the best set of labelers based on current quality estimates to label the arriving task.If it is to explore,then the algorithm will either assign an old task(an existing tester)or the new arriving task(which then becomes a tester)to the entire set of labelers M depending on whether all existing testers have been labeled enough number of times.Because of the need to repeatedly assign an old task,some new tasks will not be immediately assigned (those arriving during an exploration step while an old task remains under-labeled).These tasks will simply be given a random label (e.g.,with error probability1/2)but their numbers are limited by the frequency of an exploration step(∼log2T),as we shall detail later.Before proceeding to a more precise description of the algorithm, a few additional notions are in order.Denote the n-th label outcome (via majority vote over M labelers in exploration)for task k by y k(n).Denote by y∗k(N)the label obtained using majority rule over the N label outcomes y k(1),y k(2),···,y k(N):y∗k(N)= 1,∑N n=1I y k(n)=1N>0.50,otherwise,(5)with ties broken randomly.It is this majority label after N tests on a tester task k that will be used to analyze different labeler’s perfor-mance.A tester task is always assigned to all labelers for labeling. Following our earlier assumption that the repeated assignments of the same task use identical and independent variants of the task,we will also take the repeated outcomes y k(1),y k(2),···,y k(N)to be independent and statistically identical.Denote by E(t)the set of tasks assigned to the M labelers dur-ing exploration steps up to time t.For each task k∈E(t)denote byˆN k(t)the number of times k has been assigned.Consider the following random variable deﬁned at each time t:O(t)=I|E(t)|≤D1(t)or∃k∈E(t)s.t.ˆN k(t)≤D2(t),whereD1(t)=1(1max m:m odd m·n(S m)−α)2·ε2·log t,D2(t)=1(a min−0.5)2·log t,and n(S m)is the number of all possible majority subsets(for exam-ple when|S m|=5,n(S m)is the number of all possible subset of size being at least3)of S m,εa bounded constant,andαa positive con-stant such thatα<1max m:m odd m·n(S m).Note that O(t)captures the event when either an insufﬁcient number of tester tasks have been assigned under exploration or some tester task has been assigned insufﬁcient number of times in exploration.Our online algorithm for labeler selection is formally stated in Figure1.The LS_OL algorithm can either go on indeﬁnitely or terminate at some time T.As we show below the performance bound on this algorithm holds uniformly in time so it does not mat-ter when it terminates.3.2Main resultsThe standard metric for evaluating an online algorithm in the MAB literature is regret,the difference between the performance of an algorithm and that of a reference algorithm which often as-sumes foresight or hindsight.The most commonly used is the weak regret measure with respect to the best single-action policy assum-ing a priori knowledge of the underlying statistics.In our problem context,this means to compare our algorithm to the one that always uses the optimal selection S∗.It follows that this weak regret,up to time T,is given byR(T)=T·U(S∗)−T∑t=1U(S t),R C(T)=T∑t=1C(S t)−T·C(S∗),where S t is the selection made at time t by our algorithm;if t hap-pens to be an exploration step then S t=M and U(S t)is either1/2 due to random guess of an arriving task that is not labeled,or a min when a tester task is labeled for theﬁrst time.R(T)and R C(T) respectively capture the regret of the learning algorithm in perfor-Online Labeler Selection:LS_OL1:Initialization at t =0:Initialize the estimated accuracy {˜p i }i ∈M to some value in [0,1];denote the initialization taskas k ,set E (t )={k }and ˆN k (t )=1.2:At time t a new task arrives:If O (t )=1,the algorithm ex-plores.2.1:If there is no task k ∈E (t )such that ˆNk (t )≤D 2(t ),then assign the new task to M and update E (t )to in-clude it and denote it by k ;if there is such a task,randomly select one of them,denoted by k ,to M .ˆNk (t ):=ˆN k (t )+1;obtain the label y k (ˆN k (t ));2.2:Update y ∗k (ˆN k (t ))(using the alternate indicator functionnotation I (·)):y ∗k (ˆN k (t ))=I (∑ˆNk(t )ˆt =1y k(ˆt )ˆNk (t )>0.5).2.3:Update labelers’accuracy estimate ∀i ∈M :˜p i =∑k ∈E (t ),k arrives at time ˆt I (L i (ˆt )=y ∗k (ˆN k (t )))|E (t )|.3:Else if O (t )=0,the algorithm exploits and computes:S t =argmax m ˜U(S m )=argmax S ⊆M ˜π(S ),which is solved using the linear search property,but with thecurrent estimates {˜p i }rather than the true quantities {p i },resulting in estimated utility ˜U()and ˜π().Assign the new task to those in S t .4:Set t =t +1and go to Step 2.Figure 1:Description of LS_OLmance and in cost.Deﬁne:∆max =max S =S∗U (S ∗)−U (S ),δmax =max i =j|p i −p j |,∆min =min S =S∗U (S ∗)−U (S ),δmin =min i =j|p i −p j |.εis a constant such that ε<min {∆min2,δmin2}.For analysis we as-sume U (S i )=U (S j )if i =j 2.Deﬁne the sequence {βn }:βn =∑∞t =11t n .Our main theorem is stated as follows.T HEOREM 3.The regret can be bounded uniformly in time:R (T )≤U (S ∗)(1maxm :m oddm ·n (S m )−α)2·ε2·(a min −0.5)2·log 2(T )+∆max (2M∑m =1m oddm ·n (S m )+M )·(2β2+1α·εβ2−z ),(6)2Thiscan be precisely established when p i =p j ,∀i =j .R C (T )≤∑i /∈S ∗c i(1maxm :m oddm ·n (S m )−α)2·ε2·log T+∑i ∈M c i(1maxm :m oddm ·n (S m )−α)2·ε2·(a min −0.5)2·log 2(T )+(M −|S ∗|)·(2M∑m =1m oddm ·n (S m )+M )·(2β2+1α·εβ2−z ),(7)where 0<z <1is a positive constant.First note that the regret is nearly logarithmic in T and therefore it has zero average regret as T →∞;such an algorithm is often re-ferred to as a zero-regret algorithm.Secondly the regret bound is inversely related to the minimum accuracy of the crowd (through a min ).This is to be expected:with higher accuracy (a larger a min )of the crowd,crowd-sourcing generates ground-truth outputs with higher probability and thus the learning process could be acceler-ated.Finally,the bound also depends on max m m ·n (S m )which isroughly on the order of O (2m√m √2π).3.3Regret analysis of LS_OLWe now outline key steps in the proof of the above theorem.This involves a sequence of lemmas;the proofs of most can be found in the appendices.There are a few that we omit for brevity;in those cases sketches are provided.Step 1:We begin by noting that the regret consists of that arising from the exploration phase and from the exploitation phase,denot-ed by R e (T )and R x (T ),respectively:R (T )=E [R e (T )]+E [R x (T )].The following result bounds the ﬁrst element of the regret.L EMMA 1.The regret up to time T from the exploration phase can be bounded as follows:E [R e (T )]≤U (S ∗)·(D 1(T )·D 2(T )).(8)We see the regret depends on the exploration parameters as product.This is because for tasks arriving in exploration steps,we assign it at least D 2(T )times to the labelers;each time when re-assignment occurs,a new arriving task is given a random label while under an optimal scheme each missed new task means a utility of U (S ∗).Step 2:We now bound the regret arising from the exploitation phase as a function of the number of times the algorithm uses a sub-optimal selection when the ordering of the labelers is correct,and the number of times the estimates of the labelers’accuracy re-sult in a wrong ordering.The proof of the lemma below is omitted as it is fairly straightforward.L EMMA 2.For the regret from exploitation we have:E [R x (T )]≤∆max E [T ∑t =1(E 1(t )+E 2(t ))].(9)Here E 1(t )=I S t =S ∗,conditioned on correct ordering of labelers,counts the event when a sub-optimal section (other than S ∗)was used at time t based on the current estimates {˜p i }.E 2(t )counts the event when at time t the set M is sorted in the wrong order because of erroneous estimates {˜p i }.Step 3:We proceed to bound the two terms in (9)separately.In this part of the analysis we only consider those times t when the algorithm exploits.。

Sigmetrics15

合集下载

清华会议分级

中国计算机学会推荐国际学术会议和期刊目录AB

上海交大计算机系推荐投稿杂志及会议

IEEE会议排名

Mytrix GMS-15 RGB 游戏鼠标垫大辣椒粉说明书

固态硬盘失效率报告

国内外各领域顶级学术会议大全

最新无线网络领域顶级国际期刊、会议档次划分和介绍

国内外各领域顶级学术会议大全

2015CCF推荐期刊及会议

文档推荐

最新文档