Sigmetrics15
- 格式:pdf
- 大小:306.43 KB
- 文档页数:14
Rank 1:SIGCOMM:ACM Conf on Comm Architectures,Protocols &Apps INFOCOM: Annual Joint Conf IEEE Comp &Comm SocSPAA: Symp on Parallel Algms and ArchitecturePODC:ACM Symp on Principles of Distributed ComputingPPoPP:Principles and Practice of Parallel ProgrammingRTSS:Real Time Systems SympSOSP: ACM SIGOPS Symp on OS PrinciplesSOSDI: Usenix Symp on OS Design and ImplementationCCS: ACM Conf on Comp and Communications SecurityIEEE Symposium on Security and PrivacyMOBICOM: ACM Intl Conf on Mobile Computing and NetworkingUSENIX Conf on Internet Tech and SysICNP:Intl Conf on Network ProtocolsPACT:Intl Conf on Parallel Arch and Compil TechRTAS: IEEE Real-Time and Embedded Technology and Applications Symposium ICDCS:IEEE Intl Conf on Distributed Comp SystemsRank 2:CC: Compiler ConstructionIPDPS: Intl Parallel and Dist Processing SympIC3N: Intl Conf on Comp Comm and NetworksICPP: Intl Conf on Parallel ProcessingSRDS: Symp on Reliable Distributed SystemsMPPOI:Massively Par Proc Using Opt InterconnsASAP: Intl Conf on Apps for Specific Array ProcessorsEuro—Par:European Conf。
Manuals+— User Manuals Simplified.Mytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink User GuideHome » mytrix » Mytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink User GuideContents1 Mytrix GMS-15 RGB Gaming Mousepad Large CherryPink2 INTERFACE INSTRUCTIONS3 RGB SETTINGS4 TECHNICAL SPECIFICATION5 Documents / Resources6 Related PostsMytrix GMS-15 RGB Gaming Mousepad Large Cherry PinkThank you for buying Mytrix RGB mouse pad GMS-15. It supports a plug-andplay USB interface conveniently and there are a total of 14 light modes to meet your preference.INTERFACE INSTRUCTIONS1. RGB Light Mode Selecting Button2. Breathable Waterproof Micro-Texture Cloth Surface3. Non-slip Rubber Base4. RGB Light EdgeCONNECTIONPlug the detachable Type-C to USB-A cable into the mouse pad and turn on the SWITCH button.RGB SETTINGSFunctions:Select the Light Mode:1. RGB Light Mode Selecting Button2. Breathable Waterproof Micro-Texture Cloth Surface3. Non-slip Rubber Base4. RGB Light EdgePlug the detachable Type-C to USB-A cable into the mouse pad and turn on the SWITCH button. Press the SWITCH button to switch from 14 light modes in sequenceAdjust the Brightness:Double press the fingerprint icon to switch between the high and low brightness of the current light modeACCESSORY LISTThe product comes with a 1-year limited warranty. Our goal is to achieve total customer satisfaction, we will do our best to understand our customer’s requirements and always meet the requirements. If you have any questions, please do not hesitate to contact us via *****************. We will be more than happy to assist you.Company: Mytrix Technology LLCCustomer Service: +1-978-496-8821Email: *****************Web: Address: 13 Garabedian Dr. Unit C,Salem NH 03079Documents / ResourcesMytrix GMS-15 RGB Gaming Mousepad Large Cherry Pink [pdf] User GuideGMS-15 RGB Gaming Mousepad Large Cherry Pink, GMS-15, RGB Gaming Mousepad LargeCherry Pink, Gaming Mousepad Large Cherry Pink, Mousepad Large Cherry Pink, Large Cherry Pink, Cherry Pink, PinkManuals+,。
A Large-Scale Study of Flash Memory Failures in the FieldJustin Meza Carnegie Mellon University meza@Qiang WuFacebook,Inc.qwu@Sanjeev KumarFacebook,Inc.skumar@Onur MutluCarnegie Mellon Universityonur@ABSTRACTServers useflash memory based solid state drives(SSDs)as a high-performance alternative to hard disk drives to store per-sistent data.Unfortunately,recent increases inflash density have also brought about decreases in chip-level reliability.In a data center environment,flash-based SSD failures can lead to downtime and,in the worst case,data loss.As a result, it is important to understandflash memory reliability char-acteristics overflash lifetime in a realistic production data center environment running modern applications and system software.This paper presents thefirst large-scale study offlash-based SSD reliability in thefield.We analyze data collected across a majority offlash-based solid state drives at Facebook data centers over nearly four years and many millions of operational hours in order to understand failure properties and trends of flash-based SSDs.Our study considers a variety of SSD char-acteristics,including:the amount of data written to and read fromflash chips;how data is mapped within the SSD address space;the amount of data copied,erased,and discarded by the flash controller;andflash board temperature and bus power. Based on ourfield analysis of howflash memory errors man-ifest when running modern workloads on modern SSDs,this paper is thefirst to make several major observations:(1) SSD failure rates do not increase monotonically withflash chip wear;instead they go through several distinct periods corresponding to how failures emerge and are subsequently detected,(2)the effects of read disturbance errors are not prevalent in thefield,(3)sparse logical data layout across an SSD’s physical address space(e.g.,non-contiguous data),as measured by the amount of metadata required to track logical address translations stored in an SSD-internal DRAM buffer, can greatly affect SSD failure rate,(4)higher temperatures lead to higher failure rates,but techniques that throttle SSD operation appear to greatly reduce the negative reliability im-pact of higher temperatures,and(5)data written by the op-erating system toflash-based SSDs does not always accurately indicate the amount of wear induced onflash cells due to op-timizations in the SSD controller and buffering employed in the system software.We hope that thefindings of thisfirst large-scaleflash memory reliability study can inspire others to develop other publicly-available analyses and novelflash reliability solutions.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage,and that copies bear this notice and the full ci-tation on thefirst page.Copyrights for third-party components of this work must be honored.For all other uses,contact the owner/author(s).Copyright is held by the author/owner(s).SIGMETRICS’15,June15–19,2015,Portland,OR,USA.ACM978-1-4503-3486-0/15/06./10.1145/2745844.2745848.Categories and Subject DescriptorsB.3.4.[Hardware]:Memory Structures—Reliability,Test-ing,and Fault-ToleranceKeywordsflash memory;reliability;warehouse-scale data centers1.INTRODUCTIONServers useflash memory for persistent storage due to the low access latency offlash chips compared to hard disk drives. Historically,flash capacity has lagged behind hard disk drive capacity,limiting the use offlash memory.In the past decade, however,advances in NANDflash memory technology have increasedflash capacity by more than1000×.This rapid in-crease inflash capacity has brought both an increase inflash memory use and a decrease inflash memory reliability.For example,the number of times that a cell can be reliably pro-grammed and erased before wearing out and failing dropped from10,000times for50nm cells to only2,000times for20nm cells[28].This trend is expected to continue for newer gen-erations offlash memory.Therefore,if we want to improve the operational lifetime and reliability offlash memory-based devices,we mustfirst fully understand their failure character-istics.In the past,a large body of prior work examined the failure characteristics offlash cells in controlled environments using small numbers e.g.,tens)of rawflash chips(e.g.,[36,23,21, 27,22,25,16,33,14,5,18,4,24,40,41,26,31,30,37,6,11, 10,7,13,9,8,12,20]).This work quantified a variety offlash cell failure modes and formed the basis of the community’s un-derstanding offlash cell reliability.Yet prior work was limited in its analysis because these studies:(1)were conducted on small numbers of rawflash chips accessed in adversarial man-ners over short amounts of time,(2)did not examine failures when using real applications running on modern servers and instead used synthetic access patterns,and(3)did not account for the storage software stack that real applications need to go through to accessflash memories.Such conditions assumed in these prior studies are substantially different from those expe-rienced byflash-based SSDs in large-scale installations in the field.In such large-scale systems:(1)real applications access flash-based SSDs in different ways over a time span of years, (2)applications access SSDs via the storage software stack, which employs various amounts of buffering and hence affects the access pattern seen by theflash chips,(3)flash-based SSDs employ aggressive techniques to reduce device wear and to cor-rect errors,(4)factors in platform design,including how many SSDs are present in a node,can affect the access patterns to SSDs,(5)there can be significant variation in reliability due to the existence of a very large number of SSDs andflash chips.All of these real-world conditions present in large-scalePlatform SSDs PCIePer SSDCapacity Age(years)Data written Data read UBERA1v1,×4720GB 2.4±1.027.2TB23.8TB5.2×10−10B248.5TB45.1TB2.6×10−9C1v2,×41.2TB 1.6±0.937.8TB43.4TB1.5×10−10D218.9TB30.6TB5.7×10−11E13.2TB0.5±0.523.9TB51.1TB5.1×10−11F214.8TB18.2TB1.8×10−10Table1:The platforms examined in our study.PCIe technology is denoted by vX,×Y where X=version and Y=number of lanes.Data was collected over the entire age of the SSDs.Data written and data read are to/from the physical storage over an SSD’s lifetime.UBER=uncorrectable bit error rate(Section3.2).rection and forwards the result to the host’s OS.Errors that are not correctable by the driver result in data loss.1Our SSDs allow us to collect information only on large er-rors that are uncorrectable by the SSD but correctable by the host.For this reason,our results are in terms of such SSD-uncorrectable but host-correctable errors,and when we refer to uncorrectable errors we mean these type of errors.We refer to the occurrence of such uncorrectable errors in an SSD as an SSD failure.2.3The SystemsWe examined the majority offlash-based SSDs in Face-book’s serverfleet,which have operational lifetimes extend-ing over nearly four years and comprising many millions of SSD-days of usage.Data was collected over the lifetime of the SSDs.We found it useful to separate the SSDs based on the type of platform an SSD was used in.We define a platform as a combination of the device capacity used in the system,the PCIe technology used,and the number of SSDs in the system. Table1characterizes the platforms we examine in our study. We examined a range of high-capacity multi-level cell(MLC)flash-based SSDs with capacities of720GB,1.2TB,and3.2TB. The technologies we examined spanned two generations of PCIe,versions1and2.Some of the systems in thefleet em-ployed one SSD while others employed two.Platforms A and B contain SSDs with around two or more years of operation and represent16.6%of the SSDs examined;Platforms C and D contain SSDs with around one to two years of operation and represent50.6%of the SSDs examined;and Platforms E and F contain SSDs with around half a year of operation and represent around22.8%of the SSDs examined.2.4Measurement MethodologyTheflash devices in ourfleet contain registers that keep track of statistics related to SSD operation(e.g.,number of bytes read,number of bytes written,number of uncorrectable errors).These registers are similar to,but distinct from,the standardized SMART data stored by some SSDs to monitor their reliability characteristics[3].The values in these regis-ters can be queried by the host machine.We use a collector script to retrieve the raw values from the SSD and parse them into a form that can be curated in a Hive[38]table.This process is done in real time every hour.The scale of the systems we analyzed and the amount of data being collected posed challenges for analysis.To process the many millions of SSD-days of information,we used a cluster of machines to perform a parallelized aggregation of the data stored in Hive using MapReduce jobs in order to get a set of 1We do not examine the rate of data loss in this work.lifetime statistics for each of the SSDs we analyzed.We then processed this summary data in R[2]to collect our results.2.5Analytical MethodologyOur infrastructure allows us to examine a snapshot of SSD data at a point in time(i.e.,our infrastructure does not store timeseries information for the many SSDs in thefleet).This limits our analysis to the behavior of thefleet of SSDs at a point in time(for our study,we focus on a snapshot of SSD data taken during November2014).Fortunately,the number of SSDs we examine and the diversity of their characteristics allow us to examine how reliability trends change with vari-ous characteristics.When we analyze an SSD characteristic (e.g.,the amount of data written to an SSD),we group SSDs into buckets based on their value for that characteristic in the snapshot and plot the failure rate for SSDs in each bucket. For example,if an SSD s is placed in a bucket for N TB of data written,it is not also placed in the bucket for(N−k)TB of data written(even though at some point in its life it did have only(N−k)TB of data written).When performing bucketing,we round the value of an SSD’s characteristic to the nearest bucket and we eliminate buckets that contain less than0.1%of the SSDs analyzed to have a statistically sig-nificant sample of SSDs for our measurements.In order to express the confidence of our observations across buckets that contain different numbers of servers,we show the95th per-centile confidence interval for all of our data(using a binomial distribution when considering failure rates).We measure SSD failure rate for each bucket in terms of the fraction of SSDs that have had an uncorrectable error compared to the total number of SSDs in that bucket.3.BASELINE STATISTICSWe willfirst focus on the overall error rate and error distri-bution among the SSDs in the platforms we analyze and then examine correlations between different failure events.3.1Bit Error RateThe bit error rate(BER)of an SSD is the rate at which er-rors occur relative to the amount of information that is trans-mitted from/to the SSD.BER can be used to gauge the relia-bility of data transmission across a medium.We measure the uncorrectable bit error rate(UBER)from the SSD as:UBER=Uncorrectable errorsBits accessedForflash-based SSDs,UBER is an important reliability met-ric that is related to the SSD lifetime.SSDs with high UBERs are expected to have more failed cells and encounter more(se-vere)errors that may potentially go undetected and corruptdata than SSDs with low UBERs.Recent work by Grupp et al.examined the BER of raw MLCflash chips(without per-forming error correction in theflash controller)in a controlled environment[20].They found the raw BER to vary from 1×10−1for the least reliableflash chips down to1×10−8 for the most reliable,with most chips having a BER in the 1×10−6to1×10−8range.Their study did not analyze the effects of the use of chips in SSDs under real workloads andsystem software.Table1shows the UBER of the platforms that we examine. Wefind that forflash-based SSDs used in servers,the UBER ranges from2.6×10−9to5.1×10−11.While we expect that the UBER of the SSDs that we measure(which correct small errors,perform wear leveling,and are not at the end of their rated life but still being used in a production environment) should be less than the raw chip BER in Grupp et al.’s study (which did not correct any errors in theflash controller,exer-cisedflash chips until the end of their rated life,and accessed flash chips in an adversarial manner),wefind that in some cases the BERs were within around one order of magnitude of each other.For example,the UBER of Platform B in our study,2.6×10−9,comes close to the lower end of the raw chip BER range reported in prior work,1×10−8.Thus,we observe that inflash-based SSDs that employ error correction for small errors and wear leveling,the UBER ranges from around1/10to1/1000×the raw BER of similarflash chips examined in prior work[20].This is likely due to the fact that ourflash-based SSDs correct small errors,perform wear leveling,and are not at the end of their rated life.As a result,the error rate we see is smaller than the error rate the previous study observed.As shown by the SSD UBER,the effects of uncorrectable errors are noticeable across the SSDs that we examine.We next turn to understanding how errors are distributed amonga population of SSDs and how failures occur within SSDs.3.2SSD Failure Rate and Error CountFigure2(top)shows the SSD incidence failure rate within each platform–the fraction of SSDs in each platform that have reported at least one uncorrectable error.Wefind that SSD failures are relatively common events with between4.2% and34.1%of the SSDs in the platforms we examine reporting uncorrectable errors.Interestingly,the failure rate is much lower for Platforms C and E despite their comparable amounts of data written and read(cf.Table1).This suggests that there are differences in the failure process among platforms.We analyze which factors play a role in determining the occurrence of uncorrectable errors in Sections4and5.Figure2(bottom)shows the average yearly uncorrectable error rate among SSDs within the different platforms–the sum of errors that occurred on all servers within a platform over12months ending in November2014divided by the num-ber of servers in the platform.The yearly rates of uncor-rectable errors on the SSDs we examined range from15,128 for Platform D to978,806for Platform B.The older Platforms A and B have a higher error rate than the younger Platforms C through F,suggesting that the incidence of uncorrectable errors increases as SSDs are utilized more.We will examine this relationship further in Section4.Platform B has a much higher average yearly rate of un-correctable errors(978,806)compared to the other platforms (the second highest amount,for Platform A,is106,678).We find that this is due to a small number of SSDs having amuchA B C D E FSSDfailurerate..2.4.6.81.A B C D E FYearlyuncorrectableerrorsperSSDe+4e+58e+5Figure2:The failure rate(top)and average yearly uncorrectable errors per SSD(bottom),among SSDs within each platform.higher number of errors in that platform:Figure3quantifies the distribution of errors among SSDs in each platform.The x axis is the normalized SSD number within the platform,with SSDs sorted based on the number of errors they have had over their lifetime.The y axis plots the number of errors for a given SSD in log scale.For every platform,we observe that the top 10%of SSDs with the most errors have over80%of all un-correctable errors seen for that platform.For Platforms B,C, E,and F,the distribution is even more skewed,with10%of SSDs with errors making up over95%of all observed uncor-rectable errors.We alsofind that the distribution of number of errors among SSDs in a platform is similar to that of a Weibull distribution with a shape parameter of0.3and scale parameter of5×103.The solid black line on Figure3plots this distribution.An explanation for the relatively large differences in errors per machine could be that error events are correlated.Exam-ining the data shows that this is indeed the case:during a recent two weeks,99.8%of the SSDs that had an error during thefirst week also had an error during the second week.We therefore conclude that an SSD that has had an error in the past is highly likely to continue to have errors in the future. With this in mind,we next turn to understanding the correla-tion between errors occurring on both devices in the two-SSD systems we examine.3.3Correlations Between Different SSDsGiven that some of the platforms we examine have twoflash-based SSDs,we are interested in understanding if the likeli-directly to flash cells),cells are read,programmed,and erased,and unreliable cells are identified by the SSD controller,re-sulting in an initially high failure rate among devices.Figure 4pictorially illustrates the lifecycle failure pattern we observe,which is quantified by Figure 5.Figure 5plots how the failure rate of SSDs varies with theamount of data written to the flash cells.We have groupedthe platforms based on the individual capacity of their SSDs.Notice that across most platforms,the failure rate is low whenlittle data is written to flash cells.It then increases at first(corresponding to the early detection period,region 1in thefigures),and decreases next (corresponding to the early failureperiod,region 2in the figures).Finally,the error rate gener-ally increases for the remainder of the SSD’s lifetime (corre-sponding to the useful life and wearout periods,region 3inthe figures).An obvious outlier for this trend is Platform C –in Section 5,we observe that some external characteristics ofthis platform lead to its atypical lifecycle failure pattern.Note that different platforms are in different stages in theirlifecycle depending on the amount of data written to flashcells.For example,SSDs in Platforms D and F,which havethe least amount of data written to flash cells on average,aremainly in the early detection or early failure periods.On theother hand,SSDs in older Platforms A and B,which havemore data written to flash cells,span all stages of the lifecyclefailure pattern (depicted in Figure 4).For SSDs in PlatformA,we observe up to an 81.7%difference between the failurerates of SSDs in the early detection period and SSDs in thewearout period of the lifecycle.As explained before and as depicted in Figure 4,the lifecy-cle failure rates we observe with the amount of data writtento flash cells does not follow the conventional bathtub curve.In particular,the new early detection period we observe acrossthe large number of devices leads us to investigate why this“early detection period”behavior exists.Recall that the earlydetection period refers to failure rate increasing early in life-time (i.e.,when a small amount of data is written to the SSD).After the early detection period,failure rate starts decreasing.We hypothesize that this non-monotonic error rate behav-ior during and after the early detection period can be ac-counted for by a two-pool model of flash blocks:one poolof blocks,called the weaker pool ,consists of cells whose errorrate increases much faster than the other pool of blocks,called the stronger pool .The weaker pool quickly generates uncor-rectable errors (leading to increasing failure rates observed in the early detection period as these blocks keep failing).The cells comprising this pool ultimately fail and are taken out of use early by the SSD controller.As the blocks in the weaker pool are exhausted,the overall error rate starts decreasing (as we observe after the end of what we call the early detection period )and it continues to decrease until the more durable blocks in the stronger pool start to wear out due to typical use.We notice a general increase in the duration of the lifecycle periods (in terms of data written)for SSDs with larger capac-ities.For example,while the early detection period ends after around 3TB of data written for 720GB and 1.2TB SSDs (in Platforms A,B,C,and D),it ends after around 10TB of data written for 3.2TB SSDs (in Platforms E and F).Similarly,the early failure period ends after around 15TB of data written for 720GB SSDs (in Platforms A and B),28TB for 1.2TB SSDs (in Platforms C and D),and 75TB for 3.2TB SSDs (in Platforms E and F).This is likely due to the more flexibility a higher-capacity SSD has in reducing wear across a larger number of flash cells.4.2Data Read from Flash Cells Similar to data written ,our framework allows us to measure the amount of data directly read from flash cells over each SSD’s lifetime.Prior works in controlled environments have shown that errors can be induced due to read-based access patterns [5,32,6,8].We are interested in understanding how prevalent this effect is across the SSDs we examine.Figure 6plots how failure rate of SSDs varies with the amount of data read from flash cells.For most platforms (i.e.,A,B,C,and F),the failure rate trends we see in Figure 6are similar to those we observed in Figure 5.We find this similar-ity when the average amount of data written to flash cells by SSDs in a platform is more than the average amount of data read from the flash cells.Platforms A,B,C,and F show this behavior.22Though Platform B has a set of SSDs with a large amount of data read,the average amount of data read across all SSDs in this platform is less than the average amount of data written.0e+004e+138e+13Data written (B)0.000.501.00S S D f a i l u r e r a t e ●Platform B0.0e+00 1.0e+14Data written (B)0.000.501.00S S D f a i l u r e r a te ●Platform D0.0e+00 1.5e+14 3.0e+14Data written (B)0.000.501.00S S D f a i l u r e r a t e ●Platform E Platform F Figure 5:SSD failure rate vs.the amount of data written to flash cells.SSDs go through several distinct phases throughout their life:increasing failure rates during early detection of less reliable cells (1),decreasing failure rates during early cell failure and subsequent removal (2),and eventually increasing error rates during cell wearout (3).0.0e+00 1.0e+14 2.0e+14Data read (B)0.000.501.00S S D f a i l u r e r a t e ●Platform B0.0e+00 1.0e+142.0e+14Data read (B)0.000.501.00S S D f a i l u r e r a te ●Platform C Platform D 0.0e+00 1.5e+14Data read (B)0.000.501.00S S D f a i l u r e r a t e ●Platform E Platform F Figure 6:SSD failure rate vs.the amount of data read from flash cells.SSDs in Platform E,where over twice as much data is read from flash cells as written to flash cells,do not show failures dependent on the amount of data read from flash cells.In Platform D,where more data is read from flash cells thanwritten to flash cells (30.6TB versus 18.9TB,on average),wenotice error trends associated with the early detection andearly failure periods.Unfortunately,even though devices inPlatform D may be prime candidates for observing the occur-rence of read disturbance errors (because more data is readfrom flash cells than written to flash cells),the effects of earlydetection and early failure appear to dominate the types oferrors observed on this platform.In Platform E,however,over twice as much data is readfrom flash cells as written to flash cells (51.1TB versus 23.9TBon average).In addition,unlike Platform D,devices in Plat-form E display a diverse amount of utilization.Under theseconditions,we do not observe a statistically significant differ-ence in the failure rate between SSDs that have read the mostamount of data versus those that have read the least amountof data.This suggests that the effect of read-induced errors inthe SSDs we examine is not predominant compared to othereffects such as write-induced errors.This corroborates priorflash cell analysis that showed that most errors occur due toretention effects and not read disturbance [32,11,12].4.3Block ErasesBefore new data can be written to a page in flash mem-ory,the entire block needs to be first erased.3Each erasewears out the block as shown in previous works [32,11,12].Our infrastructure tracks the average number of blocks erasedwhen an SSD controller performs garbage collection .Garbagecollection occurs in the background and erases blocks to en-sure that there is enough available space to write new data.The basic steps involved in garbage collection are:(1)select-ing blocks to erase (e.g.,blocks with many unused pages),(2)copying pages from the selected block to a free area,and (3)erasing the selected block [29].While we do not have statisticson how frequently garbage collection is performed,we exam-ine the number of erases per garbage collection period metricas an indicator of the average amount of erasing that occurswithin each SSD.We examined the relationship between the number of erasesperformed versus the failure rate (not plotted).Across the3Flash pages (around 8KB in size)compose each flash block.A flash block is around 128×8KB flash pages.SSDs we examine,we find that the failure trends in terms of erases are correlated with the trends in terms of data written (shown in Figure 5).This behavior reflects the typical opera-tion of garbage collection in SSD controllers:As more data is written,wear must be leveled across the flash cells within an SSD,requiring more blocks to be erased and more pages to be copied to different locations (an effect we analyze next).4.4Page Copies As data is written to flash-based SSDs,pages are copied to free areas during garbage collection in order to free up blocks with unused data to be erased and to more evenly level wear across the flash chips.Our infrastructure measures the num-ber of pages copied across an SSD’s lifetime.We examined the relationship between the number of pages copied versus SSD failure rate (not plotted).Since the page copying process is used to free up space to write data and also to balance the wear due to writes,its operation is largely dic-tated by the amount of data written to an SSD.Accordingly,we observe similar SSD failure rate trends with the number ofcopied pages (not plotted)as we observed with the amount ofdata written to flash cells (shown in Figure 5).4.5Discarded Blocks Blocks are discarded by the SSD controller when they are deemed unreliable for further use.Discarding blocks affects the usable lifetime of a flash-based SSD by reducing the amount of over-provisioned capacity of the SSD.At the same time,dis-carding blocks has the potential to reduce the amount of errors generated by an SSD,by preventing unreliable cells from being accessed.Understanding the reliability effects of discarding blocks is important,as this process is the main defense that SSD controllers have against failed cells and is also a major impediment for SSD lifetime.Figure 7plots the SSD failure rate versus number of dis-carded blocks over the SSD’s lifetime.For SSDs in most plat-forms,we observe an initially decreasing trend in SSD failure rate with respect to the number of blocks discarded followed by an increasing and then finally decreasing failure rate trend.We attribute the initially high SSD failure rate when few blocks have been discarded to the two-pool model of flash cell failure discussed in Section 4.1.In this case,the weaker pool of flash blocks (corresponding to weak blocks that fail。
最新无线网络领域顶级国际期刊、会议档次划分和介绍无线网络算是计算机(CS)和电子工程(EE)的一个交叉学科。
相关的会议和期刊通常没有物理生物等领域那些量化的影响因子排名。
但是有一点是公认的:会议影响力远大于期刊。
而会议之间也没有任何量化比较,只有人们默认的一些不成文的档次划分。
我读硕士和博士期间,组里对于会议和期刊档次的划分大致如下(其中加入了一些个人分析和评价)。
第一档:SIGCOMM, MobiComSIGCOMM是ACM老牌会议,其首席地位无可置疑。
SIGCOMM论文录用率在10%左右,收录的论文偏重系统和协议设计,极少有理论论文。
另外,每年收录的二三十篇论文中仅有1/5左右是无线网络,但这些论文的平均质量和影响力都很高。
MobiCom 专注于无线网络,每年收录30篇左右,录用率在10%~15%。
在2005年以前,MobiCom论文以理论加仿真占主导,所以一人一电脑就可搞定。
但之后随着WiFi硬件接口的公开和软件无线电平台的星期,做实验变得越来越容易,所以对于实验验证的要求越来越高,收录的论文也越来越偏系统设计测量之类。
2010年MobiCom的一位组委会成员感叹:MobiCom is becoming highly imbalanced. The systems guys clearly dominate the entire conference. Most of them don’t have the patience to read theory papers… If you don’t have any running code, it’s very hard to get in.第二档:MobiHoc, MobiSys, SIGMETRICSMobiHoc, MobiSys两个会议可以说是MobiCom下比肩而立的两兄弟。
MobiHoc偏理论,MobiSys偏系统和应用(特别是手机等移动计算系统)。
中国计算机学会推荐国际学术会议和期刊目录(2015年)中国计算机学会中国计算机学会推荐国际学术期刊(计算机体系结构/并行与分布计算/存储系统)一、A类二、B类三、C类中国计算机学会推荐国际学术会议计算机体系结构/并行与分布计算/存储系统一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(计算机网络)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(计算机网络)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(网络与信息安全)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(网络与信息安全)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(软件工程/系统软件/程序设计语言)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(软件工程/系统软件/程序设计语言)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(数据库/数据挖掘/内容检索)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(数据库/数据挖掘/内容检索)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(计算机科学理论)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(计算机科学理论)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(计算机图形学与多媒体)一、A类二、B类三、C类中国计算机学会推荐国际学术会议(计算机图形学与多媒体)一、A类二、B类三、C类中国计算机学会推荐国际学术期刊(人工智能)一、A类。
An Online Learning Approach to Improving the Quality ofCrowd-SourcingY ang Liu,Mingyan LiuDepartment of Electrical Engineering and Computer ScienceUniversity of Michigan,Ann ArborEmail:{youngliu,mingyan}@ABSTRACTWe consider a crowd-sourcing problem where in the process of la-beling massive datasets,multiple labelers with unknown annota-tion quality must be selected to perform the labeling task for each incoming data sample or task,with the results aggregated using for example simple or weighted majority voting rule.In this pa-per we approach this labeler selection problem in an online learn-ing framework,whereby the quality of the labeling outcome by a specific set of labelers is estimated so that the learning algorithm over time learns to use the most effective combinations of labelers. This type of online learning in some sense falls under the family of multi-armed bandit(MAB)problems,but with a distinct feature not commonly seen:since the data is unlabeled to begin with and the labelers’quality is unknown,their labeling outcome(or reward in the MAB context)cannot be directly verified;it can only be es-timated against the crowd and known probabilistically.We design an efficient online algorithm LS_OL using a simple majority vot-ing rule that can differentiate high-and low-quality labelers over time,and is shown to have a regret(w.r.t.always using the optimal set of labelers)of O(log2T)uniformly in time under mild assump-tions on the collective quality of the crowd,thus regret free in the average sense.We discuss performance improvement by using a more sophisticated majority voting rule,and show how to detec-t andfilter out“bad”(dishonest,malicious or very incompetent) labelers to further enhance the quality of crowd-sourcing.Exten-sion to the case when a labeler’s quality is task-type dependent is also discussed using techniques from the literature on continuous arms.We present numerical results using both simulation and a re-al dataset on a set of images labeled by Amazon Mechanic Turks (AMT).Categories and Subject DescriptorsF.1.2[Modes of Computation]:Online computation;I.2.6[Artificial Intelligence]:Learning;H.2.8[Database Applications]:Data min-ingPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita-tion on thefirst page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or re-publish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Request permissions from permissions@.SIGMETRICS’15,June15–19,2015,Portland,OR,USA.Copyright c 2015ACM978-1-4503-3486-0/15/06...$15.00./10.1145/2745844.2745874.General TermsAlgorithm,Theory,ExperimentationKeywordsCrowd-sourcing,online learning,quality control1.INTRODUCTIONMachine learning techniques often rely on correctly labeled data for purposes such as building classifiers;this is particularly true for supervised discriminative learning.As shown in[19,22],the quali-ty of labels can significantly impact the quality of the trained classi-fier and in turn the system performance.Semi-supervised learning methods,e.g.[5,15,25]have been proposed to circumvent the need for labeled data or lower the requirement on the size of labeled data; nonetheless,many state-of-the-art machine learning systems such as those used for pattern recognition continue to rely heavily on su-pervised learning,which necessitates cleanly labeled data.At the same time,advances in instrumentation and miniaturization,com-bined with frameworks like participatory sensing,rush in enormous quantities of unlabeled data.Against this backdrop,crowd-sourcing has emerged as a viable and often favored solution as evidenced by the popularity of the A-mazon Mechanical Turk(AMT)system.Prime examples include a number of recent efforts on collecting large-scale labeled image datasets,such as ImageNet[7]and LabelMe[21].The concept of crowd-sourcing has also been studied in contexts other than pro-cessing large amounts of unlabeled data,see e.g.,user-generated map[10],opinion/information diffusion[9],and event monitor-ing[6]in large,decentralized systems.Its many advantages notwithstanding,the biggest problem with crowd-sourcing is quality control:as shown in several previous s-tudies[12,22],if labelers(e.g.,AMTs)are not selected careful-ly the resulting labels can be very noisy,due to reasons such as varying degrees of competence,individual biases,and sometimes irresponsible behavior.At the same time,the cost in having a large amount of data labeled(payment to the labelers)is non-trivial.This makes it important to look into ways of improving the quality of the crowd-sourcing process and the quality of the results generated by the labelers.In this paper we approach the labeler selection problem in an online learning framework,whereby the labeling quality of the la-belers is estimated as tasks are assigned and performed,so that an algorithm over time learns to use the more effective combinations of labelers for arriving tasks.This problem in some sense can be cast as a multi-armed bandit(MAB)problem,see e.g.,[4,16,23]. Within such a framework,the objective is to select the best of a set of choices(or“arms”)by repeatedly sampling different choices(referred to as exploration),and their empirical quality is subse-quently used to control how often a choice is used(referred to as exploitation).However,there are two distinct features that set our problem apart from the existing literature in bandit problems.First-ly,since the data is unlabeled to begin with and the labelers’quality is unknown,a particular choice of labelers leads to unknown quali-ty of their labeling outcome(mapped to the“reward”of selecting a choice in the MAB context).Whereas this reward is assumed to be known instantaneously following a selection in the MAB problem, in our model this remains unknown and at best can only be esti-mated with a certain error probability.This poses significant tech-nical challenge compared to a standard MAB problem.Secondly, to avoid having to deal with a combinatorial number of arms,it is desirable to learn and estimate each individual labeler’s quality separately(as opposed to estimating the quality of different combi-nations of labelers).The optimal selection of labelers then depends on individual qualities as well as how the labeling outcome is com-puted using individual labels.In this study we will consider both a simple majority voting rule as well as a weighted majority voting rule and derive the respective optimal selection of labelers given their estimated quality.Due to its online nature,our algorithm can be used in real time, processing tasks as they arrive.Our algorithm thus has the advan-tage of performing quality assessment and adapting to better labeler selections as tasks arrive.This is a desirable feature because gen-erating and processing large datasets can incur significant cost and delay,so the ability to improve labeler selection on thefly(rather than waiting till the end)can result in substantial cost savings and improvement in processing quality.Below we review the literature most relevant to the study presented in this paper in addition to the MAB literature cited above.Within the context of learning and differentiating labelers’ex-pertise in crowd-sourcing systems,a number of studies have looked into offline algorithms.For instance,in[8],methods are proposed to eliminate irrelevant users from a set of user-generated dataset;in this case the elimination is done as post-processing to clean up the data since the data has already been labeled by the labelers(tasks have been performed).In[13]an iterative algorithm is proposed that infers labeling quality using a process similar to belief propa-gation and it is shown that label aggregation based on this method outperforms simple majority voting.Another example is the fami-ly of matrix factorization or matrix completion based methods,see e.g.,[24],where labeler selection is implicitly done through the numerical process offinding the best recommendation for a partic-ipant.Again this is done after the labeling has already been done for all data by all(or almost all)labelers.This type of approaches is more appropriate when used in a recommendation system where data and user-generated labels already exist in large quantities. Recent study[14]has examined the fundamental trade-off be-tween labeling accuracy and redundancy in task assignment in crowd-sourcing systems.In particular,it is shown that a labeling accuracy of1−εfor each task can be achieved with a per-task assignment re-dundancy no more than O(K/q·log(K/ε));thus more redundancy can be traded for more accurate outcome.In[14]the task assign-ment is done in a one-shot fashion(thus non-adaptive)rather than sequentially with each task arrival as considered in our paper,thus the result is more applicable to offline settings similar to those cited in the previous paragraph.Within online solutions,the concept of active learning has been quite intensively studied,where the labelers are guided to make the labeling process more efficient.Examples include[12],which uses a Bayesian framework to actively assign unlabeled data based on past observations on labeling outcomes,and[18],which uses a probabilistic model to estimate the labelers’expertise.However, most studies on active learning require either an oracle to verify the correctness of thefinished tasks which in practice does not exist,or ground-truth feedback from indirect but relevant experiments(see e.g.,[12]).Similarly,existing work on using online learning for task assignment also typically assumes the availability of ground truth(as in MAB problems).For instance,in[11]online learning is applied to sequential task assignment but ground-truth of the task performance is used to estimate the performer’s quality.Our work differs from the above as we do not require an ora-cle or the availability of ground-truth;we instead impose a mild assumption on the collective quality of the crowd(without which crowd-sourcing would be useless and would not have existed).Sec-ondly,our framework allows us to obtain performance bounds on the proposed algorithm in the form of regret with respect to the op-timal strategy that always uses the best set of labelers;this type of performance guarantee is lacking in most of the work cited above. Last but not least,our algorithm is broadly applicable to gener-ic crowd-sourcing task assignments rather than being designed for specific type of tasks or data.Our main contributions are summarized as follows.1.We design an online learning algorithm to estimate the qual-ity of labelers in a crowd-sourcing setting without ground-truth information but with mild assumptions on the quality of the crowd as a whole,and show that it is able to learn the optimal set of labelers under both simple and weighted ma-jority voting rules and attains no-regret performance guaran-tees(w.r.t.always using the optimal set of labelers).2.We similarly provide regret bounds on the cost of this learn-ing algorithm w.r.t.always using the optimal set of labelers.3.We show how our model and results can be extended to thecase where the quality of a labeler may be task-type depen-dent,as well as a simple procedure to quickly detect andfil-ter out“bad”(dishonest,malicious or incompetent)labelers to further enhance the quality of crowd-sourcing.4.Our validation includes both simulation and the use of a real-world AMT dataset.The remainder of the paper is organized as follows.We formu-late our problem in Section2.In Sections3and4we introduce our learning algorithm along with regret analysis under a simple major-ity and weighted majority voting rule,respectively.We extend our model to the case where labelers’expertise may be task-dependent in Section5.Numerical experiments are presented in Section6. Section7concludes the paper.2.PROBLEM FORMULATION AND PRE-LIMINARIES2.1The crowd-sourcing modelWe begin by introducing the following major components of the crowd-sourcing system we consider.er.There is a single user with a sequence of tasks(un-labeled data)to be performed/labeled.Our proposed online learning algorithm is to be employed by the user in making labeler selections.Throughout our discussion the terms task and unlabeled data will be used interchangeably.beler.There are a total of M labelers,each may be select-ed to perform a labeling task for a piece of unlabeled data.The set of labelers is denoted by M ={1,2,...,M }.A la-beler i produces the true label for each assigned task with probability p i independent of the task;a more sophisticated task-dependent version is discussed in Section 5.This will also be referred to as the quality or accuracy of this label-er.We will assume no two labelers are exactly the same,i.e.,p i =p j ,∀i =j and we consider non-trivial cases with 0<p i <1,∀i .These quantities are unknown to the user a pri-ori.We will also assume that the accuracy of the collection oflabelers satisfies ¯p :=∑M i =1p i M >12,and that M >log22(¯p −1/2)2.The justification and implication of these assumptions are discussed in more detail in Section 2.3.Our learning system works in discrete time steps t =1,2,...,T .At time t ,a task k ∈K arrives to be labeled,where K could be either a finite or infinite set.For simplicity of presentation,we will assume that a single task arrives at each time,and that the labeling outcome is binary:1or 0;however,both assumptions can be fairly easily relaxed 1.For task k ,the user selects a subset S t ⊆M to label it.The label generated by labeler i ∈S t for data k at time t is denoted by L i (t ).The set of labels {L i (t )}i ∈S t generated by the selected labelers then need to be combined to produce a single label for the data;this is often referred to as the information aggregation phase.Since we have no prior knowledge on the labelers’accuracy,we will first apply the simple majority voting rule over the set of labels;later we will also examine a more sophisticated weighted majority voting rule.Mathematically,the majority voting rule at time t leads to the following label output:L ∗(t )=argmax l ∈{0,1}∑i ∈S tI L i(t )=l ,(1)with ties (i.e.,∑i ∈S t I L i (t )=0=∑i ∈S t I L i (t )=1,where I denotes the indicator function)broken randomly.Denote by π(S t )the probability of obtaining the correct label following the simple majority rule above,and we have:π(S t )=∑S :S ⊆S t ,|S |≥⌈|S t |+12⌉∏i ∈S p i ·∏j ∈S t \S(1−p j )Majority wins+∑S :S ⊆S t ,|S |=|S t |2∏i ∈S p i ·∏j ∈S t \S (1−p j )2Ties broken equally likely.(2)Denote by c i a normalized cost/payment per task to labeler i andconsider the following linear cost functionC (S )=∑i ∈Sc i ,S ⊆M .(3)Extensions of our analysis to other forms of cost functions are fea-sible though with more cumbersome notations.DenoteS ∗=argmax S ⊆M π(S ),(4)thus S ∗is the optimal selection of labelers given each individual’saccuracy.We also refer to π(S )as the utility for selecting the set of labelers S and denote it equivalently as U (S ).C (S ∗)will be referred to as the necessary cost per task.In most crowd-sourcing systems the main goal is to obtain high quality labels while the cost accrued is a secondary issue.For completeness,however,we will also analyze the tradeoff between the two.Therefore we shall 1Indeedin our experiment shown later in Section 6,our algorithmis applied to a non-binary multi-label case.adopt two objectives when designing an efficient online algorithm:choosing the best set of labelers while keeping the cost low.It should be noted that one can also combine labeling accuracy and cost to form a single objective function,such as π(S )−C (S ).The resulting analysis is quite similar to and can be easily repro-duced from that presented in this paper .2.2Off-line optimal selection of labelersBefore addressing the learning problem,we will first take a look at how to efficiently derive the optimal selection S ∗given accuracy probabilities {p i }i ∈M .This will be a crucial step repeatedly in-voked by the learning procedure we develop next,to determine the set of labelers to use given a set of estimated accuracy probabilities.The optimal selection is a function of the values {p i }i ∈M and the aggregation rule used to compute the final label.While there is a combinatorial number of possible selections,the next two results combined lead to a very simple and linear-complexity procedure in finding the optimal S ∗.T HEOREM 1.Under the simple majority vote rule,the optimal number of labelers s ∗=|S ∗|must be an odd number.T HEOREM 2.The optimal set S ∗is monotonic,i.e.,if we have i ∈S ∗and j ∈S ∗then we must have p i >p j .Proofs of the above two theorems can be found in the ing these results,given a set of accuracy probabilities the optimal selection under the majority vote rule consists of the top s ∗(an odd number)labelers with the highest quality;thus we only need to compute s ∗,which has a linear complexity of O (M /2).A set that consists of the highest m labelers will henceforth be referred to as a m-monotonic set ,and denoted as S m ⊆M .2.3The lack of ground truthAs mentioned,a key difference between our model and many other studies on crowd-sourcing as well as the basic framework of MAB problems is that we lack ground-truth in our system;we e-laborate on this below.In a standard MAB setting,when a player (the user in our scenario)selects a set of arms (labelers)to activate,she immediately finds out the rewards associated with those select-ed arms.This information allows the player to collect statistics on each arm (e.g.,sample mean rewards)which is then used in her future selection decisions.In our scenario,the user sees the labels generated by each selected labeler,but does not know which ones are true.In this sense the user does not find out about her reward immediately after a decision;she can only do so probabilistically over a period of time through additional estimation devices.This constitutes the main conceptual and technical difference between our problem and the standard MAB problem.Given the lack of ground-truth,the crowd-sourcing system is on-ly useful if the average labeler is more or less trustworthy.For instance,if a majority of the labelers produce the wrong label most of the time,unbeknownst to the user,then the system is effective-ly useless,i.e.,the user has no way to tell whether she could trust the outcome so she might as well abandon the system.It is there-fore reasonable to have some trustworthiness assumption in place.Accordingly,we have assumed that ¯p =∑M i =1p iM >0.5,i.e.,the av-erage labeling quality is higher than 0.5or a random guess;this is a common assumption in the crowd-sourcing literature (see e.g.,[8]).Note that this is a fairly mild assumption:not all labelers need to have accuracy p i >0.5or near 0.5;some labeler may have arbitrar-ily low quality (∼0)as long as it is in the minority.Denote by X i a binomial random variable with parameter p i to model labeler i ’s outcome on a given task:X i =1if her label iscorrect ing Chernoff Hoeffding’s inequality we haveP(∑M i=1X iM>1/2)=1−P(∑M i=1X iM≤1/2)=1−P(∑M i=1X iM−¯p≤1/2−¯p)≥1−e−2M·(¯p−1/2)2.Define a min:=P(∑M i=1X iM >1/2);this is the probability that a simplemajority vote over the M labelers is correct.Therefore,if¯p>1/2and further M>log22(¯p−1/2)2,then1−e−2M·(¯p−1/2)2>1/2,meaninga simple majority vote would be correct most of the time.Through-out the paper we will assume both these conditions are true.It also follows that we have:P(∑i∈S∗X is∗>1/2)≥P(∑M i=1X iM>1/2),where the inequality is due to the definition of the optimal set S∗.3.LEARNING THE OPTIMAL LABELERSELECTIONIn this section we present an online learning algorithm LS_OL that over time learns each labeler’s accuracy,which it then uses to compute an estimated optimal set of labelers using the properties given in the previous section.3.1An online learning algorithm LS_OLThe algorithm consists of two types of time steps,exploration and exploitation,as is common to online learning.However,the de-sign of the exploration step is complicated by the additional estima-tion needs due to the lack of ground-truth revelation.Specifically, a set of tasks will be designated as“testers”and may be repeated-ly assigned to the same labeler in order to obtain consistent results used for estimating her label quality.This can be done in one of t-wo ways depending on the nature of the tasks.For tasks like survey questions(e.g.those with multiple choices),a labeler may indeed be prompted to answer the same question(or equivalent variants with alternative wording)multiple times,usually not in succession, during the survey process.This is a common technique used by survey designers for quality control by testing whether a participant answers questions randomly or consistently,whether a participant is losing attention over time,and so on,see e.g.,[20].For tasks like labeling images,a labeler may be given identical images repeatedly or each time with small,added iid noise.With the above in mind,the algorithm conceptually proceeds as follows.A condition is checked to determine whether the algorithm should explore or exploit in a given time step.If it is to exploit, then the algorithm selects the best set of labelers based on current quality estimates to label the arriving task.If it is to explore,then the algorithm will either assign an old task(an existing tester)or the new arriving task(which then becomes a tester)to the entire set of labelers M depending on whether all existing testers have been labeled enough number of times.Because of the need to repeatedly assign an old task,some new tasks will not be immediately assigned (those arriving during an exploration step while an old task remains under-labeled).These tasks will simply be given a random label (e.g.,with error probability1/2)but their numbers are limited by the frequency of an exploration step(∼log2T),as we shall detail later.Before proceeding to a more precise description of the algorithm, a few additional notions are in order.Denote the n-th label outcome (via majority vote over M labelers in exploration)for task k by y k(n).Denote by y∗k(N)the label obtained using majority rule over the N label outcomes y k(1),y k(2),···,y k(N):y∗k(N)= 1,∑N n=1I y k(n)=1N>0.50,otherwise,(5)with ties broken randomly.It is this majority label after N tests on a tester task k that will be used to analyze different labeler’s perfor-mance.A tester task is always assigned to all labelers for labeling. Following our earlier assumption that the repeated assignments of the same task use identical and independent variants of the task,we will also take the repeated outcomes y k(1),y k(2),···,y k(N)to be independent and statistically identical.Denote by E(t)the set of tasks assigned to the M labelers dur-ing exploration steps up to time t.For each task k∈E(t)denote byˆN k(t)the number of times k has been assigned.Consider the following random variable defined at each time t:O(t)=I|E(t)|≤D1(t)or∃k∈E(t)s.t.ˆN k(t)≤D2(t),whereD1(t)=1(1max m:m odd m·n(S m)−α)2·ε2·log t,D2(t)=1(a min−0.5)2·log t,and n(S m)is the number of all possible majority subsets(for exam-ple when|S m|=5,n(S m)is the number of all possible subset of size being at least3)of S m,εa bounded constant,andαa positive con-stant such thatα<1max m:m odd m·n(S m).Note that O(t)captures the event when either an insufficient number of tester tasks have been assigned under exploration or some tester task has been assigned insufficient number of times in exploration.Our online algorithm for labeler selection is formally stated in Figure1.The LS_OL algorithm can either go on indefinitely or terminate at some time T.As we show below the performance bound on this algorithm holds uniformly in time so it does not mat-ter when it terminates.3.2Main resultsThe standard metric for evaluating an online algorithm in the MAB literature is regret,the difference between the performance of an algorithm and that of a reference algorithm which often as-sumes foresight or hindsight.The most commonly used is the weak regret measure with respect to the best single-action policy assum-ing a priori knowledge of the underlying statistics.In our problem context,this means to compare our algorithm to the one that always uses the optimal selection S∗.It follows that this weak regret,up to time T,is given byR(T)=T·U(S∗)−T∑t=1U(S t),R C(T)=T∑t=1C(S t)−T·C(S∗),where S t is the selection made at time t by our algorithm;if t hap-pens to be an exploration step then S t=M and U(S t)is either1/2 due to random guess of an arriving task that is not labeled,or a min when a tester task is labeled for thefirst time.R(T)and R C(T) respectively capture the regret of the learning algorithm in perfor-Online Labeler Selection:LS_OL1:Initialization at t =0:Initialize the estimated accuracy {˜p i }i ∈M to some value in [0,1];denote the initialization taskas k ,set E (t )={k }and ˆN k (t )=1.2:At time t a new task arrives:If O (t )=1,the algorithm ex-plores.2.1:If there is no task k ∈E (t )such that ˆNk (t )≤D 2(t ),then assign the new task to M and update E (t )to in-clude it and denote it by k ;if there is such a task,randomly select one of them,denoted by k ,to M .ˆNk (t ):=ˆN k (t )+1;obtain the label y k (ˆN k (t ));2.2:Update y ∗k (ˆN k (t ))(using the alternate indicator functionnotation I (·)):y ∗k (ˆN k (t ))=I (∑ˆNk(t )ˆt =1y k(ˆt )ˆNk (t )>0.5).2.3:Update labelers’accuracy estimate ∀i ∈M :˜p i =∑k ∈E (t ),k arrives at time ˆt I (L i (ˆt )=y ∗k (ˆN k (t )))|E (t )|.3:Else if O (t )=0,the algorithm exploits and computes:S t =argmax m ˜U(S m )=argmax S ⊆M ˜π(S ),which is solved using the linear search property,but with thecurrent estimates {˜p i }rather than the true quantities {p i },resulting in estimated utility ˜U()and ˜π().Assign the new task to those in S t .4:Set t =t +1and go to Step 2.Figure 1:Description of LS_OLmance and in cost.Define:∆max =max S =S∗U (S ∗)−U (S ),δmax =max i =j|p i −p j |,∆min =min S =S∗U (S ∗)−U (S ),δmin =min i =j|p i −p j |.εis a constant such that ε<min {∆min2,δmin2}.For analysis we as-sume U (S i )=U (S j )if i =j 2.Define the sequence {βn }:βn =∑∞t =11t n .Our main theorem is stated as follows.T HEOREM 3.The regret can be bounded uniformly in time:R (T )≤U (S ∗)(1maxm :m oddm ·n (S m )−α)2·ε2·(a min −0.5)2·log 2(T )+∆max (2M∑m =1m oddm ·n (S m )+M )·(2β2+1α·εβ2−z ),(6)2Thiscan be precisely established when p i =p j ,∀i =j .R C (T )≤∑i /∈S ∗c i(1maxm :m oddm ·n (S m )−α)2·ε2·log T+∑i ∈M c i(1maxm :m oddm ·n (S m )−α)2·ε2·(a min −0.5)2·log 2(T )+(M −|S ∗|)·(2M∑m =1m oddm ·n (S m )+M )·(2β2+1α·εβ2−z ),(7)where 0<z <1is a positive constant.First note that the regret is nearly logarithmic in T and therefore it has zero average regret as T →∞;such an algorithm is often re-ferred to as a zero-regret algorithm.Secondly the regret bound is inversely related to the minimum accuracy of the crowd (through a min ).This is to be expected:with higher accuracy (a larger a min )of the crowd,crowd-sourcing generates ground-truth outputs with higher probability and thus the learning process could be acceler-ated.Finally,the bound also depends on max m m ·n (S m )which isroughly on the order of O (2m√m √2π).3.3Regret analysis of LS_OLWe now outline key steps in the proof of the above theorem.This involves a sequence of lemmas;the proofs of most can be found in the appendices.There are a few that we omit for brevity;in those cases sketches are provided.Step 1:We begin by noting that the regret consists of that arising from the exploration phase and from the exploitation phase,denot-ed by R e (T )and R x (T ),respectively:R (T )=E [R e (T )]+E [R x (T )].The following result bounds the first element of the regret.L EMMA 1.The regret up to time T from the exploration phase can be bounded as follows:E [R e (T )]≤U (S ∗)·(D 1(T )·D 2(T )).(8)We see the regret depends on the exploration parameters as product.This is because for tasks arriving in exploration steps,we assign it at least D 2(T )times to the labelers;each time when re-assignment occurs,a new arriving task is given a random label while under an optimal scheme each missed new task means a utility of U (S ∗).Step 2:We now bound the regret arising from the exploitation phase as a function of the number of times the algorithm uses a sub-optimal selection when the ordering of the labelers is correct,and the number of times the estimates of the labelers’accuracy re-sult in a wrong ordering.The proof of the lemma below is omitted as it is fairly straightforward.L EMMA 2.For the regret from exploitation we have:E [R x (T )]≤∆max E [T ∑t =1(E 1(t )+E 2(t ))].(9)Here E 1(t )=I S t =S ∗,conditioned on correct ordering of labelers,counts the event when a sub-optimal section (other than S ∗)was used at time t based on the current estimates {˜p i }.E 2(t )counts the event when at time t the set M is sorted in the wrong order because of erroneous estimates {˜p i }.Step 3:We proceed to bound the two terms in (9)separately.In this part of the analysis we only consider those times t when the algorithm exploits.。