Expected sample moments of concomitants of selected order statistics
- 格式:pdf
- 大小:130.19 KB
- 文档页数:16
标题:Python中如何实现样本时间片段长度同步的方法一、介绍在进行时间序列分析时,常常会遇到样本时间片段长度不同的情况。
为了能够对数据进行更准确的比较和分析,我们需要对样本的时间片段长度进行同步处理。
而在Python中,有许多方法可以实现样本时间片段长度同步,本文将为大家介绍几种常用的方法。
二、方法一:使用重采样(resampling)方法重采样是一种常见的处理时间序列数据的方法,通过对时间序列数据进行重新取样,使得不同的时间序列之间的时间片段长度能够同步。
在Python中,可以使用pandas库中的resample函数来实现重采样的操作。
具体步骤如下:1. 将时间序列数据导入到pandas的DataFrame中。
2. 使用resample函数对时间序列数据进行重采样,指定重采样的频率(例如:每天、每周、每月等)和插值方法(例如:取平均值、求和等)。
3. 对重采样后的数据进行处理,使得不同时间序列的时间片段长度同步。
三、方法二:使用插值(interpolation)方法插值是一种常用的数值分析方法,通过已知数据点的取值来估计其他位置处的取值。
在处理时间序列数据时,可以使用插值方法来对不同时间序列的时间片段长度进行同步。
在Python中,可以使用scipy库中的interpolate模块来实现插值的操作。
具体步骤如下:1. 将时间序列数据导入到pandas的DataFrame中。
2. 使用scipy库中的interpolate模块中的插值函数对时间序列数据进行插值处理,使得不同时间序列的时间片段长度同步。
3. 对插值后的数据进行处理,使得时间序列数据能够进行更准确的比较和分析。
四、方法三:使用时间对齐(time alignment)方法时间对齐是一种常见的处理时间序列数据的方法,通过调整时间序列数据的时间索引,使得不同时间序列的时间片段长度能够同步。
在Python中,可以使用pandas库中的align函数来实现时间对齐的操作。
Check cif1 吸收校正问题Info on Absorption Correction Method Missing ... ?解决:将cif文件中的_exptl_absorpt_correction_type ?问号改为multi-scan(一般是改为这个)同时,将cif文件中的_exptl_absorpt_process_details ?问号改为SADABS只改第一项,会出现这样的错误提示:An _exptl_absorpt_correction_type has been givenwithout a literature citation. This should be containedin the exptl_absorpt_process_details field. Absorptioncorrection given as multi-scan2 空间群问题No _symmetry_space_group_name_H-M Given ........ ?解决:在ins或是res中找,titl一栏就是空间群名称,一般没有做过空间群的转换,这里就是了。
最保险的方法是在生成的check cif页面上就有,例如Space group C 2/m 这个绝对不会搞错3 No su's on H-atoms, but refinement reported as . mixed报告中显示对H原子的修正方法是混合模式但没有列出H原子相关的偏差。
解决:将mixed改为constr如果H只是通过Hadd加到C上,而没有H加到游离的氧原子上,则Refine-ls-hydrogen-treatment constr(而不是填写mixed)4Ratio of Maximum / Minimum Residual Density .... 2.72正负残留峰的双值应该接近1,出现这样的错误表明精修不到位,可能还有残余峰。
societiesofbiologicalpsychiatry(WFSBP)guidelinesforthebiologicaltreatmentofbipolardisorders:update2012onthelongtermtreatmentofbipolardisorder[J].WorldJBiolPsychiatry,2013,14(3):154 219.[12]GálvezV,Hadzi PavlovicD,SmithD,etal.Predictorsofseizurethresholdinrightunilateralultrabriefelectroconvulsivetherapy:roleofconcomitantmedicationsandanaesthesiaused[J].BrainStimul,2015,8(3):486 492.[13]AltamuraAC,LiettiL,DobreaC,etal.Moodstabilizersforpa tientswithbipolardisorder:thestateoftheart[J].ExpertRevNeurother,2011,11(1):85 99.[14]OlsenRW,SieghartW.Internationalunionofpharmacology.LXX.subtypesofgamma aminobutyricacid(a)receptors:classificationonthebasisofsubunitcomposition,pharmacology,andfunction.update[J].PharmacolRev,2008,60(3):243 260.(收稿日期:2020 09 04 修回日期:2021 08 29)·病例报告·帕利哌酮成功治疗使用利培酮疗效不佳的精神分裂症患者1例孔娣关键词: 利培酮; 帕利哌酮中图分类号: R749 3 文献标识码: B 文章编号: 1005 3220(2022)01 0012 011 病例患者,女,15岁,因“凭空闻语,感被议论1年”于2018年8月1日就诊。
3-2 (a) Sketch the naturally sampled PAM waveform that results from sampling a 1-kHz sine wave at a 4-kHz rate.(b) Repeat part (a) for the case of a flat-topped PAM waveform.Solution:3-4 (a)Show that an analog output waveform (which is proportional to the original input analog waveform) may be recovered from a naturally sampled PAM waveform by using the demodulation technique showed in Fig.3-4.(b) Find the constant of proportionality C, thatis obtained with this demodulation technique , where w(t) is the oriqinal waveform and Cw(t) is the recovered waveform. Note that C is a function of n ,where the oscillator frequency isnfs.Solution:()()()()()()1111sin sin 2cos sin 2cos cos sin [cos 2cos cos sin 2cos s s jk ts k k k jk ts k k s s k s s s s s k n kt kT s t c ek d k d ded d k tk dk dk d w t w t d d k t k d v t w t n tk d w t d n t n d dd k t n tn k ddωωτππωπππωπωππωππωω∞∞-=-∞=-∞∞∞-=-∞=∞=∞=≠-⎡⎤=∏=⎢⎥⎣⎦==+⎡⎤=+⎢⎥⎣⎦==++∑∑∑∑∑∑2]s n t ω211cos cos 222s s n t n tωω=+after LPF:()()()sin sin o w t w t n d d n d n ddn dcw t c ππππ==∴=3-7 In a binary PCM system, if the quantizing noise is not to exceed P ± percent of the peak-to-peak analog level, show that the number of bits in each PCM word needs to be⎪⎭⎫⎝⎛=⎥⎦⎤⎢⎣⎡⎪⎪⎭⎫ ⎝⎛≥P pn 50log 32.350log 10] [log 10102(Hint: Look at Fig. 3-8c.)Solution:Binary PCM M=n2levelsforPPq V P n 100||≤We need)50(log)10(log 50log 5025011002 size step 1022pP n PM P M V P MV nPPPP ≥⎪⎭⎫⎝⎛≥≥=≤≤==δ)(log )(log )(log )(log )(log x b a x x b a b b a ==3-8 The information in an analog voltagewaveform is to be transmitted over a PCM system with a ±0.1% accuracy (full scale). The analog waveform has an absolute bandwidth of 100 Hz and an amplitude range of –10 to +10V .(a) Determine the minimum sampling rate needed.(b) Determine the number of bits needed in each PCM word.(c) Determine the minimum bit rate required in the PCM signal.(d) Determine the minimum absolute channel bandwidth required for transmission of this PCM signal. Solution:(a) Determine the minimum sampling rate needed./sec samples 200)100(22===B f s(b) Determine the number of bits needed in each PCM word.Using the results given in prob. 3-7.(c) Determine the minimum bit rate required in the PCM signal.s f w ords n bits K bits (9)200 1.8 w ord sec sec R ⎛⎫⎛⎫=== ⎪ ⎪⎝⎭⎝⎭(d) Determine the minimum absolute channelbandwidth required for transmission of this920.1%0.1%24250025125009V V V M and n bits a PC M w ord δδδ±=±→====>=→PCM signal.For binary PCM D=RHz9002==⇒D B3-9 An 850-Mbyte hard disk is used to store PCM data. Suppose that a voice-frequency (VF) signal is sampled at 8 ksamples/s and the encoded PCM is to have an average SNR of at least 30dB. How many minutes of VF conversation (i.e., PCM data) can be stored on the hard disk? Solution:53002.6230lg 1022=→=∴=≥=⎪⎭⎫⎝⎛n n M dB MM N S nsec 58sec 40sec 405sec 8kbytes bits byte kbits R kbits sample bits ksamples n f R s =⎪⎭⎫ ⎝⎛⎪⎭⎫ ⎝⎛=⇒=⎪⎪⎭⎫⎝⎛⎪⎭⎫ ⎝⎛==min13,47min 833,2minsec/60sec 10170sec 10170sec10170sec 10510850sec585033336hrs T kbytes Mbytes T ==⨯=⨯=⇒⨯=⨯⨯==3-10 An analog signal with a bandwidth of 4.2 MHz is to be converted into binary PCM and transmitted over a channel, The peak-signal quantizing noise ratio at the receiver output must be at least 55 dB.(a) If we assume that 0=Pe and that there is no ISI, what will be the word length and the number of quantizing steps needed?(b) What will be the equivalent bit rate? (c) What will be the channel null bandwidth required if rectangular pulse shapes are used? Solution:(a) If we assume that 0eP = and that there is no ISI, what will be the word length and the num-ber of quantizing steps needed? Using(3-18),lengthword 9 34.85577.402.6bitsn use n n dB N S peak =⇒≥⇒≥+=⎪⎭⎫⎝⎛steps quantizing 512229===nM(b)sec Mbits6.75Sample bits 9sec 4.8ecMsamples/s4.8)MHz 2.4(22log=⎪⎪⎭⎫ ⎝⎛⎪⎭⎫⎝⎛=====Msamplesn f R f f s anasFor rectangular pulse shapeMHz 6.75==R B null3-12 G iven an audio signal with spectralcomponents in the frequency band 300 to 3000Hz, assume that a sampling rate of 7KHz will be used to generate a PCM signal .Design an appropriate PCM system as follows:a. Draw a block diagram of the PCM system , including the transmitter, channel, receiver.b. S pecify the number of uniform quantization steps needed and the channel null bandwidth required , assume that the peak signal-to-noise ratio at the receiver output needs to be at least 30dB and that polar NRZ signaling is used.c. Discuss how nonuniform quantization canbe used to improve the performance of the system.Solution: (a) 略 (b)lengthword 5 10.43077.402.6bitsn use n n dB N S peak=⇒≥⇒≥+=⎪⎭⎫ ⎝⎛stepsquantizing 32225===nM7sam ples/sec 7 5bits K bits 35 sec Sam ple sec s s f K K sam plesR f n =⎛⎫⎛⎫===⎪⎪⎝⎭⎝⎭()KHzR B null 35==∴( c) uniform quantizing : for all samples,the quantizing noise power is the same 122δ=N↑→↓→NS signal big N S signal smalluniform quantizing is not good for small signal.Nonuniform quantizing: samples are nonlinear processed,Small signal is amplified↑→N S(or small signal ---using small step size ↑→N S )3-14 In a PCM system , the bits error rate dueto channel noise is 10-4. Assume that peak signal-to-noise ratio on the received analog signal needed to be at least 30dB.(a) Find the minimum number of quantizing steps that can be used to encode the analog signal into a PCM signal.(b) If the original analog signal had an absolute bandwidth of 2.7kHz , what is the null bandwidth of PCM signal for the polar NRZ signaling case.Solution: (a) 410-=PedB N S PKout30≥⎪⎭⎫⎝⎛()2231000141PK out S M N M Pe⎛⎫=≥ ⎪+-⎝⎭52206.19===≥n M M use M nKz f s 4.57.22=⨯=27KHz R /274.55===⨯==nullsB sKb nf R 3-17 For a 4 bit PCM system , calculate and sketch a plot of the output SNR(in decibels) as a function of the relative input level , ()20lg rmsx V for(a) A PCM system that uses 10μ= law companding(b) A PCM system that uses uniform quantizationWhich of these system is better to use in practice? Why?Solution: n = 4 bits ---- a PCM word (a)()()()6.02 4.7720lg ln 16.024 4.7720lg ln 11021.25dB SNn dBμ=+-+⎡⎤⎣⎦=⨯+-+⎡⎤⎣⎦=(b)() 6.02 4.7720lg ()6.024 4.7720lg ()28.8520lg ()rm s dBrm s rm s S N n x V x V x V =++=⨯++=+3-19 A multilevel digital communication system sends one of 16 possible levels over the channel every 0.8 ms .(a) What is the number of bits corresponding to each level? (b) What is the baud rate? (c) What is the bit rate? Solution:(a) What is the number of bits corresponding to each level?2164/lL l bits level==⇒=(b) What is the baud rate?311,2500.810secN sym bol D baudT -===⨯(c) What is the bit rate?kbits/sec5)250,1(4===lD R3-20 A multilevel digital communication system is to operate at a data rate of 9,600 bits/s.(a) If 4-bit words are encoded into each level for transmission over the channel, what is the minimum required bandwidth for the channel?(b) Repeat part (a) for the case of 8-bit encoding into each level. Solution:(a) If 4-bit words are encoded into each level for transmission over the channel. What is the min-imum required bandwidth for the channel?(b) Repeat part (a) for the case of 8-bit encoding into each level.600600)1200(2121baud 120089600minHz B HzD B D ===≥==3-24 Consider a random data pattern consisting of binary 1’s and 0’s, where the probability of obtaining either a binary 1 or abinary 0 is21. Calculate the PSD for thefollowing types of signaling formats as a function of b T ,the time needed to send 1 bit of data:(a) Polar RZ signaling where the pulse width isbT 21=τ.(b) Manchester RZ signaling where the pulse width isbT 41=τ. What is the first nullbandwidth of these signals? What is the spectral efficiency for each of these signaling cases? Solution:(a) Polar RZ signaling where the pulse width is bT 21=τ.sin(/2)()[()]2/2b b b T fT F f F f t fT ππ==and the data are independent from bit to bit1:1:210,2n n b a AV AV →+→-,依概率依概率()222:01,221,2nn knFor k A a a a and I A +=⎧⎪⎪===⎨⎪-⎪⎩依概率依概率()2222111(0)()22n n i ii R a a P A A A ===⨯+-⨯=∑The first-null bandwidth is RT B bnull 22==andthe bandwidth efficiency is12R B η==(b) Manchester RZ signaling where the pulse width isbT 41=τ. What is the first nullband-width of these signals? What is the spectral efficiency for each of these signaling cases?()()2,0:3400,0A k Thus R k k ⎧==-⎨≠⎩()()22S s2222222()P ()336T sin (/2)12(/2)sin (/2)4(/2)sj k f T k b b b b b b b F f fR k a T fT A T fT A T fT fT eπππππ∞=-∞=-⎛⎫= ⎪⎝⎭=∑Equation (3-36) can also be used to evaluate the PSD for RZ Manchester signaling where the pulse shape is shown in the figure.⎥⎦⎤⎢⎣⎡-⎪⎪⎭⎫ ⎝⎛=-22)sin()(τωτωτπτπτj j ee f f f F⎪⎭⎫⎝⎛⎪⎪⎭⎫ ⎝⎛=⇒2sin )sin(2)(ωττπτπτf f j f FUsing (3-40) and (3-36), the PSD forManchester signaling is()()2222)][sin(sin 4)(τπτπτπτf f f T A f p b⎥⎦⎤⎢⎣⎡=IfbT 41=τ, this becomes2224sin 44sin 41)(⎥⎦⎤⎢⎣⎡⎪⎭⎫ ⎝⎛⎥⎥⎥⎥⎦⎤⎢⎢⎢⎢⎣⎡⎪⎭⎫⎝⎛⎪⎭⎫ ⎝⎛=b b b b fT fT fT T A f p πππThe first-null bandwidth is RT B bnull 44==and thespectral efficiency is41=η(bits/sec)/Hz.3-29 The data stream 01101000101 appears at the input of a differential encoder. Depending on the initial start-up condition of the encoder, find out two possible differential encoded data streams that can appear at the output. Solution:3-30 Create a practical block diagram for a differential encoding system. Explain how thesystem work by showing the encoding and decoding for the sequence 001111010001. Assume that the reference digit is a binary 1. Show that error propagation can not occur. Solution:3-34 The information in an analog waveform is first encoded into binary PCM and then converted to a multilevel signal for transmission over the channel. The number of multilevels is eight. Assume that the analog signa has a bandwidth of 2700Hz and is tobe reproduced at the receiver output with an accuracy of 1%±(full scall).(a) Determine the minimum bit rate of the PCM signal.(b) Determine the minimum baud rate of the multilevel signal.(c) Determine the minimum absolute channel bandwidth required for transmission of this PCM signal. Solution:1221%50624100V M n V V δδδ±=±→=→==→= m in ()62270032.4/()28332.410.83()5.42s la R nf kb s b L L l R D kBdlD c B kH z==⨯⨯=========3-35 A binary waveform of 9600bits/s is converted into an octal (Multilevel) waveform that is pass through a channel with a raisedcosine-rolloff Nyquist filter characteristic . The channel has a conditioned (equalized) phase response out to 2.4kHz .(a) What is the baud rate of the multilevel signal?(b) What is the rolloff factor of the filtercharacteristic?Solution:09600()8332003()(1)(1) 2.40.52R a L l D Bdl D b B f r r kH z r =→=====+=+=→=3-37 A binary communication system uses polar signal. The overall impulse response is designed to be of thesin x xtype, as given byEq(3-67),so that there will be no ISI. The bitrate is 300/s R f bit s ==.(a) What is the baud rate of the polar signal? (b) Plot the waveform of polar signal at the system output when the input binary data is 01100101. Can you discern the data by looking at this polar waveform? Solution:1502s T f B H z==(b)sin ()s e s f t h t f t ππ=1()eSsf H f f f ⎛⎫= ⎪⎝⎭∏1Ss f DT ==3-43 Using the results of prob.3-42, demonstrate that the following filter characteristics do or do not satisfy Nyquist ’s criterion for eliminating ISI (0022s f f T ==).()()00122eT a H f fT ⎛⎫=⎪⎝⎭∏()()00223eT b H f fT ⎛⎫=⎪⎝⎭∏Solution:()()000012222e T T f a H ffT f ⎛⎫⎛⎫==⎪ ⎪⎝⎭⎝⎭∏∏()()0000232322eT T f b H f fT f ⎛⎫⎪⎛⎫==⎪⎪⎝⎭⎪⎝⎭∏∏3-45 An analog signal is to be converted into a PCM signal that is a binary polar NRZ line code. The signal is transmitted over a channel that is absolutely bandlimited to 4kHz. Assume that the PCM quantizer has 16 steps and that the overall equivalent system transfer function is of the raised cosine-rolloff type with r =0.5.(a) Find the maximum PCM bit rate that can be supported by this system without introducing ISI.(b) Find the maximum bandwidth that canbe permitted for the analog signal . Solution:()0:164 a PC M w ord 40.522 5.33/1T T M n B kH zr B a R D f kb sr==→=====⨯=+量化器()analog analog 2 5.331000667224s b R nf n B R B H zn=≥⋅⨯∴≤==⨯3-47 Multilevel data with an equivalent bit rate of 2,400 bits/s is sent over a channel using a four-level line code that has a rectangular pulse shape at the output of the transmitter. The overall transmission system (i.e., the transmitter, channel, and receiver) has an r =0.5 raised cosine-rolloff Nyquist filtercharacteristic.(a) Find the baud rate of the received signal.(b) Find the 6-dB bandwidth for this transmission system.(c) Find the absolute bandwidth for the system. Solution:(a) Find the baud rate of the received signal.242=⇒==l L l2400/1200aud2D R l B ===(b) Find the 6-dB bandwidth for thistransmission system.611(1200)600H z 22dB B D ===(c) Find the absolute bandwidth for the system.113(1)(10.5)(1200)(1200)900224absolute T B B r D H z==+=+==3-54 One analog waveform w 1(t ) is bandlimited to 3 kHz, and another, w 2(t), is bandlimited to 9 kHz. These two signals are to be sent by TDM over a PAM-type system. (a) Determine the minimum sampling frequency for each signal, and design a TDM commutator and decommutator to accommodate the signals.(b) Draw some typical waveforms for w 1(t ) and w 2(t ), and sketch the corresponding TDM PAM waveform. Solution:(a) Determine the minimum sampling frequency for each signal, and design a TDM commutator and decommutator to accommodate the signals. TDM1122(): 3kH z 6ksam ples/sec (): 9kH z 18ksam ples/secs s t B f t B f ωω=⇒==⇒=(b) Draw some typical waveforms for w 1(t ) and w 2(t ), and sketch the corresponding TDM PAM waveform.3-56 Twenty-three analog signals , each with a bandwidth of 3.4kHz, are sampled at an 8-kHz rate and multiplexed together with a synchronization channel (8kHz)into a TDM PAM signal. This TDM signal is passed through a channel with an overall raised cosine-rolloff Nyquist filter characteristic of r=0.75.(a) Draw a block diagram for the system, indicating the fc of the commutator and the overall pulse rate of the TDM PAM signal.(b) Evaluate the absolute bandwidth required for the channel.Solution:248k pulses/sec=192k pulses/sec D =⨯()()192k pulse/sec110.75168kH z 22D B r =+=+=3-58 Rework Prob.3-56 for a TDM pulse code modulation system in witch an 8-bit quantizer is used to generate the PCM words for each of the analog inputs and an 8-bit synchronization word is used in the synchronization channel.Solution:3-59 Design a TDM PCM system that will accommodate four 300-bit/s (synchronous) digital inputs and one analog input that has a bandwidth of 500Hz. Assume that the analog samples will be encoded into 4-bit PCM word. Draw a block diagram for your design, analogous to Fig.3-39, indicating the data rates at the various points on the diagram. Explain how your design works.Solution:3-60 Design a TDM PCM system that will accommodate two 2400-bit/s synchronous digital inputs and an analog input that has a bandwidth of 2700 Hz. Assume that the analog input is sampled at 1.11111 times the Nyquist rate and converted into 4-bit PCM word. Draw a block diagram for your design, and indicate the data rates at the various points on your diagram. Explain how your TDM scheme works.Solution:。
3.5 交叉列联表分析在实际分析中,除了需要对单个变量的数据分布情况进行分析外,还需要掌握多个变量在不同取值情况下的数据分布情况,从而进一步深入分析变量之间的相互影响和关系,这种分析就称为交叉列联表分析。
当所观察的现象同时与两个因素有关时,如某种服装的销量受价格和居民收入的影响,某种产品的生产成本受原材料价格和产量的影响等,通过交叉列联表分析,可以较好地反映出这两个因素之间有无关联性及两个因素与所观察现象之间的相关关系。
因此,数据交叉列联表分析主要包括两个基本任务:一是根据收集的样本数据,产生二维或多维交叉列联表;二是在交叉列联表的基础上,对两个变量间是否存在相关性进行检验。
要获得变量之间的相关性,仅仅靠描述性统计的数据是不够的,还需要借助一些表示变量间相关程度的统计量和一些非参数检验的方法。
常用的衡量变量间相关程度的统计量是简单相关系数,但在交叉列联表分析中,由于行列变量往往不是连续变量,不符合计算简单相关系数的前提条件。
因此,需要根据变量的性质选择其他的相关系数,如Kendall等级相关系数、Eta值等。
SPSS提供了多种适用于不同类型数据的相关系数表达,这些相关性检验的零假设都是:行和列变量之间相互独立,不存在显著的相关关系。
根据SPSS检验后得出的相伴概率(Concomitant Significance)判断是否存在相关关系。
如果相伴概率小于显著性水平0.05,那么拒绝零假设,行列变量之间彼此相关;如果相伴概率大于显著性水平0.05,那么接受原假设,行列变量之间彼此独立。
在交叉列联表分析中,SPSS所提供的相关关系的检验方法主要有以下3种:(1)卡方(χ2)统计检验:常用于检验行列变量之间是否相关。
计算公式为:(3.11)其中,f0表示实际观察频数,f e表示期望频数。
卡方统计量服从(行数-1)´(列数-1)个自由度的卡方统计。
SPSS在计算卡方统计量时,同时给出相应的相伴概率,由此判断行列变量之间是否相关。
Expected Sample Moments of Concomitants of Selected OrderStatisticsDirk V.Arnold and Hans-Georg BeyerDepartment of Computer Science XIUniversity of Dortmund44221Dortmund,GermanyAbstractIn this paper,the task of determining expected values of sample moments,where the sample members have been selected based on noisy information,is considered.Exact expressions forexpected values of sums of products of concomitants of selected order statistics are derived.Then,using Edgeworth and Cornish-Fisher approximations,explicit results that depend on coefficientsthat can be determined numerically are obtained.While the results are exact only for normalpopulations,it is shown experimentally that including skewness and kurtosis in the calculationscan yield greatly improved results for other distributions.Keywords:Concomitants of order statistics,Gaussian noise,sample moments,Edgeworth approx-imation,Cornish-Fisher expansion1IntroductionSuppose that X1Y1XλYλareλi.i.d.bivariate observations from a continuous population with c.d.f.F x y and with p.d.f.f x y.Further,suppose that the points are ordered by their X variates. The order statistics of X are denoted as usual by X i:λ,1iλ.Following David[4],the Y variate associated with X i:λis called the concomitant of the i th order statistic and is denoted by Y i:λ.We consider the special case that the population density isf x y f x yg y(1) where g y is the density of a probability distribution that is without loss of generality assumed to be standardized to have zero mean and unit variance,and wheref x y 12πϑexp1ϑ2(2)for aϑ0.It is the purpose of this paper to obtain approximations for the expected values of the meanm11µµ∑i1Yλi1:λm1j j2(4)of the sample consisting of those Y-values that are associated with theµlargest X-values.The determination of expected values of the moments in Eqs.(3)and(4)is a recurring problem in the theory of evolution strategies[3,8].Evolution strategies are powerful heuristics for numerical search and optimization.At time step t,the state of an evolution strategy includes a set ofµ1 candidate solutions to the problem at hand.A set ofλµnew candidate solutions is generated from the existing set by means of certain operations that have the purpose of introducing variation.Sub-sequently,theµbest of theλcandidate solutions thus generated are selected to replace the original set of candidate solutions in time step t 1.This form of selection is usually called truncation se-lection.Therefore,we refer to the set Yλi1:λ:i1µas the truncated sample even though this form of selection is commonly referred to as Type II censoring in Statistics.The quality of a candidate solution is of course determined by that candidate solution’s objective function value.As real-world optimization problems almost always include sources of noise,it is of particular interest to consider the case that the observed or measured objective function values X i do not properly reflect the candidate solutions’true quality Y i.The assumption of Gaussian noise is almost universal in the optimization literature and motivates the choice of probability density in Eq.(2).The quantityϑis referred to as the noise strength and determines the correlation coefficienta11ϑ2(5)of the bivariate distribution F x y.This paper is by no means thefirst to consider properties of concomitants of selected order statis-tics.In related work,Nagaraja[6]considered asymptotic properties of m1which he referred to as the induced selection differential.Yeo and David[9]developed a general expression for the probability that theµobjects that are selected include theνµobjects with the largest Y-values.Nagaraja and David[7]derived limit distributions for the maximum Y-value of the truncated sample for both the extreme and the quantile cases.A survey of work concerned with all aspects of concomitants of order statistics has been compiled by David and Nagaraja[5].The remainder of this paper is organized as follows.In Section2,integral expressions for the expected values of sums of products of the concomitants of the selected order statistics are derived. In Section3,an Edgeworth approximation is used for expressing the distribution of the Y variates, making it possible to solve all but one of the integrals in the previously obtained expression.Even though only moments up to the fourth order are considered,there are no restrictions in principle that would prevent the inclusion of higher-order moments in the calculations.Then,a substitution is carried out with the goal of expressing the expected values of sums of products of the concomitants of the selected order statistics in terms of coefficients that can be obtained numerically.In the course of that substitution,a Cornish-Fisher expansion is used to express the inverse c.d.f.of the X variates. Finally,in Section4,expected values of the mean and of moments about the mean of the truncated sample are obtained.The special cases of the normal distribution—for which the results are exact—and theχ2-distribution—for which it is shown experimentally that considering skewness and kurtosis in the calculations greatly improves the quality of the approximation,are discussed.Appendix A derives some identities that are used in Sections2and3,and Appendix B contains a Mathematica program handling the tedious details of the calculations.2Sums of products of concomitantsIn Section4,we will express the moments of the truncated sample in terms of sums of products of the concomitants of the selected order statistics.Let Aα1ανbe a vector ofν1positiveintegersαj,j1ν,and letS A∑Yα1i1:λYανiν:λ(6) where the summation ranges over all indices i jλµ1λµ2λsuch that i j i k for any j k.Note that forνµ,the summation is empty and S A0.So as to restrict the indices in S A such that i1i2iν,let us formally writeπA y1yνfor the sum of products of powers of the y k with all permutations of the exponentsα1αν.For example,π22y1y2y21y22π111y1y2y3y1y2y3π211y1y2y3y21y2y3y1y22y3y1y2y23Then we can writeS Aλν1∑i1λµ1λν2∑i2i11λ∑iνiν11πA Y i1:λY iν:λThe expected value of a sum of products of the concomitants of the selected order statistics with exponents prescribed by A is thusE S A Eλν1∑i1λµ1λν2∑i2i11λ∑iνiν11πA Y i1:λY iν:λ∞∞∞∞λν1∑i1λµ1λν2∑i2i11λ∑iνiν11πA y1yνg i1iν:λy1yνd yνd y1where g i1iν:λy1yνdenotes the joint p.d.f.of the concomitants Y ik:λ,k1ν,with1i1i2iνλ.Using results quoted by Balakrishnan and Rao[2]and by David and Na-garaja[5],that joint p.d.f.can be written asg i1iν:λy1yνλ!∞∞∞x1∞xν1ν∏k1g y k f x k y kν∏k0F x k1F x k i k1i k1i k1i k1!d xνd x2d x1d yνd y1Using Identity1from Appendix A it follows that E S Aµ!λµFinally,lettingΦandφdenote the c.d.f and the p.d.f.of the standardized normal distribution,respec-tively,and substituting zΦy and exchanging the order of the integrations yieldsE S Aµ!λµ6He3yγ272He6y(9)whereγ1andγ2are the coefficients of skewness and kurtosis of the distribution,respectively,and where He k y denotes the k th Hermite polynomial.For the sake of brevity,we refrain from considering higher-order terms in the calculations.Note however that this is not a restriction in principle,and that additional terms could be considered.Introducing new variables x k x k1ϑ2ν∞x0∞xν1J A x1xνd xνd x1(10)where x0F1Φyϑν∞∞∞∞πA y1yνν∏k1g y kφy kϑd yνd y1(11)From Identity2in Appendix A with the correlation coefficient from Eq.(5)it follows that 11ϑ2xwhere the coefficients of the polynomial depend onγ1,γ2,and a ing this result in Eq.(10), the remaining integrals can then be solved using Identities3,4,5,and6from Appendix A.Thefinal result is of the formI A yν∑i0polynomial in x0iφx0νi1Φx0i(14)where again the coefficients of the polynomials depend onγ1,γ2,and a only.The calculations are not difficult but tedious and lengthy.Written out,they occupy far more space than is available here.There-fore,instead of presenting detailed steps,we have included a Mathematica program in Appendix B that takes over the task of determining the coefficients of the polynomials in Eq.(14).The result for I A x obtained thus far does not require any more integrations.However,it does depend on x0F1Φy1ϑ23 a3γ1,and its coefficient of kurtosis isγ21ϑ22a4γ2.Expanding F1Φy into a Cornish-Fisher series yields(Abramowitz and Stegun[1])x0F1Φy1ϑ2yγ124a4He3yγ216a3y21γ236a62y35yit follows that x0y d and therefore by Taylor expansion around y that1Φx01Φyφy dyd22Binomially expanding powers of these quantities it follows thatx0k y k ky k1dk k12k k12k k1Note that d2γ21a6y21236,and that all terms represented by dots consists of higher-order terms only.Inserting Eqs.(15),(16),and(17)in Eq.(14)results inI A yν∑i0Z A i0yγ124Z A i2yγ21µν!λµν∑i0∞∞Z A i0yγ124Z A i2yγ21µν!ν∑i0∑k0ζA i0kγ124ζA i2kγ21µS1m2µ1µ2S11m3µ1µ2µ3S2112µ4S44µ23µ3µ4S2212µ3µ4S1111In some situations,it is also useful to know the expected value of the square of the variance of the truncated sample.Squaring Eq.(4)and multiplying out,it follows thatm22µ12µ4S312µ22µ3µ4S21124µ1a2h20µλh11µλE m3µ1µ2µ3612a2h20µλh11µλa4h13µλh22µλ3h20µλ3µ1µ2µ3µ3612a2h20µλh11µλa4h13µλh22µλ3h20µλµ1µ2µ308,the coefficient of kurtosis isγ212.The value of n10is rather small,and more exact results are achieved for greater n. Nonetheless,it can be seen that including the skewness and the kurtosis of the population distribution in the calculations substantially improves the quality of the results,and that especially the values computed for thefirst and second moments very closely reflect the measured values.0.40.60.81.01.20.00.5 1.0 1.5 2.00.000.050.100.150.200.250.00.51.01.52.0λµ1!µν!F 1z λµ11z µνd zwhere formally F 00,F ν11,i 00,and i ν1λ1are assumed,holds.Identity 2:For non-negative integer k and real numbers µand σ,the identity∞∞He k x φx φxµ1σ2k 12He kµ1σ212holds.Identity3:For non-negative integer k and real number z,the identity∞z He k xφx d x1Φz if k0He k1zφz if k0holds.Identity4:For positive integer k and real numbers z andβ0,the identity ∞zHe k xφxβd x1βHe k1zφzββ121Φx2if k0 He k1zφz1Φz∞zHe k1xφx2d x if k0holds.Identity6:For positive integer k and real number z,the identity ∞zHe k xφx21Φx d x12∞zφx3d x if k112∞zHe k1xφx3d xk1d xHe k x k He k1xandHe k1x x He k x k He k1xof Hermite polynomials are used.The proof of Identity1is a bit more involved.Let us write lhsµλνand rhsµλνfor the left and right hand sides of Identity1,respectively,and letµλ.Then,lhsµλ1λ∑iλµ11F1λi F i11λj!j1!According to Eqs.6.6.4and26.5.1in[1]it follows in terms of the incomplete regularized Beta func-tion thatlhsµλ11λµ1!µ1!F1zλµ11zµ1d zrhsµλ1and the validity of the identity forν1has been shown.For the inductive step,let us now assume that the identity holds for a particular value ofνand all values ofµandλthat satisfyνµλ.To show that the identity holds forν1,letµandλsatisfy ν1µλ.The left hand side of the identity then readslhsµλν1λν∑i1λµ1λν1∑i2i11λ∑iν1iν1ν1∏k0F k1F k i k1i k1i k1i k1!ν1∏k0F k1F k i k1i k1λiν1!iν1iν1!ν1∏k0F k1F k i k1i k1λiν1j!j!ν1∏k0F k1F k i k1i k1λiν1!ν∏k0F k1F k i k1i k1i k1i k1!lhsµ1λ1νAsνµ1λ1and as the identity holds forνit follows thatlhsµλν1rhsµ1λ1ν1Integrate1[A_,0]:=A;Integrate1[A_,i_]:=Integrate1[Int1[HermiteExpand[A,y[i]],y[i],x[i]],i-1];Int1[c_expr_,y_,x_]:=c Int1[expr,y,x]/;FreeQ[c,y];Int1[expr1_+expr2_,y_,x_]:=Int1[expr1,y,x]+Int1[expr2,y,x];Int1[He[k_,y_],y_,x_]:=aˆ(k+1)Hermite[k,x]g[x];where Int1implements the integration rule Eq.(12)taking into account that the terms involving exponential functions had been left out,and where the second argument of Integrate1needs to beνinitially.Here and in what follows,f[y]and g[y]stand for1Φy andφy,respectively.The integrand in Eq.(10)is then simply given by:MakeIntegrand2[A_]:=Integrate1[MakeIntegrand1[A],Length[A]];where again A stands for the exponent vector A.Theν-fold integration in Eq.(10)is done by Integrate2[A_,0]:=A;Integrate2[A_,i_]:=Integrate2[Int2[HermiteExpand[A,x[i]],x[i],x[i-1]],i-1];Int2[c_expr_,y_,x_]:=c Int2[expr,y,x]/;FreeQ[c,y];Int2[expr1_+expr2_,y_,x_]:=Int2[expr1,y,x]+Int2[expr2,y,x];Int2[He[0,y_]g[y_],y_,x_]:=f[x];Int2[He[0,y_]f[y_]g[y_],y_,x_]:=f[x]ˆ2/2;Int2[He[1,y_]g[y_]ˆb_.,y_,x_]:=g[x]ˆb/b;Int2[He[1,y_]f[y_]g[y_]ˆ2,y_,x_]:=f[x]g[x]ˆ2/2-Int2[He[0,y]g[y]ˆ3,y,x]/2;Int2[He[k_,y_]g[y_],y_,x_]:=Hermite[k-1,x]g[x];Int2[He[k_,y_]g[y_]ˆb_.,y_,x_]:=(Hermite[k-1,x]g[x]ˆb/b-(b-1)(k-1)Int2[He[k-2,y]g[y]ˆb,y,x]/b)/;k>=2;Int2[He[k_,y_]f[y_]g[y_],y_,x_]:=Hermite[k-1,x]f[x]g[x]-Int2[He[k-1,y]g[y]ˆ2,y,x]/;k>=1;Int2[He[k_,y_]f[y_]g[y_]ˆ2,y_,x_]:=(Hermite[k-1,x]f[x]g[x]ˆ2/2-(k-1)Int2[He[k-2,y]f[y]g[y]ˆ2,y,x]/2-Int2[He[k-1,y]g[y]ˆ3,y,x]/2)/;k>=2;where Int2implements the integration rules given by Identities3,4,5,and6from Appendix A and where the second argument of Integrate2needs to beνinitially.The result of the steps so far is the representation of I A y given by Eq.(14).To do the substitution prescribed by Eqs(15),(16),and(17)we define:Substitution[c_]:=c/;FreeQ[c,x[0]];Substitution[expr1_+expr2_]:=Substitution[expr1]+Substitution[expr2];Substitution[expr1_*expr2_]:=Substitution[expr1]*Substitution[expr2];d1=g1aˆ3(yˆ2-1)+g2aˆ4(yˆ3-3y)-g1ˆ2aˆ6(2yˆ3-5y);d2=g1ˆ2aˆ6(yˆ2-1)ˆ2;Substitution[x[0]ˆk_.]:=yˆk+k yˆ(k-1)d1+k(k-1)yˆ(k-2)d2/2;Substitution[f[x[0]]ˆk_.]:=f[y]ˆk-k f[y]ˆ(k-1)g[y](d1-y*d2/2)+k(k-1)f[y]ˆ(k-2)g[y]ˆ2d2/2; Substitution[g[x[0]]ˆk_.]:=g[y]ˆk(1-k(y*d1-(yˆ2-1)d2/2)+k(k-1)yˆ2d2/2);Finally,the representation of I A y given by Eq.(18)can be obtained by:MakeSum[A_]:=HermiteExpand[Substitution[Integrate2[MakeIntegrand2[A],Length[A]]]/aˆLength[A], y];where the division by aνreflects the factors[2]B ALAKRISHNAN,N.AND R AO,C.R.(1998).Order statistics:An introduction.In Handbookof Statistics16,eds.N.Balakrishnan and C.R.Rao.Elsevier,Amsterdam,pp.3–24.[3]B EYER,H.-G.(2001).The Theory of Evolution Strategies.Springer,Berlin.[4]D AVID,H.A.(1973).Concomitants of order statistics.Bull.Internat.Statist.Inst.45,295–300.[5]D AVID,H.A.AND N AGARAJA,H.N.(1998).Concomitants of order statistics.In Handbookof Statistics16,eds.N.Balakrishnan and C.R.Rao.Elsevier,Amsterdam,pp.487–513. [6]N AGARAJA,H.N.(1982).Some asymptotic results for the induced selection differential.J.Appl.Prob.19,253–261.[7]N AGARAJA,H.N.AND D AVID,H.A.(1994).Distribution of the maximum of concomitantsof selected order statistics.Ann.Statist.22,478–494.[8]R UDOLPH,G.(1997).Convergence Properties of Evolutionary Algorithms.Dr.Kovaˇc,Ham-burg.[9]Y AO,W.B.AND D AVID,H.A.(1984).Selection through an associated characteristic,withapplications to the random effects model.J.Amer.Statist.Soc.79,399–405.。