Polyphonic music transcription using note event modeling
- 格式:pdf
- 大小:137.36 KB
- 文档页数:4
音乐史英语词汇大全了解音乐发展的关键词Music History: Comprehensive English Vocabulary to Understand Key Terms in the Development of MusicIntroduction:Music has played a significant role in human culture for centuries, evolving and diversifying across different periods and genres. In order to fully comprehend the development of music, it is essential to have a comprehensive understanding of key terms and concepts in music history. This article aims to provide a complete English vocabulary list, enabling readers to grasp the essence of music's evolution.Ancient Music:1. Chant: A monophonic vocal music performed in religious ceremonies during the Middle Ages.2. Gregorian Chant: A form of medieval plainchant named after Pope Gregory I, characterized by a simple melody and monophonic texture.3. Lyre: A plucked string instrument used in ancient Greece and Rome.4. Panpipe: A wind instrument consisting of a series of pipes connected together, commonly used in ancient civilizations.Medieval Music:1. Troubadour: A medieval lyric poet who composed and performed vernacular songs during the Middle Ages.2. Minstrel: A musician or singer who entertained the nobility with music and storytelling.3. Polyphony: A musical texture where two or more melodic lines are simultaneously performed, popular in Medieval and Renaissance music.4. Motet: A vocal musical composition with sacred or secular texts, typically including multiple voice parts.Renaissance Music:1. Madrigal: A secular vocal music genre, popular during the Renaissance era, often featuring polyphonic textures and expressive texts.2. Lute: A plucked string instrument commonly used in Renaissance music.3. Minuet: A French dance form in triple meter, widely used in the Baroque era.4. Counterpoint: The art of combining different melodic lines harmonically, popularized by composers like Johann Sebastian Bach.Baroque Music:1. Concerto: A musical composition for a solo instrument accompanied by an orchestra.2. Oratorio: A large-scale musical composition, usually based on religious themes and performed without staging or costumes.3. Fugue: A contrapuntal composition technique, where a theme is introduced in one voice and imitated by others.4. Harpsichord: A keyboard instrument used in the Baroque period, producing sound by plucking string.Classical Music:1. Sonata: A musical composition for a solo instrument or a small ensemble, often consisting of multiple movements.2. Symphony: An extended musical composition for a full orchestra, typically consisting of several movements.3. Opera: A dramatic form of musical art combining singing and acting, with elaborate stage sets and costumes.4. Concerto Grosso: A musical composition in which a small group of soloists is set against a larger ensemble.Romantic Music:1. Lied: A German art song for solo voice and piano, usually conveying poetic texts.2. Nocturne: A musical composition inspired by the night, often conveying a reflective, dreamy atmosphere.3. Program Music: Instrumental music that tells a story or describes a scene or image.4. Symphonic Poem: A piece of orchestral music that tells a story or evokes a specific mood or emotion.Modern Music:1. Atonality: A compositional technique that rejects traditional tonal hierarchy, focusing on dissonance and lack of key center.2. Jazz: A genre of music characterized by improvisation, syncopation, and a distinctive rhythmic feel.3. Minimalism: A style of music characterized by the repetition of simple melodic and rhythmic motifs.4. Electronic Music: Music composed and produced using electronic devices and technology, often with unconventional sounds and effects.Conclusion:By familiarizing ourselves with these essential English vocabulary terms in music history, we can gain a deeper understanding of the various stages and genres that have shaped the development of music over time. Whether exploring ancient chants or modern electronic compositions, this comprehensive vocabulary list equips readers with the necessary tools to fully appreciate and analyze music's rich history.。
高三英语音乐类型变化趋势单选题30题1.The music of the Baroque period is characterized by _____.A.simple melodiesplex harmoniesC.minimal instrumentationD.spontaneous improvisation答案:B。
本题主要考查巴洛克时期音乐的特点。
选项A“simple melodies”( 简单的旋律)不符合巴洛克时期音乐的特点,巴洛克音乐通常旋律复杂华丽。
选项B“complex harmonies”(复杂的和声)是巴洛克时期音乐的特点之一。
选项C“minimal instrumentation”(极简的乐器配置)错误,巴洛克时期音乐乐器配置较为丰富。
选项D“spontaneous improvisation”( 即兴演奏)不是巴洛克时期音乐的主要特点。
2.In the Renaissance, music was often _____.A.performed only in churchesB.accompanied by electric guitarsC.written for large orchestrasD.sung in polyphony答案:D。
文艺复兴时期音乐特点的考查。
选项A“performed only in churches”( 只在教堂演奏)太绝对,文艺复兴时期音乐也在其他场合演奏。
选项B“accompanied by electric guitars” 由电吉他伴奏)错误,电吉他在文艺复兴时期不存在。
选项C“written for large orchestras”为大型管弦乐队而写)不准确,文艺复兴时期大型管弦乐队还未成熟。
选项D“sung in polyphony” 以复调演唱)是文艺复兴时期音乐的常见形式。
3.Early music was mainly passed down through _____.A.printed sheet musicB.oral traditionC.digital recordingsD.television broadcasts答案:B。
Chinese English装饰变奏曲Double艾尔曲ayre安魂曲Requiem八度octave八重奏Octet巴洛克Baroque巴洛克式Baroque半音semitone伴奏乐器accompanying instrument 悲歌elegy编钟chime bells变声break变奏vary变奏曲variation标题性音乐programmatic music表现主义expressionism表演performance拨弦乐器plucked musical instrument 插段episode插曲interlude常规弥撒ordinary mass唱诗班choir呈示部exposition持续音pedal抽象派abstract art初次登台debut传统地方戏traditional local opera打击乐Percussion打击乐器percussion instrument大管bassoon大号Tuba大弥撒high mass大师virtuoso大提琴Cello大提琴协奏曲cello concerto大提琴演奏家cellist大协奏曲concerto grosso带伴奏的赋格accompanied fugue单二部曲式simple binary form单簧管clarinet低音提琴contrabass电影歌曲movie song动机motif独唱音乐会recital独奏solo独奏协奏曲solo concerto独奏者soloist断奏staccato对呈示部counter-exposition对题countersubject对位counterpoint对位的contrapuntal二部曲式binary form二重对位double counterpoint 二重赋格double fugue二重奏duet duo泛音overtone分节的strophic分节式strophic form丰富的表现力rich expressive powers 复二部曲式compound复调polyphony复调的polyphonic赋格fugue赋格的fugal钢琴piano钢琴独奏音乐会piano recital钢琴协奏曲piano concerto钢琴作品piano work高潮climax高音区high register告别舞台say farewell to stage 歌词verse歌剧opera歌剧舞台operatic stage革命歌曲revolutionary song格里高利圣咏Gregorian chant共鸣箱resonating box古典主义classicism古钢琴Harpsichord古曲ancient tune古组曲partita固定低音ground bass固定旋律cantus firmus故事片音乐feature film music关系大小调relative minor/major key管风琴organ管乐器wind instrument管弦乐曲orchestral music管弦乐作品orchestral piece管弦系wind and string instrumentsdepartment广板largo国歌national anthem国际声乐比赛international vocal competition 国际艺术节international art festival行板andante合唱chorus合唱式的赋格choral fugue合唱音乐choral music合唱指挥chorus conducting合奏ensemble和声harmony哼唱hum花腔coloratura幻想曲fantasy回旋曲rondo回旋曲式rondo form回旋奏鸣曲式sonata-rondo form击弦乐器struck string instrument即兴曲impromptu急板presto纪录片音乐documentary film music技艺精湛的virtuosity假声falsetto间奏曲Intermezzo键盘keyboard交替唱in alternation交响乐团Orchestral交响曲symphony交响诗symphonic poem教堂唱诗班church choir教堂音乐church music节奏rhythm杰作masterpiece进步歌曲progressive song进行曲March近关系调nearby key京剧Peking Opera经文歌motet剧作家playwright康索特consort康塔塔cantata口琴Harmonica快板allegro快速有生气vivo浪漫主义romanticism乐队band乐队指挥conductor乐句phrase乐谱notation乐章movement力度dynamic练习曲etude琉特琴lute洛可可式Rococo慢板lento弥撒Mass弥撒mass民歌folksong民间艺术家folk artist民间音乐家folk musician民间音乐素材folk music material民族乐器national musical instrument民族器乐national instrumental music民族特色national feature模仿imitation模仿对位imitative counterpoint木管号cornett木琴Xylophone牧歌madrigal牧歌madrigal男低音bass男高音tenor男中音baritone女低音Alto女高音soprano排练rehearse配器instrumentation配曲setting普及音乐知识popularize musical knowledge齐唱in unison其它弦乐重奏other string ensembles 器乐音乐instrumental music前奏曲prelude切分syncopation琴弓bow琴弦string清唱剧oratorio曲库repertory曲式form曲调tune全音tone人声合唱Choir柔板adagio三部曲式ternary form三拍子triple meter三重奏Trio三重奏/唱trio沙哑的raucous声部voice part声乐vocal music声乐教师vocal teacher声乐教育家vocal educator声乐人才vocal talent声乐作品vocal work圣歌plainsong圣乐Sacred Music拾音器pickup世俗音乐secular music室内乐chamber music手风琴Accordion受难曲passion抒情的lyrical属调dominant key竖笛Recorder竖琴harp双簧管oboe四重奏Quartet素歌plainsong随想曲capriccio特别弥撒proper mass调的关系key relationship调律temperament调式mode调性tonality通奏低音continuo铜管总称Brass维奥尔琴viol尾声coda未来主义futurism文艺复兴Renaissance文艺界literary and artistic circles无伴奏合唱作品unaccompanied choralcomposition五线谱staff五重奏quintet舞曲dance西部民歌western folk song嬉游曲divertimento喜歌剧comic opera戏剧drama戏剧音乐dramatic music下属的subdominant弦乐器stringed instrument弦乐四重奏string quartet弦轴peg弦柱string pole现代音乐modern music现实主义realism小步舞曲minuet小行板andantino小号trumpet小快板allegretto小弥撒low mass小提琴violin小尾声codetta小夜曲serenade协奏曲concerto协奏曲Concerto谐谑曲scherzo新古典主义neoclassicism序曲overture叙事曲Ballade宣叙调recitative旋律melody学堂乐歌school song阉人歌手castrato演唱技巧vocalism演出季度制performance season system演奏技巧performing skill扬琴dulcimer摇篮曲lullaby业余交响乐团amateur symphony orchestra夜曲Nocturne艺术成就art achievement艺术感染力artistic appeal艺术团art troupe艺术形象artistic image音高pitch音阶scale音节式圣咏syllabic chant音孔hole音乐的戏剧和演出活动music and drama performanceactivities音乐风格musical idiom音乐教育musical education音乐评论家music critic音量volume音色color音诗tone-poem音箱sound box音域musical range音质sound quality引子部分introductory section印象主义impressionism英国管English Horn影坛movie circle咏叹调aria羽管键琴harpsichord圆号horn远关系调distant key运弓bowing再现部recapitulation赞美诗anthem展开部development长笛flute长笛Flute指挥系conducting department指挥作品conducting work中板moderato中国唱片总公司china record corporation中国音乐家协会China Musicians’ Association 中世纪medieval中提琴viola中央交响乐团central philharmonic orchestra 终乐章finale终止cadence众赞歌chorale重唱ensemble主题subject主题theme主题材料thematic material主调home key主修钢琴major in piano主修声乐major in vocal music专辑album转调modulate壮板grave自然主义naturalism宗教音乐sacred music宗教赞美歌motet奏鸣曲sonata奏鸣曲式sonata form组曲suite作曲composition作曲系department of composition叙述部分narrative section。
Pop music,often referred to as popular music,is a genre that has captivated audiences worldwide with its catchy tunes,relatable lyrics,and dynamic performances. Its a form of artistic expression that has evolved significantly over the decades,reflecting the cultural shifts and technological advancements of each era.Origins and EvolutionPop music traces its roots back to the early20th century,with the rise of jazz and blues.It gained momentum in the1950s and1960s with the advent of rock n roll,which was a blend of various musical styles including rhythm and blues,country,and gospel.The Beatles,often credited with leading the British Invasion,played a pivotal role in shaping the pop music landscape.CharacteristicsPop music is characterized by its accessibility and broad appeal.It often features simple, memorable melodies and lyrics that resonate with a wide audience.The genre is known for its use of hooks,which are short musical phrases designed to make a song instantly recognizable and catchy.Lyrical ThemesThe lyrics in pop songs often revolve around universal themes such as love,heartbreak, friendship,and personal growth.They are typically written in a way that listeners can easily relate to,regardless of their background or age.Musical ElementsPop music incorporates a variety of musical elements,including but not limited to: Melody:The tune that is most memorable and catchy in a song.Harmony:The chords that accompany the melody,providing a rich texture to the music. Rhythm:The beat and tempo that drive the song,often dictating how it makes listeners feel.Vocals:The human voice is a central component of pop music,with singers often delivering the emotional core of the song.Production TechniquesThe production of pop music has become increasingly sophisticated with the advent of digital technology.Producers use various techniques such as:Sampling:Incorporating portions of other songs or sounds into a new composition. Synthesizers:Electronic instruments that can mimic traditional instruments or create unique sounds.AutoTune:A software that corrects vocal pitch to ensure a polished sound.Impact on SocietyPop music has a profound impact on society,influencing fashion,language,and even social attitudes.It often serves as a mirror to the times,reflecting societal changes and providing a soundtrack to peoples lives.Notable Artists and BandsThroughout its history,pop music has seen the rise of numerous iconic artists and bands, such as:Michael Jackson:Known as the King of Pop,his music and dance moves have influenced countless artists.Madonna:A trailblazer for female artists,she has pushed boundaries in both music and image.Beyoncé:A modernday icon,she has used her platform to advocate for social issues and empower women.ConclusionPop music is more than just entertainment its a cultural phenomenon that shapes and reflects the zeitgeist.As an art form,it continues to evolve,incorporating new technologies and styles while maintaining its core appeal to a diverse global audience. Whether its through the emotional depth of a ballad or the infectious energy of a dance track,pop music remains a vital and everchanging form of artistic expression.。
常用音乐术语全集1、套曲 Cycle 一种由多乐章组合而成的大型器乐曲或声乐器2、组曲 Suite由几个具有相对独立性的器乐曲组成的乐曲3、奏鸣曲Sonata指类似组曲的器乐合奏套曲.自海顿.莫扎特以后,其指由3-4个乐章组成的器乐独奏套曲(钢琴奏鸣曲)或独奏乐器与钢琴合奏的器乐曲(小提琴奏鸣曲)4、交响曲 symphony大型管弦乐套曲,通常含四个乐章.其乐章结构与独奏的奏鸣曲相同5、协奏曲concerto由一件或多件独奏乐器与管弦乐团相互竞奏,并显示其个性及技巧的大型器乐套曲.分独奏协奏曲、大协奏曲、小协奏曲等6、交响诗 symphonic poem单乐章的标题****响音乐7、音诗 poeme单乐章管弦乐曲,与交响诗相类似8、序曲 overture歌剧、清唱剧、舞剧、其他戏剧作品和声乐、器乐套曲的开始曲。
十九世纪又出现独立的音乐会序曲前奏曲prelude带有即兴曲的性质、有独立的乐思、常放在具有严谨结构的乐曲或套曲之前作为序引的中、小型器乐曲。
10、托卡塔 toccata 节奏紧凑、快速触键的富有自由即兴性的键盘乐曲cantabile 一如歌地 con spirito 一有精神地deciso 一坚定地doice 一柔和地dolente 一怨诉地energico—精力充沛地fantastico 一幻想地grave 一沉重地grazioso ―优雅地giocoso—嬉戏地leggiero一轻巧地largamente一宽广地maestoso一庄严地marcato 一强调mesto 一忧伤地nobilmente —高雅地pathetic 一悲怆地passionate 一热情洋溢地pastoral 一田园地risoluto 一果断地Rubato 一节奏自由sempl ice 一朴素地sempre -继续地 sentimento 一多愁善感地sostenuto -持续地 vivace 一活泼地vivo 一活跃地scherzando 一幽默地spirito 一精神饱满地tranquillo一安静地触键术语glissando 滑奏legato连音legato assai 很连贯legatissimo 最连音non legato非连音portato次断音staccatostaccatissimo▲断音sempre slacc一直用断音tenuto保持Aart achievement [9? tfi:vmsnt]艺术成就1.artistic [a / tistik] appeal [a'pi: 1]艺术感染力artistic image 艺术形象2.amateur ['kmat。
音乐信息检索技术:音乐与人工智能的融合李伟;高智辉【摘要】音乐科技是一个典型的交叉学科领域,分为艺术部分和科技部分.近年来兴起的音乐信息检索技术(MIR)是音乐科技领域的重要组成部分.MIR领域包含数十个研究课题,可按照与各音乐要素的密切程度分为核心层和应用层.当前的MIR技术发展仍然面临诸多困难,但随着艺术与科技的不断融合,必将迎来其发展的辉煌时期.【期刊名称】《艺术探索》【年(卷),期】2018(032)005【总页数】5页(P112-116)【关键词】人工智能;音乐信息检索技术;音乐科技【作者】李伟;高智辉【作者单位】复旦大学计算机科学技术学院,上海201203;复旦大学信息科学与工程学院,上海200433【正文语种】中文【中图分类】J61一、音乐科技概况早在20世纪50年代,计算机刚刚产生,美国的一位化学博士就开始尝试运用计算机处理音乐。
随后几十年,欧美各国相继建立了多个大型音乐科技研究机构,如1975年建立的美国斯坦福大学的音乐及声学计算机研究中心(Center forComputer Research in Music and Acoustics,CCRMA)、1977 年建立的法国巴黎的声学与音乐研究与协调研究所(Institute for Research and Coordination Acoustic/Music,IRCAM)、1994年建立的西班牙巴塞罗那庞培法布拉(UPF)大学的音乐科技研究组(Music Technology Group,MTG)、2001年建立的英国伦敦女王大学数字音乐研究中心(Centrefor Digital Music,C4DM)等。
此外,在亚洲的日本、中国台湾等国家和地区也有多个该领域的公司(如雅马哈)和科研院所。
欧洲由于其浓厚的人文和艺术气息成了音乐科技的世界中心。
图1 音乐科技各领域关系图音乐科技是一个典型的交叉学科领域,分为艺术部分和科技部分。
关于“是对位还是复调”的再讨论赖朝师【摘要】专业复调音乐教学在中国不到百年的历史.20世纪50年代之前以“对位法”、“赋格”为课程名称,之后改用“复调音乐”延续至今.近年来,有些学者提出要恢复它原来的名称,但许多学者则认为这是时代和学科发展的必然结果.因此,学界展开了多年的关于课程名称问题的大讨论.本文针对杨勇教授的“是对位还是复调”的问题发表自己的一点见解,主旨是进一步思考我国复调音乐教学及其学理的研究.站在复调写作技术教学的发展角度,以全国“三次复调学术会议”为基点,在分析的基础上,认为欧美、苏俄复调音乐教学体系的形成总是与它的历史和文化紧密结合,而我国复调音乐课程体系经过几代人的努力,同样形成了自己独特的风格.【期刊名称】《浙江师范大学学报(社会科学版)》【年(卷),期】2013(038)001【总页数】8页(P113-120)【关键词】对位法;赋格;复调音乐;可动对位;塔涅耶夫【作者】赖朝师【作者单位】浙江师范大学音乐学院,浙江金华321004【正文语种】中文【中图分类】J614.2中央音乐学院特聘教授——杨勇,于2010年在该学院学报第1期上发表了《是对位还是复调》的文章(以下简称“杨文”);杨教授认为,我国目前专业音乐教学的“复调音乐”课程名称存在问题,并提出了更名为“对位法”、“赋格”的理由。
同年,该校的龚晓婷教授也在该杂志的第4期发表了题为《中央音乐学院复调教学之我见——中央音乐学院复调教学特色和我的实践体会》的文章,论证了中央音乐学院的复调音乐教学经过几代人的努力,已经建立起自己独特的体系,并通过自己的实践认为这套体系是严密的,是符合我国的教学实际规律的。
2011年,中国音乐学院博士生导师张韵璇教授也在该学报的第2期上发表了《对位与对位教学理念》的文章(以下简称“张文”)。
杨文与张文的观点基本相同,主张将现行的“复调音乐”课程更名为“对位法”或“对位法与赋格”,说这是符合“国际标准”。
POLYPHONIC MUSIC TRANSCRIPTION USING NOTE EVENT MODELINGMatti P.Ryyn¨a nen and Anssi KlapuriInstitute of Signal Processing,Tampere University of TechnologyP.O.Box553,FI-33101Tampere,Finland{matti.ryynanen,anssi.klapuri}@tut.fiABSTRACTThis paper proposes a method for the automatic transcription of real-world music signals,including a variety of musical genres. The method transcribes notes played with pitched musical instru-ments.Percussive sounds,such as drums,may be present but they are not transcribed.Musical notations(i.e.,MIDIfiles)are produced from acoustic stereo inputfiles using probabilistic note event modeling.Note events are described with a hidden Markov model(HMM).The model uses three acoustic features extracted with a multiple fundamental frequency(F0)estimator to calculate the likelihoods of different notes and performs temporal segmen-tation of notes.The transitions between notes are controlled with a musicological model involving musical key estimation and bigram models.Thefinal transcription is obtained by searching for several paths through the note models.Evaluation was carried out with a realistic music ing strict evaluation criteria,39%of all the notes were found(recall)and41%of the transcribed notes were correct(precision).Taken the complexity of the considered transcription task,the results are encouraging.1.INTRODUCTIONTranscription of music refers to the process of generating sym-bolic notations,i.e.,musical transcriptions,for musical perfor-mances.Conventionally,musical transcriptions have been written by hand,which is time-consuming and requires musical educa-tion.In addition to the transcription application itself,the compu-tational transcription methods facilitate automatic music analysis, automatic search and annotation of musical information in large music databases,and interactive music systems.The automatic transcription of real-world music performances is an extremely challenging task.Humans(especially musicians) are able to recognize time-evolving acoustic cues as musical notes and larger musical structures,such as melodies and chords.A mu-sical note is here defined by a discrete note pitch with a specific onset and an offset time.Melodies are consecutive note sequences with organized and recognizable shape whereas chords are combi-nations of simultaneously sounding notes.Transcription systems to date have either considered only lim-ited types of musical genres[1],[2],[3],or attempted only partial transcription[4].For a more complete review of different systems for polyphonic music transcription,see[5].To our knowledge, there exists no system transcribing music performances without setting any restrictions on the instruments,musical genre,maxi-mum polyphony,or the presence of percussive sounds or sound effects in the performances.The proposed method transcribes the pitched notes in music performances without limiting the target material in any of the above-mentioned ways.Goto has previ-Figure1:The block diagram of the transcription method. ously proposed methods for the partial transcription of such com-plex material[4].The proposed transcription method is based on probabilistic modeling of note events and their relationships.The approach was previously applied in monophonic singing transcription[6].Figure 1shows the block diagram of the method.First,both the left and the right channel of an audio recording are processed frame-by-frame with a multiple-F0estimator to obtain several F0s and their related features.The F0estimates are processed by a musicologi-cal model which estimates musical key and chooses between-note transition probabilities.Note events are described with HMMs which allow the calculation of the likelihoods for different note events.Finally,a search algorithmfinds multiple paths through the models to produce transcribed note sequences.We use the RWC(Real World Computing)music genre data-base which consists of stereophonic acoustic recordings sampled at44.1kHz from several musical genres,including popular,rock, dance,jazz,classical,and world music[7].For each recording, the database includes a reference MIDIfile which contains a man-ual annotation of the note events in the acoustic recording.The annotated note events are here referred to as the reference notes. Since there exist slight time deviations between the recordings and the reference notes,all the notes within one referencefile are col-lectively time-scaled to synchronize them with the acoustic signal. In particular,the synchronization can be performed more reliably for the beginning of a recording.Therefore,we use thefirst30 seconds of91acoustic recordings for training and testing our tran-scription system.The MIDI notes for drums,percussive instru-ments,and sound effects are excluded from the set of reference notes.2.MULTIPLE-F0ESTIMATIONThe front-end of the transcription method is a multiple-F0estima-tor proposed in[5],[8].The estimator applies an auditory model where an input signal is passed through a70-channel bandpass filterbank and the subband signals are compressed,half-wave rec-tified,and lowpassfiltered with a frequency response close to1/f. Short-time Fourier transforms are then computed within the bands and the magnitude spectra are summed across channels to obtain a summary spectrum where all the subsequent processing takes place.Periodicity analysis is carried out by simulating a bank of combfilters in the frequency domain.F0s are estimated one at a time,the found sounds are canceled from the mixture,and the estimation is repeated for the residual.In addition,the method performs detection of the onsets of pitched sounds by observing positive changes in the estimated strengths of different F0values.Here we used the estimator to analyze audio signal in92.9ms frames overlapped with11.6ms interval between the beginnings of successive frames.In each frame,the estimator producesfive distinct fundamental frequency values.Both the left and the right channels are independently analyzed from the input stereo signal, resulting in ten F0estimates at each analysis frame t.As an output, the estimator produces four matrices X,S,Y,and D of size10×t max(t max is the number of analysis frames):•F0estimates X and their salience values S.For a F0esti-mate x it=[X]it ,the salience value s it=[S]itroughlyexpresses how prominent x it is in the analysis frame t.•Onsetting F0estimates Y and their onset strengths D.If asound with F0estimate y it=[Y]it sets on in frame t,theonset strength value d it=[D]itis high.The F0values in both X and Y are expressed in units of unrounded MIDI note numbers byMIDI note number=69+12log2…F0440Hz«.(1)Logarithm is taken from the elements of S and D to compress their dynamic range.If an onset strength value d it is small,the onsetting F0estimate y it is random valued.Therefore,those y it values are set to zero for which the onset strength d it is below a fixed threshold.We empirically chose a threshold value of ln3.3.PROBABILISTIC MODELSThe transcription system applies three probabilistic models:a note event HMM,a silence model,and a musicological model.The note HMM uses the output of the multiple-F0estimator to calcu-late likelihoods for different notes,and the silence model corre-sponds to time regions where no notes are sounding.The musico-logical model controls transitions between note HMMs and the si-lence model,analogous to a“language model”in automatic speech recognition.Transcription is done by searching for disjoint paths through the note models and the silence model.3.1.Note event modelNote events are described with a three-state HMM.The note HMM is a state machine where state q i,1≤i≤3,represents the typical values of the features in the i:th temporal segment of note events. The model allocates one note HMM for each MIDI note number n=29,...,94,i.e.,from note F1to B♭6.Given the matrices X,S,Y,and D,the observation vector o n,t∈R3is defined for a note HMM with nominal pitch n at frame t aso n,t=(∆x n,t,s jt,d n,t),(2) where∆x n,t=x jt−n,(3)d n,t= d kt,if|y kt−n|≤10,otherwise,(4)j=arg mini{|x it−n|},1≤i≤10,(5)k=arg mini{|y it−n|},1≤i≤10.(6)The observation vectors thus consist of three features:the pitch difference∆x n,t between the measured F0and the nominal pitch n of the modeled note,the salience s jt,and the onset strength d n,t. For a note HMM n,the nearest F0estimate and its salience val-ues s jt are associated with the note by(3),(5).The onset strength feature is used only if its corresponding F0value is closer than one semitone to the nominal pitch n of the model(4),(6).Other-wise,onset strength feature value d n,t is set to zero,i.e.,no onset measurement is available for the particular note.We use the pitch difference as a feature instead of the absolute F0value so that only one set of note-HMM parameters needs to be trained.In other words,we have a distinct note HMM for each nominal pitch n=29,...,94but they all share the same trained parameters.This can be done since the observation vector itself is tailored to be different for each note model(2).The HMM parame-ters include(i)state-transition probabilities P(q j|q i),i.e.,the con-ditional probability that state q i is followed by state q j,and(ii)the observation likelihood distributions,i.e.,the likelihoods P(o n|q i) that observation o n is emitted by the state q i from note model n.For the time region of a reference note with nominal pitch n, the observation vectors by(2)form an observation sequence for training the acoustic note event model.The observation sequence is accepted for the training only if the median of the absolute pitch differences|∆x n,t|is smaller than one semitone during the ref-erence note.This type of selection is necessary,since for some reference notes there are no reliable F0measurements available in X.The note HMM parameters are then obtained using the Baum-Welch algorithm[9].The observation likelihood distributions are modeled with a four-component Gaussian mixture model.Figure2shows the parameters of the trained note HMM.On top,the HMM states are connected with arrows to show the HMM topology and the state-transition probabilities.Below each state, thefigure shows the observation likelihood distributions for the three features.Thefirst state is interpreted as the attack state where pitch difference has a larger variance,the salience feature gets lower values,and the onset strength has a prominent peak at 1.2,thus indicating note onsets.The second state,here called as the sustain state,includes small variance of the pitch difference feature and large salience values.Thefinal state is a noise state, where F0s with small salience values dominate.The sustain and the noise states are full-connected,thus allowing note HMM to visit the noise state and switch back to the sustain state,if a note event contains noisy regions.3.2.Silence modelWe use a silence model to skip the time regions where no notes are sounding.The silence model is a1-state HMM for which theP I T C H D I F F E R E N C E N O T E E V E N T Figure 2:The parameters of the trained note event HMM.observation likelihood at time t is defined asP (silence )t =1−max n,j{P (o n,t |q j )},(7)i.e.,the observation likelihood for the silence model is the nega-tion of the greatest observation likelihood in any state of any note model at time t .Here,the observation likelihood value P (o n,t |q j )is on linear scale,and its maximum value is unity.If a note state has a large observation likelihood,the observation likelihood for the silence model is small at that time.3.3.Musicological modelThe musicological model controls transitions between note HMMs and the silence model in a manner similar to [6].The musicologi-cal model is based on the fact that some note sequences are more common than others in a certain musical key.A musical key is roughly defined by the basic note scale used in a song.A major key and a minor key are called a relative-key pair if they consist of scales with the same notes (e.g.,the C major and the A minor).The musicological model first finds the most probable relative-key pair using a musical key estimation method proposed and eval-uated in [10].The method produces likelihoods for different major and minor keys from those F0estimates x it (rounded to the near-est MIDI note numbers)for which salience value is larger than a fixed threshold.Based on empirical investigation of the data,we chose a threshold value of ln 2.The most probable relative-key pair is estimated for the whole recording and this key pair is then used to choose transition probabilities between note models and the silence model.The transition probabilities between note HMMs are defined by note bigrams which were estimated from a large database of monophonic melodies,as reported in [6].As a result,given theprevious note and the most probable relative-key pair k ,the note bigram probability P (n t =j |n t −1=i,k )gives a transition prob-ability to move from note i to note j .This means that a possible visit in the silence model is skipped and only the previous note accounts.The musicological model assumes that it is more probable both to start and to end a note sequence with a note which is frequently occurring in the musical key.A silence-to-note tran-sition (if there is no previously visited note,e.g.,in the begin-ning of the piece)corresponds to starting a note sequence and a note-to-silence transition corresponds to ending a note sequence.Krumhansl reported the occurrence distributions of different notes with respect to musical key estimated from a large amount of clas-sical music [11,p.67].The musicological model applies these dis-tributions as probabilities for the note-to-silence and the silence-to-note transitions so that the most probable relative-key pair is taken into account.A transition from the silence model to itself is prohibited.3.4.Finding several note sequencesThe note models and the silence model constitute a network of models.The optimal path through the network can be found us-ing the Token-passing algorithm [12]after calculating the obser-vation likelihoods for note model states and the silence model,and choosing the transition probabilities between different mod-els.The Token-passing algorithm propagates tokens through the network.Each model state contributes to the overall likelihood of a token by the observation likelihood and the transition probabili-ties between the states.When a token is emitted out of a model,a note boundary is recorded.Between the models,the musicological model contributes to the likelihood of the token by considering the previous note,or the silence model.Eventually,the token with the greatest likelihood defines the optimal note sequence.In order to transcribe polyphonic music,we need to find sev-eral paths through the network.We apply the Token-passing al-gorithm iteratively as follows.As long as the desired number of paths has not yet been found and the found paths contain notes and not just silence,(i)find the optimal path with Token-passing,and (ii)prohibit the use of any model (except the silence model)on the found path during the following iterations.After each iteration,recalculate the observation likelihoods for the silence model with (7)by discarding the observation likelihoods for note models on the found paths.As a result,several disjoint note sequences have been transcribed.In the simulations,the maximum number of it-erations was set to 10,meaning that the system can transcribe at most 10simultaneously sounding notes.Figure 3shows a tran-scription example from the beginning of a jazz ballad,including piano,a contrabass,and drums.4.SIMULATION RESULTSThe proposed transcription system was evaluated using a three-fold cross validation.Three evaluation criteria were used:the re-call rate,the precision rate,and mean overlap ratio.Given that c (ref )is the number of reference notes,c (trans )is the total num-ber of transcribed notes,and c (cor )is the number of correctly tran-scribed notes,the rates are defined asrecall =c (cor )c (ref ),precision =c (cor )c (trans ).(8)Figure3:A transcription from the beginning of a jazz ballad,in-cluding piano,a contrabass,and drums.The grey bars indicate the reference notes and the black lines are the transcribed notes.A reference note is correctly transcribed by a note in the transcrip-tion if(i)their MIDI notes are equal,and(ii)the absolute differ-ence between their onset times is smaller than or equal to a given maximum onset intervalδ,and(iii)the transcribed note is not al-ready associated with another reference note.In other words,one transcribed note can transcribe only one reference note.The overlap ratio is defined for an individual correctly tran-scribed note asoverlap ratio=min{offsets}−max{onsets}max{offsets}−min{onsets},(9)where“onsets”refers to the onset times of both the reference and the corresponding transcribed note,and“offsets”accordingly to the offset times.The mean overlap ratio is then calculated as the average of overlap ratios for all the correctly transcribed notes within one transcription.Because of the timing deviations between the recordings and the manually annotated reference notes,δ=150ms was used.Al-though this is a rather large value,it is acceptable in this situation. For example,the reference note50in Fig.3is not correctly tran-scribed due to theδ=150ms criterion.The criteria are otherwise very strict.In addition,the reference notes with colliding pitch and onset(on the average,20%of reference notes)were required to be transcribed,although our transcription system performs no instru-ment recognition and is thus incapable of transcribing such unison notes.The recall rate,the precision rate,and the mean overlap ratio are calculated separately for the transcriptions of each recording. The average over all the transcriptions for each criterion are:recall 39%,precision41%,and the mean overlap ratio40%.5.CONCLUSIONSThis paper described a method for transcribing realistic polyphonic audio.The method was based on the combination of an acous-tic model for note events,a silence model,and a musicological model.The proposed transcription method and the presented eval-uation results give a reliable estimate of the accuracy of state-of-the-art music-transcription systems which address all music types and attempt to recover all the notes in them.The results are very encouraging and serve as a baseline for the further development of transcription systems for realistic music signals.Transcription examples are available athttp://www.cs.tut.fi/sgn/arg/matti/demos/polytrans.html6.REFERENCES[1]K.Kashino,K.Nakadai,T.Kinoshita,and H.Tanaka,“Orga-nization of hierarchial perceptual sounds:Music scene anal-ysis with autonomous processing modules and a quantitative information integration mechanism,”in Proc.International Joint Conference on Artificial Intelligence(IJCAI),vol.1, Aug.1995,pp.158–164.[2]K.D.Martin,“Automatic transcription of simple polyphonicmusic:Robust front end processing,”Massachusetts Insti-tute of Technology Media Laboratory Perceptual Computing Section,Tech.Rep.399,1996.[3]M.Davy and S.J.Godsill,“Bayesian harmonic models formusical signal analysis,”in Seventh Valencia International meeting(Bayesian Statistics7).Oxford University Press, 2002.[4]M.Goto,“A real-time music-scene-description system:Predominant-F0estimation for detecting melody and bass lines in real-world audio signals,”Speech Communication, vol.43,no.4,pp.311–329,2004.[5] A.Klapuri,“Signal processing methods for the automatictranscription of music,”Ph.D.dissertation,Tampere Univer-sity of Technology,2004.[6]M.P.Ryyn¨a nen and A.Klapuri,“Modelling of note eventsfor singing transcription,”in Proc.ISCA Tutorial and Re-search Workshop on Statistical and Perceptual Audio,Oct.2004.[7]M.Goto,H.Hashiguchi,T.Nishimura,and R.Oka,“RWCmusic database:Music genre database and musical instru-ment sound database,”in Proc.4th International Conference on Music Information Retrieval,Oct.2003.[8] A.Klapuri,“A perceptually motivated multiple-F0estima-tion method,”in Proc.IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,Oct.2005. [9]L.R.Rabiner,“A tutorial on hidden markov models andselected applications in speech recognition,”Proc.IEEE, vol.77,no.2,pp.257–289,Feb.1989.[10]T.Viitaniemi,A.Klapuri,and A.Eronen,“A probabilis-tic model for the transcription of single-voice melodies,”in Proc.2003Finnish Signal Processing Symposium,May 2003,pp.59–63.[11] C.Krumhansl,Cognitive Foundations of Musical Pitch.Ox-ford University Press,1990.[12]S.J.Young,N.H.Russell,and J.H.S.Thornton,“Tokenpassing:a simple conceptual model for connected speech recognition systems,”Cambridge University Engineering Department,Tech.Rep.,July1989.。