高级VisualAudio(2006年9月)advanced_visualaudio_transcript
- 格式:pdf
- 大小:106.50 KB
- 文档页数:10
2023-2024-2青竹湖湘一九下二模考试英语注意事项:1、答题前,请考生先将自己的姓名、准考证号填写清楚,并认真核对条形码上的姓名、准考证号、考室和座位号;2、必须在答题卡上答题,在草稿纸、试题卷上答题无效;3、答题时,请考生注意各大题题号后面的答题提示;4、请勿折叠答题卡,保持字体工整、笔迹清晰、卡面清洁;5、答题卡上不得使用涂改液、涂改胶和贴纸;6、本学科试卷中听力材料以中速朗读两遍。
试卷分四个部分,61小题,时量100分钟,满分100分。
I.听力(共两节,满分20分)略II.阅读(共两节,满分30分)第一节(共11小题;每小题2分,满分22分)阅读下列材料,从每题所给的A、B、C三个选项中,选出最佳选项。
21. Judy wants to spend at least four months on volunteer work, so she can join _______.A. Make our Oceans Plastic-FreeB. Solar Energy WorkshopC. Work in a National Park22. What can volunteers in Health & Environment Supporter do in their free time?A. B. C.23. Which of the followings is NOT true according to the chart?A. V olunteers in the National Park need to clean up to park.B. Solar Energy Workshop needs English masters or Spanish beginners.C. V olunteers in Make our Oceans Plastic-Free work in the morning on weekends.BFireworks(烟花) rose into the night sky and spelt the message: “One world,One Family”. Many people still remember the amazing sc ene during the Beijing2022 Winter Olympics. Today, fireworks are becoming increasingly eye-catchingand able to create different shapes and add more fun to festivals and big eventslike the Olympics.People from all over the world like using Chinese fireworks to make the night sky colorful. Most of them are produced in Hunan and Jiangxi provinces. With a long history in China, firework production used to be one of the most dangerous industries, but the use of modern machines and technologies has made it safer and greener.In Liuyang, Hunan, modern machines now work in all of its firework factories. With them, there are fewer factory accidents. They also require much fewer workers and less space than before.The Liuyang government has used an intelligent system at firework factories to help find out risks before an accident happens. In Dayao, a town in Liuyang, every firework factory has intelligent cameras. For safe production, only one worker can get into some areas in the factories, and there will be warning messages if more than one person appears in those areas. A similar system is also used in the firework factories in Wanzai, Jiangxi. It has helped discover a lot of safety problems.▲The city of Liuyang has set up research centers and worked with universities across the country to make fireworks environmentally friendly, such as improving the quality of the fireworks to produce less smoke.As the birthplace of fireworks, Liuyang makes efforts to bring safer and cleaner Chinese fireworks to the whole world.24. What do we know about Chinese fireworks?A. They are popular at home and abroad.B. They are mainly produced in North China.C. It’s more dangerous to produce fireworks now.25. Why do the firework factories in Liuyang use the intelligent system?A. To guide workers.B. To help save money.C. To make the production safe.26. Which of the following can be put into ▲?A. Have you ever watched a firework show?B. Firework production also cause less pollution now.C. Thanks to the intelligent system, no accidents have happened.27. What is the main idea of the passage?A. More and more firework factories appear in China.B. Firework production in China needs fewer workers.C. Firework production becomes safer and greener in China.CA museum can be a place that transports people back to a historic moment, atreasure house for artists, or an escape on a hot summer day. As the museumcraze has grown strong in the past few years, teenagers in China have becomemore interested in the exploring museums, where they can lose themselves inmodern technology as well as culture from the past. According to EducatorMagazine, visits by teenagers to museums each year increased from 220 millionto 290 million between 2017 and 2023. Among the increasing numbers of museum visitors, many of them were accompanied(陪伴) by their families.To draw younger visitors, many museums are creating more activities and products for the purpose of spreading Chinese culture, including artifact(手工艺品) making, night visits, secret adventures, treasure repair, and some interesting courses. At the China Hangzhou Arts & Crafts Museum, for example, teenagers gathered around to make paper umbrellas during the Spring Festival holiday. With white gloves on and little tools in their hands, visitors in Henan Museum got to experience the daily work of archaeologists(考古学家), who often spend the whole day examining the remains of buildings and objects found in the ground.Apart from these teenager-favorite activities, some museums are riding the wave(浪) of going digital. A hall called Popular Agricultural Science for Young Visitors uses both visual and audio(视听) technology to help teenagers to join in agriculture(农业). Besides this offline form, the Palace Museum has also created its website for teenagers. Here, online visitors may go on an unusual adventure at the museum in the style of a comic(漫画) book.For Huang Le, a local museum is her go-to spot whenever she travels to a new city. She said to Educator Magazine: “A museum is like a 3D encyclopedia(百科全书). We should always get our kids to be fully immersed(沉浸) in museums and fall in love with learning through them.”28. What is the writer’s purpose of showing numbers in Paragraph 1?A. To advertise for Chinese museums.B. To encourage more people explore museums.C. To show the museums’ increasing popularity.29. What creative activities can visitors do in the museums?①experience archaeological work ②watch the oil paper umbrella show③learn how to repair the treasure ④visit museums at nightA.①③④B. ①②④C. ②③④30. What does the underlined word “digital” mean in Paragraph 3?A. 现实化B. 数字化C. 农业化31. What’s the best title for the passage?A. How Do Museums Win the Hearts of Teenagers?B. How Do Museums Provide Creative Exploration Online?C. How Do Teenagers Spend Summer Vacation in Museums?第二节(共4小题;每小题2分,满分8分)根据短文内容,从短文后的选项中选出能填入空白处的最佳选项。
2024年软考-中级软考-多媒体应用设计师考试历年真题常考点试题带答案(图片大小可任意调节)第1卷一.单选题(共20题)1.数字音频采样和量化过程所用的主要硬件是:A.数字编码器B.数字解码器C.模拟到数字的转换器(A/D转换器)D.数字到模拟的转换器(D/A转换器)2.下列采集的波形声音质量最好的是:A.单声道、8位量化、22.05khz采样频率B.双声道、8位量化、44.1khz采样频率C.单声道、16位量化、22.05khz采样频率D.双声道、16位量化、44.1khz采样频率3.乐音的音高是由()决定的。
A.声音响度B.谐音结构C.节拍特征D.基音频率4.请根据多媒体的特性判断以下()属于多媒体的范畴。
(1)交互式视频游戏(2)有声图书(3)彩色画报(4)彩色电视A.仅(1)B.(1)(2)C.(1)(2)(3)D.全部A.仅(1)B.(1)(2)C.(1)(2)(3)D.全部6.王某是某公司的软件设计师,每当软件开发完成后均按公司规定编写软件文档,并提交公司存档。
那么该软件文档的著作权()享有。
A.应由公司B.应由公司和王某共同C.应由王某D.除署名权以外,著作权的其他权利由王某modore公司在1985年率先在世界上推出了第一个多媒体计算机系统amiga,其主要功能是:(1)用硬件显示移动数据,允许高速的动画制作;(2)显示同步协处理器;(3)控制25个通道的dma,使cpu 以最小的开销处理盘、声音和视频信息;(4)从28hz震荡器产生系统时钟;(5)为视频ram(vram)和扩展ram卡提供所有的控制信号;(6)为vram和扩展ram卡提供地址。
A.(1)(2)(3)B.(2)(3)(5)C.(4)(5)(6)D.全部8.某项目包含的活动如下表所示,完成整个项目的最短时间为()。
A.16B.17C.18D.199.属于应用安全的是()。
A.机房安全B.入侵检测C.漏洞补丁管理D.数据库安全10.多媒体技术未来发展的方向是:(1)高分辨率,提高显示质量;(2)高速度化,缩短处理时间;(3)简单化,便于操作;(4)智能化,提高信息识别能力。
2024年普通高等学校招生全国统一考试(新课标Ⅰ卷)英语学科姓名________________准考证号________________全卷共12页,满分150分,考试时间120分钟。
考生注意:1.答题前,请务必将自己的姓名、准考证号用黑色字迹的签字笔或钢笔分别填写在试题卷和答题纸规定的位置上。
2.答题时,请按照答题纸上“注意事项”的要求,在答题纸相应的位置上规范作答,在本试题卷上的作答一律无效。
第一部分听力(共两节,满分30分)做题时,先将答案标在试卷上。
录音内容结束后,你将有两分钟的时间将试卷上的答案转涂到答题纸上。
第一节(共5小题;每小题1.5分,满分7.5分)听下面5段对话。
每段对话后有一个小题,从题中所给的A、B、C三个选项中选出最佳选项。
听完每段对话后,你都有10秒钟的时间来回答有关小题和阅读下一小题。
每段对话仅读一遍。
例:How much is the shirt?A.£19.15.B.£9.18.C.£9.15.答案是C。
1.【此处可播放相关音频,请去附件查看】What is Kate doing?A.Boarding a flight.B.Arranging a trip.C.Seeing a friend off.2.【此处可播放相关音频,请去附件查看】What are the speakers talking about?A.A pop star.B.An old song.C.A radio program.3.【此处可播放相关音频,请去附件查看】What will the speakers do today?A.Go to an art show.B.Meet the man's aunt.C.Eat out with Mark.4.【此处可播放相关音频,请去附件查看】What does the man want to do?A.Cancel an order.B.Ask for a receipt.C.Reschedule a delivery.5.【此处可播放相关音频,请去附件查看】When will the next train to Bedford leave?A.At9:45.B.At10:15.C.At11:00.第二节(共15小题;每小题1.5分,满分22.5分)听下面5段对话或独白。
专利名称:Audio visual apparatus发明人:Tanaka, Shigeo,Sato, Masaru 申请号:EP92307823.2申请日:19920827公开号:EP0530024B1公开日:19970528专利内容由知识产权出版社提供摘要:An audio-visual system allows a plurality of audio-visual devices to be interconnected for convenient operation from a single set of function keys. The system includes a first AV device and other connected AV devices connected so as to bidirectionally transfer data. Control data from connected devices are read by a CPU of the first device and written to a RAM memory portion of the first device. When a function key of the first device is operated, a connected device is operated according to the control data read therefrom and stored in the first device. A memory card incorporating control data for the external devices may further be accessed by the first AV device for reading control data for the external devices. A particular function key may control as many functions as there are connected devices in the AV system.申请人:SONY CORP地址:JP国籍:JP代理机构:Nicholls, Michael John更多信息请下载全文后查看。
高二英语科技发展练习题30题(带答案)1. With the development of the _____, we can get information from all over the world in seconds.A. smart phoneB. radioC. televisionD. newspaper答案:A。
解析:smart phone意为智能手机,在现代社会,智能手机的发展让人们能迅速获取世界各地的信息。
radio是收音机,获取信息的速度和范围不如智能手机广泛。
television是电视,虽然能获取信息,但在即时性和信息全面性上不如智能手机。
newspaper是报纸,它的信息更新速度慢,不能在几秒内获取全球信息。
2. The invention of the Internet has _____ our way of communication greatly.A. changedB. keptC. followedD. stopped答案:A。
解析:change有改变的意思,互联网的发明极大地改变了我们的交流方式。
keep表示保持,互联网不是保持原有的交流方式。
follow是跟随,不符合题意。
stop是停止,与互联网对交流方式的影响相悖。
3. Many people rely on _____ apps to order food these days.A. shoppingB. food - orderingC. gameD. music答案:B。
解析:food - ordering是订餐的意思,与题干中order food相呼应。
shopping是购物,与订餐无关。
game是游戏,music是音乐,都不符合订餐这个场景。
4. High - tech products like _____ are becoming more and more popular.A. e - booksB. notebooksC. textbooksD. workbooks答案:A。
A scheme for racquet sports video analysis with the combination ofaudio-visual informationLiyuan Xing a*, Qixiang Ye b, Weigang Zhang c, Qingming Huang a and Hua Yu aa Graduate School of the Chinese Acadamy of Science, Beijing, Chinab Institute of Computing Technology, Chinese Academy of Sciences, Beijing, Chinac School of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaABSTRACTAs a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results. Keywords: Sports video analysis, racquet sports, audio-visual combination.1. INTRODUCTIONSports video is one of the most popular video programs all around the world. People create an increasing amount of digitized sports video day by day. It is necessary to find effective ways for accessing these programs by their content. On the other hand, more and more people access sports video using TV set-top box, PC, PDA, or even mobile phone, so it is important to extract the valuable content in a sports video for saving both the user’s time and downloading costs. Recently, sports video content analysis for automatic semantic indexing and content summarization has been a hot research topic.In the past years, lots of work has been done on sports video analysis and understanding. In 1, Adami et. al gives an overview of the present situation in sports video analysis. We can see that much attention has been paid to soccer and baseball, and both single signal (audio, visual, text) and multi-model have been used for their structure analysis and highlights extraction. As far as racquet sports is considered, only tennis has been mentioned in the review. However, only visual information has been used for recognition of strokes based on HMM (Hidden Markov model) in 2 and detecting fundamental views using supervised learning and domain-specific rules in 3. To the best of our knowledge, several approaches 4,5,6 have been proposed for event detection and structure analysis in tennis video using multi-model information. In 4, visual features which characterize the type of shot view and audio features which describe the audio event within a video shot are merged by an HMM. In 5, the authors use visual features to capture the essence of scene geometry and audio features to identify the sound of the ball hitting. Visual and audio recognition results are combined by a likelihood approach for high level content analysis. In 6, Xu et. al classify video shots into several major classes and among which the classification of the different audio signal segments is performed. Then rules are made on the audio-visual classification result to detect exciting events.As we can see, although there are some works done on racquet sports video, the research is mainly following the work on field sports video (mainly soccer video) and there is still no general scheme for different kinds of racquet sports video * Further author information: send correspondence to Liyuan Xing, Qixiang Ye, Weigang Zhang, Qingming Huang, email: {lyxing, qxye, wgzhang, qmhuang}@; Hua Yu, email: yuh@259 Visual Communications and Image Processing 2005, edited by Shipeng Li,Fernando Pereira, Heung-Yeung Shum, Andrew G. Tescher, Proc. of SPIE Vol. 5960(SPIE, Bellingham, WA, 2005) · 0277-786X/05/$15 · doi: 10.1117/12.631382analysis. In this paper, we try to provide a general solution to analyze the racquet sports video by fully explore its characteristics on audio, visual features and their temporal relationship.The rest of the paper is organized as fallows. In section 2, we outline the framework on racquet sports video analysis based on its audio and video characteristics. The details of supervised audio classification and unsupervised video classification are presented in section 3. Rally detection and ordering are described in section 4. Experimental results are illustrated in section 5 and conclusion of the paper with a discussion of future work is provided in section 6.2. SYSTEM OVERVIEWDifferent from field sports, racquet sports has no distinct exiting event e.g. “shot on goal” event in soccer video. A racquet sports video consists of the best of any odd number of games, and each game is made up of many story units which are called rallies in this paper. Each rally is the period during which the ball is in play, and a rally of which the result is not scored is a let. Unless the rally is a let, a player shall score a point. Play will be continuous throughout a match except that any player is entitled to stop. Rally by rally content makes the racquet sports video have well defined temporal structure both in audio and visual information.Typically the audio track of the racquet sports contains impact, commentator speech, audience cheers, which are mixed with each other or much noise from the situation. The audio information is more related to the state of the racquet sports, for example, the impact occurrence usually indicates the ball is playing, and the cheers or excited commentator speech occurrence often indicates the end of the rally and the happen of some exciting events. Audio is a valuable and robust feature for racquet sports content analysis if we can identify some distinct sound in match. Meanwhile, the broadcast of a racquet sports is produced by a fixed number of cameras at fixed place in the court for its given task. Then, there is much similarity in the scene that made by the same camera. If we can group the video shots of similar content into same cluster, the temporal video structure of a match will be clear. But pure visual feature is not sufficient for parsing the racquet sports structure since it is quite difficult to learn supervised classification model for different kinds of video content.The goals of our work are racquet sports video structure parsing and highlight ordering. The experiments are carried out both in table tennis and tennis. Figure 1 is the proposed system framework. First, a supervised classification method is used to detect important audio symbols such as impact, audience cheers or commentator speech. In the mean time, an unsupervised scene categorization algorithm is employed to group video shots into various clusters which have not been specified with the semantic clusters labels. Then, by taking advantage of domain knowledge that rally scenes have a strong relationship with obvious audio symbols, for example, impact sound always happens in the course of rallies, we identify semantic information of the video scene clusters by audio classification results. After a refinement procedure for removing false rally scenes through further audio analysis, rally and break segmentation could be achieved. Finally, the detected rally scenes are ranked by pre-defined exciting model, and exciting moments such as game (match) points can be correctly retrieved from those ranked rally scenes.xUcFigure 1: System framework260 Proc. of SPIE Vol. 59603. AUDIO AND VIDEO CLASSIFICATIONIn this section, we introduce the details on how to obtain the audio symbols and video clusters by supervised andunsupervised classification, respectively.3.1. Supervised audio classificationGenerally speaking, there are several kinds of representative sound in racquet sports video including cheer, applause,speech, impact, and so on. In real conditions, it is difficult to recognize these sound because of the variance of differentsound effect in different matches. And some sound may be mixed with others in a match, which makes them difficult tobe well classified with a probabilistic method. Furthermore, sometimes the duration of sound is quite different indifferent sports video. We take some measures in the follows for the audio classification task.3.1.1. Feature extraction and selectionAccording to the type of frame-level features on which they are based, four groups of clip-level features are extracted,including both time-domain and frequency-domain features. Short-time energy(STE) based features contain the meanvalue of STE(1), the standard deviation of STE(2), the low STE ratio(3), the high STE ratio(4); Zero-crossing rate(ZCR)based features are consists of the mean value of ZCR(5), the standard deviation of ZCR(6), the low ZCR ratio(7), themean value of different ZCR(8), the standard deviation of different ZCR(9); The only pitch based feature is the meanvalue of pitch(10); And the others are frequency based features, which are the mean value of brightness(11), the standarddeviation of brightness(12), the mean value of bandwidth(13), the standard deviation of bandwidth(14), SpectrumFlux(SF,15), the mean value of sub-band power(16-19), the standard deviation of sub-band power(20-23), the meanvalue of LPCC (24-31), the standard deviation of LPCC (32-39), the mean value of MFCC(40-47), the standarddeviation of MFCC(48-55). Especially, the four sub-bands cover the frequency interval of 0HZ-fs/8HZ, fs/8Hz-fs/4Hz,fs/4Hz-3*fs/8Hz, 3*fs/8Hz-fs/2Hz. All these features are commonly used for audio classification and the detaileddescriptions are in 7 and 8.We extract a total of 55 features from a clip. Although all these features can be used to distinguish audio, some featuresmay contain more information than others. Using only a small set of the most powerful features will reduce the time forfeature extraction and classification. Furthermore, the existing research has shown that when the number of trainingsamples is limited, using a large feature set may decrease the generality of a classifier9. Therefore, feature selection isperformed before they are fed into the classifier. To select the effective features, SVM classifier, which will beintroduced in the next section, is employed to do the classification evaluation. Kernel function for SVM is Radial BasisFunction (RBF) with parameter gamma=1/dim.A forward search algorithm9 is used to perform the feature selection task. Figure 2 and Table 1 are the correspondingexperimental results. And the SVM cross validation is performed with three sets of randomly selected training data andtesting data. Figure 2 shows the performance curve in the feature selection process. We find that with the increase ofdimension of feature, the accuracy increases sharply at first but slightly decreases when the number passes certain value(17 dimensions) which is selected as the feature dimension. The selected features are listed in the second column inTable 1. It can be seen that some of the time-domain and frequency-domain features are selected, which shows that bothtime domain and frequency domain can reflect the signal. None of the Sub-bands features in the frequency based featuresis selected. It means these features have little discriminating power. In the following, training and classification areperformed on the 17 selected features.Table1. Features used for audio classification†Feature set Features selected Number of features Number of features selectedEnergy based features 1,2,4 4 3ZCR based features 6,8,9 5 3Pitch based features 1 045 11Frequency based features 11,15,25,26,34,36,45,46,47,54,55Total1755† The numbers in second column represent the specific features.Proc. of SPIE Vol. 5960 261Figure 2: The feature selection process of 1s clip length3.1.2. Training and classificationWe find that the performance of SVM-based classifier is much better than that of the KNN and GMM in audio classification and segmentation in this task. Compared with other classifiers, SVM is easier to train, needs fewer training samples and has better generalization ability. Besides, since we cannot define the probabilistic distribution functions of the samples, we use SVM classifier in our work. The ability of SVM working on small training samples will reduce the hard labor on training sample marking.In the training and classification process, to determine the best sound clip length for samples we design a procedure based on SVM classification evaluation. In the procedure, sound classification results of different clip length are compared on the curve in Figure 4. The point at which the best classification performance is obtained is selected as sound clip length. For example, in the experiment, one second is the best sound clips length for sound classification in table tennis.3.2. Unsupervised video shot classificationSports video programs are normally captured by a limited number of cameras in certain locations. Most racquet sports videos are composed of a restricted number of typical scenes and video shots in each type of scenes have similar video content. Then, we propose a merging method to cluster video shots of similar video content into same video cluster 10. This method is generic and does not require any domain knowledge about the type of the processed sports video. It performs in an unsupervised manner and relies on the scene similarity analysis of the shots in the video.3.2.1. MergingIn the merging process, each shot is represented by 5 key frames. Color histogram (256 dimensions) in HSV color space is employed as the low level features of these frames. We use the Euclidian distance between two shots (or scenes) to represent their scene similarity. The smaller the distance is, the more similar the two shots (or scenes) are and the higher their scene similarity is. Every time the two most similar shots are merged into a scene. This procedure is repeated until the merging stop criterion is satisfied before the merging process stops to obtain the scene clustering results.3.2.2. Stop criteriaTo determine the number of clusters, the stop criterion is defined based on a J value which is defined according to the Fisher Discriminant Function.∑∑∑∑====−−==N i mean i K c N i c mean c i t K c c w s s s s J J J l c l 0000r r r r (1) 262 Proc. of SPIE Vol. 5960where t J is the total inter-cluster scatter of the initial scene sequence, c w J is the intra-cluster scatter of scene clusterc . N is the total scene number in the initial scene sequence, c N is the shot number of scene cluster c . •represents the Euclidean distance. c i s r (i s r ) denotes shot i in scene cluster c (the initial scene sequence), and c mean s r (mean s r ) denotes the mean feature value of shots in scene c (the initial scene sequence).The value of J represents the total scene cluster scatter, which describes the ratio of intra-cluster scatter to inter-cluster scatter of the scenes in the merging processing. Actually, it is expected that J value is small and the scene number is small. While in real situation, the smaller the scene number is, the large the J value will be. As a tradeoff between J value and the scene number, we choose a point where the sum of the J value and the ratio of the scene number to the total number of scenes in the initial scene sequence in the merging process is the smallest, and take this one as the best merging stop point. Experiments show that the rally scene which we care most is merged well. Figure 3 is a shot clustering demonstration in table tennis video.(a) (b) Figure 3: Video shot classification. (a) Keyframes of shots in real video sequence.(b) Keyframes of shots clusters4. RALLY DETECTION AND RANKINGIn sports video content analysis, the high-level semantics are always defined as the exciting events or highlights. By considering what people care most in the racquet sports, the first thing we think is that they hope to browse the video just like they read a book using the catalog, so it is necessary to structure the racquet sports into rallies. Another thing is that they expect to watch the most exciting parts of the match, so ranking the segmented rallies should be done.4.1. Audio/visual combination for rally-break segmentationWhen we consider parsing the racquet sports structure into rallies, better results can be obtained by combining the audio and visual information. In the previous steps, video shots have been merged into clusters. But we do not know the exact semantic of these video scene clusters. It is probably a selective method that we use the domain color to recognize the rally classes. But this method is not general for all types of racquet sports video. On the other hand, the sound classes detected by audio detection may not be accurate. But the sound classes have strong semantic meaning and the scene clusters have good temporal boundary. In order to solve both the problems utilizing the good aspects, we seek help to the inevitable relationship between audio and visual information.4.1.1. Scene recognition by audio symbolBy observing each of the video scene clusters, we find that there must be a sound class coupling with each video scene cluster very well. For example in table tennis video, the “impact” sound matches best with “red field” scene cluster. Thus we can obtain the meaning of each video scene cluster by the audio symbol coupling best with it. And then we can coarsely segment a sports video into rally-break scenes. The procedure is described as follows.Proc. of SPIE Vol. 5960 263Supposing K sound classes and N video scene clusters are obtained. Let kn m be the number of time slots and belongto the k -th sound class that is distributed in the n -th scene cluster. Suppose that k -th sound class has M timeslots and ki t is within the time domain of the i-th time slot. Then∑=−=Mi ki kn n t f m 0])([δ (2))max(arg 'kn n m n = (3)where )(t f stands for the result of video clustering, and ],...2,1[)(N t f ∈, So 'n is the exact match of k. For example, when the k -th sound class is the Ping-Pong impact, the corresponding 'n -th scene cluster is the rally scene. After we know which the rally cluster is, the coarse rally event can be detected.4.1.2. Refinement by audio symbolBecause some replay or some casually scenes may be similar with the rally scene, it is inevitable that the recall of rally event is high, however the precision may be a little lower. Then audio rules are employed to remove those rally scenes such as replay or some casually scenes. We should make a tradeoff between the precision and the recall of the rally event to make some improvement. Some results of the refined rally event detection are shown in Table 2 and Table 3, section5.3. The heuristic decision rules for rally event detection in the refinement processing are fully described in the following.RULE1: IF there is an impact in the time scope of coarse rallyTHEN judge whether at the end of the coarse rally there is excited speech or cheerIF it is THEN it is a rallyELSE it is notRULE2: IF there is no impact in the time scope of coarse rallyTHEN judge whether at the end of the coarse rally there is excited speech or cheer and whether at thebeginning of the coarse rally there is plain speech or cheerIF both are satisfied THEN it is a rallyELSE it is not4.2. Rally ranking by excitementThe detected rally event can be output as video content summarization. Furthermore, these rallies can be ranked by their exciting degrees. When we consider highlight ordering, the last time of the cheer and the pith of the commentator speech have very strong relationship with excitement. Long time cheer and high pitch are the natural response of the audience and commentator immediately after highlight. Also the duration of a rally is an important factor when we enjoy the match. It seems people prefer to watching the long time confront according to our observation. By this analysis, we can define the exciting degree rally G of a rally by a linear modelspeech speech cheer cheer duration duration rally P w E w T w G ⋅+⋅+⋅= (4)where w is the importance weight, duration T represents the duration of a rally, cheer E represents the relative energy of thecheer clips after the rally, speech P is the pitch of speech after a rally. The higher the rally G value is, the more exciting thecorresponding rally will be.5. EXPERIMENT RESULTSThe database used in our experiment is composed of 10536 seconds table tennis videos which are from totally 11 different living broadcast programs of Athens Olympic 2004, and 7761 seconds tennis videos which are from totally 4 different living broadcast programs of Wimbledon 2004. These videos are compressed by MPEG-2, digitized at 25 frames/s (PAL) and have a resolution of 352×288. The audio signal is sampled at 48000Hz and 16 bits/sample. The audio stream is segmented into clips that are experimented in different time length with no overlapping with the previous ones. Each clip is then divided into frames that are 512 samples long and are shifted by 256 samples from the previous frames. The clip is used as the classification unit. In table tennis, clips of 957 seconds that comes from 4 different videos 264 Proc. of SPIE Vol. 5960are used for audio training, and clips of 9579 seconds that comes from other 7 different videos are used for testing. In tennis, clips of 567 seconds that comes from 1 video is used for audio training, and clips of 7194 seconds that comes from other 3 different videos are used for testing.5.1. Performance of different clip length for SVM classificationIn order to determine the length of the clip for our feature analysis, the testing clip length is from 0.01s to 1.5s with different interval. It is because 0.01s is already corresponding to the samples of a frame, so it is the minimum, and as there is no much audio such as impact lasting for a long time, we find 1.5s as the maximum is proper. In the experiment, 0.2s, 0.5s, 1s and 1.5s are selected for training clip length to see the trend of performance with increasing training clip length.The results of different aspects are illustrated in Figure 4. It can be seen that in (c) there is no much difference, but in (a) and (b) they are in the contrary sequence. It is reasonable that when the precision is higher, the recall may be lower. In the following process of combining, the precision of the impact is more important. So in (a), the highest one that is 1s should be selected as the training clip length. The 1s line reaches the maximal value when the testing clip length is 1s, so both the training and testing clip length with 1s is the best.(a) (b)(c)Figure 4: (a) the precision of the impact (b) the recall of the impact (c) the total precision of the total five classes5.2. Rally-break segmentationSeven different table tennis videos with the total time of 9579 seconds are used for rally testing. The results are listed in Table 2. It can be seen that the rally event detected only by audio classification results is not satisfactory. This is because the impact sometimes is not easy to detect when it is interfered by the commentator speech or other sound. However, there is much improvement after audio/visual combination is applied, with the precision and recall reaching 88.93% and 98.01%, respectively. After refinement, the recall decreases 3.48%, but the precision increases 4.03%.The rally detection results of 3 different tennis videos with the total time of 7194 seconds are listed in Table 3. It can be seen that the precision of rally event by coarse combination is a little low, because in tennis some scenes with field for player medium view are very similar with rally scene. So the refinement after combination is more important, and when the recall decreases 3.9%, the precision increases 13.1%.Proc. of SPIE Vol. 5960 265Table 2: Rally event detection results in table tennis by audio only, audio/visual combination and refinement==========================================Table 3: Rally event detection results in tennis by audio only, audio/visual combination and refinement==================5.3. Rally rankingSince there is no absolute ground truth for rallies exciting ranking, subjective method is employed to evaluate the performance of the proposed ranking model. We think it is appropriate that the game (match) points are ranked by the linear model in the top 20% of the whole match, and the other rallies are in the order which is also acceptable.6. CONCLUSION AND FUTURE WORK In this paper, we propose a general scheme for racquet sports video content analysis including rallies detection and ranking. The general approaches on supervised audio classification and unsupervised video classification make the method work well on different racquet sports video. The proposed rally segmentation method and ranking model is more reasonable than traditional sports video analysis method by fully exploring the characteristics of racquet sports video. In future work, more sophisticated audio/visual information combination model will be developed and more reasonable rally ranking method can be explored. ACKNOWLEDGEMENTThis work is partly supported by NEC Research China on “Context-based Multimedia Analysis and Retrieval Program” and “Science 100 Plan” of Chinese Academy of Sciences. This work is a part of “SPISES: SPorts Video Summarization and Enrichment System” at “/en/project/spises/SPISES.htm”.REFERENCES1. N. Adami, R. Leonardi, P. Migliorati, “An overview of multi-modal techniques for the characterization of sportprogrammes,” in Proc. SPIE - VCIP’03, pp. 1296-1306, 8-11 July, Lugano, Switzerland, 2003.2. M. Petkovic, Z. Zivkovic, W. Jonker, “Recognizing strokes in tennis videos using hidden markov models,” in Proc.IASTED Int. Conf. Visualization, Imaging and Image Processing , Spain, 2001.3. D. Zhong and S.-F. Chang, “Structure analysis of sports video using domain models,” IEEE Conference onMultimedia and Exhibition , Tokyo, Japan, Aug. 22-25, 2001.4. E. Kijak, G. Gravier, P. Gros, L. Oisel, F. Bimbot, “HMM based structuring of tennis videos using visual and audiocues”, in Proc. Intl. Conf. on Multimedia and Exhibition , 2003.5. R. Dayhot, A. Kokaram, and N. Rea, “Joint audio-visual retrieval for tennis broadcasts,” in Proc. ICASSP , 2003. 266 Proc. of SPIE Vol. 59606. M. Xu , L-Y. Duan, C.-S. Xu, Q. Tian, “A fusion scheme of visual and auditory modalities for event detection insports video,” in Proc. ICASSP, 2003.7. Lie Lu, Hong-Jiang Zhang, Stan Z. Li, “Content-based audio classification and segmentation by using supportvector machines,” Multimedia Systems 8: 482–492 , 2003.8. Y. Wang, Z. Liu, and J. Huang, “Multimedia content analysis using both audio and visual clues,” IEEE SignalProcessing Magazine, 17(6):12--36, 2000.9. A.K. Jain, “Statistical pattern recognition: a review,” IEEE Trans. PAMI, 2, 4-37, 2001.10. Weigang Zhang, Qixiang Ye, Liyuan Xing. Qingming Huang and Wen Gao, “Unsupervised sports video sceneclustering and its applications to story units detection,” in Proc. SPIE - VCIP’05, July, Beijing, China, 2005.Proc. of SPIE Vol. 5960 267。
《日语高级视听》课程教学大纲一、教师或教学团队信息二、课程基本信息课程名称(中文):高级视听课程名称(英文):Advanced Audio-visual Japanese课程类别:□通识必修课□通识选修课☑专业必修课□专业方向课□专业拓展课□实践性环节课程性质*:□学术知识性☑方法技能性□研究探索性□实践体验性课程代码:周学时:2 总学时:32 学分: 2先修课程:基础日语、综合日语、日语视听(1)(2)授课对象:日语专业四年级学生三、课程简介本课程为日语专业中高级阶段的主干课程。
着重培养学生听与讲的综合能力。
授课采用难度较高的听力录音教材,适当采用VCD和电视录像等手段,要求并培养学生尽快掌握与现实生活有关的日语听解能力、日语思维和表达能力。
四、课程目标本课程旨在让学生掌握较高的日语听力理解能力,能够应用所掌握的日语语音词汇语法等知识和听力理解技能,听懂日语能力测试1级和日语专业8级要求的难度的日语会话和独白段落,解答问题基本正确,为将来更有效地从事日语翻译等工作打下基础。
五、教学内容与进度安排*四年级第一学期大致教学进度六、修读要求要求学生利用课余时间,尽量多听日语广播、新闻等,或通过看影视作品、动漫等,以提高听力水平。
七、学习评价方案考试为任课教师出题、闭卷考试。
最终成绩为:平时成绩30%+期末成绩70%八、课程资源1.杜勤,《日本语听力》(第二版)(第二册),华东师范大学出版社,2007年8月2.侯仁锋,《日本语听力》(第二版)(第三册),华东师范大学出版社,2007年3.西藤洋一,《日语能力考试1级试题集2004-2000年》,学林出版社,2005年4.西藤洋一,《日语能力考试1级试题集2007-2005年》,学林出版社,2008年5.日本语检定协会 J.TEST事务局,《J.TEST实用日本语检定考试真题集》(A-D级),北京语言大学出版社,2006-20126.皮细庚,《日语专业八级考试详解》,上海外语教育出版社,2006年7.徐文智,《全新日语专业八级考试必备指南》,大连理工大学出版社,2010年8.陆留娣,《日本语听力》(第三册),华东师范大学出版社出版,2007年7月9.王源、(日)蛯原正子,《日语专业八级考试综合辅导与强化训练》,外语教学与研究出版社,2012年8月10.杨诎人、谭晶华,《高等院校日语专业八级考试试题与分析(2002-2011》,高等教育出版社,2012年8月11.《日语新干线》由日本阿罗可出版社和外语教学与研究出版社合编12.《日本语能力测试1.2级习题集》听力部分外研社等13.《新大学日语听力与会话》(2)侯仁锋编,高等教育出版社历年专业八极考试听力部分及其他最新影视资料等九、其他需要说明的事宜本课程必须和其他基础课相配合。
B lackfin O nline L earning & D evelopmentPresentation Title: Advanced VisualAudio®Presenter Name: Paul BeckmannChapter 1: IntroductionSubchapter 1a: OverviewChapter 2: Advanced FeaturesSub-chapter 2a: Module VariablesSub-chapter 2b: Expression LanguageSub-chapter 2c: PresetsChapter 3: The External InterfaceSub-chapter 3a: The External InterfaceSub-chapter 3b: MATLAB InterfaceSub-chapter 3c: Regression TestingChapter 4: Custom Audio ModulesSub-chapter 4a: Standard vs. CustomSub-chapter 4b: Module ComponentsSub-chapter 4c: Render FunctionSub-chapter 4d: Class StructureSub-chapter 4e: XML FileChapter 5: ConclusionSub-chapter 5a: SummarySub-chapter 5b: Additional InformationChapter 1: IntroductionSubchapter 1a: OverviewHello and welcome. My name is Paul Beckmann. I’m an engineer with Analog Devices, and today I’m going to be talking about advanced features of VisualAudio.This presentation provides training on the advanced features of VisualAudio. This is a tool for accelerating the development of audio products. Examples and demonstrations today will be based upon the Blackfin 533 EZ-KIT, although VisualAudio supports a number of Blackfin and SHARC processors. You will learn about advanced tool features such as high and low-level variables, the expression language, and using presets. You’ll learn how to use the external interface to control VisualAudio from other applications such as MATLAB and you will also learnthe basics of writing audio modules. The target audience for this presentation is audio algorithm developers. You should be comfortable writing C code and have some familiarity with the Blackfin processors and the VisualDSP++ development environment.The outline is as follows: First of all we’re going to talk about advanced features of the VisualAudio Designer application. This includes high- and low-level parameters, the expression language and presets. We’re going to talk about using the external interface with particular emphasis on MATLAB, then we’re going to talk about writing custom audio modules, and then we’ll finally conclude.Chapter 2: Advanced FeaturesSub-chapter 2a: Module VariablesLet’s start with the advanced features of the VisualAudio Designer application. First of all, there are high-level and low-level module variables. The high-level variables are the ones which appear on a module’s inspectors. We call that an inspector, is an interface for adjusting the parameters of an audio module. What’s shown here is the inspector for a bass tone control. And you can adjust the smoothing time, the gain in DB, and also the frequency.Now, shown on the inspector, the three variables, those are the high-level variables. If you look more closely at the data structure associated with the audio module, you’ll see that there are a number of other parameters. These are the low-level ones, or also called the render variables, and these appear within the module’s data structures. So on the one hand we have the high-level variables shown on the inspector, and then we have the low-level variables which are in the data structure on the DSP.Sub-chapter 2b: Expression LanguageConverting between the high-level and the low-level parameters is the expression language. Expression language is a simple scripting language that takes the high-level parameters, does some basic mathematical operations on them and uses those results to set the low-level parameters. Expression language is typically used for things like converting from a smoothing time in milliseconds, to a smoothing coefficient, converting from dB units to linear units, or possibly to convert a balance control, a single control into two separate gains. So again, there’s the inspector, which contains the high-level variables, goes through the expression language and then gets turned into low-level parameters in the audio module structures. In some cases, you could, in fact, have high-level parameters that map directly to low-level parameters without any expression language in between.Sub-chapter 2c: PresetsAnother mechanism within VisualAudio is what’s referred to as presets. Presets are a convenient mechanism for managing audio-module parameter sets. Steps in using a preset are as follows: First of all, you tune your system to a desired state using either the inspectors or the external interface. So shown here is one of the inspectors. Shown here is an external interface designed using MATLAB. So you tune up the system the way you want.The next thing you do is tell VisualAudio to capture the preset. It will display a list of all the audio modules that you have in your layout. You can click on the checkboxes to select those, and then you name the preset and everything that gets selected here gets saved with this preset.What you can do then after you’ve created a few presets, is you can select the presets from the tool. There’s a drop list here that gives you a list of all the presets you’ve created, and you just select one and release the mouse button, and that sends all the audio module parameters associated with those presets down to the DSP. What you can also do, is after you’ve created some presets, maybe you’ve determined the ones that sound right, you can also select presets and optionally compile them with the executable. So presets live both in the PC application itself (you can have a large number of presets) and then you can select the presets you want to compile down onto the DSP. Once they’re compiled on the DSP, you can enable those presets from the DSP code.Some other things about presets. Presets are written in Intel hex format, so we use a text file format for that. You can store them on the host and download them to the DSP, or you could download or store them on the DSP itself. Typical uses for presets are, for example, dealing with multiple sample rates, so your system may have to operate at, for example, 32 kHz, 44.1 and 48 kHz, so you’d create a preset for each of the sample rates that your system needs to operate on. You might use them for preserving default EQ settings, or even to make A/B comparisons between different parameter sets in order to fine tune the audio performance of a system.Chapter 3: The External InterfaceSub-chapter 3a: The External InterfaceNow, let’s talk about the external interface. The external interface works both in design mode and tuning mode. Recall that design mode is where you instantiate audio modules, wire them together, and set audio module parameters. Tuning mode, on the other hand, is when the executable is running on the target hardware and the changes you make on the application get sent down to the DSP in real-time. The external interface works both in both design mode and tuning mode.When in design mode, the changes happen to the audio module data structures residing on the PC. And in tuning mode, the changes happen to the audio module data structures on the PC and they’re also sent to the DSP in real-time. Capabilities of the external interface include manipulating audio module parameters so you can get and set audio module parameters on the external interface. There’s basic control of the system, such as loading, saving, building, capturing presets and so forth. There’s also advanced control, such as instantiating audio modules, naming them and wiring them together. And there’s also a mode that enables you to exchange audio data with the target processor. This process is very useful for regression testing. Audio is exchanged block-by-block between the PC and the running DSP. It happens in non-real-time and the speed of the data exchange is determined by the speed of the tuning interface. The external interface is implemented as a local COM server, housed in an .EXE. And it’s accessible by any COM-compliant language: C, C++, Excel, VisualBasic and so forth. A total of 53 APIs and the program ID is VisualAudio Designer.Typical uses of the external interface are creating custom audio module design functions. You might have some detailed filter design calculations. You could also use it for creating custom GUIs, so maybe you want a simple control panel that controls multiple module parameters, or you may want to provide full or restricted access to certain audio module parameters. That can be done through the external interface.You can also leverage existing design tools and methodologies that you might have. So insteadof starting over with VisualAudio, you can use your existing tools and interface them to VisualAudio using external interface. External interface is also useful for automating system design and tuning and also for regression testing of audio modules and systems. So, for example, you could design an audio module, test it using external interface to make sure it’s working properly, embed that into a larger system and then test the larger system using the regression testing capabilities of the external interface.The expression language is included in the external interface. For example, here you have a COM-aware application, it interfaces via the external interface, and the changes you make, in fact, flow through the expression language. So you can access the high-level variables, and there’s also another way to access the low-level render variables directly on the DSP.When you make a change to a high-level variable, if there’s an expression associated with it, the expression language is invoked and the low-level render variables are updated as well. The direct access to low-level render variables is also very useful. For example, you could reset state variables on a filter and so forth. And that allows you to manipulate the DSP variables directly.Sub-chapter 3b: MATLAB InterfaceWhat we’ve also done is that we know that a number of customers use MATLAB as a preferred design environment. So what we’ve done is we’ve created a special layer that simplifies usage of VisualAudio with MATLAB. When you use this, each audio module appears as a MATLAB object, and objects can be manipulated as if they were MATLAB structures. Again, using this approach, the expression language is included as well when you make changes through MATLAB.Let me give you some examples here. Here’s the audio processing layout that’s a simple stereo system. There’s tone controls bass control, treble control, there’s a volume control with built in loudness compensation and then these three modules here form a peak limiter. So the command here, “va_module”, and the first argument you give it the instance name of the module. In this case, it’s VolumeFletcherMunson_S1. That’s the name of this audio module. When you execute this command, it queries VisualAudio for information regarding this audio module, the variables, what data types, sizes and so forth. And then it generates and returns a MATLAB object that represents the volume control.Let me demonstrate this in MATLAB. So I’m going to start and I’m going to query VisualAudio for a single audio module. So we’re going to query the volume control. The name of the audio module is VolumeFletcherMunson_S1. What this command does is it creates a MATLAB object with the same name as the volume control.Let’s take a look at the fields of this data structure. The data structure has four fields: the smoothing time gain, low frequency, and low Q. And if I switch back to VisualAudio, and open up the inspector for the volume control, you’ll see that there’s a one-on-one correspondence between the elements shown on the inspector, and the fields of that data structure. What I cando then is I can treat this object or data structure simply as if it were a MATLAB structure and get and set module variables. For example, I’m going to set the gain of the volume control from 0 dB, that’s what it’s set to right now. It’s the FM gain field. I’m going to set it from 0 dB to -20 dB. It changes the value, and if I switch back VisualAudio, you’ll see that the gain here has been set to -20 dB.What’s really handy about this is it’s good for configuring the parameters of audio modules in a script and doing this in a repeatable fashion. So, in fact, I can access any of the parameters here.I can get and set them. I can use the full capabilities of MATLAB to design filters and so forth.What’s also nice about this is MATLAB also continues to work in tuning mode. Let me demonstrate that for you. I’ll switch back to VisualAudio. I’m going to generate code for this layout. I’m going to go to VisualDSP++. I have the associated project for this platform loaded, and I’m going to halt the processor, and I’m going to build this executable. So VisualDSP++ is now compiling, linking the project. It’s going to download it to the DSP and we’re going to get thisrunning in real-time now. I’m going to switch back to VisualAudio and go into tuning mode. So now in tuning mode, any changes that I make are going to happen in real-time on the DSP as well.Let’s switch back to MATLAB and I’m going to start the music here. You can hear the music here. Let’s see what the volume control is at -20 dB. I’m going to set this to -10 dB, so it’s going to get louder. So all the way to full volume 0 dB and then back down to, going to mute it, go to a -100 dB. So you can see all the changes that happen in MATLAB also happen in real-time on the DSP.So in this example, I got the high-level module parameters of this volume control. VisualAudio also provides the ability to get to the low-level render variables through the external interface. I’m going to reissue this command, and I’m going to give it a second argument. Okay, that says either high level or low level mode. By default it gives me high-level mode. I’m going to request it to do low-level mode.And let’s take a look at what this data structure looks like now. Instead of having the four fields shown on the high-level inspector, what we have now is all of the variables from the low-level data structure on the DSP. So you can see that the four high-level variables map to about ten different DSP variables. You’ll also see that there are these state variables, and you’ll have to refer to the volume control documentation to understand exactly what each of these parameters does.But what I want to point out is that these state variables are, in fact, being grabbed from the DSP in real-time as it’s running. So if I query that module structure again, you’ll see that the state variables are changing, and they’re changing, in fact, in response to the audio as it’s being processed. So MATLAB provides you access to the high-level inspector variables, and also tothe low-level render variables.Sub-chapter 3c: Regression TestingAnd let me go back to my presentation. VisualAudio also provides regression testing capabilities. What happens is you place a platform in demand render mode. At that point, an external application generates data, passes the data via the external interface block by block through the tuning interface. Basically, you specify the audio inputs, the data is passed through the audio processing layout, and then the audio outputs are returned to the external application. And the external application analyzes the data for correctness. All the regression testing capabilities are also available through MATLAB and we provide MATLAB examples on how to do this.This is what it looks like to use the testing API through MATLAB. First of all you issued a command, va_demandrender and you tell it to begin. When you issue that command, it halts thereal-time flow of audio and the platform waits for you to send data from the PC down to the DSP. What you do next is you issue the va_demandrender(‘process’) command, and you give it an input argument which is the data in. DATA_IN is a matrix of the input data. There’s one column per input channel. And the size of it is TickSize, so it’s the size of the block times the number of inputs.This data is sent down to the DSP. The DSP processes it and then returns a matrix of output data. Again, there’s one output channel per column of the matrix. And you can repeat this for multiple blocks or large blocks. When you’re finally done with demand render mode, you issue – va_demandrender(‘end’); – and that resumes the real-time processing.Here’s an example using tone control. First of all, you place all the modules into bypass mode except the treble tone control. In bypass mode, audio flows through a module unchanged. What you do next is you generate input data in MATLAB and the example here is I’m generating a logarithmic chirp starting from 20 Hz to 24 kHz. Next thing I do is I start demandrender mode. I pass the data through the DSP, and then I halt demandrender mode. So with this setup, the only module which should be affecting the audio is the treble tone control. And in the example here, I have the treble tone control set to 6 dB of gain. So this is what the output of the tone control looks like.For low frequencies, we have a gain of 1 or 0 dB, so it doesn’t affect the low frequencies. And as we go higher and higher in frequency, at some point the treble tone control kicks in and it starts boosting the high frequencies. And it boosts it all the way up to a total of 6 dB or a factor of two. So using this method, you can essentially verify the operation of a single audio module or the operation of an overall audio layout.Chapter 4: Custom Audio ModulesSub-chapter 4a: Standard vs. CustomNext we’re going to look at writing custom audio modules within VisualAudio. We have standard audio modules and custom audio modules. Standard audio modules are supplied with VisualAudio with separate libraries for SHARC and Blackfin. Custom modules, on the other hand, are written by the user. Standard and custom modules appear on separate tabs within the module palette. Here I’m showing the standard modules. There’s another tab here for custom audio modules. And, in fact, the only distinction between standard and custom modules is which tab it appears on. There’s no limitation or cost overhead associated with writing custom modules. We also provide source code for all the standard modules. What’s nice is that this source code serves as a starting point for writing your own customer audio modules.Sub-chapter 4b: Module ComponentsThere are three components of an audio module. There’s a header file that contains a module’s run-time interface and the description of the associated data structure. There’s a module’s run-time function, or we use the term “render function”. This can be written in C, in Assembly code, or provided as object or as a library. Lastly, there’s an XML file that describes a module in detail to VisualAudio. It contains, for example, elements of its data structure, it describes the inspector interface, containing both the high- and low-level variables, and any expressions. And there’s also memory allocation rules – should an array be allocated in internal memory, external memory, and so forth.Let’s look at the instance data structure. Each instance of an audio module has an associated C data structure. All data structures start with the same set of fields. These fields contain elements common to all audio modules. And it, for example, describes the base class of the audio object. After the common header, this is followed by module-specific fields. For example, this is what you’d find in the header file for the ScalerSmoothed control. This is a smoothly varying single input, single output gain control. What you have here is you have the common header, followed by the specific parameters for this module. There’s ampSmoothing, oneOverTickSize, and so forth. These are 32-bit fractional values. This is a 16-bit fractional value. Finally, there’s a typedef name. So this defines the type for this ScalerSmoothed.Sub-chapter 4c: Render FunctionNext you have an associated render function. Again, they can be written in C or Assembly code and most of the modules provided with VisualAudio are written in Assembly code. The example here is in C code just for readability, although the actual code provided with VisualAudio is in Assembly. So let’s start out with the function arguments.Each audio module takes three input arguments. First of all, there’s a pointer to the instance data structure. There’s an array of buffer pointers. And there’s an integer which specifics the TickSize: how many samples this processing function is receiving as its input. What’s happening here in the code is I’m pulling out some of the instance variables from the data structure into local variables. I then get a pointer to the input data and a pointer to the output data. What you need to remember is the input and output buffers, you get a single array of buffer pointers. The buffers are ordered as input buffers followed by output buffers, followed by any scratch buffers. In this module, there’s only a single input buffer, one output buffer and no scratch buffers.So, the first buffer pointer gives you the input, second one gives you the output data. What you do then you apply the processing, and in this case the smoothing control updates using a first order filter on a block-by-block basis. And then internally within each block, it applies the gain difference as a linear amp. The function calls you see here are also intrinsic operations on the Blackfin which do fractional arithmetic. And here it loops over an entire block to each processTickSize elements performing the processing, taking the input, processing it, and deriving the output. This is, in fact, how all audio modules are written. There’s three input arguments: a pointer to the instance structure, an array of pointers, and the TickSize.Sub-chapter 4d: Class StructureThere’s also a class structure. So in addition to an instance structure, all audio modules of the same type share a single class structure. For example, if I had multiple volume controls withinmy system, each volume control would have its own instance structure, and then there would be a single class structure. The class structure describes the behavior of the audio module to VisualAudio’s run-time interface. It contains, for example, the number of inputs and outputs, specifies is it a mono input or a stereo input, and so forth.In this example, it says it’s a mono input and a mono output, gives the name of the render function, and also discusses the bypass behavior. In VisualAudio you can bypass a module, which case the inputs are copied directly to the outputs. This class structure is typically declared within the module’s C file. If the module’s render function is written in Assembly, you would have two files. There would be an Assembly file containing the render function, and a C file with the class structure declaration.Take a look at the figure down here. In this case, I have three instances of the ScalerSmoothed. Each of those has a separate instance structure and they all point to a single class structure for the ScalerSmoothed. And if I had delays in the system, there’s two instances of delay, there’d be separate instance structures, and they would both point to a separate delay class structure. So that’s another part of an audio module.This is what the class structure looks like for the particular ScalerSmoothed that we’ve been using. Again, there’s a pointer to the render function. There’s a field here that says to use the default bypass behavior. You can also specify a custom bypass function here. And then there’s a descriptor for the inputs and outputs. And this refers to a single input and it’s mono, a single output, and it’s mono.Sub-chapter 4e: XML FileAnd lastly, there’s audio module XML. This XML file describes the audio module to the VisualAudio Designer application. It contains, for example, the name of the audio module, where it should appear within the tree view in the module palette. It describes the input and output pins, what data types they are, how many pins it has, lists which processors the module is compatible with. For example, is it a SHARC, is it a Blackfin? If it’s a Blackfin, which Blackfin processors are supported. It describes the instance data structure, what are the fields of the data structure. Listsout the high-level variables, and any expressions. It talks about memory allocation rules, and also finally, a few other miscellaneous usage rules.Chapter 5: ConclusionSub-chapter 5a: SummarySo this concludes the presentation on VisualAudio’s advanced design features. These design features simplify the development of advanced audio features. We discussed high- and low-level variables, we discussed the expression language, talked about how to use presets. There’s also an open API that allows VisualAudio’s capabilities to be extended by interfacing to external COM-compliant applications. There’s a special layer for simplifying use with MATLAB, and finally you can also write custom audio modules.Sub-chapter 5b: Additional InformationYou can get more information on VisualAudio from a number of places. First of all, a free download is available at the VisualAudio product page. This includes a VisualAudio Designer application, EZ Kit platforms, and audio module libraries for the SHARC and Blackfin. The download also includes full product help, and that’s another good source of information.You can get additional examples and tutorials at the VisualAudio Developers website, shown here. You can email specific technical questions to the VisualAudio Support email address, or you can click the “Ask A Question” button. This concludes my presentation on VisualAudio advanced features. I’d like to thank you for your time and attention.。