Interconnect tuning strategies for high-performance ICs
- 格式:pdf
- 大小:101.00 KB
- 文档页数:8
P3701 | NVIDIA JETSON AGX ORIN SERIESDATASHEETDiscover the most powerful AI computer for energy-efficient autonomous machines.NVIDIA ® Jetson AGX Orin ™ series modules deliver up to 275 TOPS of AI performance with power configurable between 15W and 60W. This gives you more than 8X theperformance of Jetson AGX Xavier ™ in the same compact form-factor for robotics and other autonomous machine use cases.These system-on-modules support multiple concurrent AI application pipelines with an NVIDIA Ampere architecture GPU, next-generation deep learning and visionaccelerators, high-speed IO, and fast memory bandwidth. Now, you can develop solutions using your largest and most complex AI models to solve problems such as natural language understanding, 3D perception, and multi-sensor fusion.Jetson runs the NVIDIA AI software stack, and use case-specific application frameworks are available, including NVIDIA Isaac ™ for robotics, DeepStream for vision AI, and Riva for conversational AI. You can also save significant time with NVIDIA Omniverse ™ Replicator for synthetic data generation (SDG), and with NVIDIA TAO Toolkit for fine-tuning pretrained AI models from the NGC ™ catalog.Jetson ecosystem partners offer additional AI and system software, developer tools, and custom software development. They can also help with cameras and other sensors, as well as carrier boards and design services for your product.Jetson Orin modules are unmatched in performance and efficiency for robots and other autonomous machines, and they give you the flexibility to create the next generation of AI solutions with the latest NVIDIA GPU technology. Together with the world-standard NVIDIA AI software stack and an ecosystem of services and products, your road to market has never been faster.Jetson AGX Orin 32GB>1792-core NVIDIA Ampere architecture GPU with 56 tensor cores >2x NVDLA v2.0>8-core Arm® Cortex®-A78AE v8.2 64-bit CPU>32GB 256-bit LPDDR5 >64GB eMMC 5.1 >PVA v2.0Power>Voltage input 5V, 7V-20V >Module Power: 15W - 40WKey FeaturesJetson AGX Orin 64GB>2048-core NVIDIA Ampere architecture GPU with 64 tensor cores >2x NVDLA v2.0>12-core Arm® Cortex®-A78AE v8.2 64-bit CPU>64GB 256-bit LPDDR5 >64GB eMMC 5.1 >PVA v2.0Power>Voltage input 5V, 7V-20V >Module Power: 15W - 60WNVIDIA JETSON AGX ORIN SERIES MODULES TECHNICAL SPECIFICATIONS* Virtual channel-related camera information for Jetson AGX Orin is not final and subject to change.Refer to the Software Features section of the latest NVIDIA Jetson Linux Developer Guide for a list of supportedfeatures.。
A SINGLE PC SOLUTION FOR RAPID CONTROL PROTOTYPING IN WINDOWS ®.QUARC generates real-time code directly from Simulink®-designed controllers and runs the generated code in real-time on the Windows® target - all on the same PC. The Data Acquisition Card seamlessly interfaces with Simulink® using Hardware-in-the-loop blocks provided in the QUARC T argets Library.SPLIT SECOND CONTROL DESIGN – A DECADE IN ThE MAkINGQUARC was built on the legacy of WinCon, the first real-time software to run Simulink®-generated code in Windows®. QUARC seamlessly integrates with Simulink® and redefines the traditional design-to-implementation interface toolset. Just click a button to enjoy more functionality and development flexibility, all geared towards improved real-time performance. Academics havesuccessfully deployed many advanced control and mechatronic systems, ranging from intelligent unmanned systems to force-feedback-enabled virtual reality.ADVANCEDINDUSTRIAL R&DACADEMIA INDUSTRYFOUR USES OF QUARCCONTROLS EDUCATION INNOVATIVERESEARCH GRADUATE-LEVEL EXPLORATION Enhance your engineering courses with industry- relevant hands-on learning Explore practical solutions for real-life challenges with a synergistic approachConduct ground-breaking research in emerging areas such as Unmanned Vehicle Systems and hapticsFast track time-to-market with an affordable rapid control prototyping solutionChoosing software for control system design andimplementation is critical for timely, successful research and development. Quanser knows this because we’ve pioneered control engineering for over 20 years. That’s why we created QUARC – a powerful rapid control prototyping tool that significantly accelerates control design and implementation. initially designed for industrial demands, QUARC is nonetheless ideal foradvanced research, masters-level, and evenundergraduate, teaching. QUARC is an integral part of all Quanser control lab workstations and is used all over the world by thousands of educational institutions and organizations, including the Canadian Space Agency and Defense Research and Development Canada. Discover what QUARC can help you achieve in less time and effort than you might be spending now.ACCELERATE CONTROLS EDUCATIONQUARC is an ideal tool to teach control concepts. It allows students to draw a controller, generate code and run it - all without Digital Signal Processing or without writing a single line of code. The capabilities of this powerful yet adaptable software are emphasized by the comprehensive curriculum that accompanies Quanser’s control lab equipment. The supplied Instructor and Student Workbooks feature lab exercises and projects based on Simulink®. They help focus students’ efforts on key control concepts rather than tedious code writing. The curriculum is developed by engineers for engineers to effectively demonstrate and teach the mechatronic design approach practised in industry. This includes modeling, controller design, simulation and implementation. An excellent low-cost rapid control prototyping system, QUARC is being usedby thousands of institutions worldwide. It is an effective and efficient teaching tool for undergraduate and graduate-level courses in classical and modern control theory.hOW QUARC FUSES MULTIPLEENGINEERING COURSESThe Integrated Learning Centre at Queen’s University fuses all engineering disciplines into one modern lab. Quanser’s workstations, featuring a wide range of modular Quanser experiments, are used here to teach introductory, intermediate and advanced controls. QUARC software is an integral part of all those workstations. An economical approach to outfitting a lab, it also keeps students motivated, providing access to even more hands-on learning.CONTROLS EDUCATIONis done, allowing the studentsto focus more on the controldesign theory and less on theworkings of MATLABSimulink, thus improvingthe learning experience.”Dr. Wen-Hua Chen,Loughborough University,United KingdomThis Flexible Link module furtherexpands your topics of study withthe SRV02 workstation.All on a Single PCQUARC provides a single PC solution for rapid control prototypingin Windows XP® or Vista®. It generates real-time code directly fromSimulink®-designed controllers – but for the same PC. This single PCSolution for rapid control prototyping significantly accelerates controldesign and implementation. This helps students focus on theimportant aspects of the control design process and completeproject-based assignments successfully.Simple. Intuitive.QUARC user interfaces are easy to understand without training.For example, QUARC’s “external mode” communications allow theSimulink® diagram to communicate with real-time code generatedfrom the model. Tune parameters of the running model by changingblock parameters in the Simulink® diagram. Want to view the statusof a signal in the model? Simply open a Simulink® Scope (or any otherSink in the diagram) while the model runs on the target. Furthermore,data can be streamed to the MATLAB® workspace or to a file on diskfor off-line analysis.Low MaintenanceQUARC streamlines the process of maintaining and servicing a laboratorywithout sacrificing system performance or an excessive amount of yourstaff’s time. The extremely flexible host-target structure allows QUARC usersto maximize limited resources (i.e. PC, laptop and hardware) with minimaleffort or cost. Host (control design environment) and target (platformwhich executes the real-time code) can be on separate computers yet stillcommunicate through a network connection. QUARC can sustain anypossible multi-configuration. Ask about License Server Architecture.The Integrated Learning Center, Queen’s University, Canada.BRING ThEORIES TO LIFEWhether you’re exploring emerging technologies or transforming knowledge into solutions for real-world challenges, count on Quanser to help you achieve your research goals. The power of QUARC software combined with Quanser’s innovative plants can helpresearchers test their theories in real-time, on real hardware. QUARC seamlessly integrates with Quanser’s research platforms toimplement virtually any control algorithm. Combine QUARC with Quanser’s multi-function Data Acquisition card and plants to create a self-contained control workstation ideal for advanced research. Use it to design, simulate, implement, and test a variety of time-varyingsystems: communications, controls, signal processing, video processing, and image processing.All this is achievable quickly , easily and affordably because the workstation is a fully integrated, open-architecture solution.The set-up pictured below shows a 3 DOF Gyroscope workstation as one example of a Quanser workstation for high level research. This typical configuration entails: • P lant • Amplifier• Data Acquisition Card • Virtual Plant Simulation• Rapid Control Prototyping Design Software • Pre-designed ControllersFor more information about the Quanser’s research platformsplease visit /MCC.14323 DOF GYROSCOPEFeaturing three Degrees Of Freedom (DOF), this dynamically diverse experimental platform is ideal for teaching rotational dynamic challenges.DATA ACQUISITION CARDMeasure and command real-time signals with high I/Osampling period. QUARC supports a wide range of Quanser and National Instruments data acquisition cards. For a complete list please visit /QUARC.AMPLIFIER AMPAQQuanser’s multi-channel linear current amplifier is ideal forprecision controls. The AMPAQ connects to the DAQ terminal board and is connected to the 3-DOF Gyroscope with its easy-connect cables.SOFTWARE TO ACCELERATE DESIGN3-DOF Gyroscope models are designed to run in real-time with QUARC ® software, which integrates seamlessly withMATLAB ®/Simulink ®.“Using Quanser’s software, we can easily design control systems for many plants. We can apply complex control strategies quickly and effectively - and it is very easy to verify theory on the real plant.”Kenichi yano,Associate Professor, Gifu University , JapanEFFORTLESS INTEGRATION FOR MEChATRONIC RESEARChQUARC is a powerful, flexible mechatronic integration tool, providing time-saving and simple solutions to those unique challenges encountered when you’re developing mechatronic systems. Whether you have custom-made research platforms or use manufactured equipment, QUARC is the only software that makes it easy to interface with all of them. QUARC offers a suite of third-party device blocks which help researchers seamlessly interface and control KUKA robots, PGR cameras and SensAble® PHANTOM devices, to name a few. These blocks not only allow a Simulink® model to communicate with external devices but also implement the mathematical framework for controlling them. All this is possible without the need to learn new tools or hand coding since the controller design and integration is performed in an environment most researchers are familiar with, such as Windows®, MATLAB®, Simulink®.“QUARC’s support of TCP/IP has been a tremendous help for our research. It allowed us to develop a distributed sensing system that isn’t dependent on expensive I/O hardware or DAQ boards. Further, this allows for safety-critical redundancy when we aredoing vehicle control tests.”Sean Brennan,Department of Mechanical and Nuclear Engineering,Pennsylvania State University , USAQUARC OFFERS OVER 10 BLOCKSETSThe table provides an overview. At a glance,you can see specific research applications, unique attributes and technical specifications.Now you can enjoy greater flexibility whenimplementing control schemes. QUARC expands the possibilities for complex control design by:multiple operating Systems Support.QUARC is designed so that code could be generated for multipleoperating systems and hardware platforms while maintaining a common, seamless and easy-to-use interface. Simulink® models can run in real-time on a variety of targets - a target being acombination of operating system and processor for which QUARC generates code from a Simulink® diagram. Targets includeWindows® and QNX®. The number of targets QUARC supports is continually increasing.Support for Communications.The QUARC Stream API offers a flexible and protocol-independent communications framework. Conduct standard communication between QUARC models and more: between a QUARC model and an external third-party application (e.g., graphical userinterface) or even between two external third-party applications. The Stream API is independent of the development environment and can be used in C/C++, .NET, MATLAB®, LabVIEW TM , etc. The Stream API enables the communication between multiple real-time model over the internet. This could be used for distributed control, teleoperation, device interfacing, etc. The stream API natively supports the following protocols: TCP/IP, UDP, serial, shared memory , named pipes, ARCNET, and more.For demos and tutorials on QUARC’s communication capabilities request a free trial of QUARC at /QUARC.increasing number of Blocksets.The number of interfaces QUARC supports is continuallyincreasing over time to ensure easy integration with recent and popular third-party devices. Here are a few more examples: • Nintendo Wiimote• Q bot- An Unmanned Ground Vehicle based on iRobot Create®• Schunk Grippers• SparkFun Electronics SerAccelGet an updated list of interfaces supported by QUARC at /QUARC/blocksetsDESCRIPTIONUsing the KUKA Robot Blockset you can control any KUKA robot equipped with RSI (Robot SensorInterface) through the interactive Simulink® environment without tedious hand coding and cumbersome hardware interfacing.This blockset is not included in the standard QUARC license and is sold separately.The Point Grey Research (PGR) Blockset is used to acquire images from some of the Point Grey Research cameras. QUARC also provides image processing blocksets that can be used to find objects of a given color within a source image or convert images from one format to another.This blockset is included in the standard QUARC license.The Wiimote (Wii Remote) block reads the state of the Wiimote and outputs the button, acceleration, and Infra Red (IR) camera information. Using this blockset you can easily interface the Wiimote into the controller. This blockset is included in the standard QUARC license.The Novint Falcon Blockset is used for implementing control algorithms for the Falcon haptic device. Using the Blockset significantly simplifies the task of designing controllers for the Falcon.This blockset is included in the standard Quarc license.TEChNICAL CAPABILITIES AND SPECIFICATIONS• E nables the deployment of real-time executables with GUI • S upport for setting and getting values (e.g., knobs, displays, scopes, and other inputs and outputs)Supported devices:• SensAble PHANTOM Omni • SensAble PHANTOM Desktop • SensAble PHANTOM Premium• SensAble PHANTOM Premium 6DOF Data provided as output,• GPS position (latitude, longitude, altitude)• Number of visible satellites (dilution of precision data)• Accuracy information (dilution of precision – DOP)Typical accuracy 1-3m (WAAS)SUGGESTED RESEARCh APPLICATIONS• GUI Design (e.g. Cockpit)• Force feedback virtual reality• Haptically-enabled medical simulations • Teleoperation• Precise robotic manipulation• Image-based control and localization • Autonomous navigation and control • Fault detection• Image-based control and localization• Autonomous navigation and control • Image recognition • Mapping• Obstacle detection and avoidance • Visual servoing and tracking • Vision feedback• Teleoperation• Robotic manipulation• Force feedback virtual reality• Haptically enabled medical simulations • Teleoperation• Localization• Autonomous navigation and control• M ission reconfiguration(e.g., for Unmanned Vehicle Systems)• Fault recovery • Safety watchdogDYNAMICRECONFIGURATIONkUkA ROBOT ALTIASENSABLEPhANTOM ® SERIESVISUALIzATIONPGR CAMERASWII REMOTENOVINT FALCONGPSNATURAL POINTOPTITRACkThe PHANTOM® Blockset lets you control the series of PHANTOM® haptic devices via Simulink®. For added flexibility researchers can combine the Phantom Blockset and Visualization Blockset to enjoy seamless haptics rendering of virtual environments.This blockset is not included in the standard QUARC license and is sold separately.The Visualization Blockset creates 3D visualizations of simulations or actual hardware in real-time. By combining meshes and textures, you can create objects to seamlessly integrate high-performance graphics with real-time controllers. Comprehensive documentation and examples along with additional content are provided to help new users get started and master this blockset quickly. QUARC Visualization blockset is used in the Virtual Plant Simulation of selected Quanser plants such as SRVO2 and Active Suspension. This blockset is included in the standard QUARC license.• Y coordinates of up to four IR points detected by the wiimote IR camera. Valid values range from 0 to 767 inclusive.• A compatible Bluetooth device must be installed on the PC• A bility to command either Cartesian or joint velocity set points • A bility to measure the Cartesian positions, joint angles and joint torques • A bility to set either Cartesian or the joint minimum and maximum velocity limits • K UKA built-in safety checks are still enabled for safe operation• S end forces and torques in Cartesian or joint space • Read encoder values, position, and joint angles• Send commands in two different work spaces to the Phantom device • T he block outputs the gimbal angles of the device plus the values associated with the buttons and the 7 DOF available on the device (thumb-pad or scissors)• R emotely connect to a visualization server with multiple clients • N o interference with the operation of your real-time controller• Plugins provided for Blender and Autodesk’s 3ds Max 2008, 2009 and 2010• S et different material properties such as diffuse color, opacity , specular color, shininess, and emissivity.• T exture map support for png, jpg, tiff, and bmp.• X 3D support• C onfigurable mouse and keyboard interface for manually navigating around the environment • P erformance far exceeds TMW’s Virtual Reality toolbox• U p to 16 cameras can be connected and configured for single or multiple capture volumes • C apture areass up to 400 square feet • S ingle point tracking for up to 80 markers, or 10 rigid-body objects • T ypical calibration time is under 5 minutes • P osition accuracy on the order of mm under typical conditions• U SB 2.0 connectivity to ground station PC• U p to 100 fps tracking• S upport for Draganflyer 2 HI-COL and the FireflyMV • F rame rate selection from 7.5 fps to 60 fps • R esolutions from 640 x 480 to 1024 x 768, • C olor or grayscale, and custom image (subimage) sizes supported for faster framerates• C ontinuity of states between the model being switched-out and the model being switched-in, as a necessary condition to the system stability • S witching within one sampling interval, as a necessary condition to the system stability • D ynamic reconfiguration can be triggered either automatically (e.g., from a supervisory model) or manually• D ynamic Reconfiguration can be triggered either locally or remotely (i.e., on a remote target)The OptiTrack Blockset allows motion capture and tracking by using 3 or more synchronized infrared (IR) cameras that capture images containing reflective markers within a workspace. The blockset can be used to track either individual markers or rigid bodies. This Blockset makes it easy to conduct vision-based control experiments in real-time, especially for objects that were previously difficult to track, such as indoor autonomous vehicles.This blockset is not included in the standard QUARC license and is sold separately.The GPS Blockset allows GPS receivers to be easily accessed, thereby adding GPS localization to an experimentalplatform. This Blockset integrates with Ublox GPS devices as well as NMEA compliant GPS devices. This blockset is not included in the standard Quarc license and is sold separately.The Altia Design Blockset enables the user to interact with the real-time code from Altia GUIs. Unlike theMATLAB® GUIs, MATLAB® and Simulink® are not required when using Altia GUIs. This blockset gives you the tools you need to generate complete production systems without writing a single line of code. This blockset is included in the standard QUARC license.The Dynamic Reconfiguration Blockset lets you dynamically switch models on the target machine within a sampling interval. A running model may be replaced with another model while ensuring continuity of states between both with no interruptions (i.e. no skipped sample). For a demo and tutorial on the Dynamic Reconfiguration blockset request a free trial of QUARC at /QUARC.This blockset is not included in the standard QUARC license and is sold separately.Data provided as output:• P osition: X, Y, and Z position in Cartesian coordinates• Button information: Whether a button is currently pressed or not • F orce: X, Y, and Z forces applied by the Falcon end-effectormodel 1model 2* Please note that prices for blocksets may vary. For more information or to request a quote please contact sales@.• Payload 5 kg • Number of axes 6• Repeatability <±0.02 mm • Weight 28 kg• Mounting positions floor or ceiling • Controller KR C2sr • Max speed 8.2 m/sData provided as output:• X, Y, and Z axis accelerations • Button states • X coordinates of up to four IR points detected by the wiimote IR camera. Valid values range from 0 to 1023 inclusive• S upport for setting values (i.e. Meters and other outputs)• F eatures the Quanser Plot library for AltiaBLOCkSET* • Virtual reality rendering• Game and medical simulation• Simulation of mechanical components • Data fusion • R eal-time status displays of physical hardware• Virtual cockpit for aerial vehicles REQUEST A FREE 30 DAY TRIAL OF QUARC TODAY. VISIT /QUARC• Robotic manipulation • Teleoperation“The Host Computer System for the Challenging Environment Assessment Laboratory (CEAL) at the Toronto Rehabilitation Institute (TRI) was developed using Quanser’s QU ARC real-time software. The power of QU ARC, with Quanser’s engineering support, enabled TRI to create a flexible developmentenvironment for researchers to implement sophisticated real-time experiments, using a large-scale 11-ton, 6-DOF motion platform and high-performance audio-visual rendering systems”Dr. Geoff Fernie , Vice President, Toronto Rehabilitation Institute, CanadaQUARC ACCELERATES MEChATRONIC DEVELOPMENT WITh RAPID CONTROL PROTOTYPINGQUARC is a powerful Rapid Control Prototyping (RCP) platform that meets industrial research and development demands. This robust software helps manage the increasing complexity of controlengineers’ tasks and accelerates their ability to test control strategies. Generating countless iterations of Simulink® control designsbecomes almost effortless - a block diagram design is automatically implemented on the system and computed in real time, eliminating the need for manual coding. This RCP platform is adaptable to virtually any mechatronic interfaces and scalable for complex multi-input and multi-output systems.Affordable Industrial-Grade PerformanceFor a fraction of the cost of comparable systems, Research and Development engineers can convert a PC into a powerful platform for control system development and deployment. When combined with a Quanser Power Amplifier and a Quanser Data Acquisition Card, QUARC software provides an ideal rapid prototyping and hardware-in-the-loop development environment. QUARC is also compatible with a wide range of commercially available data acquisition cards, including National Instruments boards.QUARC evolved from experience with its predecessor WinCon.The Canadian Space Agency played an intricate role in defining and confirming many of the features of QUARC. This was done in the context of their micro-satellite development program on an early stage prototype. It has since been adopted by industries requiring the latest in performance and development flexibility such as the Aerospace, Defence and Medical device industries.QUARC capabilities and features are designed to optimize the RCP process. Below are a few samples of such features.• F lexible and extensible communications blocks configurablefor real-time TCP/IP, UDP, serial, shared memory and other protocols • P erformance Diagnostics • R TW Code Optimization support • M odularity and incremental builds via model referencing • C ontrol of thread priorities and CPU affinity • A synchronous execution (e.g., ideal for efficient communication) • R un any number of models on one target – or simultaneously on multiple targets • S elf-booting models for embedded targets• E xternal Hardware-In-the-Loop card and communication interfacing provided in C/C++, MATLAB®, LabVIEW TM , and .NET languages • M ultiprocessor (SMP) support, e.g., on a quad-core Windows target QUARC models can take advantage of all four cores. • S imulink® 3D Animation (formerly known as Virtual Reality) Toolbox support• A bility to interface with MATLAB® GUIs, LabVIEW TM panels, and Altia“We have been using Quanser’s QU ARC software to do real-time robot control. QU ARC enables fast and easy prototyping of control algorithms with hardware in the loop and has been an invaluable tool for algorithm development, simulation, and verification.”Paul Bosscher, Harris Corporation, USAChallenging environment AssessmentLaboratory (CeAL) will be one of the most advanced rehabilitation research facilities in the world.INNO VATE, RESEARCHAND EXPLOIT KNOWLEDGE.QU ANSERCONSULTING SOFTWAREHARDWAREPlantDAQAmplifierQUARC®: A POWERFUL ENGINEFOR ENGINEERING DEPARTMENTSThree issues challenge university engineering departments everywhere: teaching, research and budget. One solution resolves them: QUARC software from Quanser!For T eaching: Created by engineers for engineers, QUARC is an excellent low-cost rapid control prototyping system. Working seamlessly with Simulink®, QUARC helps students put ideas andtheory into practice sooner. Plus curriculum is offered to help educators focus on what matters most. With more hands-on learning, undergraduate and graduate students alike are captivated and motivated to study further.For Research: Originally designed for industrial use, QUARC is idealfor advanced research. From the precise control of surgical robots to unmanned air vehicles and beyond, ideas can be tested in real-time- even ideas that are out of this world. Small wonder our client list includes NASA, the Canadian Space Agency and thousands of universities and colleges. (Look on your left.)For your department’s budget: QUARC seamlessly integrates over80 Quanser experiments - from introductory to very advanced. These are modular by design and maximize efficiencies, offering multiple uses for one workstation. Academics ourselves, Quanser appreciates your need for careful budgeting. So QUARC is competitively pricedand available with single- or multiple-user licenses.Learn more at /QUARCProducts and/or services pictured and referred to herein and their accompanying specifications may be subject to change without notice. Products and/or services mentioned herein are trademarks or registered trademarks of Quanser Inc. and/or its affiliates. Other product and company names mentioned herein are trademarks or registered trademarks of their respective owners.©2010 Quanser Inc. All rights reserved. Rev 2.0。
Juniper Networks NetScreen-ISG 2000(1)Maximum Performance and Capacity (2)Firewall performance 2 Gbps 3DES performance1 Gbps Deep Inspection performance 300 Mbps Concurrent sessions 512,000New sessions/second 30,000Policies 30,000Interfaces Up to 8 Mini GBIC (SX or LX),up to 28 10/100Mode of OperationLayer 2 mode (transparent mode)(5)Yes Layer 3 mode (route and/or NA T mode) Yes NA T (Network Address Translation)Yes PA T (Port Address Translation)Yes Policy-based NA T Yes Virtual IP 8(4)Mapped IP8,192(3)Users supportedUnrestrictedFirewallNumber of network attacks detected 31Network attack detection Yes DoS and DDoS protections Yes TCP reassembly for fragmented packet protection Yes Malformed packet protections Yes Deep Inspection firewall Yes Stateful protocol signatures Yes Protocols supported HTTP , FTP , SMTP , POP 3, IMAP , DNS Content Inspection Yes Malicious Web filtering up to 128 URLs External Web filtering (Websense)Yes Integrated Web filtering No VPNConcurrent VPN tunnels up to 10,000(3)Tunnel interfacesup to 1,024(3)DES (56-bit), 3DES (168-bit) and AES encryption Yes MD-5 and SHA-1 authentication Yes Manual Key, IKE, PKI (X.509)Yes Perfect forward secrecy (DH Groups)1,2,5Prevent replay attack Yes Remote access VPN Yes L2TP within IPSec Yes IPSec NA T traversalYes Redundant VPN gateways YesFirewall and VPN User Authentication Built-in (internal) database - user limit 1,500(3)3rd Party user authentication RADIUS, RSA SecurID, and LDAPXAUTH VPN authentication Yes Web-based authentication Yes System ManagementWebUI (HTTP and HTTPS)Yes Command Line Interface (console)Yes Command Line Interface (telnet)YesCommand Line Interface (SSH)Yes, v1.5 and v2.0 compatibleJuniper Networks NetScreen-ISG 2000(1)System ManagementNetScreen-Security ManagerYes All management via VPN tunnel on any interface Yes SNMP full custom MIB Yes Rapid deployment NoLogging/MonitoringSyslog (multiple servers)External, up to 4 serversE-mail (2 addresses)Yes NetIQ WebTrends External SNMP (v2)Yes TracerouteYes VPN tunnel monitorYes VirtualizationMaximum number of Virtual Systems 0 default, upgradeable to 50(6)Maximum number of security zones 26 default, upgradeable to 126(6)Maximum number of virtual routers 3 default, upgradeable to 53(6)Number of VLANs supported 500 max RoutingOSPF/BGP dynamic routing up to 8 instances each (3)RIPv2 dynamic routing up to 50 instances supported (3)Static routes20,000Source-based routingYesHigh Availability (HA)Active/Active Yes Active/PassiveYes Redundant interfacesYes Configuration synchronizationYes Session synchronization for firewall and VPN Yes Session failover for routing change Yes Device failure detection Yes Link failure detectionYes Authentication for new HA members Yes Encryption of HA traffic Yes IP Address Assignment StaticYes DHCP , PPPoE client No Internal DHCP server No DHCP relayYes PKI SupportPKI Certificate requests (PKCS 7 and PKCS 10)Yes Automated certificate enrollment (SCEP)Yes Online Certificate Status Protocol (OCSP)Yes Certificate Authorities Supported Verisign Yes Entrust Yes Microsoft Yes RSA KeonYes iPlanet (Netscape)Yes Baltimore Yes DOD PKIYesJuniper Network’s Integrated Security Gateway,the NetScreen-ISG 2000,is a purpose-built,high-performance system designed to deliver scalable network and application security for large enterprise,carrier and data center networks. Integrating best-of-breed Deep Inspection firewall,VPN and DoS solutions,the JuniperNetworks NetScreen-ISG 2000 enables secure,reliable connectivity along with network and application-level protection for key,high-traffic network segments. The NetScreen-ISG 2000 is built on Juniper Network’s next-generation architecture which includes a fourth generation security ASIC,the GigaScreen 3,high speedmicroprocessors and add-on security modules to provide the predictable,multi-Gigabit performance needed for the most demanding network segments.Juniper Networks NetScreen-ISG 2000Juniper NetworksNetScreen-ISG 2000(1)AdministrationLocal administrators database20External administrator database RADIUS/LDAP/SecurID Restricted administrative networks6Root Admin, Admin, and Read Only user levels YesSoftware upgrades TFTP/WebUI/NSMConfiguration Roll-back YesTraffic ManagementGuaranteed bandwidth NoMaximum bandwidth Yes, per physical interface Priority-bandwidth utilization NoDiffServ stamp Yes, per policyExternal FlashCompactFlash™Supports 128 or 512 MBIndustrial-Grade SanDisk Event logs and alarms YesSystem config script YesNetScreen ScreenOS Software YesDimensions and PowerDimensions (H/W/L) 5.25/17.5/23 inchesWeight52 lbs.Rack mountable19” standard, 23” optional Power Supply (AC)90 to 264 VAC, 250 watts Power Supply (DC)-36 to -72 VDC, 250 wattsLicensing Options: The NetScreen-ISG 2000 is available with two licensing options to provide two different levels of functionality and capacity.Advanced Models: The Advanced software license provides all of the features and capacities listed within this specsheet.Baseline Models: The Baseline software license provides an entry-level solution for customer environments where features such as Deep Inspection™, OSPF and BGP dynamic routing, advanced High Availabilty, and full capacity are not criticalrequirements. The following table shows the features and capacities that are different than the Advanced models:NetScreen-ISG 2000 Baseline AdvancedSessions256,000512,000Concurrent VPN tunnels1,00010,000Deep Inspection Firewall No YesVLANs100500OSPF/BGP No YesHigh Availability (HA)Active/Passive Active/ActiveCertificationsSafety CertificationsUL, CUL, CSA, CBEMC CertificationsFCC class A, CE class A, C-Tick, VCCI class AEnvironmentOperational temperature: 32°to 122°F, 0°to 50°CNon-operational temperature: -4°to 158°F, -20°to 70°CHumidity: 10 to 90% non-condensingMTBF (Bellcore model)7.6 yearsSecurityPending Ordering InformationProduct Part NumberNetScreen-ISG 2000 Bundles Advanced*NetScreen-ISG 2000 system 1 4 port 10/100 I/O Module NS-ISG-2000-P00A-S00 NetScreen-ISG 2000 system 1 8 port 10/100 I/O Module NS-ISG-2000-P01A-S00 NetScreen-ISG 2000 system 1 Dual-Port mini-GBIC NS-ISG-2000-P02A-S00I/O ModuleNetScreen-ISG 2000 system 1 dual port 10/100/1000NS-ISG-2000-P03A-S00Copper I/O ModuleNetScreen-ISG 2000 Bundles Baseline*Netscreen-ISG 2000 system 1 4 port 10/100 I/O Module NS-ISG-2000B-P00A-S00 Netscreen-ISG 2000 system 1 8 port 10/100 I/O Module NS-ISG-2000B-P01A-S00 Netscreen-ISG 2000 system 1 Dual port mini-GBIC NS-ISG-2000B-P02A-S00I/O ModuleNetScreen-ISG 2000 system 1 dual port 10/100/1000NS-ISG-2000B-P03A-S00Copper I/O Module*All systems include 2 AC power supplies and 0 virtual systemsNetScreen-ISG 2000 Virtual System UpgradesVSYS Upgrade 0 to 5NS-ISG-2000-VSYS-5 VSYS Upgrade 5 to 25NS-ISG-2000-VSYS-25 VSYS Upgrade 25 to 50NS-ISG-2000-VSYS-50 VSYS Upgrade 0 to 25NS-ISG-2000-VSYS-025 VSYS Upgrade 0 to 50NS-ISG-2000-VSYS-050Every Virtual System includes 1 virtual router and 2 security zones, usable in the virtual or root systemNetScreen-ISG 2000 ComponentsI/O Module - Dual Port Mini GBIC-SX NS-ISG-2000-SX2I/O Module - Dual Port Mini GBIC-LX NS-ISG-2000-LX2I/O Module - 4 Port 10/100 Fast Ethernet NS-ISG-2000-FE4I/O Module - 8 Port 10/100 Fast Ethernet NS-ISG-2000-FE8I/O Module - Dual Port 10/100/1000 Gig Ethernet NS-ISG-2000-TX2SX transceiver (mini-GBIC)NS-SYS-GBIC-MSXLX transceiver (mini-GBIC)NS-SYS-GBIC-MLXAC power supply NS-ISG-2000-PWR-AC DC power supply NS-ISG-2000-PWR-DC Japan power cord option NS-ISG-2000-JAPANFan module NS-ISG-2000-FANRack Mount Kit (19 in., all mounting hardware)NS-ISG-2000-RCK-01 Rack Mount Kit (23 in., all mounting hardware)NS-ISG-2000-RCK-02 Blank Interface Panel NS-ISG-2000-IPAN Blank Power Supply Cover NS-ISG-2000-PPAN(1)Performance, capacity and features listed are based upon systems ScreenOS 5.0.0 and may vary with other ScreenOS releases. Actual throughput may vary based upon packet size and enabled features.(2)Performance and capacity provided are the measured maximums under ideal testing conditions. May vary by deployment.(3)Shared among all Virtual Systems(4)Not available with Virtual Systems(5) NA T, PA T, policy based NA T, virtual IP, mapped IP, virtual systems, virtual routers, VLANs, OSPF, BGP, RIPv2, Active/Active HA,and IP address assignment are not available in layer 2 transparent mode(6)Requires purchase of virtual system key. Every virtual system includes one virtual router and two security zones, usable inthe virtual or root system.1194 North Mathilda Avenue Sunnyvale, CA 94089 USA Phone: 888-JUNIPER (888-586-4737) or 408-745-2000 Fax: 408-745-2100Copyright © 2004 Juniper Networks, Inc. All rights reserved.Juniper Networks, the Juniper Networks logo, NetScreen, NetScreen Technologies, GigaScreen, and the NetScreen logo are registered trademarks of Juniper Networks, Inc. NetScreen-5GT, NetScreen-5XP, NetScreen-5XT, NetScreen-25, NetScreen-50, NetScreen-100, NetScreen-204, NetScreen-208, NetScreen-500, NetScreen-5200, NetScreen-5400, NetScreen-Global PRO, NetScreen-Global PRO Express, NetScreen-Remote Security Client, NetScreen-Remote VPN Client, NetScreen-IDP 10, NetScreen-IDP 100, NetScreen-IDP 500, GigaScreen ASIC, GigaScreen-II ASIC, and NetScreen ScreenOS are trademarks of Juniper Networks, Inc. All other trademarks and registered trademarks are the property of their respective companies.Part Number: 110011-003 Sept 2004。
1移动4G/5G VPDN业务路由组网实现面临的问题某金融机构因业务发展需要,办理开通运营商的VPDN 业务,用户利用移动办公终端可以直接进行拨号,业务使用正常。
但用户在新业务需求中,要使用VPDN拨号业务实现在分支机构引入4G/5G路由器接入,作为分支机构有线接入的冗余备份网络。
新业务开通测试时,将VPDN用户的UIM卡插入到4G/ 5G路由器上拨号,拨号成功并获取随机业务IP地,但用户要求实现拨号获取的IP地址必须为用户指定的静态固定IP地址,并需要由4G/5G路由器为下挂的信息终端,分配总部预规划的指定的业务IP地址。
按照用户需求完成相关IP路由指向配置后,测试下挂信息终端无法通过4G/5G路由器获取用户指定的业务IP实现与总部数据中心的数据交互。
2移动4G/5G VPDN业务组网架构拓扑及现场测试情况根据用户业务组网要求,按VPDN业务实现网络拓扑结构(如图1所示)。
在用户分支机构使用运营商移动4G/ 5G网络,通过VPDN业务拨号连通用户内部网络,使用总部根据不同业务分配的静态IP地址,安全、便捷、高效地与总部的数据库服务器、文件管理服务器、Web应用服务器、邮件服务器、视频服务器、网管服务器等进行数据的交互;总部可以对分支机构的信息终端通过图形化网管进行安全、灵活的管理、授权。
图1网络拓扑结构基于4G/5G网络应用VPDN业务专网的设计与实现The Design and Implementation of VPDN Service Private NetworkBased on the4G/5G Network Application冯亚军(中国电信股份有限公司河南分公司,郑州450000)FENG Ya-jun(Henan Branch of China Telecom,Zhengzhou450000,China)【摘要】随着信息通信技术(ICT)与人类生产生活持续深度融合,政务、金融、教育、医疗、工业等行业对泛在、高速、智能、安全的信息网络需求空前高涨,运营商提供的VPDN业务基于4G/5G的应用场景可提供更加安全、可靠、便捷的通信保障,相应业务的发展和应用场景越来越多。
vSAN ™Networking Done RightIncrease vSAN Efficiency with Mellanox Ethernet InterconnectsHigher EfficiencyEfficient Hardware OffloadsA variety of new workloads and technologies are increasing the load on CPU utilization. Overlay networks protocols, OVS processing, access tostorage and others are placing a strain on VMware environments. High performance workloads require intensive processing which can waste CPU cycles, and choke networks. The end result is that application efficiency is limited and virtual environments as a whole becomes inefficient. Because of thesechallenges, data center administrators now look to alleviate CPU loads by implementing, intelligent, network components that can ease CPU strain, increase networkbandwidth and enable scale and efficiency in virtual environments.Mellanox interconnects can reduce the burden byoffloading many networking tasks, thereby freeing CPU resources to serve more VMs and process more data. Side-by-side comparison shows over a 70% reduction in CPU resources and a 40% improvement in bandwidth.Without OffloadsWith Mellanox OffloadsvSphere 6.5, introduced Remote Direct MemoryAccess over Converged Ethernet (RoCE). RoCE allows direct memory access from one computer to another without involving the operating system or CPU. The transfer of data is offloaded to a RoCE-capable adapter, freeing the CPU from the data transferprocess and reducing latencies. For virtual machines a PVRDMA (para-virtualized RDMA) network adapter is used to communicate with other virtual machines. Mellanox adapters are certified for both in vSphere.RoCE dramatically accelerates communication between two network endpoints but also requires a switch that is configured for lossless traffic. RoCE v1 operates over lossless layer 2 and RoCE v2 supports layer 2 and layer 3. To ensure a lossless environment, you must be able to control the traffic flows. Mellanox Spectrum switches support Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) whichenables a global pause across the network to support RDMA. Once RoCE is setup on vSphere close-to-local, predictable latency can be gained from networked storage along with line-rate throughput and linear scalability. This helps to accommodate dynamic, agile data movement between nodes.RoCE CertifiedReduce CPU OverheadWith RDMAWithout RDMA VMware Virtual SANVMware's Virtual SAN (vSAN) brings performance, low cost and scalability to virtual cloud deployments. An issue that cloud deployment model raises is the problem of adequate storage performance to virtualinstances. Spinning disks and limited bandwidth networks lower IO rates over local drives. VMware’s solution to this is vSAN which adds a temporary local storage “instance” in the form of a solid -state drive to each server. vSAN extends the concept of local instance storage to a shareable storage unit in each server, where additionally, the data can be accessed by other servers over a LAN. vSAN brings. The benefits of VSAN include:•Increased performance due to local server access to Flash storage•Lower infrastructure cost by removing the need for networked storage appliances •Highly scalable --simply add more servers to increase storage •Eliminate boot storms since data is stored locally•Unified management --no storage silo versus server silo separation problemsMellanox 10/25G Ethernet interconnect solutions enable unmatched competitive advantages in VMwareenvironments by increase efficiency of overall server utilization and eliminating I/O bottleneck to enable more virtual machines per server, faster migrations and speed access to storage. Explore this reference guide to learn more about how Mellanox key technologies can help improve efficiencies in your vSAN environment.Scalable from a half rack to multiple racksHalf Rack 12 nodesFull Rack 24 nodesPay As You Grow Switching10 Racks up to 240 nodesDeployment Config134411GbE link: 1GbE Transceiver125/10GbE link: QSFP to SFP+324100GbE link: QSFP to QSFP 100/40GbE link: QSFP to QSFP Provisioning & Orchestration▪Zero-touch provisioning ▪VLAN auto-provisioning▪Migrate VMs without manual configuration▪VXLAN/DCI support for VM migration across multiple datacenters for DRMonitoring▪Performance monitoring ▪Health monitoring ▪Detailed telemetry▪Alerts and notificationsAutomated Network▪½ 19” width, 1U height ▪18x10/25GbE + 4x40/100GbE ▪57W typical (ATIS)2Mellanox InterconnectsiSERStorage virtualization requires an agile and responsive network. iSER accelerates workloads by using an iSCSI extensions for RDMA. Using the iSER extension lowers latencies and CPU utilization to help keep pace with I/O requirements and provides a 70% improvement in throughput and 70% reduction in latencies through Mellanox Ethernet interconnects.Deliver 3X EfficiencyHyper-ConvergedReduce CapEx ExpenseHyper-Converged Infrastructure (HCI) is a demanding environment for networking interconnects. HCI consists of three software components: compute virtualization, storage virtualization and management, in which all three require an agile andresponsive network. Deploying on 10, or better, 25G network pipes assists as does network adapters and switches with offload capabilities to optimizeperformance and availability of synchronization and replication of virtualized workloads.CapEx Analysis: 10G vs. 25GMellanox adapters and switches accelerate VM resources toimprove performance, enhance efficiency and provide high-availability and are a must-have feature for any VMware environment. Ethernet AdaptersMellanox Connect-X adapters:▪Enable near-native performance for VMs thru Stateless offloads ▪Extend hardware resources to 64 PF, 512 VF w/ SR-IOV & ROCE ▪Accelerate virtualized networks with VXLAN, GENEVE & NVGRE ▪Align network services withcompute services for multitenant network supportIncreasing vSAN EfficiencyIncrease vSAN Efficiency with Mellanox Ethernet Interconnects。
with Bluetooth® connectorFEATURES• Dynamic full-range transducer for high-resolution,powerful monitoring sound• Reduces acoustic stress factors through natural anddistortion-free reproduction• 2 in 1 bundle: Bluetooth® module for wireless connec-tion to mobile devices, PCs or tablets, with a built-inmicrophone for calls or standard 3,5 mm jack-plugcable• Excellent shielding through optimized earpiece shapeand flexible silicone and foam attachmentsFor the stage. For massive sound. For the road.Developed for high expectations on live stages, the specially designed driver of the IE 100 PRO creates precise audio cla-rity for musicians in live sessions. Typical for the new type of membrane is a powerful, high-resolution and warm monito-ring sound. With the included Bluetooth® module, the in-ears become comfortable everyday companions for your mobile phone, PC or tablet. With the built-in mic, phone calls or Webcasts are also possible.Musicians and DJs choose the IE 100 PRO wireless set for its exceptional sound and high wearing comfort. Not only for live sessions, but also for producing on the road or as an everyday companion.The in-ears come with 4 earpiece adapters that optimize the fit for every ear size and shape. The setup is stage-safe from the connection to the cable conduit.Sophisticated monitoring sound for mixing on live stages, producing in the studio and everywhere in between.DELIVERY INCLUDES• IE 100 PRO (BLACK, CLEAR or RED)• Bluetooth connector• black cable for IE 100 PRO• USB-A to USB-C cable• soft pouch• cleaning tool• foam and silicone ear adapters• quick guide• safety guide• compliance sheetwith Bluetooth® connectorPRODUCT VARIANTSIE 100 PRO WIRELESS BLACKArt. no. 509171IE 100 PRO WIRELESS CLEAR Art. no. 509172IE 100 PRO WIRELESS RED Art. no. 509173SPECIFICATIONS IE 100 PROFrequency response 20 - 18,000 Hz Impedance20 ΩSound pressure level (SPL)115 dB (1 kHz / 1 V rms )Total harmonic distortion (THD)< 0.1 % (1 kHz, 94 dB)Noise attenuation < 26 dB Magnetized field strength 1.63 mT Operating temperature Storage temperature –5 °C to +50 °C (23 °F to 122 °F)–20 °C to +70 °C (–4 °F to 158 °F)Relative humidity< 95 %Bluetooth ® ConnectorWearing style Bluetooth® neckband cable Microphone principle MEMS Microphone frequency response100 - 8,000 HzMicrophone sensitivity -42 dBV/Pa (ITU-T P.79)Microphone pick-up pattern (speech audio)omni-directional Power supply - built-in rechargeable lithium- polymer battery 3.7 V ⎓, 100 mAhUSB charging 5 V ⎓, 100 mA max.Operating time10 h (music playback via SBC) with rechargeable battery;240 h in standby mode Charging time ofrechargeable batteries approx. 2.5 hOperating temperature Charging temperature Storage temperature +5 °C to +40 °C ± 5 °C (41 °F to 104 °F ± 9 °F)+10 °C to +40 °C ± 5 °C (50 °F to 104 °F ± 9 °F)–20 °C to +70 °C (–4 °F to 158 °F)Relative humidity Operation: Storage:10 - 80 %, non-condensing 10 - 90 %Magnetized field strength1.63 mT (with IE 100 PRO)0.23 mT (without headphone)Weight approx. 13 gBluetooth®VersionBluetooth 5.0 compatible,class 1, BLETransmission frequency 2,402 - 2,480 MHz Modulation GFSK, π/4 DQPSK, 8DPSK Profiles HSP, HFP, AVRCP, A2DP Output power 10 mW (max)CodecSBC, aptX®, aptX LL®, AACThe Bluetooth® word mark and logos are registered trade-marks owned by Bluetooth SIG, Inc. and any use of such marks by Sennheiser electronic GmbH & Co. KG is under license.with Bluetooth® connectorSennheiser electronic GmbH & Co. KG · Am Labor 1 · 30900 Wedemark · Germany · ACCESSORIESIE PRO Bluetooth Connector Art. no. 508943IE PRO Mono cable Art. no. 508944Twisted cable Art. no. 507478Black straight cableArt. no. 508584。
ACM Word Template for SIG Site1st Author1st author's affiliation1st line of address2nd line of address Telephone number, incl. country code 1st author's E-mail address2nd Author2nd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code2nd E-mail3rd Author3rd author's affiliation1st line of address2nd line of addressTelephone number, incl. country code3rd E-mailABSTRACTA s network speed continues to grow, new challenges of network processing is emerging. In this paper we first studied the progress of network processing from a hardware perspective and showed that I/O and memory systems become the main bottlenecks of performance promotion. Basing on the analysis, we get the conclusion that conventional solutions for reducing I/O and memory accessing latencies are insufficient for addressing the problems.Motivated by the studies, we proposed an improved DCA combined with INIC solution which has creations in optimized architectures, innovative I/O data transferring schemes and improved cache policies. Experimental results show that our solution reduces 52.3% and 14.3% cycles on average for receiving and transmitting respectively. Also I/O and memory traffics are significantly decreased. Moreover, an investigation to the behaviors of I/O and cache systems for network processing is performed. And some conclusions about the DCA method are also presented.KeywordsKeywords are your own designated keywords.1.INTRODUCTIONRecently, many researchers found that I/O system becomes the bottleneck of network performance promotion in modern computer systems [1][2][3]. Aim to support computing intensive applications, conventional I/O system has obvious disadvantages for fast network processing in which bulk data transfer is performed. The lack of locality support and high latency are the two main problems for conventional I/O system, which have been wildly discussed before [2][4].To overcome the limitations, an effective solution called Direct Cache Access (DCA) is suggested by INTEL [1]. It delivers network packages from Network Interface Card (NIC) into cache instead of memory, to reduce the data accessing latency. Although the solution is promising, it is proved that DCA is insufficient to reduce the accessing latency and memory traffic due to many limitations [3][5]. Another effective solution to solve the problem is Integrated Network Interface Card (INIC), which is used in many academic and industrial processor designs [6][7]. INIC is introduced to reduce the heavy burden for I/O registers access in Network Drivers and interruption handling. But recent report [8] shows that the benefit of INIC is insignificant for the state of the art 10GbE network system.In this paper, we focus on the high efficient I/O system design for network processing in general-purpose-processor (GPP). Basing on the analysis of existing methods, we proposed an improved DCA combined with INIC solution to reduce the I/O related data transfer latency.The key contributions of this paper are as follows:▪Review the network processing progress from a hardware perspective and point out that I/O and related last level memory systems have became the obstacle for performance promotion.▪Propose an improved DCA combined with INIC solution for I/O subsystem design to address the inefficient problem of a conventional I/O system.▪Give a framework of the improved I/O system architecture and evaluate the proposed solution with micro-benchmarks.▪Investigate I/O and Cache behaviors in the network processing progress basing on the proposed I/O system.The paper is organized as follows. In Section 2, we present the background and motivation. In Section 3, we describe the improved DCA combined INIC solution and give a framework of the proposed I/O system implementation. In Section 4, firstly we give the experiment environment and methods, and then analyze the experiment results. In Section 5, we show some related works. Finally, in Section 6, we carefully discuss our solutions with many existing technologies, and then draw some conclusions.2.Background and MotivationIn this section, firstly we revise the progress of network processing and the main network performance improvement bottlenecks nowadays. Then from the perspective of computer architecture, a deep analysis of network system is given. Also the motivation of this paper is presented.2.1Network processing reviewFigure 1 illustrates the progress of network processing. Packages from physical line are sampled by Network Interface Card (NIC). NIC performs the address filtering and stream control operations, then send the frames to the socket buffer and notifies OS to invoke network stack processing by interruptions. When OS receives the interruptions, the network stack accesses the data in socket buffer and calculates the checksum. Protocol specific operations are performed layer by layer in stack processing. Finally, data is transferred from socket buffer to the user buffer depended on applications. Commonly this operation is done by memcpy, a system function in OS.Figure 1. Network Processing FlowThe time cost of network processing can be mainly broke down into following parts: Interruption handling, NIC driver, stack processing, kernel routine, data copy, checksum calculation and other overheads. The first 4 parts are considered as packet cost, which means the cost scales with the number of network packets. The rests are considered as bit cost (also called data touch cost), which means the cost is in proportion to the total I/O data size. The proportion of the costs highly depends on the hardware platform and the nature of applications. There are many measurements and analyses about network processing costs [9][10]. Generally, the kernel routine cost ranges from 10% - 30% of the total cycles; the driver and interruption handling costs range from 15% - 35%; the stack processing cost ranges from 7% - 15%; and data touch cost takes up 20% - 35%. With the development of high speed network (e.g. 10/40 Gbps Ethernet), an increasing tendency for kernel routines, driver and interruption handling costs is observed [3].2.2 MotivationTo reveal the relationship among each parts of network processing, we investigate the corresponding hardware operations. From the perspective of computerhardware architecture, network system performance is determined by three domains: CPU speed, Memory speed and I/O speed. Figure 2 depicts the relationship.Figure 2. Network xxxxObviously, the network subsystem can achieve its maximal performance only when the three domains above are in balance. It means that the throughput or bandwidth ofeach hardware domain should be equal with others. Actually this is hard for hardware designers, because the characteristics and physical implementation technologies are different for CPU, Memory and I/O system (chipsets) fabrication. The speed gap between memory and CPU – a.k.a “the memory wall” – has been paid special attention for more than ten years, but still it is not well addressed. Also the disparity between the data throughput in I/O system and the computing capacity provided by CPU has been reported in recent years [1][2].Meanwhile, it is obvious that the major time costs of network processing mentioned above are associated with I/O and Memory speeds, e.g. driver processing, interruption handling, and memory copy costs. The most important nature of network processing is the “producer -consumer locality” between every two consecutive steps of the processing flow. That means the data produced in one hardware unit will be immediately accessed by another unit, e.g. the data in memory which transported from NIC will be accessed by CPU soon. However for conventional I/O and memory systems, the data transfer latency is high and the locality is not exploited.Basing on the analysis discussed above, we get the observation that the I/O and Memory systems are the limitations for network processing. Conventional DCA or INIC cannot successfully address this problem, because it is in-efficient in either I/O transfer latency or I/O data locality utilization (discussed in section 5). To diminish these limitations, we present a combined DCA with INIC solution. The solution not only takes the advantages of both method but also makes many improvements in memory system polices and software strategies.3. Design MethodologiesIn this section, we describe the proposed DCA combined with INIC solution and give a framework of the implementation. Firstly, we present the improved DCA technology and discuss the key points of incorporating it into I/O and Memory systems design. Then, the important software data structures and the details of DCA scheme are given. Finally, we introduce the system interconnection architecture and the integration of NIC.3.1 Improved DCAIn the purpose of reducing data transfer latency and memory traffic in system, we present an improved Direct Cache Access solution. Different with conventional DCA scheme, our solution carefully consider the following points. The first one is cache coherence. Conventionally, data sent from device by DMA is stored in memory only. And for the same address, a different copy of data is stored in cache which usually needs additional coherent unit to perform snoop operation [11]; but when DCA is used, I/O data and CPU data are both stored in cache with one copy for one memory address, shown in figure 4. So our solution modifies the cache policy, which eliminated the snoopingoperations. Coherent operation can be performed by software when needed. This will reduce much memory traffic for the systems with coherence hardware [12].I/O write *(addr) = bCPU write *(addr) = aCacheCPU write *(addr) = a I/O write with DCA*(addr) = bCache(a) cache coherance withconventional I/O(b) cache coherance withDCA I/OFigure 3. xxxxThe second one is cache pollution. DCA is a mixed blessing to CPU: On one side, it accelerates the data transfer; on the other side, it harms the locality of other programs executed in CPU and causes cache pollution. Cache pollution is highly depended on the I/O data size, which is always quite large. E.g. one Ethernet package contains a maximal 1492 bytes normal payload and a maximal 65536 bytes large payload for Large Segment Offload (LSO). That means for a common network buffer (usually 50 ~ 400 packages size), a maximal size range from 400KB to 16MB data is sent to cache. Such big size of data will cause cache performance drop dramatically. In this paper, we carefully investigate the relationship between the size of I/O data sent by DCA and the size of cache system. To achieve the best cache performance, a scheme of DCA is also suggested in section 4. Scheduling of the data sent with DCA is an effective way to improve performance, but it is beyond the scope of this paper.The third one is DCA policy. DCA policy refers the determination of when and which part of the data is transferred with DCA. Obviously, the scheme is application specific and varies with different user targets. In this paper, we make a specific memory address space in system to receive the data transferred with DCA. The addresses of the data should be remapped to that area by user or compilers.3.2 DCA Scheme and detailsTo accelerate network processing, many important software structures used in NIC driver and the stack are coupled with DCA. NIC Descriptors and the associated data buffers are paid special attention in our solution. The former is the data transfer interface between DMA and CPU, and the later contains the packages. For farther research, each package stored in buffer is divided into the header and the payload. Normally the headers are accessed by protocols frequently, but the payload is accessed only once or twice (usually performed as memcpy) in modern network stack and OS. The details of the related software data structures and the network processing progress can be found in previous works [13].The progress of transfer one package from NIC to the stack with the proposed solution is illustrated in Table 1. All the accessing latency parameters in Table 1 are based on a state of the art multi-core processor system [3]. One thing should be noticed is that the cache accessing latency from I/O is nearly the same with that from CPU. But the memory accessing latency from I/O is about 2/3 of that from CPU due to the complex hardware hierarchy above the main memory.Table 1. Table captions should be placed above the tabletransfer.We can see that DCA with INIC solution saves above 95% CPU cycles in theoretical and avoid all the traffic to memory controller. In this paper, we transfer the NIC Descriptors and the data buffers including the headers and payload with DCA to achieve the best performance. But when cache size is small, only transfer the Descriptors and the headers with DCA is an alternative solution.DCA performance is highly depended on system cache policy. Obviously for cache system, write-back with write-allocate policy can help DCA achieves better performance than write-through with write non-allocate policy. Basing on the analysis in section 3.1, we do not use the snooping cache technology to maintain the coherence with memory. Cache coherence for other non-DCA I/O data transfer is guaranteed by software.3.3 On-chip network and integrated NICFootnotes should be Times New Roman 9-point, and justified to the full width of the column.Use the “ACM Reference format” for references – that is, a numbered list at the end of the article, ordered alphabetically and formatted accordingly. See examples of some typical reference types, in the new “ACM Reference format”, at the end of this document. Within this template, use the style named referencesfor the text. Acceptable abbreviations, for journal names, can be found here: /reference/abbreviations/. Word may try to automatically ‘underline’ hotlinks in your references, the correct style is NO underlining.The references are also in 9 pt., but that section (see Section 7) is ragged right. References should be published materials accessible to the public. Internal technical reports may be cited only if they are easily accessible (i.e. you can give the address to obtain thereport within your citation) and may be obtained by any reader. Proprietary information may not be cited. Private communications should be acknowledged, not referenced (e.g., “[Robertson, personal communication]”).3.4Page Numbering, Headers and FootersDo not include headers, footers or page numbers in your submission. These will be added when the publications are assembled.4.FIGURES/CAPTIONSPlace Tables/Figures/Images in text as close to the reference as possible (see Figure 1). It may extend across both columns to a maximum width of 17.78 cm (7”).Captions should be Times New Roman 9-point bold. They should be numbered (e.g., “Table 1” or “Figure 2”), please note that the word for Table and Figure are spelled out. Figure’s captions should be centered beneath the image or picture, and Table captions should be centered above the table body.5.SECTIONSThe heading of a section should be in Times New Roman 12-point bold in all-capitals flush left with an additional 6-points of white space above the section head. Sections and subsequent sub- sections should be numbered and flush left. For a section head and a subsection head together (such as Section 3 and subsection 3.1), use no additional space above the subsection head.5.1SubsectionsThe heading of subsections should be in Times New Roman 12-point bold with only the initial letters capitalized. (Note: For subsections and subsubsections, a word like the or a is not capitalized unless it is the first word of the header.)5.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized and 6-points of white space above the subsubsection head.5.1.1.1SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.5.1.1.2SubsubsectionsThe heading for subsubsections should be in Times New Roman 11-point italic with initial letters capitalized.6.ACKNOWLEDGMENTSOur thanks to ACM SIGCHI for allowing us to modify templates they had developed. 7.REFERENCES[1]R. Huggahalli, R. Iyer, S. Tetrick, "Direct Cache Access forHigh Bandwidth Network I/O", ISCA, 2005.[2] D. Tang, Y. Bao, W. Hu et al., "DMA Cache: Using On-chipStorage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance", HPCA, 2010.[3]Guangdeng Liao, Xia Zhu, Laxmi Bhuyan, “A New ServerI/O Architecture for High Speed Networks,” HPCA, 2011. [4] E. A. Le´on, K. B. Ferreira, and A. B. Maccabe. Reducingthe Impact of the MemoryWall for I/O Using Cache Injection, In 15th IEEE Symposium on High-PerformanceInterconnects (HOTI’07), Aug, 2007.[5] A.Kumar, R.Huggahalli, S.Makineni, “Characterization ofDirect Cache Access on Multi-core Systems and 10GbE”,HPCA, 2009.[6]Sun Niagara 2,/processors/niagara/index.jsp[7]PowerPC[8]Guangdeng Liao, L.Bhuyan, “Performance Measurement ofan Integrated NIC Architecture with 10GbE”, 17th IEEESymposium on High Performance Interconnects, 2009. [9] A.Foong et al., “TCP Performance Re-visited,” IEEE Int’lSymp on Performance Analysis of Software and Systems,Mar 2003[10]D.Clark, V.Jacobson, J.Romkey, and H.Saalwen. “AnAnalysis of TCP processing overhead”. IEEECommunications,June 1989.[11]J.Doweck, “Inside Intel Core microarchitecture and smartmemory access”, Intel White Paper, 2006[12]Amit Kumar, Ram Huggahalli., Impact of Cache CoherenceProtocols on the Processing of Network Traffic[13]Wenji Wu, Matt Crawford, “Potential performancebottleneck in Linux TCP”, International Journalof Communication Systems, Vol. 20, Issue 11, pages 1263–1283, November 2007.[14]Weiwu Hu, Jian Wang, Xiang Gao, et al, “Godson-3: ascalable multicore RISC processor with x86 emulation,”IEEE Micro, 2009. 29(2): pp. 17-29.[15]Cadence Incisive Xtreme Series./products/sd/ xtreme_series.[16]Synopsys GMAC IP./dw/dwtb.php?a=ethernet_mac [17]ler, P.M.Watts, A.W.Moore, "Motivating FutureInterconnects: A Differential Measurement Analysis of PCILatency", ANCS, 2009.[18]Nathan L.Binkert, Ali G.Saidi, Steven K.Reinhardt.Integrated Network Interfaces for High-Bandwidth TCP/IP.Figure 1. Insert caption to place caption below figure.Proceedings of the 12th international conferenceon Architectural support for programming languages and operating systems (ASPLOS). 2006[19]G.Liao, L.Bhuyan, "Performance Measurement of anIntegrated NIC Architecture with 10GbE", HotI, 2009. [20]Intel Server Network I/O Acceleration./technology/comms/perfnet/downlo ad/ServerNetworkIOAccel.pdfColumns on Last Page Should Be Made As Close AsPossible to Equal Length。
收稿日期:2020-01-215G专网技术解决方案和建设策略The T echnical Solutions and Construction Strategies for 5G Private Networks5G 所具备的海量连接、高可靠、低时延等技术特性,高度契合专网的需求,可实现视频直播、海量物联网设备接入、无人驾驶、远程医疗、智能制造等典型行业或政企专网应用。
针对5G 应用于专网市场,从5G 专网独立建设必要性、3GPP 5G NPN (Non-Public Network )技术最新进展和5G 核心网2B (To Business )网络建设策略三个方面分析5G 核心侧在行业/专网方向的发展与演进。
以期5G 在专网领域发挥更大的价值,促进“5G+行业”跨界合作形成良性循环。
5G ;专网;2B ;垂直应用;网络切片The technical characteristics of 5G, such as massive connections, high reliability and low latency, are highly suitable for the requirements of private networks. Hence the typical private network applications can be realized for industries, government and enterprises, such as live broadcast, massive IoT device access, unmanned driving, telemedicine and intelligent manufacturing. For the application of 5G in the private network market, this paper analyzes the development and evolution of the 5G core side in the industry/private network directions from the three aspects of the independent construction necessity of 5G private networks, the latest developments in 3GPP 5G NPN (Non-Public Network) standard technologies and 5G core network 2B (To Business) network construction strategies. It is expected that 5G will play a greater value in the private network fi eld and promote cross-border cooperation in "5G + industry" to form a virtuous circle.5G; private network; 2B; vertical application; network slice(中兴通讯股份有限公司,江苏 南京 210012)(ZTE Corporation, Nanjing 210012, China)【摘 要】【关键词】李立平,李振东,方琰崴LI Liping, LI Zhendong, FANG Yanweidoi:10.3969/j.issn.1006-1010.2020.03.002 中图分类号:TN929.5文献标志码:A 文章编号:1006-1010(2020)03-0008-06引用格式:李立平, 李振东, 方琰崴. 5G专网技术解决方案和建设策略[J]. 移动通信, 2020,44(3): 8-13.0 引言与2G/3G/4G 不同,5G 除了能够满足个人通信外,还能够满足多样化垂直行业的专网通信建设需要,为未来的工业化应用奠定基础,通信业内对于专网通信已经开展了深入和广泛的研究。
《基于OPNET的LTE切换技术研究》篇一一、引言随着移动互联网的快速发展,无线通信技术已成为人们生活中不可或缺的一部分。
而长期演进(LTE)技术以其高带宽、低时延等特点在移动通信网络中扮演着重要的角色。
其中,用户终端在不同小区之间的切换过程对于保障网络质量和用户体验至关重要。
因此,基于OPNET平台的LTE切换技术研究对于提高无线通信网络性能具有非常重要的意义。
二、OPNET技术概述OPNET是一款广泛使用的网络仿真软件,其特点在于可以构建复杂网络模型,并对网络性能进行全面分析和评估。
在LTE切换技术研究中,OPNET平台能够模拟真实环境下的信号传输和用户行为,从而实现对LTE切换过程的全面观察和深入分析。
三、LTE切换技术介绍LTE切换技术是指用户终端在不同基站之间进行切换的过程。
这一过程涉及到多个关键技术,包括测量报告、切换决策、切换执行等。
在LTE网络中,切换的顺畅与否直接影响到用户的通信质量和网络性能。
四、基于OPNET的LTE切换技术研究基于OPNET的LTE切换技术研究主要涉及到以下几个方面:1. 模型构建:在OPNET平台上构建LTE网络模型,包括基站、用户终端、信号传播等要素。
这一过程需要充分考虑实际网络环境的复杂性,以确保仿真结果的准确性。
2. 仿真实验:在构建好的模型上进行仿真实验,观察用户在不同场景下的切换过程。
通过调整参数和设置,可以模拟出不同信道条件、不同用户行为等场景下的切换情况。
3. 数据分析:对仿真实验结果进行数据分析,包括切换成功率、时延、丢包率等指标。
通过对这些指标的分析,可以评估LTE切换技术的性能,并找出存在的问题和改进方向。
4. 优化改进:根据数据分析结果,对LTE切换技术进行优化改进。
这包括改进测量报告的准确性、优化切换决策算法、提高切换执行效率等。
通过这些优化措施,可以提高LTE网络的性能和用户体验。
五、研究结果与讨论通过基于OPNET的LTE切换技术研究,我们可以得到以下结论:1. 切换技术的性能受到多种因素的影响,包括信道条件、用户行为、基站分布等。
Interlaken技术新一代数据包互连协议白皮书2010-11—12 23:41:36|分类:Interlaken | 标签:|字号大中小订阅1。
0 摘要串行链接技术提高了先进通信设备的设备互连带宽。
Interlaken 是一项为实现高带宽及可靠的包传输而优化的互连协议. 该协议使用多个串行链接,在器件间建立逻辑连接,并利用多通道、背压能力和数据完整性保护,提升通信设备的性能。
该白皮书概述Interlaken 的特点和实施案例研究.2.0 设计目标2.1 协议描述传统上,具有千兆位级吞吐量的器件的数据总线速率约为每管脚100 Mbps.差分信号技术将该带宽增加了接近10 倍,达至每对管脚800 Mbps,从而使器件的吞吐量达到10 Gbps。
具有时钟和数据恢复功能的新串行技术,又将带宽增加了10 倍,达至每对管脚6 Gbps,从而使器件的数据流速率达到数十Gbps。
相比之前的协议,该协议可减少了90% 的IO 管脚和PCB 线路。
该协议利用最先进的串行技术,以实现通信系统器件间基于包传输模式的,高速、健壮、灵活的接口,实现通信系统内器件之间的包传输。
2.2 带宽范围的连接。
如此宽的带宽范围,令该协议可适用于多项应用,并允许后向兼容多代设备。
Interlaken 适用于在以下设备中实施:具有多个10 Gbps 端口的MAC、OC-768 SONET framer、下一代100 Gb 以太网集成电路和100 Gbps switch fabric 与包处理器。
2。
3 扩展性Interlaken 具有在不同数量的通道上运行的能力,从而可实现其扩展性. 以下两个参数决定了连接带宽的大小:1. 接口的串行通道数量Interlaken 接口可使用任意数量的串行链接(或“通道”)。
有效带宽与通道数量直接相关。
例如,如图1 所示,当按相同的单通道速度运行时, 8—通道接口可承载的有效载荷是4 通道接口的两倍。
2. 各通道的频率有效带宽还与各通道比特率直接成比例。
Interconnect Tuning Strategies for High-Performance ICs Andrew B.Kahng,Sudhakar Muddu,Egino Sarto and Rahul SharmaSilicon Graphics,Inc.,Mountain View,CA94039muddu,sarto,ashu@,abk@AbstractInterconnect tuning is an increasingly critical degree of freedom in thephysical design of high-performance VLSI systems.By interconnecttuning,we refer to the selection of line thicknesses,widths and spac-ings in multi-layer interconnect to simultaneously optimize signal dis-tribution,signal performance,signal integrity,and interconnect manu-facturability and reliability.This is a key activity in most leading-edgedesign projects,but has received little attention in the literature.Ourwork provides thefirst technology-specific studies of interconnect tun-ing in the literature.We center on global wiring layers and intercon-nect tuning issues related to bus routing,repeater insertion,and choiceof shielding/spacing rules for signal integrity and performance.We ad-dress four basic questions.(1)How should width and spacing be allo-cated to maximize performance for a given line pitch?(2)For a givenline pitch,what criteria affect the optimal interval at which repeatersshould be inserted into global interconnects?(3)Under what circum-stances are shield wires the optimum technique for improving intercon-nect performance?(4)In global interconnect with repeaters,what otherinterconnect tuning is possible?Our study of question(4)demonstratesa new approach of offsetting repeater placements that can reduce worst-case cross-chip delays by over30%in current technologies.1IntroductionWith technology scaling,on-chip interconnect becomes an increasinglycritical determinant of performance,manufacturability and reliability inhigh-end VLSI designs.Current and future designs are generally interc-onnect-limited,and the available routing resource must be carefully bal-anced among signal distribution,power/ground distribution,and clockdistribution.Table1reproduces several technology projections fromthe1997SIA National Technology Roadmapfor Semiconductors[1].Anotable deviation from the original1994Roadmap is that maximum on-chip clock frequencies will reach the gigahertz range even in the180nmprocess generation.The implications of technology scaling–particu-larly for system interconnect–are very complicated.Example consid-erations for a7-layer metal(7LM)process might include:Local interconnect layers(e.g.,M1-M3)should generally remainat near-minimum dimensions and pitch in order to achieve rout-ing density(for an example analysis of interconnect density in0.25µm processes,see[10]).For short lines(e.g.,several hun-dred microns or less),thinner metal offers less lateral couplingcapacitance and driver loading,and thus locally improves circuitperformance.At the same time,maximum wire width is limitedby the aspect ratio upper bound.The resulting thin and narrowwires are highly resistive and also subject to reliability concerns;they are hence unsuitable for global interconnects,power distri-bution,etc.We also note that layers M2-M3(and maybe M4)will support a mix of local and“near-global”wiring,e.g.,long wires within a single block.The distribution of lengths and per-formance goals for these signals can vary considerably between designs;since shorter wires are better routed on thinner metal, these design-specific considerations will affect the interconnect.1When two parallel neighboring lines L1and L2switch simultaneously in opposite di-rections,the driver of L1sees the grounded line capacitance plus twice the coupling capac-itance of L1to L2.If L2is quiet when L1switches,then the driver of L1sees the grounded line capacitance plus the coupling capacitance to L2.And if L2switches simultaneously in the opposite direction,the driver of L1sees only the groundedline capacitance.(In leading-edge processes,each neighbor coupling is of the same(and possibly greater)magnitude as the area coupling to ground.)The“coupling factor”or“switching factor”is often given in the range02,and since most lines have two neighbors,the total coupling factor is in the range04.SIA National Technology Roadmap (1997)199720012006Minimum feature size -dense lines (nm )1801307075014002000#Wiring layers6-778-90.640.400.26Metal height/width aspect ratio1.8:12.1:12.7:1Table 1:Selected technology projections from the 1997SIA NTRS.influence on vendors’processes [10].Nevertheless,this topic has re-ceived very little attention in the literature,with only a small handful of high-level treatments available.2Our work is the first in the literature to attempt a wide-ranging study of interconnect tuning.We center on global wiring layers (e.g.,M4and M5in a 6LM process),and interconnect tuning issues related to bus routing,repeater insertion,and choice of shielding/spacingrules for sig-nal integrity and performance.3(Of necessity,our studies are for now independent of several other issues,e.g.,wire tapering and choice of wire thickness.)Coupling Capacitance per µm (aF)TopTotal(µm)Right Neighbor(ground)1.0,2.225.6146.841.2,2.029.2648.221.4,1.833.1151.531.6,1.638.6051.901.8,1.444.1251.522For example,[12]describes a characterization and analysis methodology and the need to break ideal scaling in deep submicron interconnect.[8]is another work that centers on analysis of a given multi-layer interconnect process,as opposed to the underlying intercon-nect tuning.[3]and [6]are examples of system-level treatments based on Rent’s rule for interconnect length distribution.3Even though the results presented in this paper are for aluminum interconnects with SiO2dielectric,similar techniques can be applied for copper interconnects and low-K dielectrics.2Allocation of Width and Spacing for Given PitchOur first study seeks to determine how width and spacing should be op-timally allocated for a given line pitch.In practice,the actual line width used is considerably greater than the minimum line width achievable in lithography.Thus,there is freedom to tune the width and spacing once assumptions are in place for line thickness and target line length.We note that becausevery long inter-block lines will have repeaters inserted regularly (see Section 3below),the maximum line length of interest is equal to the optimum interval between repeaters;this length ranges be-tween 2500µm and 5000µm for global interconnect layers in leading-edge technologies.We have performed detailed studies of “fast”M3interconnect with 3.2µm pitch,assuming that M2crossunders are dense (i.e.,can be ap-proximated as a ground plane)[9]and explicitly modeling M4crossovers.Dielectric modeling is based on actual layer data for a representative 0.25µm CMOS process.QuickCap was used to extract coupling and area capacitances,summarized in Table 1.As is typical in such anal-yses,we assume worst-case coupling,i.e.,a total coupling factor of 4.0(worst-case coupling factor of 2.0to each of the left and right neighbors of the (victim)line under analysis).Table 3shows HSPICE-computed line delays for M3line lengths ranging from 4000µm to 6000µm .Again,dense M2is assumed to be a ground plane,and M4crossovers are modeled explicitly.The Table shows that (width,spacing)=1220µm gives the best performance for the given line pitch.3Bounding the Interval Between RepeatersA very basic study (in some sense a pre-requisite to all other intercon-nect tuning)asks how often repeaters should be inserted into global in-terconnects.This is of course a chicken-egg problem,in that the opti-mum repeater interval depends on the interconnect tuning,and the in-terconnect tuning depends on the maximum run ever made without an intervening repeater.However,the following can be noted.A body of study shows that repeaters should be inserted at uni-form intervals.In other words,there should be a constant inter-connect length (or interconnect delay)between each pair of adja-cent repeaters;the first and last segments of the path are excep-tions because in practice the driver and receiver sizes may not be the same as the repeater size.Actually,such theoretical results de-viate from real-life practice.On any source-destination path the repeater sizes need not be the same.It may also be better to add repeaters in parallel in order to drive larger wire lengths.(This is not just for performance:repeaters locally affect device area and routing constraints.However,our studies have not yet ad-dressed such layout ing the same principle (and with certain types of methodology and chip planning constraints),it can be better to increase the size of the drivers inside the block as much as possible,which would increase the first segment length.Assuming that the driver size and the receiver size are the same as the size of the repeaters inserted along the path,we calculate the total delay,optimal number of repeaters and optimal distanceWidth,Space5000µm M3length (µm)Driver Total Driver Total Driver Total Load DelayDelay Load Delay DelayLoad DelayDelay 113.99168.36233.091.2,2.0115.00215.73143.76293.02172.51379.6592.80138.04192.101.6,1.6138.77225.89173.46303.04208.15389.5682.84124.03173.41WidthLengthDelayFall Time(wp,wn)(µm)(µm)SF (ps)(130,65)/(130,65) 1.141679(130,65)/(130,65) 1.141421(130,65)/(130,65) 1.141187(130,65)/(130,65) 1.14975(130,65)/(130,65) 1.14623(130,65)/(130,65) 1.131405(130,65)/(130,65) 1.131193(130,65)/(130,65) 1.131001(130,65)/(130,65) 1.13828(130,65)/(130,65) 1.13538(130,65)/(130,65) 1.121131(130,65)/(130,65) 1.12966(130,65)/(130,65) 1.12817(130,65)/(130,65) 1.12682(130,65)/(130,65) 1.12456(130,65)/(130,65) 1.641123(130,65)/(130,65) 1.64963(130,65)/(130,65) 1.64818(130,65)/(130,65) 1.64686(130,65)/(130,65) 1.64465(130,65)/(130,65) 1.63992(130,65)/(130,65) 1.63854(130,65)/(130,65) 1.63729(130,65)/(130,65) 1.63615(130,65)/(130,65) 1.63422(130,65)/(130,65) 1.62862(130,65)/(130,65) 1.62746(130,65)/(130,65) 1.62640(130,65)/(130,65) 1.62543(130,65)/(130,65)1.62382stageK1T RepstageThe delay of the first stage is the total delay from the output ofdriver to the input of the first repeater,i.e.,T f irstK 1.Therefore,the total delay for the path isT K totK 1T gdT intK1R rep αc L p2K1C rep(1)where r ,c are resistance and capacitance per unit length of the interconnect line.We compute the optimal number of repeaters that minimizes total delay by setting ∂T totrcL 2pRULE1RULE2RULE32.0 4.0 6.08.012.010.0Figure1:Pitch-matched width-spacing rules.Rule1allows six lines per13.2µm;Rule2and the Single-V SS rule(Rule1width/spacing,but ev-ery third line grounded)both allow four signal lines per13.2µm;andRule3and the Double-V SS rule(Rule1width/spacing,but every otherline grounded)both allow three signal lines per13.2µm.one neighbor switching in the same direction with respect to the victim).We see that the M3distance between repeaters has an upper bound of5000µm due to edge rate considerations alone.Separate studies showthat this upper bound on distance between repeaters is essentially unaf-fected by changes to the driver/receiver sizing or the input slew time.4Benefits of Shield WiringOur third study addressesthe question of whether shield wiring is aneffective means of improving delay and signal integrity performance oflong global interconnects.We consider various width-spacing rules forM3interconnect,in order to evaluate the utility of spacing vs.shield-ing techniques.Our evaluations are with respect to delay only;for allof the configurations,the assumed slew time upper bounds of approxi-mately600ps imply that noise coupling will not be problematic.Figure1contrastsfive pitch-matched width-spacing rules:Rule1:1.2µm width,1.0µm spacingSingle-V SS:1.2µm width,1.0µm spacing,with every third linegrounded(i.e.,every signal line has one grounded neighbor to shieldit)Rule2:1.2µm width,2.1µm spacingRule3:2.2µm width,2.2µm spacingDouble-V SS:1.2µm width,2.1µm spacing,with every other linegrounded(i.e.,every signal line has two grounded neighbors toshield it)Again,QuickCap was used to extract capacitive couplings of a givenvictim line to its neighbor lines and the neighboring top/bottom layers;these results are shown in Table5.Notice that the Rule1,Rule2andRule3rules have worst-case coupling factors=4.On the other hand,the Single-V SS rule has worst-case coupling factor=3,and the Double-V SS rule has worst-case coupling factor=2.Table6shows the delayperformance for a4000µm M3line,under various bottom ground andtop plane configurations.We observe:The Rule3rule provides37%decrease in total delay,but sinceC e f f was not used in the gate load delay computation,actual delayreductions could be even greater.The Single-V SS rule is less effective than the Rule2rule;note thatthe two rules are equivalent in terms of effective routing density.Our studies have not yet addressed the routing interactions thatcan potentially affect this analysis.In particular,shield lines maybe addedto bring powerand ground connectionsto repeaterblocks.(a)(b)Figure2:Reduction of worst-case Miller coupling by offsetting invert-ers.In(a),inverters on the left and right neighbor lines are at phase=0with respect to the inverters on the middle line.In(b),inverters on theleft and right neighbors are at phase=0.5.The Double-V SS rule gives improved total delays compared withthe Rule3rule,with the rules being equivalent in terms of effec-tive routing density.However,the Rule3rule yields smaller in-terconnect delays,so that driver size reductions have greater po-tential for delay improvement.Thus,the Rule3rule seems prefer-able.When two buseshave activity patterns such that each is quietwhen the other is active,then their lines can be interleaved suchthat they effectively follow the Double-V SS rule.In such a case,interleaving is clearly superior to the Rule3rule,since the effec-tive routing density is doubled.Gate load delays are larger than interconnect delays,suggestingthat it is preferable to decrease line widths and increase line spac-ings.We also note that a dense M4top layer decreases total delay,and a dense M2bottom(ground plane)layer decreasestotal delayfor smaller line widths only.5New Repeater Offset Methodology for Global BusesFinally,we study another form of tuning that is possible for global inter-connects.Our motivations are three-fold:(i)global interconnect is in-creasingly dominated by wide buses;(ii)present methodology designsglobal interconnects for worst-case Miller coupling;and(iii)presentmethodology routes long global buses using repeater blocks,i.e.,blocksof co-located inverters spaced every,say,4000µm.We have proposed a simple method to improve global interconnectperformance.The idea is to reduce the worst-case Miller coupling byoffsetting the inverters on adjacent lines(see Figure2).In the previousmethodology(Figure2(a)),the worst-case switching of a neighbor line(i.e.,simultaneously and in the opposite direction to the switching ofthe victim line)persists through the entire chain of inverters.However,with offset inverter locations(Figure2(b)),any worst-case simultane-ous switching on a neighbor line persists only for half of each periodbetween consecutive inverters,and furthermore becomes best-case si-multaneous switching for the other half of the period!.To confirm the advantages of this method,the following experimen-tal methodology was used.Coupling Capacitance per µm (aF)Width,SpaceTopTotalPlanesRight Neighbor(ground)Rule1Substrate,M4Line 68.1514.79Rule1M2,M4Line 60.9234.88Rule1M2,–74.2342.99Rule2Substrate,M4Line 34.3718.07Rule2M2,M4Line 27.1048.72Rule2M2,–42.4359.15Rule3Substrate,M4Line 36.5022.14Rule3M2,M4Line 25.6167.92Rule3M2,–43.8673.23%GainWidth,SpaceDriver Load Total (µm)DelayDelayRule1Substrate,M4Line116.88–1.2,1.0167.84281.87Rule1M2,–119.62–1.2,2.1114.47199.22Rule2M2,M4Line83.66301.2,1.0137.41234.75Rule1with Single VSSM2,M4Line96.66171.2,1.0139.14237.42Rule2M2,–87.39272.2,2.2126.91176.85Rule3M2,M4Line50.90362.2,2.2130.40181.39Rule1with Double VSSSubstrate,M4Line78.11371.2,1.0104.34185.17Rule1with Double VSSM2,–78.5329Table 6:Delay estimates for a 4000µm M3line,under various interconnect tuning configurations.Driver and receiver buffer sizes:(wp=100µm,wn=50µm).Delay is computed from input of driver to input of receiver.We study systems of three parallel interconnect lines,with lengths either 10000µm or 14000µm .Theselines are stimulated by a wave-form with risetime =falltime =200ps.The middle line is consid-ered the “victim”for analysis purposes.We model two “technologies”representative of M3and M4in an 0.25µm CMOS process.In each technology,line resistance is 50Ωper 1000µm .In Technology I,capacitive couplings to left neighbor,ground and right neighbor per 1000µm are respectively 60fF,80fF and 60fF.In Technology II,capacitive couplings to left neighbor,ground and right neighbor per 1000µm are respectively 80fF,160fF and 80fF.We assume a period between inverters (repeaters)of 4000µm .So that HSPICE cannot introduce any error in its RC analysis,we manually distributed the line and coupling parasitics into 40µm segments,i.e.,repeaters occurred every 100segments,and line lengths were 250or 350segments.Each segment is modeled as a double-pi model.4We always place the inverters on the middle line with “phase =0”,i.e.,at positions 4000,8000,...microns along the line.In-verters on the left and right neighbors are placed according to all combinations of phase =0,0.1,0.2,...,0.9(again with respect toInput waveforms(Left neighbor,Left,right neighbor buffer phases:0.5,0.5 victim,Left neighbor Right neighbor Left neighbor Right neighborDelay Delay0.3610.6300.5840.6970.9940.6890.5840.6970.5840.6970.9940.6890.5840.6970.3610.630Table7:HSPICE delays(ns)for three lines of length10000µm,using Technology I,for all combina-tions of rising(R)and falling(F)initial transition on the input waveform.We show delays for inverter phases(0,0)and(0.5,0.5)on the left and right neighbors of the middle line(phase0).possible.The worst-case delay is reduced by anywhere from25%to 30%when the repeaters are placed with optimum phase.Finally,Table 9shows the same worst-case delays for the middle line,this time taken over all eight rise/fall combinations and all nine combinations of input waveform offsets.Again,even when the inputs do not switch perfectly simultaneously,the best phase combination is(0.5,0.5)and the worst phase combination is the traditional(0.0,0.0)methodology.6ConclusionsTo our knowledge,this work has provided thefirst technology-specific studies of interconnect tuning in the literature.We have described ex-perimental approaches to interconnect tuning issues related to bus rout-ing,repeater insertion,and choice of shielding/spacing rules for signal integrity and performance.In particular,four questions have been ad-dressed:allocation of width and spacing to maximize performance for a given pitch,finding the optimal interval for repeater insertion,assessing the potential benefitsof shield wiring,and optimizing the insertion of re-peaters in global buses.Our answers to these questions are at times sur-prising:in answering(3),we demonstrate that current shielding method-ologies may be suboptimal when compared with alternate width/spacing rules,and in answering(4),we propose a new repeater offset technique that can reduce worst-case cross-chip delays by over30%in current tech-nologies.Ongoing efforts extend our interconnect tuning research to en-compasslayer thicknesses,more detailed analysesof noise coupling and tuning to meet noise margins,and the delay/noise behavior in emerg-ing technology regimes(Cu interconnect and low-K dielectrics).Fi-nally,we seek to develop more complete full-chip interconnect tuning approachesbasedon analyses of the interconnect structure,speed target, and power dissipation target for a given design. REFERENCES[1]Semiconductor Industry Association,National TechnologyRoadmap for Semiconductors,December1997.[2]A.Deutsch,G.V.Kopcsay,C.W.Surovic,B.J.Rubin,L.M.Ter-man,R.P.Dunne and T.Gallo,“Modeling and Characterization of Long On-chip Interconnections for High-Performance Micropro-cessors”,final report,ARPA HSCD Contract C-556003,Septem-ber1995.Also appeared in IBM Journal of Research and Devel-opment39(5),Sept.1995,pp.547-567.[3]P.D.Fisher,“Clock Cycle Estimations for Future MicroprocessorGenerations”,Proc.IEEE Innovative Systems in Silicon,Austin, October1997.[4]L.Gwennap,“IC Makers Confront RC Limitations”,MicrodesignResources Microprocessor Report,August4,1997,pp.14-18.[5]Potin,U.Ghoshal,E.Chiprout and S.R.Nassif,“PhysicalDesign Challengesfor Performance”,International Symposium on Physical Design,April1997,pp.225-226.[6]J.Meindl,“GigaScale Integration:’Is the Sky the Limit’?”,keynote presentation slides,Hot Chips IX,Stanford,CA,August 25-26,1997.[7]L.Scheffer,“A Roadmap of CAD Tool Changes for Sub-micronInterconnect Problems”,International Symposium on Physical Design,April1997,pp.104-109.[8]R.F.Sechler,“Interconnect design with VLSI CMOS”,IBM Jour-nal of Research and Development,Jan.-March1995,pp.23-31.[9]J.Cong,L.He,A.B.Kahng,D.Noice,N.Shirali and S.H.-C.Yen,“Analysis and Justification of a Simple,Practical21/2-D Capac-itance Extraction Methodology”,Proc.Design Automation Con-ference,June1997.[10]L.Gwennap,“IC Vendors Prepare for0.25-Micron Leap”,Micro-processor Report,September16,1996,pp.11-15.[11]A.B.Kahng and S.Muddu,“Efficient gate Delay Modeling forLarge Interconnect Loads”,Proc.IEEE Multi-Chip Module Conf., Feb.1996.[12]S.-Y.Oh,K.-J.Chang,N.Chang and K.Lee,”Interconnect model-ing and design in high-speed VLSI/ULSI systems”,Proc.Interna-tional Conference on Computer Design:VLSI in Computers and Processors,October1992,pp.184-189.0.10.30.50.70.900.9880.9540.9100.9000.9620.10.9740.9380.8850.8810.9520.20.9600.9170.8480.8630.9320.30.9380.8900.8060.8340.9120.40.9110.8550.7530.8050.8850.50.8850.8060.6970.7780.8670.60.8540.8010.7350.7680.8320.70.8810.8340.7780.7960.8590.80.9170.8720.8220.8270.8940.90.9520.9120.8670.8590.924B.Line length10000µm,Technology II0.10.30.50.70.90 1.422 1.370 1.299 1.300 1.3880.1 1.405 1.347 1.258 1.278 1.3720.2 1.379 1.315 1.206 1.247 1.3470.3 1.347 1.274 1.144 1.208 1.3140.4 1.306 1.223 1.075 1.161 1.2730.5 1.258 1.144 1.015 1.124 1.2390.6 1.234 1.158 1.069 1.120 1.2090.7 1.278 1.208 1.124 1.160 1.2500.8 1.324 1.261 1.180 1.203 1.2930.9 1.372 1.314 1.239 1.250 1.339C.Line length14000µm,Technology I0.10.30.50.70.90 1.467 1.429 1.383 1.340 1.4270.1 1.454 1.414 1.356 1.324 1.4170.2 1.439 1.393 1.320 1.299 1.3950.3 1.414 1.362 1.276 1.267 1.3750.4 1.385 1.328 1.223 1.229 1.3420.5 1.356 1.276 1.105 1.203 1.3230.6 1.308 1.217 1.146 1.162 1.2810.7 1.324 1.267 1.203 1.192 1.2870.8 1.370 1.319 1.263 1.240 1.3300.9 1.417 1.375 1.323 1.287 1.377D.Line length14000µm,Technology II0.10.30.50.70.90 2.108 2.052 1.983 1.938 2.0560.1 2.092 2.029 1.943 1.913 2.0390.2 2.064 1.996 1.889 1.878 2.0120.3 2.029 1.952 1.823 1.833 1.9770.4 1.985 1.898 1.743 1.778 1.9320.5 1.943 1.823 1.590 1.744 1.9030.6 1.876 1.765 1.664 1.686 1.8430.7 1.913 1.833 1.744 1.741 1.8670.8 1.974 1.903 1.823 1.801 1.9250.9 2.039 1.977 1.903 1.867 1.989 Table8:Worst-case middle line delays over all input rise/fall combinations,for each phase combina-tion on left and right neighbors.Input offsets are all0ps.0.10.30.50.70.9 0 1.071 1.0210.9420.984 1.051 0.1 1.0540.9950.9050.958 1.035 0.2 1.0260.9640.8650.930 1.008 0.30.9950.9240.8250.8940.980 0.40.9570.8760.7820.8560.944 0.50.9050.8250.7600.8240.900 0.60.9200.8540.7910.8490.911 0.70.9580.8940.8240.8800.945 0.80.9970.9360.8600.9110.9820.9 1.0350.9800.9000.945 1.016B.Line length10000µm,Technology II0.10.30.50.70.9 0 1.502 1.430 1.335 1.379 1.476 0.1 1.475 1.396 1.284 1.345 1.449 0.2 1.440 1.350 1.229 1.305 1.416 0.3 1.396 1.295 1.171 1.258 1.373 0.4 1.343 1.231 1.114 1.205 1.321 0.5 1.284 1.171 1.074 1.175 1.279 0.6 1.292 1.200 1.124 1.190 1.281 0.7 1.345 1.258 1.175 1.234 1.328 0.8 1.398 1.315 1.226 1.280 1.3760.9 1.449 1.373 1.279 1.328 1.425C.Line length14000µm,Technology I0.10.30.50.70.9 0 1.551 1.502 1.419 1.429 1.521 0.1 1.534 1.472 1.388 1.406 1.499 0.2 1.507 1.442 1.345 1.373 1.475 0.3 1.472 1.401 1.293 1.334 1.443 0.4 1.438 1.353 1.241 1.288 1.406 0.5 1.388 1.293 1.171 1.256 1.365 0.6 1.362 1.279 1.203 1.247 1.339 0.7 1.406 1.334 1.256 1.288 1.377 0.8 1.451 1.388 1.310 1.332 1.4240.9 1.499 1.443 1.365 1.377 1.471D.Line length14000µm,Technology II0.10.30.50.70.9 0 2.190 2.116 2.031 2.027 2.147 0.1 2.161 2.081 1.982 1.991 2.119 0.2 2.125 2.035 1.920 1.946 2.084 0.3 2.081 1.980 1.846 1.893 2.041 0.4 2.029 1.913 1.775 1.831 1.989 0.5 1.982 1.846 1.666 1.804 1.955 0.6 1.930 1.818 1.730 1.773 1.901 0.7 1.991 1.893 1.804 1.830 1.957 0.8 2.053 1.965 1.879 1.892 2.015 0.9 2.119 2.041 1.955 1.957 2.079Table9:Worst-case delays with all combinations of input offsets.。