cloudera-quickstart安装使用总结
- 格式:pdf
- 大小:909.04 KB
- 文档页数:7
cloudera原理Cloudera is a company that provides a platform for big data management and analytics. The Cloudera platform is based on Apache Hadoop, an open-source framework for distributed storage and processing of large datasets. In this response, I will discuss the principles behind Cloudera's platform, its architecture, key components, and the benefits it offers.At its core, Cloudera aims to simplify the process of managing and analyzing big data. The platform allows organizations to store, process, and analyze large volumes of structured and unstructured data in a distributed computing environment. By leveraging the power of Hadoop, Cloudera enables businesses to gain insights and make data-driven decisions.The architecture of Cloudera's platform is designed to handle the challenges of big data. It consists of multiple components that work together to provide a comprehensivesolution. One of the key components is Hadoop Distributed File System (HDFS), which is responsible for storing data across multiple machines in a cluster. This distributed storage system ensures high availability and fault tolerance.Another important component is Apache MapReduce, which is a programming model and software framework for processing large datasets in parallel. MapReduce divides the computation into smaller tasks and distributes them across the cluster, allowing for efficient processing of big data. Cloudera also provides additional tools and frameworks like Apache Hive, Apache Pig, and Apache Spark, which offer higher-level abstractions and ease of use for data processing and analysis.Cloudera's platform also includes a management and monitoring system called Cloudera Manager. This component allows administrators to manage and monitor the entire Hadoop cluster from a single interface. It provides features like automated deployment, configuration management, and performance monitoring, making it easier tomaintain and optimize the cluster.One of the key benefits of using Cloudera's platform is its scalability. As data volumes continue to grow exponentially, organizations need a solution that can scale seamlessly. Cloudera's platform can handle petabytes of data and can easily scale by adding more machines to the cluster. This scalability ensures that businesses can continue to process and analyze data without any limitations.Another advantage of Cloudera is its ability to handle diverse data types. The platform supports a wide range of data formats, including structured, semi-structured, and unstructured data. This flexibility allows organizations to derive insights from various data sources, such as log files, social media feeds, and sensor data.Furthermore, Cloudera offers advanced analytics capabilities through its integration with machine learning frameworks like Apache Spark and Apache Mahout. These frameworks enable businesses to perform complex analyticstasks, such as predictive modeling, anomaly detection, and recommendation systems. By leveraging these capabilities, organizations can uncover hidden patterns and insights from their data, leading to better business outcomes.In conclusion, Cloudera's platform is built on the principles of simplicity, scalability, and flexibility. It provides a comprehensive solution for managing and analyzing big data, leveraging the power of Hadoop and other open-source technologies. With its distributed storage, parallel processing, and advanced analytics capabilities, Cloudera enables organizations to unlock the value of their data and make informed decisions.。
Cloudera Manager大数据平台运维操作指南1登录Cloudera Manager首页使用谷歌或者火狐浏览器输入http://172.31.0.29:7180 进入Cloudera Manager管理系统登录页面,172.31.0.29为安装Cloudera Manage的机器IP地址,在BI集群里恰好是YUCLIENT的IP地址。
使用admin/admin登录到系统,如下图所示:2启动/停止/重启Cloudera Management Service2.1启动2.2重启2.3停止3启动/停止/重启Hadoop所有服务在启动Hadoop相关服务器前请务必保证已经启动了Cloudera Management Service相关服务!3.1启动当所有服务都启动成功后,各项服务均显示问绿灯状态才为正常,如下图所示:4启动/停止/重启Hadoop某项服务如HDFS、Hive、MapReduce、ZooKeeper 这里以重启HDFS服务为例,其他服务与之完全相同。
5启动/停止/重启单个节点上的Hadoop某项服务如HDFS、Hive、MapReduce、ZooKeeper启动/停止/重启节点上的单个服务包括Namenode, Datanode, Hive Server, Hive Metastore, Resource Manager, Nodemanager, Zookeeper等。
这里以重启一台节点上的Datanode服务为例进行说明,启动或停止与之相同,其他服务的操作也与之相同。
等待重启成功,该操作完成。
6修改HDFS、Hive、MapReduce、ZooKeeper等配置参数修改参数包括修改全局参数以及单个节点的某个参数,全局参数指针对整个集群所有节点适用的参数。
下一章会介绍如何修改单个节点的参数。
这里以修改HDFS的blocksize为例进行说明,其他参数修改与之相同。
修改参数并保存修改成功后回到CM主页面,CM会提示是否需要部署客户端配置或者重启服务。
MV440/MV420读码器测试快速指南说明:本文档只用于初次使用MV420/440的使用者基本设定参考,在“调整”模式下测试读码能力。
更多参数设定及运行模式的说明请参考用户手册。
硬件准备:作为读码调整应用,最少需要准备的邮件设备如下:1:读码器本体(MV420/440):2:电缆:M16接口电缆(接口中最粗大那个)用于供电M12-RJ45电缆(用于连接电脑网口)3:电源:24V电源,电流在250mA或以上PC端软件准备:1:IE浏览器2:JAVA插件(通过IE浏览器设置中可以看到是否有装)若没有安装JAVA,可以在随设备的光盘中找到JAVA安装程序。
硬件连接:1:将M16电缆中红色的接24V电源正,蓝色的接24V电源负。
2:将M16电缆接口连接到MV420M12-RJ45电缆一头接MV420,一头接电脑网口。
PC端设置:1a:将PC中网络设置端TCP/IP协议中的IP地址设置为如图:附注:出厂的读码器一般设置为:192.168.0.XX,PC端的IP地址只有和该地址最后两位设置不同即可。
(0~255的任意其他数)。
2:查看读码器内置IP地址。
常规出厂会设置为192.168.0.42,也可能会是其他IP地址,可以通过软件“Primary Setup Tool”查看,附带的CD中有该软件的安装程序。
打开该程序:选择Network->Browser, 或按快捷键。
(选中Assign IP parameter之前的单选框也可以对该读写器IP地址进行改写)开始调试:1:进入IE浏览器,在地址栏中键入该读写器的IP地址,比如:192.168.0.42. 2:可以看到如下界面,说明读码器和PC连接正常。
在界面右上角,可以选择语言,支持中文简体界面。
3:点击读码器图标,进入后续调试界面4:在界面中,点击左侧第一项:调整(Adjustment),进入如下图形界面4:此时,在界面右上半部分可以看到成像的结果5:首先设置曝光为自动曝光(一般应用场合自动模式即可)6:调整镜头:对MV420,请用附带的六角改锥将镜头保护罩上的3个小螺丝松开。
安装部署CloudStack 4.0企业私有云平台Ubuntu安装方式参考资料CloudStack Installation_Guide/installation.htmlCloudStack Admin_guide/working-with-iso.htmlCloudStack Admin_guide/creating-vms.htmlCloudStack Admin_guide/create-templates-overview.html目录结构1. 什么是CloudStack2. 宿主机的系统需求3. 配置安装源4. 安装Management Server5. 安装配置KVM虚拟化Host主机6. 用户界面7. 配置Management Server8. 创建Instance类型9. 创建ISO安装源并创建Instance10. 创建并定制Template11. 通过定制的Template创建VM Instance12. 其它优化设置环境介绍OS: Ubuntu Server 12.04.1 64-bitServer:10.6.203.10 cloudstack-server-1- CloudStack Management Server- CloudStack Agent- NFS Server- MySQL Server注:CloudStack支持很好的分布式架构,上面- 代表的所有角色都可以部署在不同的机器上,但在测试环境中因为条件有限我全部都部署到了一台机器上。
1. 什么是CloudStackCloudStack是一个开源的具有高可用性及扩展性的云计算平台。
提到开源的云计算平台,相信大家首先想到的可能是OpenStack,目前国内的几家云计算平台如阿里云、盛大云以及新浪SAE貌似都基于OpenStack做了二次开发。
但使用过CloudStack之后,你会发现其实CloudStack更像是一个商业化过后的产品,有着非常好的用户界面,各个模块默认集成的很好,且安装与部署过程也相对容易一些。
Table of ContentsGet the power of Hadoop faster, with less risk 2What is your business goal? 2Dell EMC Ready Bundle for Cloudera Hadoop 3Configuration details 3Why Dell EMC for Hadoop? 4Complete your solution with Dell EMC Services and financing 5Dell EMC Professional Services 5Dell EMC Financial Services 5Find out more today 6Dell EMC Ready Bundle for Cloudera HadoopAn end-to-end Hadoop system, designed to address data analytics requirements, reduce costs and optimize performanceLeverage anend-to-end solutionReduce development costsOptimize performance Get the power of Hadoop faster, with less riskUnlike traditional systems, Hadoop enables multiple types of analytic workloads to runon the same data, at the same time, at massive scale on industry-standard hardware Cloudera’s Distribution of Hadoop (CDH) includes Apache® Hadoop and additional key open source projects to ensure you get the most out of Hadoop and your data, and it's engineered to meet the highest enterprise standards for stability and reliabilityDespite these tantalizing benefits, many organizations struggle — either to begin their data analytics journey or to make Hadoop projects successful once they’ve begunThey are often impeded by a lack of Hadoop expertise and end up spending too much time and effort on the front-end work before they can get to the results of a fully operational solutionExpertise and infrastructure matter when building a Hadoop environment That’s whyDell EMC has teamed up with industry leaders such as Cloudera®, Intel® and Syncsort® to remove the uncertainty and barriers that may be holding you back from deploying Hadoop Cost-effective, future-ready Dell EMC Ready Bundles for Cloudera Hadoop are comprehensive and easy-to-implement turnkey Hadoop solutions that help you efficiently harness the Hadoop platform and the power of data analytics to drive competitive advantageWhat is your business goal?The use cases for Hadoop are very diverse, but there are common patterns across industries and verticalsThis is just a sampling of possible use cases using the Dell EMC Ready Bundle for Cloudera HadoopOperational efficiency use casesDell EMC warehouse augmentation Log aggregation and analytics Dual storage and active archiveReduces total cost of ownership (TCO) and increasesreturn on investment (ROI)Secures your enterprise Reduces TCO and eases compliance• Offload extract, transform, load (ETL) workloads • Reduce licensing costs• Enhance data accessibility• Enable better data exploration and analytics • Manage performance more effectively • Prevent security breaches and threats• Detect operational anomalies• Increase infrastructure efficiency and automation• Lower data storage costs while maintainingaccessibility• Ease compliance and reporting• Streamline inquiry processes• Enjoy business operations improvementBusiness transformation use casesMarketing Finance Healthcare Pharmaceutical ManufacturingAnticipating customer needs Reducing risk and detecting fraud Improving patient care and reducingcosts Ensuring regulatory compliance andvalidationAchieving continuous processimprovement• Customer 360 insight• Customer retention• Customer segmentation • Customer loyalty• New product/service launch • Credit scoring• Customer analytics• Fraud detection• Risk management• Sarbanes-Oxley Act (SOX)compliance• Quality of care• Patient safety• Risk mitigation• Fraud detection• Claims management• Biomedical analytics• Stability and shelf life• Primary research• FDA compliancemanufacturing• Product quality• Customer insight• Demand forecasting•Improved operationsDell EMC Ready Bundle for Cloudera HadoopDell EMC Ready Bundle for Cloudera Hadoop is an integrated Hadoop system, designed to address data analytics requirements, reduce costs and optimize performanceSince 2011, Dell EMC and Cloudera have built validated solutions to help customers speed time to insights With our deep roots in data analytics solutions and Hadoop — and other leading partners in data analytics — Dell EMC has the expertise, tools and solutions needed to drive successful, flexible and scalable Hadoop deploymentsConfiguration detailsDell EMC Ready Bundles for Cloudera Hadoop offer a variety of configurations to meet your needs 1Dell EMC Cloudera Hadoop solution on PowerEdge R730XD ServerDell EMC Cloudera Hadoop solution on PowerEdge FX2 ServerDell EMC Cloudera Syncsort ETL offload Hadoop solution on PowerEdge R730XD Server Dell EMC Cloudera Syncsort ETL offload Hadoop solution on PowerEdge FC630 Server Dell EMC QuickStart for Cloudera HadoopUse cases Active archive/customer 360-degree Active archive/customer 360-degree ETL offload ETL offloadTesting and proofs of concept (POCs)Sizing Scales from 5 to 252 nodes, 3 8PBScales from 5 to 252 nodes, 3 8PBScales from 5 to 252 nodes, 3 8PB Scales from 5 to 252 nodes, 3 8PB 48TBSoftwareCloudera Enterprise OpenManage/iDRAC with Lifecycle Controller Cloudera Enterprise OpenManage/iDRAC with Lifecycle Controller Cloudera Enterprise Syncsort DMX-h and SILQ™Cloudera Enterprise Syncsort DMX-h and SILQCloudera Basic Edition OpenManage/iDRAC with Lifecycle Controller Pod network2 x Networking S4048 10GbE pod switches 1 x S3048 iDRAC switch 2 x Networking S4048 10GbE pod switches 1 x S3048 iDRAC switch 2 x Networking S4048 10GbE pod switches 1 x S3048 iDRAC switch 2 x Networking S4048 10GbE pod switches 1 x S3048 iDRAC switch Networking S3048-ON switchNetworking S4048-ON switchCluster network 2 x Networking S6000 40GbE cluster switches 2 x Networking S6000 40GbE cluster switches 2 x Networking S6000 40GbE cluster switches 2 x Networking S6000 40GbE cluster switches Infrastructure nodes1 x PowerEdge R630 Server admin node3 x PowerEdge R730XD Server name nodes1 x PowerEdge R730XD Server edge node 1 x PowerEdge R630 Server admin node3 x PowerEdge R730XD Server name nodes1 x PowerEdge R730XD Server edge node 1 x PowerEdge R630 Server admin node3 x PowerEdge R730XD Server name nodes1 x PowerEdge R730XD Server edge node 1 x PowerEdge FC630 Server admin node 3 x PowerEdge FC630 Server name nodes 1 x PowerEdge FC630 Server edge node2 x PowerEdge R730XD ServersWorker nodes10 x PowerEdgeR730XD Servers with 3 5" Drives — 48TB or 10 x PowerEdgeR730XD Servers with 2 5" Drives — 24TB10 x PowerEdge FC630 Servers10 x PowerEdge FD332 Storage — 32TB10 x PowerEdgeR730XD Servers with 3 5" Drives — 48TB or 10 x PowerEdgeR730XD Servers with 2 5" Drives — 24TB10 x PowerEdge FC630 Servers10 x PowerEdge FD332 Storage — 32TB3 x PowerEdge R730XD Servers1T he quantity and configurationDell EMC PowerEdge R730 Server : Impressive processor performance, a large memory footprint, extensive I/O (input/output) options and a choice of dense, high-performance storage or low-cost, high-capacity storageDell EMC PowerEdge FX2 Server : Modular server, storage and networking blocks are neatly combined in a compact, converged 2U rack chassis to redefine data center agility Dell EMC Networking S4048-ON 10/40GbE : T op-of-rack, high-density 1U switch with forty-eight 10GbE uplinks It offers ultra-low-latency and line-rate performance that is optimized for data centersDell EMC FC630 PowerEdge Server nodes : The 2-socket, half-width 1U workhorse server blocks are ideal for a wide variety of business applicationsDell EMC FD332 Storage : Flexible, high-density, half-width 1U storage modules enable you to rapidly scale direct attached storage (DAS)The Dell EMC Ready Bundle for Cloudera Hadoop Services includes:5• Onsite hardware and Operating System (OS) deployment services via Dell EMC ProDeploy• Dell EMC Cloudera Accelerator Services : Best practice guidance, hands-on labs,roadmap planning and knowledge transfer so you can get from install to full productivity, with the skills and knowledge to get the greatest value from your big data solution • Dell EMC Cloudera Hadoop Health Check Services : Reviews your current datatechnologies and processes, and makes recommendations for Cloudera tools, testing and operational practices• Dell EMC Consulting Services for Hadoop, including software installation and configuration, data migration and Microsoft ® Azure ® integration • Hardware and software support services via Dell EMC ProSupportWhy Dell EMC for Hadoop?The combination of Dell and EMC brings together two industry-leading companies with strong reputations for value and innovation And just to underscore that we are a technology leader, we've attained incredible leadership positions in some of the biggest and largest growth categories in the IT infrastructure business — and that means you can confidently source all your IT needs from one provider • #1 converged infrastructure 6• #1 in traditional and all-flash storage 7• #1 virtualized data center infrastructure 8• #1 secure business-class laptop 8• #1 cloud IT infrastructure 9•#1 server virtualization and cloud systems management software (VMware ®)10MetaScaleAchieved ROI in just 3 months 2T exas Advanced Computing Center (TACC)Gained up to 50X performance improvement 3Dell EMC40% increased automation of standard sales reports60% faster response times for prescriptive and predictive analysis 42D ell EMC case study, "Accelerating big data ROI with Hadoop ," April 20153D ell EMC case study video, "Dell EMC Drives Big Data Solutions at TACC ," November 2016 4D ell EMC case study, "Unlocking data's value for better insights and decisions ," May 2015 5A ll Dell EMC Services are standard in the US and Canada For all other regions, services are custom6I DC WW Quarterly ConvergedSystems Tracker, June 2016, Vendor Revenue—EMC FY 20157I DC WW Quarterly Enterprise Storage Systems Tracker, June 2016, Vendor Revenue—EMC CY 2015 8D ell EMC Annual Report, 2015 9I DC WW Quarterly Cloud ITInfrastructure Tracker, Q1 June 2016, Vendor Revenue—EMC FY 2015 10I DC WW Virtual Machine and Cloud System Market Shares 2015, July 2016Complete your solution with Dell EMC Services and financingDell EMC Professional ServicesSolutions customized for your needsDell EMC Hadoop Consulting is a best-in-class service delivered by certified Cloudera Hadoop experts to help you get the business value of data analytics using Hadoop The services include a data analytics assessment, workshop, testing, proofs of concept and production implementation These Hadoop experts help determine where Hadoop is a good fit for your organization They also help you build your own team of Hadoop experts through knowledge transfer at each stepSupport always on for youDell EMC ProSupport offers a single point of accountability from experts withsolution-specific training, along with premium hardware and software support available 24x7x365 ProSupport also includes collaborative support for Cloudera Enterprise software Additionally, ProSupport includes next-business-day onsite service withfour- and eight-hour parts and labor response options, and escalation management with customer-set severity level optionsDeployment assistance when you need itDell EMC offers a broad menu of installation and implementation services for Hadoop solutions through Dell EMC ProDeploy Dell EMC Services include onsite hardware and software installation, optional rack integration at a Dell EMC facility and validation of the installed solution Dell EMC takes care of the complete project management, from order drop to your acceptanceFor more information, visit Dell com/ServiceDescriptionsDell EMC Financial ServicesLet the wealth of leasing and financing options from Dell EMC Financial Services help you find opportunities when your organization faces decisions regarding capital expenditures, operating expenditures and cash flowDell EMC offers a wide range of payment options to make it easier than ever to meet your needsLearn more about Dell EMC Financial Services“We’ve completelyredesigned how we capture, store and provision data with the new Dell Hadoop cluster We can gather larger amounts of data, and our analysts and statisticians can mine that data in ways they couldn’t before ”11T ony Giordano, ExecutiveVice President of theT echnology Solutions Group,Merkle, United States“Addressing exhausted enterprise data capacity can cost up to $800,000 per terabyte of data But with Hadoop’s extreme scalability, adding terabytes can cost as little as $5,000 using MetaScale’s big data appliances based on Dell PowerEdge Servers ”12Ankur Gupta, General Manager, MetaScale,United States11D ell EMC case study, "A powerful new foundation for creating customer campaigns ," May 201512D ell EMC case study, "Accelerating big data ROI with Hadoop ," April 2015Find out more todayDon’t wait to harness the benefits of Cloudera Hadoop on a purpose-built solutiondesigned from the ground up to address data analytics requirements, reduce developmentcosts and optimize performance for deep data mining and analytics Contact your DellEMC representative to find out more todayCopyright © 2017 Dell Inc or its subsidiaries All Rights Reserved Dell, EMC, and other trademarks are trademarks of Dell Inc or its subsidiariesOther trademarks may be the property of their respective owners Published in the USA 02/17 Family guide DELL-EMC-FG-HADOOP-101Apache® and Hadoop® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries Cloudera® is a trademark or trade dress of Cloudera Intel® is a trademark of Intel Corporation in the U S and other countries Syncsort® and SILQ TM are the property of Syncsort in the United States and/or other countries Microsoft® and Azure® are a registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries VMware® is a registered trademark or trademark of VMware, Inc in the United States and/or other jurisdictions。
Impala的安装和使⽤通过本地yum源进⾏安装impala所有cloudera软件下载地址1、 impala的介绍imala基本介绍impala是cloudera提供的⼀款⾼效率的sql查询⼯具,提供实时的查询效果,官⽅测试性能⽐hive快3到10倍,其sql查询⽐sparkSQL还要更加快速,号称是当前⼤数据领域最快的查询sql⼯具,impala是参照⾕歌的新三篇论⽂(Caffeine、Pregel、Dremel)当中的Dremel实现⽽来,其中旧三篇论⽂分别是(BigTable,GFS,MapReduce)分别对应我们即将学的HBase和已经学过的HDFS以及MapReduceimpala是基于hive并使⽤内存进⾏计算,兼顾数据仓库,具有实时,批处理,多并发等优点impala与hive的关系impala是基于hive的⼤数据分析查询引擎,直接使⽤hive的元数据库metadata,意味着impala元数据都存储在hive的metastore当中,并且impala兼容hive的绝⼤多数sql语法。
所以需要安装impala的话,必须先安装hive,保证hive安装成功,并且还需要启动hive的metastore服务impala的优点1、 impala⽐较快,⾮常快,特别快,因为所有的计算都可以放⼊内存当中进⾏完成,只要你内存⾜够⼤2、摈弃了MR的计算,改⽤C++来实现,有针对性的硬件优化3、具有数据仓库的特性,对hive的原有数据做数据分析4、⽀持ODBC,jdbc远程访问impala的缺点:1、基于内存计算,对内存依赖性较⼤2、改⽤C++编写,意味着维护难度增⼤3、基于hive,与hive共存亡,紧耦合4、稳定性不如hive,不存在数据丢失的情况impala的架构以及查询计划Impala的架构模块:impala-server ==>启动的守护进程,执⾏我们的查询计划从节点,官⽅建议与所有的datanode装在⼀起,可以通过hadoop的短路读取特性实现数据的快速查询impala-statestore ==》状态存储区主节点impalas-catalog ==》元数据管理区主节点查询执⾏impalad分为frontend和backend两个层次, frondend⽤java实现(通过JNI嵌⼊impalad),负责查询计划⽣成,⽽backend⽤C++实现,负责查询执⾏。
CDH5.7快速离线安装教程解决方案:一、简介CDH是cloudera公司开发的一个快速部署、高效管理Hadoop 和其各种组件的一个商业化产品。
主要分为两部分,分别为Cloudera Manager和CDH软件包。
其中Cloudera Manager负责集群的部署与管理。
CDH软件包囊括了hdaoop各类的组件的安装包,例如hive、hdfs、Spark等等。
由于实验室服务器集群实现了硬件虚拟化,要在虚拟资源重新搭建CDH集群。
cloudera的搭建官方提供了三种安装方式。
分别是在线安装、yum安装和离线安装,我首先是采用离线安装的方式,这种方式也是目前大部分博客教程所采用的方式。
但是照着这种方式我每次到最后安装服务的时候总是在部署配置文件的时候出错,提示错误如图所示:猜测原因是权限问题,但是在网上寻找办法也一直没有解决。
这里吐槽一下cloudera官方社区,没有几个人。
因此舍弃到这种安装方式采用在线安装,但是在线安装需要耗费大量时间在软件包的下载上,不过我们可以手动下载安装,这样可以大大提高安装速度。
二、基础环境软件环境[Bash shell] 纯文本查看复制代码1 2 3 1.操作系统:Centos6.52.CDH 软件包版本5.6、Cloudra Manager 版本5.73.JDK 版本oracle jdk1.7.0_67硬件环境9台虚拟机节点,硬件配置如下:三、基础配置以下所有操作均在root 下进行1.host 配置1)修改主机名[Bash shell] 纯文本查看 复制代码1 vim /etc/sysconfig/network,各台主机honstname 改名为对应的名称,service network restart 重启网卡生效。
2)添加hostname与ip的对应关系如下图所示:3)将host从主节点master分发到各个从节点。
:[Bash shell] 纯文本查看复制代码1scp/etc/hosts root@slave1:/etc2.关闭防火墙和selinux1)关闭防火墙(每个节点)[Bash shell] 纯文本查看复制代码1 2 service iptables stop chkconfig iptables off2)关闭selinux(重启生效)[Bash shell] 纯文本查看复制代码1vim /etc/selinux/config3.ssh无密码登录1)各个节点安装ssh[Bash shell] 纯文本查看复制代码1ssh-keygen -t rsa 一路回车结束2)将公钥加入到authorized_keys(只需master操作) [Bash shell] 纯文本查看复制代码1cat id_rsa.pub >authorized_keys3) 修改权限[Bash shell] 纯文本查看复制代码1chmod600 authorized_keys4)将authorized_keys从master分发到各个slave[Bash shell] 纯文本查看复制代码1 scp authorize_keys root@slave1:~/.ssh/4.jdk 安装1)卸载自带java[Bash shell] 纯文本查看 复制代码1 2 rpm -qa |grep javayum remove java*(删除自带的java)2)安装jdk (每个节点rpm 安装)[Bash shell] 纯文本查看 复制代码1 rpm -ivh jdk1.7.0_67.rpm3)配置java 环境(每个节点配置,当然可以一个节点配置完了使用scp 分发)在/etc/profile 中加入:[Bash shell] 纯文本查看 复制代码1 23export JAVA_HOME=/usr/java/jdk1.7.0_67export CLASSPATH=.:$CLASSPTAH:$JAVA_HOME/lib export PATH=$PATH:$JAVA_HOME/bin4)使配置生效(每个节点)[Bash shell] 纯文本查看 复制代码1 source /etc/profile5.ntp 时间同步1)安装NTP(每个节点)[Bash shell] 纯文本查看复制代码1yum install ntp2)配置NTP[Bash shell] 纯文本查看复制代码1vim /etc/ntp.confmaster配置:(选用复旦大学ntp服务器)slave配置:(同步master)3)开启NTP服务[Bash shell] 纯文本查看复制代码1 service ntpd start2 chkconfig ntpd off4)查看同步效果命令:ntpstat四、Cloudera Manager安装1.下载rpm安装包rpm安装包下载地址:其中jdk如果自己安装好了可以不用下载。
Cloudera问题1、Clouderamanager问题1、内置数据库无法连接,重启以后损坏,如何避免尽量避免使用内嵌数据库,重启时仍然会写2、目前的配置问题中的错误问题,HDFS警告损坏问题自己会修复。
只是副本损坏,一段时间后会自动丢弃坏块,重新生成新的副本块3、spark的gateway无状态,同时显示不适用,什么是不适用尚未回答4、什么是同行,复制的方法是怎么样的,是直接复制数据过来还是只是访问上面的服务信息尚未回答5、默认的内嵌数据库的密码登入不上,无法创建数据库。
密码已找到,但是仍然建议使用非内嵌数据库6、初部署重启的时候出现service cloudera-scm-servicecloudera-scm-server dead but pid file exists原因和解决方法删除pid后重新启动,servicestart和stop的原理就是把pid放进一个文件中,执行停止命令时,从文件中读取pid信息然后停止7、什么是维护模式,有什么具体的用处相当于只读模式,在版本升级过程中使用,不会损坏文件8、如何备份,复制计划如何使用。
9、Navigator 设置是什么意思,其中的服务范围,包括其他的许多服务都有服务范围上午培训已经说过10、某些服务删除的时候会提示只在CM上删除了,如何彻底在服务器上删除,一定需要手动吗。
CM只能在一定程度上帮助部署监控,有些还是需要手动解决2、CDH问题1、sqoop2需要postgresql或者derby数据库,目前无法部署的问题。
只能自己部署数据库2、kafka代码连接CDH中的kafka集群出现如下异常3、Namenode节点在启动后两到三个小时会发出告警(配置HA)18:20:03,563 ERROR node.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: Inconsistent checkpoint fields. LV = -60 namespaceID = 575550765 cTime = 0 ; clusterId = cluster30 ; blockpoolId = BP-1807411824-192.168.0.149-1477736337693. Expecting respectively: -60; 1500487182; 0; cluster2; BP-408525736-192.168.0.149-1477729100927. "BP-408525736-192.168.0.149-1477729100927".解决方法是删除secondarynamenode , 配置HA(standbynamenode),能否修复不删除部署HA必须删除secondarynamenode,否则secondarynamenode和standbynamenode会同时写文件,会造成冲突3、优化问题1.随着集群节点增加,集群hosts文件的维护会变得越来越困难。
CDP使用指南2021年05月12日目录1.文档说明 (8)2.CDP平台介绍 (8)2.1.CDP平台简介 (9)2.2.C LOUDERA M ANAGER概览 (10)2.3.C LOUDERA R UNTIME (11)2.4.工具 (11)2.5.设置对基于阿里云部署的CDP的访问权限 (12)2.5.1.配置SOCKS代理 (12)2.5.2.启动SOCKS代理 (12)2.5.3.配置Google Chrome浏览器以使用代理 (13)2.5.4.网络安全组 (14)3.CLOUDERA MANAGER (15)3.1.术语 (15)3.1.1.部署 (16)3.1.2.动态资源池 (16)3.1.3.集群 (16)3.1.4.主机 (16)3.1.5.机架 (16)3.1.6.服务 (16)3.1.7.服务实例 (17)3.1.8.角色 (17)3.1.9.角色实例 (17)3.1.10.角色组 (17)3.1.11.主机模板 (17)3.1.12.网关(Gateway) (17)3.1.13.Parcel (18)3.1.14.静态服务池 (18)3.2.C LOUDERA M ANAGER架构 (18)3.2.1.心跳 (19)3.3.状态管理 (19)3.4.C LOUDERA M ANAGER 管理控制台 (20)3.4.1.Cloudera Manager管理控制台主页 (24)3.4.2.自动登出 (28)3.5.进程管理 (30)3.6.主机管理 (30)3.7.C LOUDERA M ANAGER A GENT (31)3.7.1.cm_processes (31)3.8.资源管理 (32)3.9.用户管理 (33)3.10.安全管理 (33)3.11.使用C LOUDERA M ANAGER监控集群 (33)3.12.C LOUDERA M ANAGEMENT S ERVICE (35)3.12.1.健康测试 (35)3.12.2.指标收集和显示 (36)3.12.3.事件、警报和触发器 (36)3.13.集群配置概述 (37)3.14.服务器和客户端配置 (38)3.15.C LOUDERA M ANAGER API (39)3.16.虚拟专用集群和C LOUDERA SDX (39)3.16.1.分离计算和数据资源的优势 (40)3.16.2.架构 (40)3.16.3.权衡性能 (42)3.16.4.虚拟专用集群的兼容性注意事项 (42)3.16.5.虚拟专用集群的网络注意事项 (47)4.CDP核心组件 (53)4.1.C LOUDERA R UNTIME组件版本 (53)4.2.分布式文件系统HDFS (57)4.3.实时数据库HB ASE (58)4.4.列式存储引擎K UDU (60)4.5.统一资源管理和调度框架 (61)4.6.分布式计算框架–T EZ (66)4.7.数据仓库组件–H IVE (68)4.8.SQL分析引擎I MPALA (69)4.9.HB ASE SQL查询引擎P HOENIX (71)4.10.C LOUDERA整合全文检索引擎 (73)4.11.分布式内存计算框架–S PARK (76)4.12.数据库接入工具S QOOP (78)4.13.C LOUDERA一站式安全管理 (83)4.14.分布式消息队列K AFKA (93)4.15.A PACHE A TLAS (95)5.CLOUDERA安全概述 (98)5.1.概述 (98)5.1.1.安全要求 (99)5.1.2.安全等级 (99)5.1.3.Hadoop安全架构 (100)5.2.认证概述 (101)5.2.1.Kerberos概述 (102)5.2.2.Kerberos部署模型 (103)5.2.3.使用TLS/SSL进行安全的Keytab分发 (109)5.2.4.使用向导或手动过程来配置Kerberos身份验证 (110)5.2.5.集群组件使用的身份验证机制 (110)5.3.加密概述 (111)5.3.1.保护静态数据 (111)5.3.2.保护传输中的数据 (114)5.3.3.Hadoop项目中的数据保护 (115)5.3.4.加密机制概述 (117)5.4.授权概述 (117)5.4.1.Hadoop中的授权机制 (118)5.4.2.与身份验证机制的身份验证机制集成 (119)5.4.3.Hadoop项目中的授权 (120)5.5.治理概述 (121)5.5.1.什么是Apache Atlas? (121)5.5.2.Apache Atlas使用元数据创建血统关系 (121)5.5.3.添加到实体元数据使搜索更加容易 (121)5.5.4.Apache Atlas体系结构 (122)6.CLOUDERA最佳实践 (123)6.1.I MPALA分区 (123)6.1.1.文件计数和文件大小 (123)6.1.2.分区注意事项 (124)6.1.3.指南总结 (126)6.2.I MPALA性能 (126)6.2.1.Kudu RPC (126)6.2.2.设立专门的协调员 (127)6.2.3.按需元数据和元数据管理 (130)6.3.加速S PARK ML应用 (153)6.3.1.Spark ML的原生数学库 (153)6.3.2.启用libgfortran库 (154)6.3.3.启用英特尔MKL库 (156)6.3.4.性能比较 (157)7.故障排查 (159)7.1.安全故障排查 (159)7.1.1.错误信息和各种故障 (159)7.1.2.身份验证和Kerberos问题 (167)7.1.3.HDFS加密问题 (179)7.1.4.Key Trustee KMS加密问题 (181)7.1.5.对Cloudera Manager中的TLS/SSL问题进行故障排除 (182)7.2.YARN、MR V1和L INUX OS安全性 (185)7.2.1.MRv1和YARN:jsvc程序 (185)7.2.2.仅限MRv1:Linux TaskController (186)7.2.3.仅限YARN:Linux容器执行器 (186)7.3.对I MPALA进行故障排除 (187)7.3.1.使用Breakpad Minidumps进行崩溃报告 (188)7.4.对A PACHE Y ARN进行故障排查 (190)7.4.1.在YARN上对Docker进行故障排除 (190)7.4.2.对Linux Container Executor进行故障排除 (200)7.5.对HB ASE进行故障排除 (202)7.5.1.使用HBCK2工具修复HBase集群 (203)7.5.2.Thrift Server在收到无效数据后崩溃 (203)7.5.3.HBase正在使用比预期更多的磁盘空间 (204)7.5.4.对RegionServer分组进行故障排除 (205)7.6.对APACHE KUDU进行故障排除 (206)7.6.1.启动或重启主服务器或者Tablet服务器时出现问题 (206)7.6.2.磁盘空间使用问题 (207)7.6.3.性能问题 (208)7.6.4.可用性问题 (214)7.6.5.象征堆栈跟踪 (216)7.6.6.在多主服务器部署中从死掉的Kudu主服务器中恢复 (218)7.7.对C LOUDERA S EARCH进行故障排除 (218)7.7.1.故障排除 (218)7.7.2.动态Solr分析 (219)7.7.3.其他故障排除信息 (220)7.7.4.找出Cloudera Search部署中的问题 (220)7.7.5.Cloudera Search配置和日志文件 (223)7.8.对H UE进行故障排查 (226)7.8.1.Hue负载平衡器无法在各个Hue服务器之间平均分配用户 (226)7.8.2.无法使用SAML对Hue中的用户进行身份验证 (227)7.8.3.清理旧数据以提高性能 (227)7.8.4.无法使用提供的凭据连接到数据库 (229)7.8.5.在Hue UI上激活Hive查询编辑器 (230)7.8.6.查询执行在Hue中完成,但显示为在Cloudera Manager Impala查询页面上执行 (231)7.8.7.查找Hue超级用户列表 (232)7.8.8.通过Knox访问Hue时,用户名或密码不正确 (233)7.8.9.从Knox访问Hue UI时出现HTTP 403错误 (234)7.8.10.无法从Knox Gateway UI访问Hue (236)7.8.11.引荐检查失败,因为域与任何受信任的来源都不匹配 (239)7.8.12.无法查看Snappy压缩文件 (239)7.8.13.启用SAML时出现“未知属性名称”异常 (241)7.8.14.Impala查询因无效的查询句柄错误而失败 (242)7.8.15.PostgreSQL支持的服务失败或挂起 (243)7.8.16.验证Hue中的LDAP用户时出错 (244)7.8.17.从负载均衡器访问Hue时出现502代理错误 (245)7.8.18.提交Hive查询后,无效的方法名称:“ GetLog”错误 (246)7.8.19.在Hue中提交查询时出现“授权异常”错误 (246)7.8.20.无法更改Hue中的压缩表 (248)7.8.21.从Hue访问“搜索”应用程序(Solr)时出现连接失败错误 (249)7.8.22.从顺化下载查询结果需要时间 (250)7.8.23.启用TLS后,Hue Load Balancer无法启动 (250)7.8.24.无法终止以Kerberized集群运行的Hue作业浏览器中的Hive查询 (251)7.8.25.无法在受Knox保护的集群上的Hue中查看或创建Oozie工作流 (252)7.8.26.1040,“连接太多”异常 (253)8.参考资料 (254)1.文档说明本文档主要是基于阿里云部署的CDP的操作使用和介绍,关于CDP平台的操作和使用信息来源Cloudera官网,大家可以访问https:///cdp-private-cloud-bas e/latest/index.html来获取对应的信息。
一、cloudera-quickstart的安装
(1)在官网上下载一种版本的cloudera-quickstart(有三种不同版本分别对应的可以在三
种不同的虚拟机上运行)
(2)根据下载的不同版本下载虚拟机(VMwareorVisualBox)
(3)以VisualBox虚拟机为例则可以运行cloudera-quickstar的基本配置如下:
RAM内存至少为8G
虚拟处理器分配为两个
(4)虚拟机配置好以后,不用先安装Linux操作系统。因为cloudera-qiuckstart对包括操作
系统在内的都已经打好包了,所以只需将下载的cloudera-quickstart的虚拟磁盘(比如:
cloudera-quickstart-vm-5.1.0-1-virtualbox-disk1.vmdk)添加到虚拟机控制器的位置然后启动虚
拟机中的该的系统,cloudera-quickstart所包含的一整套系统就可以使用了,这就是quickstart
版本的方便之处。
具体操作如下:
进入配置好的虚拟机页面,选择“存储”,然后点下图红圈标注的位置
会弹出来如下对话框,选择“使用现有的虚拟盘”
然后再找到你下载的cloudera-quickstart的虚拟磁盘所在的位置,点击“打开”
则虚拟磁盘就被添加进去了,如下图所示
最后启动该系统
对于win7系统上如果安装VMware虚拟机有可能在启动虚拟磁盘时会提示出现内部错误
(如下图所示),此时只需要以管理员身份运行虚拟机就可以解决该问题了。
二、clouderamanager控制页面
启动虚拟机进入cloudera-quickstart操作系统桌面后,会自动跳出浏览器上cloudera的控制
平台。但有可能会出现接口连接不上服务器的状况,如下图所示:
此时有两种解决方法:(1)点击桌面上的“Launchclouderamanager”(2)打开Linux的控
制终端输入:“sudo/home/cloudera/cloudera-manager--force”一般我比较喜欢采取第二种方
法。
做完这些后就可以单击浏览器窗口上的“clouderamanager”、“Hue”、“Hoop”、“Spark”等
进去其相应的控制平台了,打开这些控制平台还可以直接在浏览器的地址栏输入相应的端口
号进入,比如clouderamanager在服务器端的端口号为:7180,则可在浏览器中输入:
quickstart.cloudera:7180直接进入。在实际操作中我比较喜欢在浏览器中更改端口号切换控
制台。
上图所示即为clouderamanager的登录页面,用户名和密码均为:cloudera。
进入clouderamanager的控制页面如下图所示
其中上图中的左边栏中为该clouderamanager监控的各种服务,比如:Hosts,Hive,Hue等。
在这些服务的前面有个圆圈,圆圈的颜色代表各个服务的健康状态。红色为:badhealth,绿
色:goodhealth,黄色:notgoodhealth。但是,有时打开clouderamanager的控制页面时有可
能也会出现如下图所示的状况:
图中提示出现错误:Errors:Unabletoissuequery:thehostmonitorisnotrunning,该错误导致
对各个服务的健康状态无法监测到,原因是由于没有打开上图红色圈全中部分的mgmt服
务。打开mgmt服务后,其前面的圆圈就变为绿色(如下图),然后刷新一下页面就能检测
到数据了。
其他的具体内容就参见:(1)Cloudera-Manager-Installation-Guide
(2)Cloudera-Manager-Introduction
(3)CDH5-Release-Notes
(4)cloudera-director