2015-07-01 16 views
5

मैं निम्नलिखित कोड चला रहा हूँ pyspark में:अजीब व्यवहार चिंगारी के लिये भेज

In [14]: conf = SparkConf() 

In [15]: conf.getAll() 

[(u'spark.eventLog.enabled', u'true'), 
(u'spark.eventLog.dir', 
    u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), 
(u'spark.master', u'local[*]'), 
(u'spark.yarn.historyServer.address', 
    u'http://ip-10-0-0-220.ec2.internal:18088'), 
(u'spark.executor.extraLibraryPath', 
    u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), 
(u'spark.app.name', u'pyspark-shell'), 
(u'spark.driver.extraLibraryPath', 
    u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native')] 

In [16]: sc 

<pyspark.context.SparkContext at 0x7fab9dd8a750> 

In [17]: sc.version 

u'1.4.0' 

In [19]: sqlContext 

<pyspark.sql.context.HiveContext at 0x7fab9de785d0> 

In [20]: access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 

और सब कुछ सुचारू रूप से चलता है (मैं, हाइव Metastore में टेबल बना सकते आदि)

लेकिन जब मैं कोशिश spark-submit के साथ इस कोड को चलाने के लिए:

:

# -*- coding: utf-8 -*-                                                               

from __future__ import print_function 

import re 

from pyspark import SparkContext 
from pyspark.sql import HiveContext 
from pyspark.sql import Row 
from pyspark.conf import SparkConf 

if __name__ == "__main__": 

    sc = SparkContext(appName="Minimal Example 2") 

    conf = SparkConf() 

    print(conf.getAll()) 

    print(sc) 

    print(sc.version) 

    sqlContext = HiveContext(sc) 

    print(sqlContext) 

    # ## Read the access log file                                                             
    access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 

    sc.stop() 

मैं के साथ इस कोड को चलाने 210

$ spark-submit --master yarn-cluster --deploy-mode cluster minimal-example2.py 

और त्रुटि (जाहिरा तौर पर) के बिना चलाता है, लेकिन यदि आप लॉग की जांच: "You must build Spark with Hive." क्यों:

15/07/01 16:55:10 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8032 


Container: container_1435696841856_0027_01_000001 on ip-10-0-0-36.ec2.internal_8041 
===================================================================================== 
LogType: stderr 
LogLength: 21077 
Log Contents: 
SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/yarn/nm/usercache/nanounanue/filecache/133/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
15/07/01 16:54:00 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 
15/07/01 16:54:01 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1435696841856_0027_000001 
15/07/01 16:54:02 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue 
15/07/01 16:54:02 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue 
15/07/01 16:54:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue) 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization 
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
15/07/01 16:54:03 INFO spark.SparkContext: Running Spark version 1.4.0 
15/07/01 16:54:03 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue 
15/07/01 16:54:03 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue 
15/07/01 16:54:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue) 
15/07/01 16:54:03 INFO slf4j.Slf4jLogger: Slf4jLogger started 
15/07/01 16:54:03 INFO Remoting: Starting remoting 
15/07/01 16:54:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:41190] 
15/07/01 16:54:03 INFO util.Utils: Successfully started service 'sparkDriver' on port 41190. 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering MapOutputTracker 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering BlockManagerMaster 
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-14127054-19b1-4cfe-80c3-2c5fc917c9cf 
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /data0/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-c8119846-7f6f-45eb-911b-443cb4d7e9c9 
15/07/01 16:54:04 INFO storage.MemoryStore: MemoryStore started with capacity 245.7 MB 
15/07/01 16:54:04 INFO spark.HttpFileServer: HTTP File server directory is /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/httpd-c4abf72b-2ee4-45d7-8252-c68f925bef58 
15/07/01 16:54:04 INFO spark.HttpServer: Starting HTTP Server 
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:56437 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'HTTP file server' on port 56437. 
15/07/01 16:54:04 INFO spark.SparkEnv: Registering OutputCommitCoordinator 
15/07/01 16:54:04 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:37958 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'SparkUI' on port 37958. 
15/07/01 16:54:04 INFO ui.SparkUI: Started SparkUI at http://10.0.0.36:37958 
15/07/01 16:54:04 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49759. 
15/07/01 16:54:04 INFO netty.NettyBlockTransferService: Server created on 49759 
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Trying to register BlockManager 
15/07/01 16:54:05 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.36:49759 with 245.7 MB RAM, BlockManagerId(driver, 10.0.0.36, 49759) 
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Registered BlockManager 
15/07/01 16:54:05 INFO scheduler.EventLoggingListener: Logging events to hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory/application_1435696841856_0027_1 
15/07/01 16:54:05 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-1566924249]) 
15/07/01 16:54:05 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8030 
15/07/01 16:54:05 INFO yarn.YarnRMClient: Registering the ApplicationMaster 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>) 
15/07/01 16:54:05 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000 
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-99.ec2.internal:8041 
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-37.ec2.internal:8041 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000002 for on host ip-10-0-0-99.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-99.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000003 for on host ip-10-0-0-37.ec2.internal 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-37.ec2.internal 
15/07/01 16:54:11 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them. 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s 
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL 
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA 
TE) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s 
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784 
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL 
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA 
TE) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO 
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanounan 
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE -> 
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanou 
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic 
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14 
35696841856_0027/minimal-example2.py#minimal-example2.py) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO 
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanounan 
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE -> 
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanou 
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic 
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14 
35696841856_0027/minimal-example2.py#minimal-example2.py) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx 
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e 
xecutor-id, 1, --hostname, ip-10-0-0-99.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx 
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e 
xecutor-id, 2, --hostname, ip-10-0-0-37.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr) 
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-0-0-37.ec2.internal:8041 
15/07/01 16:54:14 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:43176 
15/07/01 16:54:15 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:58472 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:49047/user/Executor#563862009]) with ID 1 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:36122/user/Executor#1370723906]) with ID 2 
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 
15/07/01 16:54:15 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 
15/07/01 16:54:15 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-99.ec2.internal:59769 with 530.3 MB RAM, BlockManagerId(1, ip-10-0-0-99.ec2.internal, 59769) 
15/07/01 16:54:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-37.ec2.internal:48859 with 530.3 MB RAM, BlockManagerId(2, ip-10-0-0-37.ec2.internal, 48859) 
15/07/01 16:54:16 INFO hive.HiveContext: Initializing execution hive, version 0.13.1 
15/07/01 16:54:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 
15/07/01 16:54:17 INFO metastore.ObjectStore: ObjectStore, initialize called 
15/07/01 16:54:17 INFO spark.SparkContext: Invoking stop() from shutdown hook 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958 
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122 
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958 
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors 
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047 
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122 
15/07/01 16:54:17 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
15/07/01 16:54:17 INFO storage.MemoryStore: MemoryStore cleared 
15/07/01 16:54:17 INFO storage.BlockManager: BlockManager stopped 
15/07/01 16:54:17 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 
15/07/01 16:54:17 INFO spark.SparkContext: Successfully stopped SparkContext 
15/07/01 16:54:17 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.) 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.) 
15/07/01 16:54:17 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1435696841856_0027 
15/07/01 16:54:17 INFO util.Utils: Shutdown hook called 
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/pyspark-215f5c19-b1cb-47df-ad43-79da4244de61 
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/tmp/spark-c96dc9dc-e6ee-451b-b09e-637f5d4ca990 

LogType: stdout 
LogLength: 2404 
Log Contents: 
[(u'spark.eventLog.enabled', u'true'), (u'spark.submit.pyArchives', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.yarn.app.container.log.dir', u'/var/log/hadoop-yarn/container/application_1435696841856_0027/container_1435696841856_0027_01_000001'), (u'spark.eventLog.dir', 
u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), (u'spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS', u'ip-10-0-0-220.ec2.internal'), (u'spark.yarn.historyServer.address', u'http://ip-10-0-0-220.ec2.internal:18088' 
), (u'spark.ui.port', u'0'), (u'spark.yarn.app.id', u'application_1435696841856_0027'), (u'spark.app.name', u'minimal-example2.py'), (u'spark.executor.instances', u'2'), (u'spark.executorEnv.PYTHONPATH', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.submit.pyFiles', u''), 
(u'spark.executor.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.master', u'yarn-cluster'), (u'spark.ui.filters', u'org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter'), (u'spark.org.apache.hadoop.yarn.server.w 
ebproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES', u'http://ip-10-0-0-220.ec2.internal:8088/proxy/application_1435696841856_0027'), (u'spark.driver.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.yarn.app.attemptId', u 
'1')] 
<pyspark.context.SparkContext object at 0x3fd53d0> 
1.4.0 
<pyspark.sql.context.HiveContext object at 0x40a9110> 
Traceback (most recent call last): 
    File "minimal-example2.py", line 53, in <module> 
    access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json") 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 591, in read 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 39, in __init__ 
    File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 619, in _ssql_ctx 
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o53)) 

महत्वपूर्ण एकतरफा अंतिम पंक्ति है:

$ yarn logs -applicationId application_1435696841856_0027  

यह रूप में पढ़ता है? मैं क्या गलत कर रहा हूं?

+0

आप स्पार्क की एक कस्टम बनाया संस्करण या अपने विक्रेता के संस्करण का उपयोग कर रहे हैं की SQLContext instand बनाना चाहिए? Spark.yarn.jar भी आपके conf में सेट है? – Holden

+0

@ होल्डन यह स्पार्क, 1.4 से एक बाइनरी है। मैंने विक्रेता का उपयोग नहीं किया क्योंकि यह बहुत पुराना है (1.2)। उदाहरणों में से कोई भी 'spark.yarn.jar' सेट – nanounanue

उत्तर

6

मुझे हाल ही में यह समस्या मिली है। लेकिन यह पता चला कि स्पार्क का संदेश भ्रामक था; कोई लापता जार नहीं थे। मेरे लिए समस्या यह थी कि जावा क्लास HiveContext, जिसे पायस्पार्क द्वारा बुलाया जाता है, hive-site.xml का निर्माण करता है जब निर्माण किया जाता है और निर्माण के दौरान एक अपवाद उठाया जा रहा था। (PySpark इस अपवाद को पकड़ता है और गलत तरीके से सुझाव देता है कि यह एक लापता जार के कारण है।) यह hive.metastore.client.connect.retry.delay संपत्ति के साथ एक त्रुटि होने के कारण समाप्त हुआ, जो 2s पर सेट किया गया था। HiveContext वर्ग इसे एक पूर्णांक के रूप में पार्स करने का प्रयास करता है, जो विफल रहता है। इसे 2 पर बदलें और hive.metastore.client.socket.timeout और hive.metastore.client.socket.lifetime में वर्णों को हटा दें।

ध्यान दें कि आप सीधे sqlContext._get_hive_ctx() पर कॉल करके एक और वर्णनात्मक त्रुटि प्राप्त कर सकते हैं।

+0

धन्यवाद है। यह जवाब बहुत मदद करता है। बीटीडब्लू http://stackoverflow.com/a/34215330/1813988 कमांड लाइन समाधान की तलाश करने वालों के लिए भी सहायक है। – phil

-1

यह भी कहते हैं: '। None.org.apache.spark.sql.hive.HiveContext बुला समय कोई त्रुटि आई \ N'

तो, समस्या हो कि हाइव भाग में प्रदान नहीं की है लगता है स्पार्क-सबमिट कमांड, और क्लस्टर हाइव निर्भरता को खोजने में विफल रहता है। बस कर के रूप में यह कहते हैं, और:

Export 'SPARK_HIVE=true'

सिद्धांत रूप में, यह आप अपने जार के निर्माण के लिए हाइव निर्भरता के साथ शामिल है, ताकि चिंगारी lib यह याद करते हैं मिलेगा अनुमति चाहिए।

2

आप HiveContext

from pyspark.sql import SQLContext 
sqlContext=SQLContext(sc) 
संबंधित मुद्दे