मैं निम्नलिखित कोड चला रहा हूँ pyspark
में:अजीब व्यवहार चिंगारी के लिये भेज
In [14]: conf = SparkConf()
In [15]: conf.getAll()
[(u'spark.eventLog.enabled', u'true'),
(u'spark.eventLog.dir',
u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'),
(u'spark.master', u'local[*]'),
(u'spark.yarn.historyServer.address',
u'http://ip-10-0-0-220.ec2.internal:18088'),
(u'spark.executor.extraLibraryPath',
u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'),
(u'spark.app.name', u'pyspark-shell'),
(u'spark.driver.extraLibraryPath',
u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native')]
In [16]: sc
<pyspark.context.SparkContext at 0x7fab9dd8a750>
In [17]: sc.version
u'1.4.0'
In [19]: sqlContext
<pyspark.sql.context.HiveContext at 0x7fab9de785d0>
In [20]: access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")
और सब कुछ सुचारू रूप से चलता है (मैं, हाइव Metastore में टेबल बना सकते आदि)
लेकिन जब मैं कोशिश spark-submit
के साथ इस कोड को चलाने के लिए:
# -*- coding: utf-8 -*-
from __future__ import print_function
import re
from pyspark import SparkContext
from pyspark.sql import HiveContext
from pyspark.sql import Row
from pyspark.conf import SparkConf
if __name__ == "__main__":
sc = SparkContext(appName="Minimal Example 2")
conf = SparkConf()
print(conf.getAll())
print(sc)
print(sc.version)
sqlContext = HiveContext(sc)
print(sqlContext)
# ## Read the access log file
access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")
sc.stop()
मैं के साथ इस कोड को चलाने 210
$ spark-submit --master yarn-cluster --deploy-mode cluster minimal-example2.py
और त्रुटि (जाहिरा तौर पर) के बिना चलाता है, लेकिन यदि आप लॉग की जांच: "You must build Spark with Hive."
क्यों:
15/07/01 16:55:10 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8032
Container: container_1435696841856_0027_01_000001 on ip-10-0-0-36.ec2.internal_8041
=====================================================================================
LogType: stderr
LogLength: 21077
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/yarn/nm/usercache/nanounanue/filecache/133/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/07/01 16:54:00 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
15/07/01 16:54:01 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1435696841856_0027_000001
15/07/01 16:54:02 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue
15/07/01 16:54:02 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue
15/07/01 16:54:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue)
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization
15/07/01 16:54:02 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
15/07/01 16:54:03 INFO spark.SparkContext: Running Spark version 1.4.0
15/07/01 16:54:03 INFO spark.SecurityManager: Changing view acls to: yarn,nanounanue
15/07/01 16:54:03 INFO spark.SecurityManager: Changing modify acls to: yarn,nanounanue
15/07/01 16:54:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, nanounanue); users with modify permissions: Set(yarn, nanounanue)
15/07/01 16:54:03 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/01 16:54:03 INFO Remoting: Starting remoting
15/07/01 16:54:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:41190]
15/07/01 16:54:03 INFO util.Utils: Successfully started service 'sparkDriver' on port 41190.
15/07/01 16:54:04 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/01 16:54:04 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-14127054-19b1-4cfe-80c3-2c5fc917c9cf
15/07/01 16:54:04 INFO storage.DiskBlockManager: Created local directory at /data0/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/blockmgr-c8119846-7f6f-45eb-911b-443cb4d7e9c9
15/07/01 16:54:04 INFO storage.MemoryStore: MemoryStore started with capacity 245.7 MB
15/07/01 16:54:04 INFO spark.HttpFileServer: HTTP File server directory is /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/httpd-c4abf72b-2ee4-45d7-8252-c68f925bef58
15/07/01 16:54:04 INFO spark.HttpServer: Starting HTTP Server
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:56437
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'HTTP file server' on port 56437.
15/07/01 16:54:04 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/07/01 16:54:04 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
15/07/01 16:54:04 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/01 16:54:04 INFO server.AbstractConnector: Started [email protected]:37958
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'SparkUI' on port 37958.
15/07/01 16:54:04 INFO ui.SparkUI: Started SparkUI at http://10.0.0.36:37958
15/07/01 16:54:04 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
15/07/01 16:54:04 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49759.
15/07/01 16:54:04 INFO netty.NettyBlockTransferService: Server created on 49759
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/01 16:54:05 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.0.0.36:49759 with 245.7 MB RAM, BlockManagerId(driver, 10.0.0.36, 49759)
15/07/01 16:54:05 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/01 16:54:05 INFO scheduler.EventLoggingListener: Logging events to hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory/application_1435696841856_0027_1
15/07/01 16:54:05 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#-1566924249])
15/07/01 16:54:05 INFO client.RMProxy: Connecting to ResourceManager at ip-10-0-0-220.ec2.internal/10.0.0.220:8030
15/07/01 16:54:05 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/07/01 16:54:05 INFO yarn.YarnAllocator: Will request 2 executor containers, each with 1 cores and 1408 MB memory including 384 MB overhead
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/01 16:54:05 INFO yarn.YarnAllocator: Container request (host: Any, capability: <memory:1408, vCores:1>)
15/07/01 16:54:05 INFO yarn.ApplicationMaster: Started progress reporter thread - sleep time : 5000
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-99.ec2.internal:8041
15/07/01 16:54:11 INFO impl.AMRMClientImpl: Received new token for : ip-10-0-0-37.ec2.internal:8041
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000002 for on host ip-10-0-0-99.ec2.internal
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-99.ec2.internal
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching container container_1435696841856_0027_01_000003 for on host ip-10-0-0-37.ec2.internal
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container
15/07/01 16:54:11 INFO yarn.YarnAllocator: Launching ExecutorRunnable. driverUrl: akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, executorHostname: ip-10-0-0-37.ec2.internal
15/07/01 16:54:11 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Starting Executor Container
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up ContainerLaunchContext
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Preparing Local resources
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA
TE)
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Prepared Local resources Map(__spark__.jar -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar" } s
ize: 162896305 timestamp: 1435784032445 type: FILE visibility: PRIVATE, pyspark.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/pyspark.zip" } size: 281333 timestamp: 1435784
032613 type: FILE visibility: PRIVATE, py4j-0.8.2.1-src.zip -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip" } size: 37562 timestamp: 1435784032652 type: FIL
E visibility: PRIVATE, minimal-example2.py -> resource { scheme: "hdfs" host: "ip-10-0-0-220.ec2.internal" port: 8020 file: "/user/nanounanue/.sparkStaging/application_1435696841856_0027/minimal-example2.py" } size: 2448 timestamp: 1435784032692 type: FILE visibility: PRIVA
TE)
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanounan
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE ->
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-37.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000003/nanou
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14
35696841856_0027/minimal-example2.py#minimal-example2.py)
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CLIENT_CONF_DIR<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOO
P_HDFS_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$MR2_CLASSPATH, SPARK_LOG_URL_STDERR -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanounan
ue/stderr?start=0, SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1435696841856_0027, SPARK_YARN_CACHE_FILES_FILE_SIZES -> 162896305,281333,37562,2448, SPARK_USER -> nanounanue, SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE,PRIVATE,PRIVATE, SPARK_YARN_MODE ->
true, SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1435784032445,1435784032613,1435784032652,1435784032692, PYTHONPATH -> pyspark.zip:py4j-0.8.2.1-src.zip, SPARK_LOG_URL_STDOUT -> http://ip-10-0-0-99.ec2.internal:8042/node/containerlogs/container_1435696841856_0027_01_000002/nanou
nanue/stdout?start=0, SPARK_YARN_CACHE_FILES -> hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/spark-assembly-1.4.0-hadoop2.6.0.jar#__spark__.jar,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/applic
ation_1435696841856_0027/pyspark.zip#pyspark.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_1435696841856_0027/py4j-0.8.2.1-src.zip#py4j-0.8.2.1-src.zip,hdfs://ip-10-0-0-220.ec2.internal:8020/user/nanounanue/.sparkStaging/application_14
35696841856_0027/minimal-example2.py#minimal-example2.py)
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e
xecutor-id, 1, --hostname, ip-10-0-0-99.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
15/07/01 16:54:11 INFO yarn.ExecutorRunnable: Setting up executor with commands: List(LD_LIBRARY_PATH="/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native:$LD_LIBRARY_PATH", {{JAVA_HOME}}/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', -Xms1024m, -Xmx
1024m, -Djava.io.tmpdir={{PWD}}/tmp, '-Dspark.ui.port=0', '-Dspark.driver.port=41190', -Dspark.yarn.app.container.log.dir=<LOG_DIR>, org.apache.spark.executor.CoarseGrainedExecutorBackend, --driver-url, akka.tcp://[email protected]:41190/user/CoarseGrainedScheduler, --e
xecutor-id, 2, --hostname, ip-10-0-0-37.ec2.internal, --cores, 1, --app-id, application_1435696841856_0027, --user-class-path, file:$PWD/__app__.jar, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
15/07/01 16:54:11 INFO impl.ContainerManagementProtocolProxy: Opening proxy : ip-10-0-0-37.ec2.internal:8041
15/07/01 16:54:14 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:43176
15/07/01 16:54:15 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:58472
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:49047/user/Executor#563862009]) with ID 1
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://[email protected]:36122/user/Executor#1370723906]) with ID 2
15/07/01 16:54:15 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
15/07/01 16:54:15 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
15/07/01 16:54:15 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-99.ec2.internal:59769 with 530.3 MB RAM, BlockManagerId(1, ip-10-0-0-99.ec2.internal, 59769)
15/07/01 16:54:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-37.ec2.internal:48859 with 530.3 MB RAM, BlockManagerId(2, ip-10-0-0-37.ec2.internal, 48859)
15/07/01 16:54:16 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/07/01 16:54:17 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/07/01 16:54:17 INFO metastore.ObjectStore: ObjectStore, initialize called
15/07/01 16:54:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/07/01 16:54:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122
15/07/01 16:54:17 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.0.36:37958
15/07/01 16:54:17 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
15/07/01 16:54:17 INFO cluster.YarnClusterSchedulerBackend: Asking each executor to shut down
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-99.ec2.internal:49047
15/07/01 16:54:17 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. ip-10-0-0-37.ec2.internal:36122
15/07/01 16:54:17 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/07/01 16:54:17 INFO storage.MemoryStore: MemoryStore cleared
15/07/01 16:54:17 INFO storage.BlockManager: BlockManager stopped
15/07/01 16:54:17 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/07/01 16:54:17 INFO spark.SparkContext: Successfully stopped SparkContext
15/07/01 16:54:17 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
15/07/01 16:54:17 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/07/01 16:54:17 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
15/07/01 16:54:17 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1435696841856_0027
15/07/01 16:54:17 INFO util.Utils: Shutdown hook called
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/pyspark-215f5c19-b1cb-47df-ad43-79da4244de61
15/07/01 16:54:17 INFO util.Utils: Deleting directory /yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/tmp/spark-c96dc9dc-e6ee-451b-b09e-637f5d4ca990
LogType: stdout
LogLength: 2404
Log Contents:
[(u'spark.eventLog.enabled', u'true'), (u'spark.submit.pyArchives', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.yarn.app.container.log.dir', u'/var/log/hadoop-yarn/container/application_1435696841856_0027/container_1435696841856_0027_01_000001'), (u'spark.eventLog.dir',
u'hdfs://ip-10-0-0-220.ec2.internal:8020/user/spark/applicationHistory'), (u'spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS', u'ip-10-0-0-220.ec2.internal'), (u'spark.yarn.historyServer.address', u'http://ip-10-0-0-220.ec2.internal:18088'
), (u'spark.ui.port', u'0'), (u'spark.yarn.app.id', u'application_1435696841856_0027'), (u'spark.app.name', u'minimal-example2.py'), (u'spark.executor.instances', u'2'), (u'spark.executorEnv.PYTHONPATH', u'pyspark.zip:py4j-0.8.2.1-src.zip'), (u'spark.submit.pyFiles', u''),
(u'spark.executor.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.master', u'yarn-cluster'), (u'spark.ui.filters', u'org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter'), (u'spark.org.apache.hadoop.yarn.server.w
ebproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES', u'http://ip-10-0-0-220.ec2.internal:8088/proxy/application_1435696841856_0027'), (u'spark.driver.extraLibraryPath', u'/opt/cloudera/parcels/CDH-5.3.3-1.cdh5.3.3.p0.5/lib/hadoop/lib/native'), (u'spark.yarn.app.attemptId', u
'1')]
<pyspark.context.SparkContext object at 0x3fd53d0>
1.4.0
<pyspark.sql.context.HiveContext object at 0x40a9110>
Traceback (most recent call last):
File "minimal-example2.py", line 53, in <module>
access = sqlContext.read.json("hdfs://10.0.0.220/raw/logs/arquimedes/access/*.json")
File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 591, in read
File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 39, in __init__
File "/yarn/nm/usercache/nanounanue/appcache/application_1435696841856_0027/container_1435696841856_0027_01_000001/pyspark.zip/pyspark/sql/context.py", line 619, in _ssql_ctx
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o53))
महत्वपूर्ण एकतरफा अंतिम पंक्ति है:
$ yarn logs -applicationId application_1435696841856_0027
यह रूप में पढ़ता है? मैं क्या गलत कर रहा हूं?
आप स्पार्क की एक कस्टम बनाया संस्करण या अपने विक्रेता के संस्करण का उपयोग कर रहे हैं की SQLContext instand बनाना चाहिए? Spark.yarn.jar भी आपके conf में सेट है? – Holden
@ होल्डन यह स्पार्क, 1.4 से एक बाइनरी है। मैंने विक्रेता का उपयोग नहीं किया क्योंकि यह बहुत पुराना है (1.2)। उदाहरणों में से कोई भी 'spark.yarn.jar' सेट – nanounanue