我得到以下錯誤(我隻在大型數據集。e 15 TB壓縮)。如果我的數據集是小(1 tb)我沒有得到這個錯誤。
看起來不能洗牌階段。約映射器的數量是150000
火花配置:
spark.sql.warehouse。dir hdfs: / / / user /火花/倉庫spark.yarn.dist。文件文件:/ etc / / conf / hive-site.xml火花
spark.executor。extraJavaOptions - verbose: gc - xx: + PrintGCDetails - xx: + PrintGCDateStamps - xx: + UseConcMarkSweepGC - xx: CMSInitiatingOccupancyFraction = 70 - xx: MaxHeapFreeRatio = 70 - xx: + CMSClassUnloadingEnabled - xx: OnOutOfMemoryError =“殺死-9% p”
spark.driver。主機172.20.103.94
spark.history.fs。logDirectory hdfs: / / / var / log /火花/應用程序
spark.eventLog。使真正的
spark.ui。端口0
spark.driver。端口35246
spark.shuffle.service。使真正的
spark.driver。extraLibraryPath /usr/lib/hadoop/lib/native: / usr / lib / hadoop-lzo / lib /本地
spark.yarn.historyServer。地址的ip - 172 - 20 - 99 - 29. - ec2.internal: 18080
spark.yarn.app。id application_1486842541319_0002
spark.scheduler。FIFO模式
spark.driver。內存10克
spark.executor。id的司機
spark.yarn.app.container.log。dir /var/log/hadoop-yarn /集裝箱/ application_1486842541319_0002 / container_1486842541319_0002_01_000001
spark.driver。extraJavaOptions - xx: + UseConcMarkSweepGC - xx: CMSInitiatingOccupancyFraction = 70 - xx: MaxHeapFreeRatio = 70 - xx: + CMSClassUnloadingEnabled - xx: OnOutOfMemoryError =“殺死-9% p”
spark.submit.deployMode集群
火花。主紗
spark.ui.filters
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.executor。extraLibraryPath /usr/lib/hadoop/lib/native: / usr / lib / hadoop-lzo / lib /本地
spark.sql.hive.metastore。sharedPrefixes com.amazonaws.services.dynamodbv2
spark.executor。內存5120
spark.driver。extraClassPath /usr/lib/hadoop-lzo/lib/:/ usr / lib / hadoop / hadoop-aws.jar: / usr / share / aws / aws-java-sdk /:/ usr / share / aws / emr emrfs / conf /: / usr / share / aws emr / emrfs / lib /:/ usr / share / aws / emr / emrfs / auxlib /:/ usr / share / aws / emr /安全/ conf: / usr / share / aws / emr /安全/ lib / *
spark.eventLog。dir hdfs: / / / var / log /火花/應用程序
spark.dynamicAllocation。使真正的
spark.executor。extraClassPath /usr/lib/hadoop-lzo/lib/:/ usr / lib / hadoop / hadoop-aws.jar: / usr / share / aws / aws-java-sdk /:/ usr / share / aws / emr emrfs / conf /: / usr / share / aws emr / emrfs / lib /:/ usr / share / aws / emr / emrfs / auxlib /:/ usr / share / aws / emr /安全/ conf: / usr / share / aws / emr /安全/ lib / *
spark.executor。核心8
spark.history.ui。端口18080
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS ip - 172 - 20 - 99 - 29. - ec2.internal
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASEShttp://ip - 172 - 20 - 99 - 29. - ec2.internal: 20888 /代理/ application_1486842541319_0002
spark.app。id application_1486842541319_0002
spark.hadoop.yarn.timeline-service。啟用了錯誤
spark.sql.shuffle。10000年分區
錯誤跟蹤:
17/02/11 22:01:05信息ShuffleBlockFetcherIterator:開始在2700 ms 29遠程獲取
17/02/11 22:03:04錯誤TransportChannelHandler:連接ip - 172 - 20 - 96 - 109. ec2.internal / 172.20.96.109:7337已經安靜了120000 ms雖然有傑出的請求。假設連接死了;如果這是錯誤的,請調整spark.network.timeout。
17/02/11 22:03:04錯誤TransportResponseHandler:還有1請求未連接的ip - 172 - 20 - 96 - 109. - ec2.internal / 172.20.96.109:7337關閉
17/02/11 22:03:04錯誤OneForOneBlockFetcher:失敗而開始塊讀取
. io .IOException:連接的ip - 172 - 20 - 96 - 109. ec2.internal / 172.20.96.109:7337關閉
org.apache.spark.network.client.TransportResponseHandler.channelInactive (TransportResponseHandler.java: 128)
org.apache.spark.network.server.TransportChannelHandler.channelInactive (TransportChannelHandler.java: 109)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 251)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 237)
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive (AbstractChannelHandlerContext.java: 230)
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive (ChannelInboundHandlerAdapter.java: 75)
io.netty.handler.timeout.IdleStateHandler.channelInactive (IdleStateHandler.java: 257)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 251)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 237)
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive (AbstractChannelHandlerContext.java: 230)
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive (ChannelInboundHandlerAdapter.java: 75)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 251)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 237)
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive (AbstractChannelHandlerContext.java: 230)
io.netty.channel.ChannelInboundHandlerAdapter.channelInactive (ChannelInboundHandlerAdapter.java: 75)
org.apache.spark.network.util.TransportFrameDecoder.channelInactive (TransportFrameDecoder.java: 182)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 251)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 237)
io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive (AbstractChannelHandlerContext.java: 230)
io.netty.channel.DefaultChannelPipeline HeadContext.channelInactive美元(DefaultChannelPipeline.java: 1289)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 251)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive (AbstractChannelHandlerContext.java: 237)
io.netty.channel.DefaultChannelPipeline.fireChannelInactive (DefaultChannelPipeline.java: 893)
在io.netty.channel.AbstractChannel AbstractUnsafe 7.美元運行(AbstractChannel.java: 691)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks (SingleThreadEventExecutor.java: 408)
io.netty.channel.nio.NioEventLoop.run (NioEventLoop.java: 455)
在io.netty.util.concurrent.SingleThreadEventExecutor 2.美元運行(SingleThreadEventExecutor.java: 140)
io.netty.util.concurrent.DefaultThreadFactory DefaultRunnableDecorator.run美元(DefaultThreadFactory.java: 144)
java.lang.Thread.run (Thread.java: 745)
我增加了1200年代(我超時。e spark.network.timeout = 1200年代)。我仍得到網狀的錯誤。這一次發生錯誤塊複製。
17/02/24 09:10:21錯誤TransportResponseHandler:還有1請求未連接的ip - 172 - 20 - 101 - 120. ec2.internal / 172.20.101.120:46113關閉17/02/24 09:10:21錯誤NettyBlockTransferService:錯誤而上傳塊rdd_24_2312 . io .IOException:連接的ip - 172 - 20 - 101 - 120. ec2.internal / 172.20.101.120:46113關閉
我增加了1200年代(我超時。e spark.network.timeout = 1200年代)。我仍得到網狀的錯誤。這一次發生錯誤塊複製。
17/02/24 09:10:21錯誤TransportResponseHandler:還有1請求未連接的ip - 172 - 20 - 101 - 120. ec2.internal / 172.20.101.120:46113關閉17/02/24 09:10:21錯誤NettyBlockTransferService:錯誤而上傳塊rdd_24_2312 . io .IOException:連接的ip - 172 - 20 - 101 - 120. ec2.internal / 172.20.101.120:46113關閉