Java lang Runtime Exception

397 Views Asked by At

I am trying to find out which node has the biggest circle and its size in an ego network (an example is below). I ran the job locally and it's working fine.

What the input files look like (there are 10 files):

circle0    475    373    461    391    376    524    348    436
circle1    378    412    513    475    438    669    553    373    514    558    651    431    683    614    461    506    544    668    363    400    542    637    391    566    559    395    428    500    606    604    591    567    607    374    465    580    496    376    492    370    524    641    423    601    394    676    107    348    515    590    674    563    483    434    436    561    556
circle2    649    558    594    173    428    427    604    567    607    107    348    563    667
circle3    611    603    597    579    592    684    677
circle4    647    583    661    578    576    615    600    595    582    599    500    635    632    675    662    670    628    658    643    659    577    665    681    640    650    627
circle5    631    584    602    639    678    682    660    616    679
circle6    622    631    621    611    596    636    584    680    625    619    620    609    588    618    573    629    666    603    597    637    672    612    602    589    579    639    664    678    575    685    623    644    592    682    684    574    617    626    641    655    605    601    653    630    654    598    107    590    677    674    616    633    483    679    638    422    663    657

They basically show the nodes a certain node has in its circles in an ego network.

My mapper code:

#!/usr/bin/python
import os
import sys
import fileinput
#try:
#    filename = os.environ['mapreduce_map_input_file']
#except KeyError:
#    filename = os.environ['map_input_file']
#f = open("log.txt", "a")
#f.write("file1" + filename)
#fileinput.input(files="/home/aosaf/Documents/semester5/MMDS/facebook/*")
#filename= fileinput.filename()

circleSizes = []
first = 1
for line in sys.stdin:
    
        if first == 0:
            print("filename", "\t", circleSizes)
        first = 0
        circleSizes = []
   
        line = line.strip()
        line = line.split()
        val = len(line)-1
       

My reducer code:

#!/usr/bin/python
import sys
import os

maxcircles = []
node = []
for l in sys.stdin:
    l = l.strip()
    file, circles = l.split("\t")
    
    #print(file)
    #circles = circles.split(",")
    #circles.remove("\n")
    circles = circles.strip()
    circles = circles.strip("[")
    circles = circles.strip("]")
    circles = circles.replace("," ,"")
    circles = circles.split()
    
       
    #print(circles)

    #print(max(circles))
   

index = maxcircles.index(max(maxcircles))
print(node[index], " ", max(maxcircles))

The command I'm using to run:

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -file /home/aosaf/Documents/semester5/MMDS/mapper.py -mapper "python3 /home/aosaf/Documents/semester5/MMDS/mapper.py" -file /home/aosaf/Documents/semester5/MMDS/reducer.py -reducer "python3 /home/aosaf/Documents/semester5/MMDS/reducer.py" -input /a11/input/* -output /a11/out3

The output I'm getting:

2022-10-01 18:45:01,400 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/home/aosaf/Documents/semester5/MMDS/mapper.py, /home/aosaf/Documents/semester5/MMDS/reducer.py, /tmp/hadoop-unjar2088578457222702206/] [] /tmp/streamjob5489278541060527988.jar tmpDir=null
2022-10-01 18:45:02,534 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-10-01 18:45:02,713 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-10-01 18:45:02,913 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoopuser/.staging/job_1664614654855_0014
2022-10-01 18:45:03,035 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:03,130 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:03,153 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:03,239 INFO mapred.FileInputFormat: Total input files to process : 50
2022-10-01 18:45:03,451 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:03,872 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:03,879 INFO mapreduce.JobSubmitter: number of splits:50
2022-10-01 18:45:04,012 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-01 18:45:04,032 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1664614654855_0014
2022-10-01 18:45:04,032 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-10-01 18:45:04,259 INFO conf.Configuration: resource-types.xml not found
2022-10-01 18:45:04,260 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-10-01 18:45:04,339 INFO impl.YarnClientImpl: Submitted application application_1664614654855_0014
2022-10-01 18:45:04,400 INFO mapreduce.Job: The url to track the job: http://aosaf:8088/proxy/application_1664614654855_0014/
2022-10-01 18:45:04,402 INFO mapreduce.Job: Running job: job_1664614654855_0014
2022-10-01 18:45:11,514 INFO mapreduce.Job: Job job_1664614654855_0014 running in uber mode : false
2022-10-01 18:45:11,516 INFO mapreduce.Job:  map 0% reduce 0%
2022-10-01 18:45:25,727 INFO mapreduce.Job:  map 6% reduce 0%
2022-10-01 18:45:26,737 INFO mapreduce.Job:  map 12% reduce 0%
2022-10-01 18:45:40,866 INFO mapreduce.Job:  map 24% reduce 0%
2022-10-01 18:45:53,033 INFO mapreduce.Job:  map 26% reduce 0%
2022-10-01 18:45:54,040 INFO mapreduce.Job:  map 36% reduce 0%
2022-10-01 18:46:06,132 INFO mapreduce.Job:  map 44% reduce 0%
2022-10-01 18:46:07,137 INFO mapreduce.Job:  map 46% reduce 0%
2022-10-01 18:46:16,235 INFO mapreduce.Job:  map 52% reduce 0%
2022-10-01 18:46:17,242 INFO mapreduce.Job:  map 56% reduce 15%
2022-10-01 18:46:23,306 INFO mapreduce.Job:  map 56% reduce 19%
2022-10-01 18:46:27,338 INFO mapreduce.Job:  map 62% reduce 19%
2022-10-01 18:46:28,345 INFO mapreduce.Job:  map 66% reduce 19%
2022-10-01 18:46:29,361 INFO mapreduce.Job:  map 66% reduce 21%
2022-10-01 18:46:35,419 INFO mapreduce.Job:  map 66% reduce 22%
2022-10-01 18:46:38,454 INFO mapreduce.Job:  map 74% reduce 22%
2022-10-01 18:46:39,459 INFO mapreduce.Job:  map 76% reduce 22%
2022-10-01 18:46:41,473 INFO mapreduce.Job:  map 76% reduce 25%
2022-10-01 18:46:50,544 INFO mapreduce.Job:  map 86% reduce 25%
2022-10-01 18:46:53,559 INFO mapreduce.Job:  map 86% reduce 29%
2022-10-01 18:47:00,659 INFO mapreduce.Job:  map 92% reduce 29%
2022-10-01 18:47:01,664 INFO mapreduce.Job:  map 96% reduce 29%
2022-10-01 18:47:05,694 INFO mapreduce.Job:  map 98% reduce 32%
2022-10-01 18:47:06,697 INFO mapreduce.Job:  map 100% reduce 32%
2022-10-01 18:47:06,698 INFO mapreduce.Job: Task Id : attempt_1664614654855_0014_r_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2022-10-01 18:47:07,719 INFO mapreduce.Job:  map 100% reduce 0%
2022-10-01 18:47:11,744 INFO mapreduce.Job: Task Id : attempt_1664614654855_0014_r_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2022-10-01 18:47:16,776 INFO mapreduce.Job: Task Id : attempt_1664614654855_0014_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2022-10-01 18:47:22,818 INFO mapreduce.Job:  map 100% reduce 100%
2022-10-01 18:47:22,828 INFO mapreduce.Job: Job job_1664614654855_0014 failed with state FAILED due to: Task failed task_1664614654855_0014_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0

2022-10-01 18:47:22,928 INFO mapreduce.Job: Counters: 40
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=11498390
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=4812469
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=150
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
        HDFS: Number of bytes read erasure-coded=0
    Job Counters
        Failed reduce tasks=4
        Launched map tasks=50
        Launched reduce tasks=4
        Data-local map tasks=50
        Total time spent by all maps in occupied slots (ms)=536039
        Total time spent by all reduces in occupied slots (ms)=81499
        Total time spent by all map tasks (ms)=536039
        Total time spent by all reduce tasks (ms)=81499
        Total vcore-milliseconds taken by all map tasks=536039
        Total vcore-milliseconds taken by all reduce tasks=81499
        Total megabyte-milliseconds taken by all map tasks=548903936
        Total megabyte-milliseconds taken by all reduce tasks=83454976
    Map-Reduce Framework
        Map input records=176824
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=300
        Input split bytes=4730
        Combine input records=0
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=12411
        CPU time spent (ms)=42860
        Physical memory (bytes) snapshot=16349417472
        Virtual memory (bytes) snapshot=127486652416
        Total committed heap usage (bytes)=14432075776
        Peak Map Physical memory (bytes)=341741568
        Peak Map Virtual memory (bytes)=2555748352
    File Input Format Counters
        Bytes Read=4807739
2022-10-01 18:47:22,928 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

How I ran the job locally:

cat /home/aosaf/Documents/semester5/MMDS/facebook/* | python3 /home/aosaf/Documents/semester5/MMDS/mapper.py | python3  /home/aosaf/Documents/semester5/MMDS/reducer.py

And what the output looks like:

filename    308

which shows biggest circle among all nodes.

0

There are 0 best solutions below