I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc weather.
0057011060999991928010112004+67500+012067FM-12+001199999V0202001N012319999999N0500001N9+00281+99999102171ADDAY181999GF108991999999999999001001MD1710261+9999MW1801
0062011060999991928010206004+67500+012067FM-12+001199999V0201801N00931220001CN0200001N9+00281+99999100901ADDAA199002091AY121999GF101991999999017501999999MD1810461+9999
0108011060999991928010212004+67500+012067FM-12+001199999V0201601N009319999999N0100001N9+00111+99999100062ADDAY171999GF108991999011012501001001MD1810542+9999MW1681EQDQ01+000042SCOTLCQ02+100063APOSLPQ03+000542APC3
0087011060999991928010306004+67500+012067FM-12+001199999V0202001N022619999999N0100001N9+00501+99999098781ADDAA199001091AY161999GF108991999011004501001001MD1310061+9999MW1601EQDQ01+000042SCOTLC
0057011060999991928010312004+67500+012067FM-12+001199999V0202301N01541004501CN0040001N9+00001+99999098951ADDAY161999GF108991081061004501999999MD1210201+9999MW1601
#!/usr/bin/env python
import sys
(last_key, max_val) = (None, -sys.maxint)
for line in sys.stdin:
(key, val) = line.strip().split("\t")
if last_key and last_key != key:
print "%s\t%s" % (last_key, max_val)
(last_key, max_val) = (key, int(val))
else:
(last_key, max_val) = (key, max(max_val, int(val)))
if last_key:
print "%s\t%s" % (last_key, max_val)
First of all, your shown data has no tabs, so it's not clear why you've shown code that splits lines on tabs and finds the max. Not an average.
To find an average, you'll need to collect all seen values into a list (
values.append(int(val))), then you canfrom statistics import meanand callmean(values)at the end of the loopI'd highly suggest that you use
mrjoborpysparkinstead