Hi Joe,
Below is the Python code I came up. It uses a CSV file as the input and assumes that the first row of the CSV are the field names so it skips that row. Since I cannot attach a file here, I'll put the sample CSV file in a code block below the Python code.
Python Code:
#------------------------------------------------------------------------------- # Filname: calcmedian.py # Description: This script reads data in "bins" (groups or # ranges) and calculates a median. Sample data is # population by 5-year age groups from the US Census # American Community Survey. # # Sample file AgeBins.csv: contains age group data for # males in Maricopa County, Arizona from the # 2017 American Community Survey 5-year estimates # See table B01002 for median age comparison/check # # First column contains the starting # number for the range (i.e. 0-4 age group is 0; # 5-9 is 5; 10-14 is 10, etc) # Second column is the population for that group # # Author: Jami Dennis # Created: 1/10/2019 #------------------------------------------------------------------------------- from __future__ import division import csv # Function to read in the CSV containing bin data def read(filename): # initialize the directory agebins = {} with open(filename, 'U') as csvfile: reader = csv.reader(csvfile) next (csvfile) #skip first row headers # read in each row into the dictionary # make sure to set them as integers for row in reader: agebins[int(row[0])] = int(row[1]) # print(row) # this just prints your inputs - used for testing return agebins # have the user enter the filename agebins = read(raw_input('Enter the filename: ')) # intialize a running total to aggregate the values total = 0 # determin the half way point simply by divding the total # population by 2 median_index = (sum(agebins.values()))/2 keyList = sorted(agebins.keys()) # initialize a dictionary for the cumulative values for each bin aggList = {0:0} # step through the sorted dictionary until the cumulative value # exceeds the mid point (median_index) for i,value in enumerate(keyList): # calculate the cumulative value total += agebins[value] # add cumulative value to aggList aggList[i]= total # check if the cumulative value has exceeded median_index if total > median_index: # get the cumulative pop for the bin just before # we exceeded the median_index using aggList[i-1] # and subtract that from the median_index to find out # how much much into the bin it is to reach the # median_index howMany = median_index - aggList[i-1] # get the population in the bin (ageBin) for # the keyList[i] bin (the one before we exceeded # the median_index ageBin= agebins[keyList[i]] # determine the percent of how far into this bin # the median_index falls pctInto = howMany / ageBin # determine the span of the bin using # keyList[i+1]-keyList[i] # and multiply it by the percent into the bin multiSpan = (keyList[i+1]-keyList[i]) * pctInto # cacluate the median by adding the result to # the number for the beginning of the range median = keyList[i] + multiSpan print 'the median is: ', "{:.1f}".format(median) # we found the median so break out of the for loop break # mission accomplished This is the sample CSV file. It is bins of Age data.
BinStart,Amount 0,142040 5,146847 10,148778 15,144749 20,146136 25,155721 30,147226 35,135861 40,137356 45,133662 50,130913 55,118030 60,103880 65,91697 70,66802 75,47965 80,30810 85,26991
Again, I'm no expert, so if anyone has ideas for improving these scripts, please feel free to share!