GE实习之Target Cost

从csa.aviation.ge.com/tcta/index#/targetSummary网址下载TCT的Input数据,处理Input数据并画图保存至pdf文件。

下载Input数据

  • 步骤
    Analytics –> Target Cost Tool –> Target Baseline Summary Report –> Baseline Summary Criteria Selection (Fleet Model : GE90-Growth ; Fleet Year : 2017 ; Cost Type : All In Cost) –> Please Set The Configuration (SV Parameters : value : SHOP : TEXL offload/Wales ; SV Number : 1/2/3/4/5 ; Region : Non SEO/SEO/SEO Plus;Block:2012、2015、EIS Old、EIS New)
    共下载了100个.xlsx文件,命名格式为 :Region-SHOP-SV Number,例如 :Non SEO-TEXL offload-1

处理Input数据

  • 处理规则
    1.处理Input中的100个.xlsx文件,生成Output中的100个.xlsx文件;Input中的每个.xlsx文件有两个sheet,分别为NON-BUM Target Summary和BUM Target Summary,每个sheet的各列为Row labels(代表Sub Module)、II、III、IV、SA;将两个sheet的各Sub Module的对应level的花费相加,生成Output中的各.xlsx文件的Target工作表。
    2.将Output中的100个.xlsx文件合并为Output/Target_2017.xlsx;将TEXL 2017 60 PR SV Cost Analytics May 4.xlsx中的Target_Summary数据另存为Target_2016.xlsx文件;将Output/Target_2017和Target_2016.xlsx合并为Target.xlsx文件。
    3.逆转换列,将II、III、IV、SA逆转换为WS列和Module Target Cost列,存为Target_tran.xlsx文件。
    4.TCT.py
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    import pandas as pd
    import os
    #Statistic Input file name
    file_name_Input = []
    for file in os.Listdir('Input'):
    file_name_Input.append(file)
    print('the number of file:',len(file_name_Input))

    #求和时若有一个为空无法相加时,令值为另外一个
    def NONBUM_add_BUM(data, level, NONBUM, BUM):
    data[level] = np.where(data[NONBUM].isnull(), data[BUM], np.where(data[BUM].isnull(), data[NONBUM], data[NONBUM] + data[BUM]))

    #Handle Input per Excel
    for excel_name in file_name_Input:
    data_NON_BUM = pd.read_excel('Input/' + excel_name, 'NON-BUM Target Summery')
    data_BUM = pd.read_excel('Input/' + excel_name, 'BUM Target Summery')
    data_NON_BUM = pd.DataFrame(data_NON_BUM[data_NON_BUM['Row labels'] != 'Grand Total'])
    data_BUM = pd.DataFrame(data_BUM[data_BUM['Row labels'] != 'Grand Total'])
    #merge
    data = pd.merge(data_NON_BUM, data_BUM, on = 'Row labels', how = 'left')
    NONBUM_add_BUM(DATA, 'II', 'II_x', 'II_y')
    NONBUM_add_BUM(DATA, 'III', 'III_x', 'III_y')
    NONBUM_add_BUM(DATA, 'IV', 'IV_x', 'IV_y')
    NONBUM_add_BUM(DATA, 'SA', 'SA_x', 'SA_y')
    columns = ['NA','II_x','II_y','III_x','III_y','IV_x','IV_y','SA_x','SA_y']
    for col in columns:
    del data[col]
    #Target Add New Columns
    writer = pd.Excelwriter('Output/Sub' + excel_name)
    data_NON_BUM.to_excel(writer, 'NON-BUM Target Summery', index = False)
    data_BUM.to_excel(writer, 'BUM Target Summery', index = False)
    data_Target = pd.DataFrame(data[data['Row labels'] != 'Grand Total'])
    index = ecxel_name.find('-')
    index2 = excel_name.find('-', index + 1)
    index3 = excel_name.find('-', index2 + 1)
    data_Target['Shop'] = excel_name[0:index]
    data_Target['Block'] = excel_name[index + 1:index2]
    data_Target['Region'] = excel_name[index2 + 1:index3]
    data_Target['SV Count'] = excel_name[index3 + 1:index3 + 2]
    data_Target['Year'] = 2017
    data_Target.to_excel(writer,'Target',index = False)
    writer.save()
    #statistic Output Sub File Name
    file_name_Output = []
    for file in os.listdir('Output/Sub'):
    file_name_Output.append(file)
    #Combine Output Sub File
    data_Target_Total = pd.DataFrame()
    for excel_name in file_name_Output:
    data_Target = pd.read_excel('Output/Sub/' + excel_name,'Target')
    data_Target_Total = pd.concat([data_Target_Total, data_Target])
    #Generate the final file
    writer = pd.ExcelWriter('Output/Target_2017.xlsx')
    data_Target_Tatol.to_excel(writer, 'Target', index = False)
    writer.save()
    #Combine the data of 2017 and 2016
    target_2017 = pd.read_excel('Output/Target_2017.xlsx')
    target_2016 = pd.read_excel('Target_2016.xlsx')
    target_2016['Year'] = 2016
    target = pd.concat([target_2017, target_2016])
    writer = pd.ExcelWriter('Target.xlsx')
    target.to_excel(writer,'Target', index = False)
    writer.save()
    #generated Target_tran.xlsx
    target = pd.read_excel('Target.xlsx')
    target_tran = pd.melt(target,id_vars = ['Roew labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
    none_MTC = (target_tran['Module Target Cost'].isnull())|(targte_tran['Module Target Cost'].apply(lambda x:str(x).isspace()))
    target_tran = target_tran[~none_MTC]
    target_tran.to_excel('Target_tran.xlsx', index = False)

画图存入pdf文件

  • 简介:信息可视化(也叫绘图)是数据分析中最重要的工作之一。做一个可交互的数据可视化也许是工作的最终目标。
    matplotlib是一个用于创建出版质量图表的桌面绘图包(主要是2D方面)。matplotlib支持各种操作系统上许多不同的GUI后端,而且还能将图片导出为各种常见的矢量(vector)和光栅(raster)图:PDF、SVG、JPG、PNG、BMP、GIF等。
  • 任务目标:按Region、Shop、SV Count的组合画图分析,生成TCT.pdf文件的代码TCT_Draw_pdf.py如下所示:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    import pandas as pd
    import numpy as np
    import matplotlib.pyploy as plt
    from matplotlib.backends.backend_pdf import PdfPages
    target = pd.read_excel('Target_tran.xlsx')
    #strip
    def columns_strip(name_col):
    target[name_col] = target[name_col].map(str)
    target[name_col] = target[name_col].map(str.strip)
    columns = ['Block', 'Region', 'Shop']
    for nae_col in columns:
    columns_strip(name_col)
    #Draw
    Region_List = ['Non SEO', 'SEO', 'SEO Plus']
    Shop_List = ['TEXL', 'Wales']
    SV_Count_List = [1, 2, 3, 4, 5]
    #pdf
    with PdfPages('TCT.pdf') as pdf:
    for Region in Region_List:
    for Shop in Shop_List:
    for SV_Count in SV_Count_List:
    try:
    #Filter
    target_sub = target.loc[(targete['Region'] == Region)&(target['Shop'] == shop)&(target['SV Count'] == SV_Count)]
    picture_name = Region + '-' + 'Shop' + '-' + str(SV_Count)
    #plot
    data = pd.DataFrame(pd.pivot_table(target_sub, index = ["Row labels", "Year"], columns = ["Block"], values = ["Module Target Cost"], aggfunc = [np.mean]))
    data.plot(kind = 'bar')
    #x轴标签重构
    xlabels = []
    for i in range(len(data.index.labels[0])):
    if str(data.index.levels[0][data.index.labels[0][i]]) == str(data.index.levels[0][data.index.labels[0][i-1]]):
    xlabels.append(str(data.idex.levels[1][data.index,labels[1][i]]))
    else:
    xlabels.append(str(data.idex.levels[0][data.index,labels[0][i]]) + ' ' + str(data.idex.levels[1][data.index,labels[1][i]]))
    #修改边距,wspace和hspace用于控制宽度和高度的百分比
    plt.subplots_adjust(left = 0.20, wspace = 0.25, hspace = 0.25, bottom = 0.40, top = 0.92)#Adjust the display position of the graph to avoid incomplete display on abscissa
    #标题
    plt.title(picture_name)
    #图例
    plt.legend(["2012", "2015", "EIS New", "EIS Old"], title = "Block")
    #x轴名称
    plt.xlabel("Sub Module,Year")
    #y轴名称
    plt.ylabel("Arg Module Target Cost")
    plt.xticks(fontsize = 6)
    #获得当前的Axes对象ax
    ax = plt.gca()
    #设置x轴刻度标签
    ax.set_xticklabels(xlabels)
    #将y轴刻度标签改为货币格式
    fmt = '${x:,.0f}'
    tick = ticker.strMethodFormatter(fmt)
    ax.yaxis.set_major_formatter(tick)
    #画网格
    plt.grid(linestyle = 'dotted')
    #将当前图表保存到文件
    pdf.savefig()
    plt.close()
    except:
    pass

##计算2016年和2017年的差值
按Row labels、Shop、Block、Region、SV Count、WS计算2016年和2017年的花费差值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import pandas as pd
target_2016 = pd.read_excel('Target_2016.xlsx')
target_2017 = pd.read_excel('Output/Target_2017.xlsx')
#Unpivot target_2016 ws and Module Target Cost Columns
target_2016['Year'] = 2016
target_2016 = pd.melt(target_2016, id_vars = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
none_MTC = (target_2016['Module Target Cost'].isnull())|(targte_2016['Module Target Cost'].apply(lambda x:str(x).isspace()))
target_2016 = target_2016[~none_MTC]
target_2016.to_excel('Delta/Target_2016_unpivot.xlsx', index = False)
#Unpivot target_2017 ws and Module Target Cost Columns
target_2017 = pd.melt(target_2017, id_vars = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'Year'], var_name = 'WS', value_name = 'Module Target Cost')
none_MTC = (target_2017['Module Target Cost'].isnull())|(targte_2017['Module Target Cost'].apply(lambda x:str(x).isspace()))
target_2017 = target_2017[~none_MTC]
target_2017.to_excel('Delta/Target_2017_unpivot.xlsx', index = False)
#target_2016_unpivot and target_2017_unpivot
target_2016_u = pd.read_excel('Delta/Target_2016_unpivot.xlsx')
target_2017_u = pd.read_excel('Delta/Target_2017_unpivot.xlsx')
#strip
def columns_strip(target, name_col):
target[name_col] = target[name_col].map(str)
target[name_col] = target[name_col].map(str.strip)
columns = ['Row labels', 'Block', 'Region', 'Shop', 'WS']
for name_col in columns:
columns_strip(target_2016_u, name_col)
columns_strip(target_2016_u, name_col)
target = pd.merge(target_2016_u, target_2017_u, on = ['Row labels', 'Shop', 'Block', 'Region', 'SV Count', 'WS'], how = 'left')
target['Delta'] = target['Module Target Cost_y'] - target['Module Target Cost_x']
del target['Year_x']
del target['Year_y']
#Renamed
target.rename(columns = {'Row labels':'Sub Module'}, inplace = True)
target.rename(columns = {'Module Target Cost_x':'2016_cost'}, inplace = True)
target.rename(columns = {'Module Target Cost_y':'2017_cost'}, inplace = True)
target.to_excel('Delta/Target_Delta.xlsx', index = False)