पांडस क्रॉसस्टैब: स्वरूपित दिनांक के रूप में नामित कॉलम का क्रम बदलें (mmm yy)

मैं पैंडस क्रॉसस्टैब्स के लिए कॉलम ऑर्डर करने का तरीका नहीं ढूंढ रहा हूं। मुझे विशेष रूप से मेरे कॉलम ऑर्डर करने की आवश्यकता है जो दिनांकित मानों के आधार पर स्वरूपित दिनांक (mmm yy) हैं और महीने के 3-अक्षर नाम (मिमीएम) पर क्रमबद्ध रूप से क्रमबद्ध नहीं हैं।पांडस क्रॉसस्टैब: स्वरूपित दिनांक के रूप में नामित कॉलम का क्रम बदलें (mmm yy)

अजगर 3.3

पांडा 0.12.0

f_dtflt एक पांडा dataframe है:

यहाँ मेरी कोड के विवरण हैं।

pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%b %y")), margins=True)

उत्पादन होता है:

COLLECTION_DATE Apr 13 Aug 13 Dec 12 Feb 13 Jan 13 Jul 13 Jun 13 
EW_REGIONCOLLSITE               
EAST     1964 2092 2280 2272 2757 2113 1902 
WEST     2579 2011 1003 2351 2216 1506 1823 
All     4543 4103 3283 4623 4973 3619 3725 

COLLECTION_DATE Mar 13 May 13 Nov 12 Oct 12 Sep 13 All 
EW_REGIONCOLLSITE             
EAST     1682 1981 2108  825  975 22951 
WEST     2770 3014  407  42  888 20610 
All     4452 4995 2515  867 1863 43561

मैं चाहता हूँ कॉलम तिथि आरोही द्वारा आदेश दिया जा करने के लिए

f_dtflt.COLLECTION_DATE dtype datetime64 [एनएस]

मेरे crosstab बयान है ... 12 अक्टूबर, 12 नवंबर, ... 13 जनवरी, ... सितंबर 13. मुझे पता है कि मैं तिथियों को प्रारूपित कर सकता हूं कि वे yy-mm हैं (उदा। 13-01) लेकिन इन लेबलों का इस्तेमाल एक रिपोर्ट में किया जाएगा और यह एक समझौता है जिसे मैं नहीं करना चाहता हूं।

मैं अजगर और पांडा के लिए नया हूं इसलिए कृपया अपने प्रतिक्रियाओं में किसी भी बिंदु को जोड़कर नौसिखिया की मदद करें! बहुत बहुत धन्यवाद।

विधि का @ एंडी जवाब पहले भाग के जवाब में 1

संपादित करें। चरण 3:

के साथ कोई समस्या है मैंने एंडी के सुझाव को लागू करने का प्रयास किया है और यहां इस प्रयास पर अधिक जानकारी है।

1) मैं यह देखने के लिए निम्न पंक्ति चलाता हूं कि तिथियां कैसी दिखती हैं। निम्न पंक्ति संग्रह तिथि के लिए '2012-10' जैसे मान बनाती है। (प्रिंट द्वारा "सजा"?)

print(pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M'))

2) ऊपर बयान crosstab में प्रवेश किया है, यह क्षेत्र में इस तरह के रूप में 513, 514, आदि (वास्तविक मान अंक के लिए महीने के मूल्यों में परिवर्तन?

col_0    513 514 515 516 517 518 519 520 521 522 
EW_REGIONCOLLSITE                
EAST    825 2108 2280 2757 2272 1682 1964 1981 1902 2113 
WEST    42 407 1003 2216 2351 2770 2579 3014 1823 1506 
All    867 2515 3283 4973 4623 4452 4543 4995 3725 3619 

col_0    523 524 All 
EW_REGIONCOLLSITE      
EAST    2092 975 22951 
WEST    2011 888 20610 
All    4103 1863 43561

3) जब मैं निम्नलिखित कोड चलाने के लिए, यह एक त्रुटि फेंकता है कि 'int' ऑब्जेक्ट कोई विशेषता 'strftime'

table1.columns = table1.columns.map(lambda x: x.strftime("%b %y"))

है:)

table1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M'), margins=True)

यहाँ उत्पादन होता है

मैंने इस के साथ बहुत कुछ खेला और यहां मेरे कुछ नोट्स हैं:

# This runs and creates an array of strings: '513' etc. 
pd.to_datetime(table1.columns.map(str), unit='M') 

# The last entry in table1.columns is "All" and needs to be removed. Hence [:-1] slice. 
# This also runs but seems to give years in 1630's. 
pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M') 

# This does not run because it says object is immutable 
table1.columns[:-1]=pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M') 

# This also runs but the output is weird. It seems to give an array of both dates and -1 
table1.columns.reindex(pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M')) 

# Does not run: DatetimeIndex() must be called with a collection of some kind, '513' was passed 
table1.columns = table1.columns.map(lambda x: pd.DatetimeIndex(str(x)).strftime("%b %y")) 

# Does not run: DatetimeIndex object is not callable 
table1.rename(columns=pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M'))

4) यह crosstab में स्तंभों लेबलिंग के लिए काम करता है:

table1.columns.name = 'COLLECTION_DATE'

विधि 2

@Andy एक दूसरे सुझाव दिया और मैं इसके साथ चारों ओर खेला और यह करने के लिए नहीं मिल सका काम। इस मुद्दे का एक बड़ा हिस्सा अजगर, पांडा, और numpy के साथ परिचितता की कमी है। मैंने खुद के लिए नोट्स बनाये क्योंकि मैंने इसे हल करने का प्रयास किया था। यहाँ मेरी नोट नहीं हैं:

# Working with a new concept 
# This creates row titles of 12 10, 12 11, etc. 
table1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%y %m")), margins=True) 

# This throws an error that yb is not defined 
table1.columns.map(lambda yb: "%s %s" % (y, b) for y, b in yb.split()) 

# Tried to simplify and see what happens. Runs and creates an array of lists such as [['12, '10'], ['12', '11']...] 
table1.columns.map(lambda x: x.split()) 

# Trying a different approach. This creates a numpy array of datetimes. 
tempholder=table1.columns[:-1].map(lambda x: datetime.datetime(year=int(x[0:2]), month=int(x[3:]), day=1)) 

# Noted that f_dtflt['COLLECTION_DATE'] was a dtype of datetime64[ns] but tempholder was dtype object. So had issue. 
# Convert to datetime64 
# Get error: Out of bounds nanosecond timestamp: 12-10-01 00:00:00 
tempholder=pd.to_datetime(tempholder) 

# Tempholder is an array of datetimes from the datetime module. I used the pandas date function above. 
# Need to change that and use python datetime module function. 
# Does not work: 'numpy.ndarray' object has no attribute 'apply'... 
# this is a pandas function which does not work on a numpy array. 
tempholder.apply(lambda x: x.strftime('%b %y')) 

# This works for numpy array but I can't tell what it contains. 
# print(tempholder) gives <map object at 0x0000000026C04F28> 
# tempholder gives Out[169]: <builtins.map at 0x26c04f28> 
tempholder=map(lambda x: x.strftime('%b %y'), tempholder)

स्रोत

2013-10-21 Stacy L. Gardner

क्यों आप अपने 'COLLECTION_DATE' स्तंभ पहले छांटे नहीं और उसके बाद' crosstab' कारगर साबित होंगे? – EdChum

@ एडीचम, हाँ, ने कोशिश की। क्रॉसस्टैब सॉर्ट ऑर्डर को मजबूर करता है।क्रॉसस्टैब के लिए कोई तर्क नहीं मिला जो मुझे सॉर्ट ऑर्डर को नियंत्रित करने या इसे बंद करने देता है। –

मैंने इस समस्या से थोड़ा अलग कोण से संपर्क किया और एक फ़ंक्शन बनाया जिसे पांडा में क्रॉसस्टैब में कॉलम ऑर्डर करने की सामान्य विधि के रूप में उपयोग किया जा सकता है। यह एक पिवट टेबल के लिए भी काम कर सकता है लेकिन मैंने परीक्षण नहीं किया और न ही मैंने विवरण देखे। मुझे लगता है कि इसका उपयोग पंक्ति लेबल को ऑर्डर करने के लिए भी किया जा सकता है लेकिन मैंने इसके लिए प्रयास नहीं किया।

यह कॉलम लेबल जैसे "12 10_ ऑक्ट 12" और 12 11_Nov 12 "के साथ एक क्रॉसस्टैब बनाता है। लेबल प्रभावी रूप से क्रॉसस्टैब के वर्णमाला को मेरे पक्ष में काम करने के लिए मजबूर करता है। लेबल के वर्णमाला अनुभाग को" _ " और लेबल है कि मैं चाहता हूँ का उपयोग करने के

table_1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%y %m_%b %y")), margins=True)

आउटपुट:।

"COLLECTION_DATE 12 10_Oct 12 12 11_Nov 12 12 12_Dec 12 13 01_Jan 13 
EW_REGIONCOLLSITE               
EAST      825   2108   2280   2757 
WEST       42   407   1003   2216 
All       867   2515   3283   4973 

COLLECTION_DATE 13 02_Feb 13 13 03_Mar 13 13 04_Apr 13 13 05_May 13 
EW_REGIONCOLLSITE               
EAST      2272   1682   1964   1981 
WEST      2351   2770   2579   3014 
All      4623   4452   4543   4995 

COLLECTION_DATE 13 06_Jun 13 13 07_Jul 13 13 08_Aug 13 13 09_Sep 13 
EW_REGIONCOLLSITE               
EAST      1902   2113   2092   975 
WEST      1823   1506   2011   888 
All      3725   3619   4103   1863 

COLLECTION_DATE  All 
EW_REGIONCOLLSITE   
EAST    22951 
WEST    20610 
All    43561 "

समारोह और कॉल:

def clean_label(label_list, margins='False'): 
    ''' This function takes the column index list from a crosstab (or pivot table?) in pandas and removes the 
    part of the label before and including the "_". This allows the user to order the columns manually by creating 
    an alphabetical index followed by "_" and then the label that they would like to use. For example, a label such as 
    ['a_Positive', 'b_Negative'] will be converted to ['Positive', 'Negative']. Another example would be to order dates 
    in a table from ['12 10_Oct 12', '12 11_Nov 12'] to ['Oct 12', 'Nov 12'] 

    margins = False if the crosstab was created without margins and therefore does not have an "All" at the end of the list 
    margins = True if the crosstab was created with margins and therefore has an "All" at the end of the list 
    ''' 
    corrected_list=list() 

    # If one creates margins in pivot/crosstab, will get the last column of "All" 
    # This has to be removed from the following code or it will throw an error. 
    if margins: 
     convert_list = label_list[:-1] 
    else: 
     convert_list = label_list 

    for l in convert_list: 
     x,y=l.split('_') 
     corrected_list.append(y) 

    if margins: 
     corrected_list.append('Total') # Renames "All" to "Total" 

    return corrected_list 

# Change the labels on the crosstab table 
table_1.columns=clean_label(table_1.columns, margins=True) 

# Change name of columns 
table_1.columns.name = 'Month of Collection' 

# Change name of rows 
table_1.index.name = 'Region'

,210

आउटपुट (अंतिम तालिका):

"Month of Collection Oct 12 Nov 12 Dec 12 Jan 13 Feb 13 Mar 13 Apr 13 
Region                   
EAST     825 2108 2280 2757 2272 1682 1964 
WEST      42  407 1003 2216 2351 2770 2579 
All      867 2515 3283 4973 4623 4452 4543 

Month of Collection May 13 Jun 13 Jul 13 Aug 13 Sep 13 Total 
Region                
EAST     1981 1902 2113 2092  975 22951 
WEST     3014 1823 1506 2011  888 20610 
All     4995 3725 3619 4103 1863 43561 "

स्रोत

2013-10-23 16:53:08

आप एक स्ट्रिंग के रूप साल के महीने के रूप में किया है (और यह सही क्रम में है), तो आप रिवर्स सकता है:

In [1]: df = pd.DataFrame([['a', 'b']], columns=['12 Mar', '12 Jun']) 

In [2]: df.columns.map(lambda yb: ' '.join(reversed(yb.split()))) 
Out[2]: array(['Mar 12', 'Jun 12'], dtype=object) 

In [3]: df.columns = df.columns.map(lambda yb: ' '.join(reversed(yb.split())))

मैं सुझाव दिया था आप कर सकते थे समय के साथ ऐसा करते हैं:

pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M')

तो तुम्हारे जाने के बाद प्रारूप की आवश्यकता करने के लिए स्तंभ साफ कर सकते हैं:

df.columns = df.columns.map(lambda x: x.strftime("%b %y")) 
df.columns.name = 'COLLECTION_DATE'

लेकिन यह अवधि सूचकांक int (संभवतः एक बग?) में बदलता प्रतीत होता है।

स्रोत

2013-10-21 19:12:08

एंडी, आपका उत्तर मुझे वास्तव में नज़दीकी होने में मदद करता है। मैंने आपको यह बताने के लिए अपनी मूल पोस्ट में एक संपादन जोड़ा है कि हैंगअप कहां हैं। यह मेरी पहली बार पोस्टिंग है इसलिए उम्मीद है कि ऐसा करने का यह सही तरीका है। –

@ StacyL.Gardner मैं आपका संपादन नहीं देख सकता! :(मुद्दा क्या है? –

यह अभी है। –

पांडस क्रॉसस्टैब: स्वरूपित दिनांक के रूप में नामित कॉलम का क्रम बदलें (mmm yy)

उत्तर

संबंधित मुद्दे