विल्सन स्कोर अंतराल के पायथन कार्यान्वयन?

How Not to Sort by Average Rating पढ़ने के बाद, मैं उत्सुक था कि अगर किसी के पास बर्नौली पैरामीटर के लिए विल्सन स्कोर आत्मविश्वास अंतराल के निचले बाउंड के पाइथन कार्यान्वयन है?विल्सन स्कोर अंतराल के पायथन कार्यान्वयन?

स्रोत

2012-04-05 Jeff Bauer

अधिक परिशुद्धता के लिए करता है, तो n * पी-कैप * (1-पी-कैप) एक निश्चित दहलीज से नीचे है, 30-35 कहें मैं एक सामान्य distr के बजाय डीएफ: (pos + neg) -2 के साथ एक टी-वितरण का उपयोग करता हूं। किसी भी तरह। बस मेरे दो सेंट हालांकि – luke14free

रेडिट टिप्पणी रैंकिंग, एक विवरण और अजगर कार्यान्वयन पाया जा सकता है here

#Rewritten code from /r2/r2/lib/db/_sorts.pyx 

from math import sqrt 

def confidence(ups, downs): 
    n = ups + downs 

    if n == 0: 
     return 0 

    z = 1.0 #1.44 = 85%, 1.96 = 95% 
    phat = float(ups)/n 
    return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))

स्रोत

2012-04-05 13:36:33 Steef

यदि आप सिर्फ एक लिंक पोस्ट करने जा रहे हैं, तो इसे एक टिप्पणी में करें। यदि आप इसे उत्तर के रूप में पोस्ट कर रहे हैं, तो सामग्री से अधिक जानकारी प्रदान करें और/या कोड खींचें ताकि सभी को लिंक का पालन करने की आवश्यकता न हो, और अगर लिंक मर जाता है तो भी जवाब का मूल्य होता है। – agf

मैंने स्टीफ के लिंक से कोड जोड़ा है, इसलिए मैं उसका जवाब स्वीकार कर सकता हूं। –

नीचे दिए गए संशोधन को शामिल करने के लिए यह उत्तर सही किया जाना चाहिए! – Vladtn

मुझे लगता है कि यह एक गलत विल्सन कॉल है, क्योंकि आप 1 अप 0 नीचे आप प्राप्त है, तो के लिए विल्सन स्कोर अंतराल का उपयोग करता है NaN क्योंकि आप ऋणात्मक मूल्य पर sqrt नहीं कर सकते हैं।

सही पाया जा सकता है जब लेख How not to sort by average page से गहरे लाल रंग का उदाहरण को देख:

return ((phat + z*z/(2*n) - z * sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n))

स्रोत

2012-04-10 10:33:19 Gullevek

आप वास्तव में एक आत्मविश्वास बाध्य से सीधे जेड की गणना के लिए पसंद है और स्थापित करने से बचना चाहते हैं चाहते हैं, तो numpy/scipy , आप निम्नलिखित स्निपेट,

import math 

def binconf(p, n, c=0.95): 
    ''' 
    Calculate binomial confidence interval based on the number of positive and 
    negative events observed. Uses Wilson score and approximations to inverse 
    of normal cumulative density function. 

    Parameters 
    ---------- 
    p: int 
     number of positive events observed 
    n: int 
     number of negative events observed 
    c : optional, [0,1] 
     confidence percentage. e.g. 0.95 means 95% confident the probability of 
     success lies between the 2 returned values 

    Returns 
    ------- 
    theta_low : float 
     lower bound on confidence interval 
    theta_high : float 
     upper bound on confidence interval 
    ''' 
    p, n = float(p), float(n) 
    N = p + n 

    if N == 0.0: return (0.0, 1.0) 

    p = p/N 
    z = normcdfi(1 - 0.5 * (1-c)) 

    a1 = 1.0/(1.0 + z * z/N) 
    a2 = p + z * z/(2 * N) 
    a3 = z * math.sqrt(p * (1-p)/N + z * z/(4 * N * N)) 

    return (a1 * (a2 - a3), a1 * (a2 + a3)) 


def erfi(x): 
    """Approximation to inverse error function""" 
    a = 0.147 # MAGIC!!! 
    a1 = math.log(1 - x * x) 
    a2 = (
    2.0/(math.pi * a) 
    + a1/2.0 
) 

    return (
    sign(x) * 
    math.sqrt(math.sqrt(a2 * a2 - a1/a) - a2) 
) 


def sign(x): 
    if x < 0: return -1 
    if x == 0: return 0 
    if x > 0: return 1 


def normcdfi(p, mu=0.0, sigma2=1.0): 
    """Inverse CDF of normal distribution""" 
    if mu == 0.0 and sigma2 == 1.0: 
    return math.sqrt(2) * erfi(2 * p - 1) 
    else: 
    return mu + math.sqrt(sigma2) * normcdfi(p)

स्रोत

2014-06-03 19:51:57 duckworthd

का उपयोग निरंतरता सुधार के बिना विल्सन सीआई प्राप्त करने के लिए कर सकते हैं, तो आप statsmodels.stats.proportion में proportion_confint उपयोग कर सकते हैं। निरंतरता सुधार के साथ विल्सन सीआई प्राप्त करने के लिए, आप नीचे दिए गए कोड का उपयोग कर सकते हैं।

# cf. 
# [1] R. G. Newcombe. Two-sided confidence intervals for the single proportion, 1998 
# [2] R. G. Newcombe. Interval Estimation for the difference between independent proportions:  comparison of eleven methods, 1998 

import numpy as np 
from statsmodels.stats.proportion import proportion_confint 

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
def propci_wilson_cc(count, nobs, alpha=0.05): 
    # get confidence limits for proportion 
    # using wilson score method w/ cont correction 
    # i.e. Method 4 in Newcombe [1]; 
    # verified via Table 1 
    from scipy import stats 
    n = nobs 
    p = count/n 
    q = 1.-p 
    z = stats.norm.isf(alpha/2.) 
    z2 = z**2 
    denom = 2*(n+z2) 
    num = 2.*n*p+z2-1.-z*np.sqrt(z2-2-1./n+4*p*(n*q+1))  
    ci_l = num/denom 
    num = 2.*n*p+z2+1.+z*np.sqrt(z2+2-1./n+4*p*(n*q-1)) 
    ci_u = num/denom 
    if p == 0: 
     ci_l = 0. 
    elif p == 1: 
     ci_u = 1. 
    return ci_l, ci_u 


def dpropci_wilson_nocc(a,m,b,n,alpha=0.05): 
    # get confidence limits for difference in proportions 
    # a/m - b/n 
    # using wilson score method WITHOUT cont correction 
    # i.e. Method 10 in Newcombe [2] 
    # verified via Table II  
    theta = a/m - b/n   
    l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson') 
    l2, u2 = proportion_confint(count=b, nobs=n, alpha=0.05, method='wilson') 
    ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2) 
    ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)  
    return ci_l, ci_u 


def dpropci_wilson_cc(a,m,b,n,alpha=0.05): 
    # get confidence limits for difference in proportions 
    # a/m - b/n 
    # using wilson score method w/ cont correction 
    # i.e. Method 11 in Newcombe [2]  
    # verified via Table II 
    theta = a/m - b/n  
    l1, u1 = propci_wilson_cc(count=a, nobs=m, alpha=alpha) 
    l2, u2 = propci_wilson_cc(count=b, nobs=n, alpha=alpha)  
    ci_u = theta + np.sqrt((a/m-u1)**2+(b/n-l2)**2) 
    ci_l = theta - np.sqrt((a/m-l1)**2+(b/n-u2)**2)  
    return ci_l, ci_u 


# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# single proportion testing 
# these come from Newcombe [1] (Table 1) 
a_vec = np.array([81, 15, 0, 1]) 
m_vec = np.array([263, 148, 20, 29]) 
for (a,m) in zip(a_vec,m_vec): 
    l1, u1 = proportion_confint(count=a, nobs=m, alpha=0.05, method='wilson') 
    l2, u2 = propci_wilson_cc(count=a, nobs=m, alpha=0.05) 
    print(a,m,l1,u1,' ',l2,u2) 

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 
# difference in proportions testing 
# these come from Newcombe [2] (Table II) 
a_vec = np.array([56,9,6,5,0,0,10,10],dtype=float) 
m_vec = np.array([70,10,7,56,10,10,10,10],dtype=float) 
b_vec = np.array([48,3,2,0,0,0,0,0],dtype=float) 
n_vec = np.array([80,10,7,29,20,10,20,10],dtype=float) 

print('\nWilson without CC') 
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec): 
    l, u = dpropci_wilson_nocc(a,m,b,n,alpha=0.05) 
    print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u)) 

print('\nWilson with CC') 
for (a,m,b,n) in zip(a_vec,m_vec,b_vec,n_vec): 
    l, u = dpropci_wilson_cc(a,m,b,n,alpha=0.05) 
    print('{:2.0f}/{:2.0f}-{:2.0f}/{:2.0f} ; {:6.4f} ; {:8.4f}, {:8.4f}'.format(a,m,b,n,a/m-b/n,l,u))

HTH

स्रोत

2015-12-14 23:08:37 batesbatesbates

स्वीकार कर लिया समाधान एक हार्ड-कोडेड z के मूल्य (प्रदर्शन के लिए सर्वश्रेष्ठ) का उपयोग करने लगता है।

घटना है कि आप the blogpost से गहरे लाल रंग का सूत्र का एक सीधा अजगर बराबर एक गतिशील z के मूल्य के साथ करना चाहता था में (विश्वास अंतराल के आधार पर):

import math 

import scipy.stats as st 


def ci_lower_bound(pos, n, confidence): 
    if n == 0: 
     return 0 
    z = st.norm.ppf(1 - (1 - confidence)/2) 
    phat = 1.0 * pos/n 
    return (phat + z * z/(2 * n) - z * math.sqrt((phat * (1 - phat) + z * z/(4 * n))/n))/(1 + z * z/n)

स्रोत

2017-08-30 16:29:51

विल्सन स्कोर अंतराल के पायथन कार्यान्वयन?

उत्तर

संबंधित मुद्दे