मुझे चीजों को पूरा करने के लिए खोल कमांड के लिए प्राथमिकता है। मेरे पास एक बहुत बड़ी फाइल है - लगभग 2.8 जीबी और सामग्री जेएसओएन की है। सब कुछ एक पंक्ति पर है, और मुझे बताया गया कि वहां कम से कम 1.5 मिलियन रिकॉर्ड हैं।स्ट्रिंग को एक बहुत बड़ी फ़ाइल में स्ट्रिंग और प्रतिस्थापित करें
मुझे उपभोग के लिए फ़ाइल तैयार करनी होगी। प्रत्येक रिकॉर्ड अपनी लाइन पर होना चाहिए। नमूना:
{"RomanCharacters":{"Alphabet":[{"RecordId":"1",...]},{"RecordId":"2",...},{"RecordId":"3",...},{"RecordId":"4",...},{"RecordId":"5",...} }}
या, निम्नलिखित का उपयोग करें ...
{"Accounts":{"Customer":[{"AccountHolderId":"9c585258-c94c-442b-a2f0-1ebbcc274795","Title":"Mrs","Forename":"Tina","Surname":"Wright","DateofBirth":"1988-01-01","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"1","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"2","Superseded":"Yes" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"No","Index":"3","Superseded":"No" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"4","Superseded":"Yes" }, {"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"No","Index":"5","Superseded":"NO" },{"Contact_Info":"15482475584","TypeId":"Mobile_Phone","PrimaryFlag":"No","Index":"6","Superseded":"No" }],"Address":[{"AddressPtr":"5","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB100KP","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"6","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB10V6T","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"6884133655531279","Field_B":"887.07","Field_C":"A Loan Product",...,"FieldY_":"2015-09-18","Field_Z":"24275627"}]},{"AccountHolderId":"92a5788f-cd8f-423d-ae5f-4eb0ceb457fd","_Title":"Dr","_Forename":"Christopher","_Surname":"Carroll","_DateofBirth":"1977-02-02","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"7","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"8","Superseded":"Yes" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"No","Index":"9","Superseded":"No" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"10","Superseded":"Yes" }],"Address":[{"AddressPtr":"11","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB11TXF","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"12","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB11O8W","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"4121879819185553","Field_B":"887.07","Field_C":"A Loan Product",...,"Field_X":"2015-09-18","Field_Z":"25679434"}]},{"AccountHolderId":"4aa10284-d9aa-4dc0-9652-70f01d22b19e","_Title":"Dr","_Forename":"Cheryl","_Surname":"Ortiz","_DateofBirth":"1977-03-03","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"13","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"14","Superseded":"Yes" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"No","Index":"15","Superseded":"No" },{"Contact_Info":"[email protected]","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"16","Superseded":"Yes" }],"Address":[{"AddressPtr":"17","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB12SQR","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"18","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB12BAQ","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"3288214945919484","Field_B":"887.07","Field_C":"A Loan Product",...,"Field_Y":"2015-09-18","Field_Z":"66264768"}]}]}}
अंतिम परिणाम होना चाहिए:
{"RomanCharacters":{"Alphabet":[{"RecordId":"1",...]},
{"RecordId":"2",...},
{"RecordId":"3",...},
{"RecordId":"4",...},
{"RecordId":"5",...} }}
का प्रयास किया गया कमांड:
sed -e 's/,{"RecordId"/}]},\n{"RecordId"/g' sample.dat
awk '{gsub(",{\"RecordId\"",",\n{\"RecordId\"",$0); print $0}' sample.dat
प्रयास किया आदेशों छोटे फ़ाइलों के लिए पूरी तरह से ठीक काम करता है। लेकिन यह 2.8 जीबी फ़ाइल के लिए काम नहीं करता है जिसे मुझे कुशल बनाना चाहिए। बिना कारण के 10 मिनट के बाद सेड मिडवे छोड़ देता है और कुछ भी नहीं किया जाता था। कई घंटों के बाद सेगमेंटेशन फॉल्ट (कोर डंप) कारण के साथ अजीब गलती हुई। मैंने पर्ल की खोज की कोशिश की और प्रतिस्थापित किया और "स्मृति से बाहर" कहने में त्रुटि मिली।
कोई भी मदद/विचार बहुत अच्छा होगा!
मेरी मशीन पर अतिरिक्त जानकारी:
- से अधिक 105 जीबी डिस्क स्थान उपलब्ध।
- 8 जीबी मेमोरी
- 4 कोर सीपीयू
- उबंटू 14,04
हमें कुछ बेहतर नमूना डेटा की आवश्यकता है - आपके डेटा का डंप होना आवश्यक नहीं है, लेकिन समस्या का चित्रण हाथ में है। इसके अलावा - क्या आपने एक पार्सर का उपयोग करने पर विचार किया है? – Sobrique
मूल समस्या, मुझे लगता है कि यह है कि उन सभी 3 औजारों को लाइन-पर-एक बार पढ़ा जाता है और इस प्रकार "एकल विशाल रेखा" से घिरा हुआ होता है। न्यूलाइन के साथ कॉमा को प्रतिस्थापित करने के लिए 'tr', '' \ 012'' 'जैसी कुछ चीज़ों के साथ पहले प्री-प्रोसेसिंग का प्रयास करें। फिर लाइन-पर-ए-टाइम टूल्स बेहतर काम करेंगे। –
फिर से perl के साथ प्रयास करें लेकिन $/"," सेट करें। Sed (--unbuffered) के लिए "-u" पैरामीटर भी आज़माएं। – neuhaus