मेटल कर्नेल नए मैकबुक प्रो (देर से 2016) पर सही ढंग से व्यवहार नहीं कर रहे हैं GPUs

मैं मैकोज़ प्रोजेक्ट पर काम कर रहा हूं जो GPU पर छवि प्रसंस्करण के लिए स्विफ्ट और धातु का उपयोग करता है। पिछले हफ्ते, मुझे अपना नया 15-इंच मैकबुक प्रो (देर से 2016) मिला और मेरे कोड के साथ कुछ अजीब देखा: कर्नेल जो एक बनावट को लिखना था, ऐसा नहीं लगता था ...मेटल कर्नेल नए मैकबुक प्रो (देर से 2016) पर सही ढंग से व्यवहार नहीं कर रहे हैं GPUs

बहुत सारे खुदाई, मैंने पाया कि गणना से संबंधित जीपीयू धातु (एएमडी रेडॉन प्रो 455 या इंटेल (आर) एचडी ग्राफिक्स 530) से संबंधित है।

MTLDevice का उपयोग कर MTLCopyAllDevices() रिटर्न Radeon और इंटेल GPUs (MTLCreateSystemDefaultDevice() डिफ़ॉल्ट डिवाइस जो Radeon है देता है, जबकि) का प्रतिनिधित्व करने उपकरणों की एक सरणी शुरु कर रहा है। किसी भी मामले में, कोड इंटेल जीपीयू के साथ अपेक्षित काम करता है लेकिन यह राडेन जीपीयू के मामले में नहीं है।

मुझे आपको एक उदाहरण दिखाएं।

शुरू करने के लिए, यहाँ एक सरल गिरी कि एक निर्गम बनावट के लिए एक इनपुट बनावट और प्रतियां अपने रंग लेता है:

kernel void passthrough(texture2d<uint, access::read> inTexture [[texture(0)]], 
          texture2d<uint, access::write> outTexture [[texture(1)]], 
          uint2 gid [[thread_position_in_grid]]) 
    { 
     uint4 out = inTexture.read(gid); 
     outTexture.write(out, gid); 
    }

मैं इस कर्नेल का उपयोग करने के आदेश, मैं कोड के इस टुकड़े का उपयोग

let devices = MTLCopyAllDevices() 
    for device in devices { 
     print(device.name!) // [0] -> "AMD Radeon Pro 455", [1] -> "Intel(R) HD Graphics 530" 
    } 

    let device = devices[0] 
    let library = device.newDefaultLibrary() 
    let commandQueue = device.makeCommandQueue() 

    let passthroughKernelFunction = library!.makeFunction(name: "passthrough") 

    let cps = try! device.makeComputePipelineState(function: passthroughKernelFunction!) 

    let commandBuffer = commandQueue.makeCommandBuffer() 
    let commandEncoder = commandBuffer.makeComputeCommandEncoder() 

    commandEncoder.setComputePipelineState(cps) 

    // Texture setup 
    let width = 16 
    let height = 16 
    let byteCount = height*width*4 
    let bytesPerRow = width*4 
    let region = MTLRegionMake2D(0, 0, width, height) 
    let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Uint, width: width, height: height, mipmapped: false) 

    // inTexture 
    var inData = [UInt8](repeating: 255, count: Int(byteCount)) 
    let inTexture = device.makeTexture(descriptor: textureDescriptor) 
    inTexture.replace(region: region, mipmapLevel: 0, withBytes: &inData, bytesPerRow: bytesPerRow) 

    // outTexture 
    var outData = [UInt8](repeating: 128, count: Int(byteCount)) 
    let outTexture = device.makeTexture(descriptor: textureDescriptor) 
    outTexture.replace(region: region, mipmapLevel: 0, withBytes: &outData, bytesPerRow: bytesPerRow) 

    commandEncoder.setTexture(inTexture, at: 0) 
    commandEncoder.setTexture(outTexture, at: 1) 
    commandEncoder.dispatchThreadgroups(MTLSize(width: 1,height: 1,depth: 1), threadsPerThreadgroup: MTLSize(width: width, height: height, depth: 1)) 

    commandEncoder.endEncoding() 
    commandBuffer.commit() 
    commandBuffer.waitUntilCompleted() 

    // Get the data back from the GPU 
    outTexture.getBytes(&outData, bytesPerRow: bytesPerRow, from: region , mipmapLevel: 0) 

    // Validation 
    // outData should be exactly the same as inData 
    for (i,outElement) in outData.enumerated() { 
     if outElement != inData[i] { 
      print("Dest: \(outElement) != Src: \(inData[i]) at \(i))") 
     } 
    }

let device = devices[0] (राडेन जीपीयू) के साथ इस कोड को चलाने के दौरान, आउटटेक्चर कभी (मेरे supposition) को लिखा नहीं जाता है और परिणामस्वरूप डेटा अपरिवर्तित रहता है। दूसरी तरफ, let device = devices[1] (इंटेल जीपीयू) के साथ इस कोड को चलाने पर, सब कुछ अपेक्षित के रूप में काम करता है और आउटडेटा को डेटा में मानों के साथ अपडेट किया जाता है।

स्रोत

2016-11-24 Steve Begin

मुझे लगता है कि जब भी GPU एक बनावट जैसे MTLStorageModeManaged संसाधन को लिखता है और फिर आप उस संसाधन को सीपीयू से पढ़ना चाहते हैं (उदाहरण के लिए getBytes() का उपयोग करके), आपको इसे ब्लिट एन्कोडर का उपयोग करके सिंक्रनाइज़ करने की आवश्यकता है। commandBuffer.commit() रेखा से ऊपर निम्नलिखित डालने का प्रयास करें:

let blitEncoder = commandBuffer.makeBlitCommandEncoder() 
blitEncoder.synchronize(outTexture) 
blitEncoder.endEncoding()

आप क्योंकि GPU संसाधन के लिए सिस्टम स्मृति उपयोग कर रहा है एक एकीकृत GPU पर यह बिना दूर मिल सकता है और वहाँ सिंक्रनाइज़ करने के लिए कुछ भी नहीं है।

स्रोत

2016-11-25 04:08:46

वाह, वह गायब टुकड़ा था, बहुत बहुत धन्यवाद !!! मैं पिछले कुछ महीनों के लिए समानांतर में स्विफ्ट और धातु सीखने की कोशिश कर रहा हूं, और मैं नहीं कह सकता कि यह आसान रहा है। –

मेटल कर्नेल नए मैकबुक प्रो (देर से 2016) पर सही ढंग से व्यवहार नहीं कर रहे हैं GPUs

उत्तर

संबंधित मुद्दे