OpenSubdiv.OPENSUBDIV_VERSION.Osd.CudaEvaluator是一种CUDA计算Evaluator,用于在GPU上计算细分曲面的顶点位置及其重心。
Osd.CudaEvaluator通过CUDA Kernels实现细分算法,Kernels使用CUDA流进行异步执行,以最大利用GPU资源,提高细分性能。
Osd.CudaEvaluator实现了Osd.MultiVertexBufferInterface接口,可以从多个缓冲区中读取顶点数据,并将细分后的顶点数据写入相应的缓冲区中。
OsdCudaEvaluator();
创建一个空的CUDA计算Evaluator。
bool Compile(OsdVertexBufferDescriptor const& vertexDesc,
OsdVertexBufferDescriptor const& varyingDesc,
OsdVertexBufferDescriptor const& edgeDesc,
OsdVertexBufferDescriptor const& vertexStencilDesc,
OsdVertexBufferDescriptor const& varyingStencilDesc);
编译CUDA Kernels并创建CUDA流,准备执行细分算法。
void ApplyBilinearFaceVerticesKernel(int numVertexElements, CUdeviceptr vertexData);
void ApplyBilinearEdgeVerticesKernel(int numVertexElements, CUdeviceptr vertexData);
void ApplyBilinearVertexVerticesKernel(int numVertexElements, CUdeviceptr vertexData);
void ApplyCatmarkFaceVerticesKernel(int numVertexElements, CUdeviceptr vertexData, CUdeviceptr varyingData);
void ApplyCatmarkEdgeVerticesKernel(int numVertexElements, CUdeviceptr vertexData);
void ApplyCatmarkVertexVerticesKernel(int numVertexElements, CUdeviceptr vertexData, CUdeviceptr varyingData);
void ApplyLoopEdgeVerticesKernel(int numVertexElements, CUdeviceptr vertexData);
void ApplyLoopVertexVerticesKernel(int numVertexElements, CUdeviceptr vertexData, CUdeviceptr varyingData);
执行细分算法,生成新的顶点数据。支持的细分算法有:Bilinear、Catmull-Clark、Loop。
//创建CUDA计算Evaluator
OsdCudaEvaluator evaluator;
//编译CUDA Kernels并创建CUDA流
evaluator.Compile(vertexDesc, varyingDesc, edgeDesc, vertexStencilDesc, varyingStencilDesc);
//细分算法,生成新的顶点数据
if (scheme == Scheme::Loop) {
evaluator.ApplyLoopVertexVerticesKernel(numVertexElements, vertexData, varyingData);
} else if (scheme == Scheme::CatmullClark) {
evaluator.ApplyCatmarkVertexVerticesKernel(numVertexElements, vertexData, varyingData);
} else if (scheme == Scheme::Bilinear) {
evaluator.ApplyBilinearVertexVerticesKernel(numVertexElements, vertexData);
}
//将细分后的顶点数据写入相应的缓冲区中
cudaMemcpyAsync(vertexOutputData, vertexData, sizeof(float) * numVertexElements * 3, cudaMemcpyDeviceToHost, cudaStreamPerThread);