V社区-机器智能技术交流-第三届‘悉灵杯’-基于MV-EB435i-A相机的室内3D语义感知课题研究进展（一）

第三届‘悉灵杯’-基于MV-EB435i-A相机的室内3D语义感知课题研究进展（一）

209
0
分享

二维码

分享链接
2024-10-15 09:17

RGB-D相机

ubuntu22.04开发室内3D语义感知模型部署研究。

课题背景

如今深度学习正在逐步朝着具有空间感知能力的具身智能任务落地，以满足更多的场景需求。比如机器人操作和导航，要求代理感知3d场景，理解人类指令，并通过自我行动做出决策。在实时场景中，3D 感知模型具备下面特征：（1）实时性：输入数据是一个流RGB-D视频，而不是预先收集的视频，视觉感知应该与数据收集同步执行，需要较高的推理速度；(2) 更细粒度：应该识别场景中几乎出现的任何对象； (3) 更强泛化性能：一个模型可以应用于不同类型的场景，并与不同的传感器参数兼容。

本题是对具身视觉的3d感知模型进行工程化实践与探索，是各种下游任务的基础。

环境配置

System environment:
    sys.platform: linux
    Python: 3.10.15 | packaged by conda-forge | [GCC 13.3.0]
    CUDA available: True
    MUSA available: False
    numpy_random_seed: 1510685637
    GPU 0: NVIDIA GeForce RTX 4090 Laptop GPU
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.7, V11.7.64
    GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
    PyTorch: 1.13.0+cu117
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.7
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.9.7  (built against CUDA 11.8)
    - Built with CuDNN 8.5
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.14.0+cu117
    OpenCV: 4.10.0
    MMEngine: 0.10.3

采集数据获取

采集部分核心代码如下，统一使用linux的python接口进行处理：

while True:
        stFrameData=MV3D_RGBD_FRAME_DATA()
        ret=camera.MV3D_RGBD_FetchFrame(pointer(stFrameData), 5000)
        if ret==0:
            for i in range(0, stFrameData.nImageCount):
                stData = stFrameData.stImageData[i]
                print("MV3D_RGBD_FetchFrame[%d]:enImageType[%d],nWidth[%d],nHeight[%d],nDataLen[%d],nFrameNum[%d],bIsRectified[%d],enStreamType[%d],enCoordinateType[%d]" % (
                    i, stData.enImageType, stData.nWidth, stData.nHeight, stData.nDataLen, stData.nFrameNum,
                    stData.bIsRectified, stData.enStreamType, stData.enCoordinateType))
                            
                if i == 0:
                    # 获取depth
                    p_depth = string_at(stData.pData, stData.nDataLen)
                    depth_img = np.frombuffer(p_depth, dtype=np.uint16)
                    depth_img = depth_img.reshape((stData.nHeight, stData.nWidth))
                    cv2.imwrite(os.path.join(save_dir, f'{stData.nFrameNum}_depth.jpg'), depth_img)
                    # cv2.imshow("depth_img", depth_img)
                    # cv2.waitKey(0)
                    # cv2.destroyAllWindows()

                    # 获取点云
                    Cloud_Img_Data = MV3D_RGBD_IMAGE_DATA()
                    nty = camera.MV3D_RGBD_MapDepthToPointCloud(pointer(stData), pointer(Cloud_Img_Data))
                    pp1 = ctypes.string_at(Cloud_Img_Data.pData,Cloud_Img_Data.nDataLen)
                    img1 = np.frombuffer(pp1,dtype=np.float32)
                    np.save(os.path.join(save_dir, f"{stData.nFrameNum}_point.npy"), img1)
                    print(img1.shape)

                elif i == 1:
                    # 获取图像数据
                    pp = ctypes.string_at(stData.pData, stData.nDataLen)
                    img = np.frombuffer(pp, dtype=np.uint8)
                    img11 = img.reshape((stData.nHeight, stData.nWidth, 2))
                    bgf = cv2.cvtColor(img11, cv2.COLOR_YUV2BGR_YUYV)
                    # cv2.imshow("rgb_img", bgf)
                    cv2.waitKey(0)
                    cv2.destroyAllWindows()
                    print("bgf.shape:", bgf.shape)
                    cv2.imwrite(os.path.join(save_dir, f'{stData.nFrameNum}_rgb.jpg'), bgf)

相机参数获取

关键代码如下：
首先python写一个解析函数方便读取ctypes结构体进行读取：

def extract_calib_info(calib_info):
    # 提取内参矩阵
    intrinsic_matrix = [calib_info.stIntrinsic.fData[i] for i in range(9)]

    # 提取畸变系数
    distortion_coefficients = [calib_info.stDistortion.fData[i] for i in range(12)]

    return {
        'intrinsic_matrix': intrinsic_matrix,
        'distortion_coefficients': distortion_coefficients,
    }

接下来调用解析函数保存相机参数，后面模型提取特征提取super_point关键点的时候要用到：

# 获取 depth 相机参数
depth_calib = MV3D_RGBD_CALIB_INFO()
camera.MV3D_RGBD_GetCalibInfo(1, pointer(depth_calib))

depth_calib_dict = extract_calib_info(depth_calib)
print("depth_calib_dict:")
save_to_json(depth_calib_dict, "depth_calib.json")
pprint.pprint(depth_calib_dict)

# 获取 rgb 相机参数
rgb_calib = MV3D_RGBD_CALIB_INFO()
camera.MV3D_RGBD_GetCalibInfo(2, pointer(rgb_calib))

rgb_calib_dict = extract_calib_info(rgb_calib)
save_to_json(rgb_calib_dict, "rgb_calib.json")
print("rgb_calib_dict:")
pprint.pprint(rgb_calib_dict)

获取数据结果示例

深度图:

对应rgb图：

深度相机和rgb相机参数文件：

{
    "intrinsic_matrix": [
        669.1973266601562,
        0.0,
        630.7010498046875,
        0.0,
        669.1973266601562,
        342.984375,
        0.0,
        0.0,
        1.0
    ],
    "distortion_coefficients": [
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0
    ]
}

{
    "intrinsic_matrix": [
        641.2483520507812,
        0.0,
        640.4540405273438,
        0.0,
        641.2787475585938,
        358.39166259765625,
        0.0,
        0.0,
        1.0
    ],
    "distortion_coefficients": [
        -0.2017442286014557,
        0.03686028718948364,
        1.0394737728347536e-05,
        -8.135665120789781e-05,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0,
        0.0
    ]
}