MULTI-MEDIA LAB

  • Yao Lu, Jiangtao Wen,
    US Patent
    An audio system comprising a plurality of speaker devices allows seamless transitioning of playing audio between speakers in different locations. A mobile device may be used to determine a location for which the associated speakers should be used for playback. The speaker devices at the different locations may be synchronized to a common clock in order to allow the audio playback to be transitioned seamlessly from one location to another.
  • Yuxing Han, Jiangtao Wen, Minhao Tang,
    US Patent
    The present invention discloses an optimized coding method for an omnidirectional video, computer readable storage medium and computer device to solve the technical problem that the video quality cannot be guaranteed under a low code rate in the prior art. The method includes: obtaining attribute information of each coding unit of an omnidirectional video file, wherein each coding unit is a storage and coding unit of the omnidirectional video file; determining a coding mode corresponding to each coding unit according to the attribute information of each coding unit; and coding each coding unit according to the coding mode corresponding to each coding unit.
  • Yu Zhang, Yuxing Han, Jiangtao Wen,
    Frontiers of Computer Science
    The number of IoT (Internet of things) connected devices increases rapidly. These devices have different operation systems and therefore cannot communicate with each other. As a result, the data they collected is limited within their own platform. Besides, IoT devices have very constrained resources like weak MCU (micro control unit) and limited storage. Therefore, they need direct communication method to cooperate with each other, or with the help of nearby devices with rich resources. In this paper, we propose a secure method to exchange resources (SMER) between heterogeneous IoT devices. In order to exchange resources among devices, SMER adopts a compensable mechanism for resource exchange and a series of security mechanisms to ensure the security of resource exchanges. Besides, SMER uses a smart contract based scheme to supervise resource exchange, which guarantees the …
  • Minhao Tang, Jiangtao Wen, Yuxing Han,
    arXiv preprint arXiv:1911.00639
    The High Efficiency Video Coding (HEVC/H.265) standard doubles the compression efficiency of the widely used H.264/AVC standard. For practical applications, rate control (RC) algorithms for HEVC need to be developed. Based on the R-Q, R- or R- models, rate control algorithms aim at encoding a video clip/segment to a target bit rate accurately with high video quality after compression. Among the various models used by HEVC rate control algorithms, the R- model performs the best in both coding efficiency and rate control accuracy. However, compared with encoding with a fixed quantization parameter (QP), even the best rate control algorithm [1] still under-performs when comparing the video quality achieved at identical average bit rates. In this paper, we propose a novel generalized rate-distortion- (R-D-) model for the relationship between rate (R), distortion (D) and the Lagrangian multiplier () in rate-distortion (RD) optimized encoding. In addition to the well designed hierarchical in
  • Hsien-Yu Meng, Jiangtao Wen,
    Proceedings of SAI Intelligent Systems Conference
    We present a novel end-to-end framework for facial performance capture given a monocular video of an actor’s face. Our framework are comprised of 2 parts. First, we optimize a triplet loss to learn the embedding space which ensures the semantically closer facial expressions are closer in the embedding space and the model can be transferred to distinguish the expressions that are not presented in the training dataset. Second, the embeddings are fed into an LSTM network to learn the deformation between frames. In the experiments, we demonstrated that compared to other methods, our method can distinguish the delicate motion around lips and significantly reduce artifacts between the tracked meshes.
  • Zhihong Shao, Minlie Huang, Jiangtao Wen, Wenfei Xu, Xiaoyan Zhu,
    arXiv preprint arXiv:1908.06605
    Existing neural methods for data-to-text generation are still struggling to produce long and diverse texts: they are insufficient to model input data dynamically during generation, to capture inter-sentence coherence, or to generate diversified expressions. To address these issues, we propose a Planning-based Hierarchical Variational Model (PHVM). Our model first plans a sequence of groups (each group is a subset of input items to be covered by a sentence) and then realizes each sentence conditioned on the planning result and the previously generated context, thereby decomposing long text generation into dependent sentence generation sub-tasks. To capture expression diversity, we devise a hierarchical latent structure where a global planning latent variable models the diversity of reasonable planning and a sequence of local latent variables controls sentence realization. Experiments show that our model outperforms state-of-the-art baselines in long and diverse text generation.
  • Xinyao Chen, Bichuan Guo, Minhao Tang, Yuxing Han, Jiangtao Wen,
    2019 IEEE International Conference on Multimedia and Expo (ICME)
    AV1, a next-generation open-source and royalty-free video coding standard, achieves high compression performance at high computational cost. To meet the requirements of HD and UHD video applications, extensive optimizations in both the algorithm and implementation of AV1 are required. In this paper, we analyze the similarities between the block structure decisions after rate-distortion (RD) optimized AV1 and HEVC encodings of the same input. Taking advantage of such similarities, we propose a conditional Bayesian inference model to perform early termination in block partition determination of AV1 based on HEVC encoding outputs. An estimation algorithm is designed to iteratively calculate the prior probability for Bayesian inference. Experiment results show that our proposed algorithm could realize an average time saving of 35.7% and negligible BD-rate loss (0.61%), with the pre-encoding time taken …
  • Jiawen Gu, Bichuan Guo, Jiangtao Wen,
    2019 IEEE International Conference on Multimedia and Expo (ICME)
    Efficient storage and delivery of the light field (LF) information rely on high performance compression. In this paper, we propose a high efficiency light field compression algorithm that utilizes a hierarchical coding structure with synthetic virtual references. Specifically, a LF image are interpreted as a multi-view sequence that is efficiently compressed using the multi-view extension of high efficiency video coding (MV-HEVC). Using deep neural networks, we synthesize virtual references from reconstructed neighbor frames, they serve as extra reference candidates in our novel hierarchical coding structure. Compared with previous work, the proposed algorithm further exploits the intrinsic similarities in LF images. Experimental results show that the proposed algorithms demonstrate a superior performance that achieves up to 55.2% BD-rate reduction and 2.55dB BD-PSNR improvement compared with the HEVC …

COORPRATERS