Deep convolutional neural models for picture quality prediction
Jongyoo Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang and A. C. Bovik
IEEE Signal Processing Magazine
[abs]
|
Deep Blind Image Quality Assessment by Employing FR-IQA
Jongyoo Kim and S. Lee
IEEE Conference on Image Processing (ICIP), 2017
[abs]
|
Blind Deep S3D Image Quality Evaluation via Local to Global Feature Aggregation
H. Oh, S. Ahn, Jongyoo Kim, and S. Lee
IEEE Transactions on Image Processing
[abs]
|
Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework
Jongyoo Kim and S. Lee
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
[abs]
Since human observers are the ultimate receivers of digital images, image quality metrics should be designed from a human-oriented perspective. Conventionally, a number of full-reference image quality assessment (FR-IQA) methods adopted various computational models of the human visual system (HVS) from psychological vision science research. In this paper, we propose a novel convolutional neural networks (CNN) based FR-IQA model, named Deep Image Quality Assessment (DeepQA), where the behavior of the HVS is learned from the underlying data distribution of IQA databases. Different from previous studies, our model seeks the optimal visual weight based on understanding of database information itself without any prior knowledge of the HVS. Through the experiments, we show that the predicted visual sensitivity maps agree with the human subjective opinions. In addition, DeepQA achieves the state-of-the-art prediction accuracy among FR-IQA models.
|
Enhancement of Visual Comfort and Sense of Presence on Stereoscopic 3D Images
H. Oh, Jongyoo Kim, J. Kim, T. Kim, S. Lee, and A. C. Bovik
IEEE Transactions on Image Processing
[abs]
|
Quality Assessment of Perceptual Crosstalk on Two-View Auto-Stereoscopic Displays
Jongyoo Kim, T. Kim, S. Lee, and A. C. Bovik
IEEE Transactions on Image Processing (Accepted)
[abs]
Crosstalk is one of the most severe factors affecting the perceived quality of stereoscopic 3D (S3D) images.It arises from a leakage of light intensity between multiple views, as in auto-stereoscopic displays. Well-known determinants of crosstalk include the co-location contrast and disparity of the left and right images, which have been dealt with in prior studies. However, when a natural stereo image that contains complex naturalistic spatial characteristics is viewed on an auto-stereoscopic display, other factors may also play an important role in the perception of crosstalk. Here, we describe a new way of predicting the perceived severity of crosstalk, which we call the Binocular Perceptual Crosstalk Predictor (BPCP). BPCP uses measurements of three complementary 3D image properties (texture, structural duplication and binocular summation) in combination with two well-known factors (co-location contrast and disparity) to make predictions of crosstalk on two-view auto-stereoscopic displays. The new BPCP model includes two masking algorithms and a binocular pooling method. We explore a new masking phenomenon that we call duplicated structure masking, which arises from structural correlations between the original and distorted objects. We also utilize an advanced binocular summation model to develop a binocular pooling algorithm. Our experimental results indicate that BPCP achieves high correlations against subjective test results, improving upon those delivered by previous crosstalk prediction models.
|
Fully Deep Blind Image Quality Predictor
Jongyoo Kim and S. Lee
IEEE Journal of Selected Topics in Signal Processing 2017
[abs]
In general, owing to the benefits obtained from original information, full-reference image quality assessment (FR-IQA) achieves relatively higher prediction accuracy than no-reference image quality assessment (NR-IQA). By fully utilizing reference images, conventional FR-IQA methods have been investigated to produce objective scores that are close to subjective scores. In contrast, NR-IQA does not consider reference images; thus, its performance is inferior to that of FR-IQA. To alleviate this accuracy discrepancy between FR-IQA and NR-IQA methods, we propose a blind image evaluator based on a convolutional neural network (BIECON). To imitate FR-IQA behavior, we adopt the strong representation power of a deep convolutional neural network to generate a local quality map, similar to FR-IQA. To obtain the best results from the deep neural network, replacing hand-crafted features with automatically learned features is necessary. To apply the deep model to the NR-IQA framework, three critical problems must be resolved: 1) lack of training data; 2) absence of local ground truth targets; and 3) different purposes of feature learning. BIECON follows the FR-IQA behavior using the local quality maps as intermediate targets for conventional neural networks, which leads to NR-IQA prediction accuracy that is comparable with that of state-of-the-art FR-IQA methods.
|
An Identification Framework for Print-Scan Books in a Large Database
S. Lee, Jongyoo Kim, and S. Lee
Information Sciences 2017
[abs]
In this paper, we propose an identification framework to determine copyright infringement in the form of illegally distributed print-scan books in a large database. The framework contains following main stages: image pre-processing, feature vector extraction, clustering, and indexing, and hierarchical search. The image pre-processing stage provides methods for alleviating the distortions induced by a scanner or digital camera. From the pre-processed image, we propose to generate feature vectors that are robust against distortion. To enhance the clustering performance in a large database, we use a clustering method based on the parallel-distributed computing of Hadoop MapReduce. In addition, to store the clustered feature vectors efficiently and minimize the searching time, we investigate an inverted index for feature vectors. Finally, we implement a two-step hierarchical search to achieve fast and accurate on-line identification. In a simulation, the proposed identification framework shows accurate and robust in the presence of print-scan distortions. The processing time analysis in a parallel computing environment gives extensibility of the proposed framework to massive data. In the matching performance analysis, we empirically and theoretically find that in terms of query time, the optimal number of clusters scales with O ( N ) for N print-scan books.
|
Blind Sharpness Prediction for Ultra-High-Definition Video Based on Human Visual Resolution
H. Kim, Jongyoo Kim, T. Oh, and S. Lee
IEEE Transactions on Circuits and Systems for Video Technology 2016
[abs]
We explore a no-reference sharpness assessment model for predicting the perceptual sharpness of ultra-highdefinition (UHD) videos through analysis of visual resolution variation in terms of viewing geometry and scene characteristics. The quality and sharpness of UHD videos are influenced by viewer perception of the spatial resolution afforded by the UHD display, which depends on viewing geometry parameters including display resolution, display size, and viewing distance. In addition, viewers may perceive different degrees of quality and sharpness according to the statistical behavior of the visual signals, such as the motion, texture, and edge, which vary over both spatial and temporal domains. The model also accounts for the resolution variation associated with fixation and foveal regions, which is another important factor affecting the sharpness prediction of UHD video over the spatial domain, and which is caused by the nonuniform distribution of the photoreceptors. We calculate the transition of the visually salient statistical characteristics resulting from changing the display’s screen size and resolution. Moreover, we calculated the temporal variation in sharpness over consecutive frames in order to evaluate the temporal sharpness perception of UHD video. We verify that the proposed model outperforms other sharpness models in both spatial and temporal sharpness assessments.
|
Perceptual Crosstalk Prediction on Autostereoscopic 3D Display
T. Kim, Jongyoo Kim, S. Kim, S. Cho, and S. Lee
IEEE Transactions on Circuits and Systems for Video Technology 2016
[abs]
Perceptual crosstalk prediction for autostereoscopic 3D displays is of fundamental importance in determining the level of quality perceived by humans in terms of the display performance and the 3D viewing experience. However, no robust framework exists to quantify perceptual crosstalk while taking into account the hardware structure of a display as well as its content characteristics via content analysis. In this paper, we present a 3D Perceptual Crosstalk Predictor (3D-PCP) that can be used to predict crosstalk in a unique way when viewing autostereoscopic 3D displays. 3D-PCP captures hardware features using an Optical Fourier transform - Light Measurement Device and content features through content analysis based on information theory. By deriving the disparity, luminance, color, and texture maps, this approach defines the visual entropy, mutual information, and relative entropy in order to investigate the influences of the 3D scene characteristics on perceptual crosstalk. The experimental results demonstrate that the 3D-PCP output is highly correlated with subjective scores.
|
No-Reference Perceptual Sharpness Assessment for Ultra-High-Definition Images
W. Kim, H. Kim, H. Oh, Jongyoo Kim, and S. Lee
IEEE International Conference on Image Processing (ICIP) 2016
[abs]
Since ultra-high-definition (UHD) display has larger resolution and various display size, it is necessary to measure image sharpness considering variation in visual resolution caused by diverse viewing geometry. In this paper, we propose a no-reference perceptual sharpness assessment model of UHD images. The proposed model analyzes viewing geometry in terms of display resolution and viewing environment. Then, we measure the local adaptive sharpness score in accordance with the textural motion blur, texture, and edge. In addition, we propose a spatial pooling method associated with foveal regions, which is caused by nonuniform distribution of the photoreceptors on a human retina. Through the rigorous experiments, we demonstrate that the proposed model can measure the sharpness of UHD images more accurately than other image sharpness assessment methods.
|
Human Gait Prediction Method Using Microsoft Kinect
J. Kim, D. Kim, I. Lee, Jongyoo Kim, H. Oh, and S. Lee
International Workshop on Advanced Image Technology (IWAIT) 2016
[abs]
Real-time monitoring of elderly movement can provide valuable information regarding an individual’s degree of functional rehabilitation. Many laboratory-based studies have described various gait detection systems with different wearable inertial sensors, but only limited number of papers addressed the issues by using some non-wearable sensors. A practical method of gait information detection and gait analysis is proposed in the paper using an inexpensive Microsoft Kinect fixed on the midpoint of lower extremity rehabilitation robot. The horizontal distances between Kinect plane and every mark pasted on lower extremity are acquired. Taken the characteristics of gait distance series into consideration, the Autoregressive Moving Average (ARMA) model is established to reflect the changing rule of gait status. Combined with the Kalman filter, gait information reflecting rehabilitation status at next moment is predicted accurately. The method regarding the gait detection and gait analysis is verified by amounts of gait experiments finally.
|
Implementation of an Omnidirectional Human Motion Capture System Using Multiple Kinect Sensors
J. Kim, I. Lee, Jongyoo Kim, and S. Lee
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2015
[abs]
Due to ease of implementation for various user interactive applications, much research on motion recognition has been completed using Kinect. However, one drawback of Kinect is that the skeletal information obtained is provided under the assumption that the user faces Kinect. Thus, the skeletal information is likely incorrect when the user turns his back to Kinect, which may lead to difficulty in motion recognition from the application. In this paper, we implement a highly accurate human motion capture system by installing six Kinect sensors over 360 degrees. The proposed method enables skeleton to be obtained more accurately by assigning higher weights to skeletons captured by Kinect in which the user faces forward. Toward this goal, the front vector of the user is temporally traced to determine whether the user is facing Kinect. Then, more reliable joint information is utilized to construct a skeletal representation of each user.
|
Video Sharpness Prediction Based on Motion Blur Analysis
Jongyoo Kim, J. Kim, W. Kim, J. Lee, and S. Lee
IEEE International Conference on Multimedia and Expo (ICME) 2015
[abs]
For high bit rate video, it is important to acquire the video contents with high resolution, the quality of which may be degraded due to the motion blur from the movement of an object(s) or the camera. However, conventional sharpness assessments are designed to find focal blur caused either by defocusing or by compression distortion targeted for low bit rates. To overcome this limitation, we present a no-reference framework of a visual sharpness assessment (VSA) for high-resolution video based on the motion and scene classification. In the proposed framework, the accuracy of the sharpness estimation can be improved via pooling weighted by the visual perception from the object and camera movements and by the strong influence from the region with the highest sharpness. Based on the motion blur characteristics, the variance and the contrast over the spectral domain are used to quantify the perceived sharpness. Moreover, for the VSA, we extract the highly influential sharper regions and emphasize them by utilizing the scene adaptive pooling.
|
3D Visual Discomfort Predictor Based on Neural Activity Statistics
H. Oh, Jongyoo Kim, S. Lee, and A. C. Bovik
IEEE International Conference on Image Processing (ICIP) 2015
[abs]
Visual discomfort assessment (VDA) on stereoscopic images is of fundamental importance for making decisions regarding visual fatigue caused by unnatural binocular alignment. Nevertheless, no solid framework exists to quantify this discomfort using models of the responses of visual neurons. Binocular vision is realized by means of neural mechanisms that subserve the sensorimotor control of eye movements. We propose a neuronal model-based framework called Neural 3D Visual Discomfort Predictor (N3D-VDP) that automatically predicts the level of visual discomfort experienced when viewing stereoscopic 3D (S3D) images. The N3D-VDP model extracts features derived by estimating the neural activity associated with the processing of binocular disparities. In this regard we deploy a model of disparity processing in the extra-striate middle temporal (MT) region of occipital lobe. We compare the performance of N3D-VDP with other recent VDA algorithms using correlations against reported subjective visual discomfort, and show that N3D-VDP is statistically superior to the other methods.
|
Implementation of Human Action Recognition System Using Multiple Kinect Sensors
B. Kwon, D. Kim, J. Kim, I. Lee, Jongyoo Kim, H. Oh, H. Kim, and S. Lee
Advances in Multimedia Information Processing - PCM 2015
[abs]
Human action recognition is an important research topic that has many potential applications such as video surveillance, human-computer interaction and virtual reality combat training. However, many researches of human action recognition have been performed in single camera system, and has low performance due to vulnerability to partial occlusion. In this paper, we propose a human action recognition system using multiple Kinect sensors to overcome the limitation of conventional single camera based human action recognition system. To test feasibility of the proposed system, we use the snapshot and temporal features which are extracted from three-dimensional (3D) skeleton data sequences, and apply the support vector machine (SVM) for classification of human action. The experiment results demonstrate the feasibility of the proposed system.
|
Quality Assessment of Perceptual Crosstalk in Autostereoscopic Display
Jongyoo Kim, T. Kim, and S. Lee
IEEE International Conference on Image Processing (ICIP) 2014
[abs]
Crosstalk is one of the most annoying problems in an autostereoscopic display causing perceptual quality degradation and visual discomfort. To predict the perceived crosstalk when viewing an autostereoscopic display, it is necessary to consider the characteristics of human perception, displaying mechanism, viewing environment and so on. Therefor, we propose a novel metric for predicting the perceptual crosstalk that is based on human visual system (HVS); non-linear sensitivity of luminance and masking effects. The proposed model adopts the duplicated structure masking, yielding predictive power that is statistically superior to prior models that rely on 2D quality metric.
|
Ego Motion Induced Visual Discomfort of Stereoscopic Video
Jongyoo Kim, K. Oh, and S. Lee
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2013
[abs]
When each video sequence is captured, an inappropriate camera motion should be one of crucial factors leading to visual discomfort and distortion. The well known symptom, visually induced motion sickness (VIMS) is caused by the illusion of self motion by perceiving the video with ego motion. In particular, for the stereoscopic 3D video, it can be easily observed that the viewers have dominantly feel much more severe symptoms of visual discomfort. In this paper, we analyze the ego motion of the stereoscopic video and predict the effects. We attempt a novel approach by exploiting the computer vision algorithm. We propose a novel method which can estimate the perceptual 3D ego motion from the stereoscopic video. Then we analyze the ego motion components to predict the visual discomfort of stereoscopic video.
|
Construction of Stereoscopic 3D Video Database
H. Oh, Jongyoo Kim, and S. Lee
Global 3D TECH Forum 2013
|
Effects on 3D Experience by Space Distortion in Stereoscopic Video
Jongyoo Kim and S. Lee
Global 3D TECH Forum 2012
|
Visual Stimuli Using 3D Graphic Software for 3D Quality Assessment
Jongyoo Kim and S. Lee
International Conference on 3D Systems and Applications (3DSA) 2012
|