Chinese Journal of Intelligent Science and Technology

Parallel philosophy and intelligent technology: dual equations and testing systems for parallel industries and smart societies

Fei-Yue WANG

2021, 3(3): 245-255. doi:10.11959/j.issn.2096-6652.202126

Asbtract ( 462 )

HTML ( 54)

PDF (4026KB) ( 745 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Starting from Karl Popper’s Three Worlds, the knowledge system facing three worlds and the corresponding philosophical problem were proposed.The scope of traditional philosophy from being and becoming to believing was extended, and a new parallel philosophy as the foundation for intelligent science and technology was proposed.The dual equation systems and parallel testing were introduced through ACP, and cyber-physical-social systems (CPSS).This will enable a true DAO approach for intelligent parallel industries and smart societies: trustable, reliable, usable, and effective +efficient (true), distributed + decentralized, autonomous + automated, organized + ordered (DAO), and the corresponding distributed autonomous organizations and operations (DAOs).

Retinal multi-disease screening and recognition method based on deep convolution ensemble network

Heyang WANG, Qiming YANG, Qi ZHU

2021, 3(3): 259-267. doi:10.11959/j.issn.2096-6652.202127

Asbtract ( 262 )

HTML ( 33)

PDF (3931KB) ( 416 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

As for the characteristics of various types of retinal diseases and uncertainty of the location of the lesions, a retinal multi-disease screening and recognition method based on deep convolutional ensemble network was proposed.Firstly, the black borders on both sides of the retinal fundus image were cut off, and the noise in the image was removed to reduce the interference to the retinal image and increase the clarity of the image.After that, data augmentation methods such as cropping and rotating were performed to process retinal fundus image to amplify the dataset.Then, a model based on deep convolutional neural network was built for feature extraction, and the network model was fine-tuned to complete the task of screening and identifying retinal diseases.Finally, the results of multiple models were ensembled.The experimental results show that this method has achieved good results for the screening and recognition of retinal diseases, the accuracy of retinal disease screening is 96.05%, and the accuracy of retinal disease recognition is 72.55%.

3D convolution-based image sequence feature extraction and self-attention for license plate recognition method

Ganxiong ZENG, Xiao KE

2021, 3(3): 268-279. doi:10.11959/j.issn.2096-6652.202128

Asbtract ( 251 )

HTML ( 23)

PDF (10995KB) ( 449 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

In recent years, neural networks based on self-attentive mechanism have been widely used in computer vision tasks.As the intelligent transportation system is widely used, the task difficulty of license plate recognition is increasing and the need for correct recognition is getting more pressing in the face of complex and changing traffic scenes.Therefore, a rectification-free license plate recognition method T-LPR based on self-attention was proposed.Firstly, the images were sliced and sequenced, and 3D convolution was used for feature extraction of the sliced sequences to obtain a sequence of image embedding vectors.Secondly, the sequence of embedding vectors was fed into an encoder based on Transformer Encoder, which learned the relationship between the individual embedding vectors and outputs the final encoding result.Finally, the final encoding result was classified by a classifier.Experimental results on several public datasets show that T-LPR proposed is very effective for recognizing license plates in all kinds of difficult scenarios.

A method based on visual saliency for vehicle-mounted monocular camera ego-motion estimation and vehicle scale estimation

Mingxin AI, Tie LIU, Jing WANG, Jiali DING, Zejian YUAN, Yuanyuan SHANG

2021, 3(3): 280-293. doi:10.11959/j.issn.2096-6652.202129

Asbtract ( 251 )

HTML ( 22)

PDF (2518KB) ( 370 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

A method based on visual saliency for ego-motion estimation and scale estimation of the vehicle in front was proposed.Firstly, for the ego-motion estimation of vehicle-mounted camera, the visual saliency calculation method was used to detect and remove moving objects in the monocular image sequence containing noise.While considering the image texture and smooth region, the weighted saliency map was used to retain useful feature points, to improve the robustness of ego-motion estimation.Secondly, the distance of the vehicle in front was converted into a vehicle scale estimation, by integrating descriptor match and the strength of regularization match of the lie algebra to minimize loss function.The visual attention mechanism was used to get texture image block without shade, and the pixel in the image block weight to mitigate the effects of destroyed by noise pixel, so as to realize the robust and accurate scale estimation.Finally, several challenging datasets were used to analyze and verify the proposed method.The results show that the monocular camera ego-motion estimation method reaches the level of the stereo camera-based method, and the vehicle scale estimation method ensures the prediction accuracy while giving full play to the advantages of strong robustness.

Anchor free multispectral pedestrian detection algorithm based on differential feature attention mechanism

Jifeng SHEN, Yue1 LIU, Hao WEI, Xin ZUO, Wankou YANG

2021, 3(3): 294-303. doi:10.11959/j.issn.2096-6652.202130

Asbtract ( 361 )

HTML ( 42)

PDF (4937KB) ( 312 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Multispectral pedestrian detection system suffers with low feature fusion quality, high quantity of model hyper-parameters and complex anchor matching algorithm.To deal with these problems, an anchor free multispectral pedestrian detection algorithm based on differential feature attention mechanism was proposed.Firstly, differential modality aware fusion was used to obtain the complementary information between different modalities to optimize the channel features.Secondly, the CenterNet detection framework with anchor free mechanism was adopted to greatly reduce the computational complexity of the model and thus improve the detection speed.Finally, differential feature guided attention mechanism was introduced to improve the quality of feature fusion and further enhance the detection accuracy.Experimental results on three open datasets, KAIST, CVC14 and FLIR, show that the proposed algorithm can effectively improve the detection accuracy and speed compared with the current advanced methods, and has a good practical application prospect.

Plant leaf detection technology based on multi-scale CNN feature fusion

Ying LI, Long CHEN, Zhaohong HUANG, Yang SUN, Guorong CAI

2021, 3(3): 304-311. doi:10.11959/j.issn.2096-6652.202131

Asbtract ( 261 )

HTML ( 28)

PDF (2235KB) ( 492 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Plant leaf detection is one of the essential aspects of the scientific plant breeding and precision agriculture process.The traditional practice of plant leaf detection requires professional knowledge of the operators, high labor costs, and long time-consuming cycles.The plant leaf detection technology based on multi-scale CNN feature fusion (MCFF) was proposed.Starting from the needs of deep learning technology assisted plant cultivation, a MCFF was used to detect leaf count for three different types and resolutions of rosette model plants, arabidopsisthaliana, and tobacco.Compared with the other three algorithms, the MCFF has a higher detection accuracy with an average detection rate of mAP 0.662, a highly competitive performance (AP = 0.946) has been achieved for each indicator close to the practical level.

Insulator detection based on SE-YOLOv5s

Qing TIAN, Rong HU, Zuoyong LI, Yuanzheng CAI, Zhaochai YU

2021, 3(3): 312-321. doi:10.11959/j.issn.2096-6652.202132

Asbtract ( 330 )

HTML ( 38)

PDF (4481KB) ( 380 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

In a large environment where the power system needs to be inspected, the traditional method of manual inspection has great inconvenience and potential safety risks, and the object detection method of unmanned aerial vehicle has great application prospects in the direction of insulator detection and recognition.SE-YOLOv5s, a lightweight insulator detection network that performs efficient detection for this task was presented.Firstly, backbone of YOLOv5s by fusing the SE attention module was strengthen.Then, the position distribution of insulator object was investigated and predefined templates of a priori box by K-means clustering on prior coordinate vectors were generated.Finally, the network by a multitask loss function was trained combined with confidence and position regression task.Furthermore, Mosaic data augmentation was utilized to supplement additional training samples.Experimental results demonstrate that the proposed SE-YOLOv5s significantly outperforms baseline methods at multiple criterions including accuracy, recall rate, detection rate and mean average precision.In comparison with the baselines, the proposed network has a flexible trade-off between robustness and memory overhead and it is a potential approach to promote the power system development.

Light weight rotating object detector based on angle sensitive spatial attention mechanism

Kaishi YIN, Meng YANG, Xi GU, Zhicheng WANG

2021, 3(3): 322-333. doi:10.11959/j.issn.2096-6652.202133

Asbtract ( 279 )

HTML ( 30)

PDF (6137KB) ( 286 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

With the rapid development of deep learning, more and more target detection algorithms based on anchor frame are applied to remote sensing images in recent years.However, the cost of improving the accuracy of the algorithm is to sacrifice the detection speed.Therefore, the target detection network framework of anchor free was chosen, and a remote sensing detection algorithm of rotating frame was proposed according to the characteristics of remote sensing scene.A simple and effective representation of rotating frame was proposed according to the spatial position relationship between rotating frame and its external rectangular frame.In addition, an angle sensitive attention mechanism was designed to assist the detection of rotating targets.By introducing angle information, the detection ability of the model for rotating targets was improved.The proposed algorithm was tested on the open remote sensing dataset DOTA.The mean average precision of target detection network of rotating frame is 68.5% and the detection speed is 17.4 frames per second.

Collaborative representation based classifier with maximum correntropy criterion and locality constraint

Qinru YU, Guifu LU

2021, 3(3): 334-341. doi:10.11959/j.issn.2096-6652.202134

Asbtract ( 200 )

HTML ( 15)

PDF (1199KB) ( 180 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

A method which utilizes maximum correntropy criterion and locality information called collaborative representation based classifier with maximum correntropy criterion and locality constraint (CRC/MCCLC) was proposed.On the one hand, CRC/MCCLC was not only more robust to outliers than L₁ norm but also could be computed efficiently using half-quadratic optimization technique because of the use of maximum correntropy criterion.On the other hand, CRC/MCCLC could obtain more discriminative information from the training samples and could lead to an approximately sparse representation because of the use of locality information.Extensive experimental results on some image databases demonstrate that CRC/MCCLC can achieve the state-of-the-art performance on these image databases.

Robust non-negative supervised low-rank discriminant embedding algorithm

Yu YAO, Minghua WAN

2021, 3(3): 342-350. doi:10.11959/j.issn.2096-6652.202135

Asbtract ( 155 )

HTML ( 10)

PDF (13761KB) ( 251 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

Non-negative matrix factorization (NMF) has been widely used.However, NMF pays more attention to the local information of the data, it ignores the global representation of the data.In terms of image classification, the global information of the data is often more robust to noise than the local information.In order to improve the robustness of the algorithm, combined with the data of local and global representation, and considered the characteristics of low-rank representation, a non-negative supervised low-rank discriminant embedded algorithm was proposed.This algorithm assumed the existence of noise in the data, decomposed the data into clean data and noise data, and made sparse constraints on the noise matrix through the L₁norm, so as to enhance the robustness to noise.In addition, the algorithm used low-rank representation to learn a low-rank matrix, then through non-negative decomposition, the robustness of the algorithm was enhanced again.Finally, combined with a study of graph embedding method, the local and global data were retained at the same time.The algorithm is applied to various noisy databases, and the recognition rate of this algorithm is improved by about 5%～15% compared with the comparison algorithm.

End-to-end speech enhancement based on ultra-lightweight channel attention

Yi HONG, Chengli SUN, Yan LENG

2021, 3(3): 351-358. doi:10.11959/j.issn.2096-6652.202136

Asbtract ( 331 )

HTML ( 25)

PDF (2888KB) ( 391 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The full convolutional time-domain audio separation network (Conv-TasNet) is a state-of-the-art end-to-end speech separation model which was proposed recently.The Conv-TasNet used dilated convolution to expand the receptive field and fuse more speech features in space, which greatly improved the speech separation performance of the network, but at the same time ignored the importance of information across different convolution channels.An end-to-end speech enhancement method based on ultra-lightweight channel attention was proposed, which effectively combined Conv-TasNet and channel attention.At the same time, a group of filters was added to the Conv-TasNet codec to improve the speech feature extraction ability of the network.This method can make convolutional neural network combine spatial information and channel information more effectively to improve the speech enhancement effect.Experiment shows that the proposed model can effectively improve the performance of speech enhancement when the model capacity is only increased by about 0.02%.

Sparse representation for image recognition based on semi-genetic algorithm in feature space

Linrui SHI, Yijing HUANG, Jinwu FU, Xinyue GUO, Zizhu FAN

2021, 3(3): 359-369. doi:10.11959/j.issn.2096-6652.202137

Asbtract ( 165 )

HTML ( 15)

PDF (1098KB) ( 356 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

The typical sparse representation for classification (SRC) is usually based on L₁minimization problem.Conceptually, SRC is essentially an L₀norm minimization problem solved in the original input space, which cannot capture well the nonlinear information within the data.In order to address this problem, a nonlinear mapping to map the original input data into a new high dimensional feature space was applied, and a new representation approach based on L₀norm was proposed.The representing dictionary used to represent the test sample contains two parts in the proposed approach.The first part is fixed to the neighbors of the test sample.The training samples of the second part is chosen by the variation of genetic algorithm (GA), i.e., the semi GA (SGA) algorithm, which exploits the representation error to determine the second part of the representing dictionary.In the approach, if the training samples combining the determined neighbors of the test sample yield the least representation error, these training samples are determined as the second part of the representing dictionary by SGA.Experiments on several popular face databases and one handwritten digit data set demonstrate that the proposed approach can achieve better classification performance.

Research on anti-spoofing method of face recognition based on semi-supervised learning

Li LI, Weiliang ZENG, Yonghui HUANG, Weijun SUN

2021, 3(3): 370-380. doi:10.11959/j.issn.2096-6652.202138

Asbtract ( 340 )

HTML ( 45)

PDF (2002KB) ( 513 )

Knowledge map

Figures and Tables | References | Related Articles | Metrics

It is a long-term challenge to identify the real and fake faces in the images.When the synthetic fake faces are very realistic, it is difficult for machines and even naked eyes to distinguish the real and fake ones.The supervised anti-spoofing method often requires a large number of labeled samples for a good performance.An anti-spoofing method of face recognition based on semi-supervised learning was proposed to reduce the dependence on massive labeled samples.The method adopted an image inpainting model to learn the data distribution of face images.During the training process, a few labeled samples periodically provided supervised signals to train the classifier to distinguish real faces from fake ones.The proposed method could be used for face anti-spoofing in different scenario, such as faces captured by cameras or generated by generative adversarial net.Accordingly, it was evaluated on the NUAA and RMFD datasets.Experiment results show that the proposed method can keep the quality of restored images, and achieve desirable classification accuracy.With a few labeled samples, the proposed method outperforms Improved-GAN and common semi-supervised methods, and surpasses supervised learning method based on support vector machine and convolutional neural network.

Current Issue