At present, artificial intelligence, machine learning, deep learning, computer vision, etc. have become the vane of the new era. This article mainly introduces the following points: First, if you want to get started with computer vision, what basic knowledge do you need to know? The second point, since you want to learn in this area, what reference books you need to know, and what are some public courses that you can learn? The third point, which may be of interest to everyone, is that as a branch of artificial intelligence, computer vision is inevitably combined with deep learning, and deep learning can also be said to be integrated into computer vision and image processing, including We are talking about natural language processing, so this article will also briefly introduce the combination of computer vision and deep learning. Fourth, in the computer field, we will inevitably do open source work, so this article will introduce you to some open source software. The fifth point is that to learn or study computer vision, you must read some literature. Then how we start to read literature and slowly find our own direction in this field will be briefly introduced in this article. 1. Basic knowledge The next thing I want to introduce is the first point of what computer vision means, and the second is some basic knowledge of images and videos. Including the hardware of the camera, as well as the calculations of the CPU and GPU. In computer vision, we will inevitably involve considering whether to use the CPU or GPU for computing. Then there is the intersection of it with other disciplines, because computer vision can intersect with many disciplines, and when doing interdisciplinary, the meaning and use value it can play will be greater. In addition, for those who did not do artificial intelligence before, they may be doing software development and want to transform to computer vision. How to transform? What programming languages ​​and mathematical foundations need to be learned? These will be introduced to you in the first section. 1.0 What is computer vision Computer vision is a science that studies how to make machines "see". Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure machine vision for targets, and further image processing, so that computer processing becomes more suitable for human eyes to observe or send images to be inspected together. As a scientific discipline, computer vision studies related theories and technologies, and the view builds an artificial intelligence system that can obtain "information" from images or multi-dimensional data. At present, the very popular VR, AR, and 3D processing are all part of computer vision. Computer vision applications After understanding what computer vision is, I will give you a list of some of the current applications in the field of computer vision, which can be said to be ubiquitous, and all the current hottest entrepreneurial directions are covered in it. These include unmanned driving, unmanned security, and face recognition that we often mention. Face recognition is relatively the most mature application field, and then there are text recognition, vehicle license plate recognition, as well as image search, VR/AR, and 3D reconstruction, as well as the current promising fields. -Medical image analysis. Medical image analysis was proposed very early, and it has been studied for a long time, but now it has been re-developed. More researchers include both image researchers and researchers in the medical field. , Are paying more and more attention to the analysis of computer vision, artificial intelligence and medical images. Moreover, at the moment, medical image analysis has also nurtured many start-up companies, and the future prospects in this direction are still worth looking forward to. Then, in addition to drones, unmanned driving, etc., computer vision technology is applied. 1.1 Images and videos, the concepts you need to know image • Image depth: the number of bits used to store each pixel (bits) • Picture format and compression: common picture formats JPEG, PNG, BMP, etc. are essentially a compression coding method for pictures. Example: JPEG compression video • I frame: represents the key frame, which can be understood as the complete preservation of this picture; only the data of this frame can be completed when decoding (because it contains the complete picture) • P frame: It represents the difference between this frame and the previous key frame (or P frame). When decoding, it is necessary to superimpose the difference defined by this frame with the previously buffered picture to generate the final picture. (That is, the difference frame, P frame does not have complete picture data, only data that is different from the previous frame picture) • B frame represents the two-way difference frame, the difference between the recorded frame and the previous and next frames (the specifics are more complicated, there are 4 cases), in other words, to decode the B frame, not only the previous buffered picture must be obtained, but the picture after decoding is also required. To obtain the final picture by superimposing the front and back pictures and the data of this frame. The B frame has a high compression rate, but the decoding is more troublesome. • Code rate: The larger the code rate, the larger the volume; the smaller the code rate, the smaller the volume. • Frame rate affects the smoothness of the picture and is directly proportional to the smoothness of the picture: the larger the frame rate, the smoother the picture; the lower the frame rate, the more dynamic the picture. If the bit rate is variable, the frame rate will also affect the volume. The higher the frame rate, the more pictures will pass per second, the higher the bit rate required, and the larger the volume. • Resolution affects the image size and is directly proportional to the image size; the higher the resolution, the larger the image; the lower the resolution, the smaller the image. • Under the condition of certain bit rate, the relationship between resolution and definition is inversely proportional: the higher the resolution, the less clear the image, the lower the resolution, the clearer the image. Under the condition of a certain resolution, the bit rate and clarity The degree is proportional to the relationship: the higher the bit rate, the clearer the image; the lower the bit rate, the less clear the image • Bandwidth and frame rate For example, when transmitting images on ADSL lines, the upstream bandwidth is only 512Kbps, but 4 channels of CIF resolution images must be transmitted. According to the convention, the recommended bit rate of CIF resolution is 512Kbps, then according to this calculation, it can only be transmitted one way, reducing the bit rate will inevitably affect the image quality. So in order to ensure the image quality, the frame rate must be reduced. In this way, even if the bit rate is reduced, the image quality will not be affected, but it will have an impact on the continuity of the image. 1.2 Camera Classification of cameras: The current camera hardware can be divided into surveillance cameras, cameras for professional industry applications, smart cameras and industrial cameras. Among the surveillance cameras, there are two types that are currently used more frequently, one is called a network camera, and the other is called an analog camera. They mainly have different imaging principles. In addition, different industries will have specific cameras at the time, such as ultra-wide dynamic cameras, infrared cameras, and thermal imaging cameras, all of which may be used in special specific fields, and the pictures and images he obtains It's completely different. If we do image processing and computer vision analysis, what kind of camera is more beneficial to you, we must learn to take advantage of the hardware. If you are doing research, you can generally control what kind of camera we use, but if it is in actual application scenarios, the possibility of this control will be a little bit smaller, but here you have to know that there are some problems that you may change. This kind of hardware can be solved very well. This is an idea. There are still some problems that you may not be able to solve after using the algorithm for a long time. Even your efficiency is very low and the cost is very high. However, if you change the hardware a little bit, you will find that the original problems are gone and they are all well solved. Solved, this is a new situation of the hardware for you. Including now there are smart cameras and industrial cameras. Industrial cameras are generally more expensive because they are dedicated to various industrial fields, or they are used for precision instruments, high-precision and high-definition cameras. 1.3 CPU and GPU Next, I will tell you about CPU and GPU. If you want to do computer vision and image processing, you must not skip GPU computing. GPU computing may also be a knowledge point that you need to learn or self-study next. Because it can be seen that most of the current papers on computer vision are implemented with GPUs. However, in the application field, because the price of GPUs is relatively expensive, the application scenarios of CPUs still account for the majority. What is the main difference between CPU and GPU? The difference between them can be mainly compared in two aspects, the first is called performance, and the second is called throughput. So most of the time, GPU will be associated with another term, called parallel computing, which means that it can do a large number of threads at the same time. Why is the image particularly suitable for GPU computing? This is because the GPU was originally designed as a graphics processing unit, which means that I can divide each pixel into a thread to perform calculations, and each pixel only performs some simple calculations. This is the original graphics processor. The principle of appearance. When it is to do graphics rendering, it needs to calculate the transformation of each pixel. Therefore, the calculation amount of each pixel transformation is very small. It may be the calculation of a formula, and the calculation amount is very small. It can be placed in a simple calculation unit to perform calculations. Then this is the difference between CPU and GPU. Based on this difference, we will design when to use the CPU and when to use the GPU. If the algorithm you are currently designing is not very parallel, it is a complex calculation from top to bottom, and there is not much compatibility. Then even if you use a GPU, it will not help you very much. Good to improve computing performance. So, don't say that everyone else is using GPU, then you use GPU. What we need to understand is why GPU is used, and under what circumstances, GPU can be used to achieve the best effect. 1.4 The relationship between computer vision and other disciplines Computer vision is currently very much related to other disciplines, including robotics, as well as the processing of medical treatment, physics, imaging, and satellite images just mentioned. These are often used in computer vision. Here, the most frequently asked questions are nothing more than There are three concepts, one is called computer vision, one is called machine vision, and the other is called image processing. What is the difference between these three things? The difference between these three things is quite different from person to person, and every researcher has a different understanding of it. First of all, Image Processing is more of some processing of graphics and images, and some processing of image pixel level, including 3D processing, will be more understood as an image processing; and machine vision, more of it is also combined When it comes to processing at the hardware level, it is the ability of graphics computing combined with hardware and software, and the ability of graphics to be intelligent, we generally understand it as the so-called machine vision. The computer vision we are talking about today is more inclined to computer processing at the software level, and it is not as simple as image recognition. It also includes the understanding of images, and even some transformation processing of images. At present, the generation of some images we are involved in can also be classified into this computer vision field. Therefore, computer vision itself is also a very basic subject, which can intersect with various disciplines. At the same time, it will also be divided into more detailed internals, including machine vision and image processing. 1.5 Mathematical foundation of programming language AND The content of this part can be found in "How to learn computer vision for non-computer majors" 2. Reference books and open classes The first reference book is called "Computer Vision: Models, Learning and Inference" written by Simon JD prince, this is more suitable for the entry level, because this book is equipped with a lot of code, Matlab code, C code There are a lot of learning codes, as well as reference materials and documents, all of which are very detailed, so it is very suitable for entry-level students to see. The second "Computer Vision: Algorithms and Applications" written by Richard Szeliski, this is a very classic, very authoritative reference material, this book is not for reading, it is for checking, similar to a reference book , It is the most extensive reference book, so it can generally be read and consulted as a reference book. This third "OpenCV3 programming entry" Author: Mao Xingyun, cold Xuefei, if you want to get started quickly to implement a number of projects, you can look at this book, it can teach you some examples of practical realization, and learn OpenCV most classic, The most extensive open source library of computer vision. Public class: Stanford CS231N 3. Deep learning knowledge to know There is not much to talk about in deep learning, not to say that the content is not much, it is very much, here is only one book recommended for everyone, this book was published at the end of last year, it is the latest deep learning book, it It is very comprehensive, from basic mathematics to the knowledge points of probability, statistics, machine learning, calculus, and linear geometry just mentioned. 4. Open source software that needs to be understood and learned OpenCV Caffe TensorFlow SHAOXING COLORBEE PLASTIC CO.,LTD , https://www.colorbeephoto.com
-Autonomous driving
-Unmanned security
-Face recognition
-Vehicle license plate recognition
-Search by image
-VR/AR
-3D reconstruction
-Medical image analysis
-Drone
-Other
A picture contains: dimension, height, width, depth, number of channels, color format, data first address, end address, data volume and so on.
When a pixel occupies more bits, it can express more colors and richer.
For example: a 400*400 8-bit image, what is the original data volume of this image? If the pixel value is an integer, what is the value range?
1. Calculation of the amount of raw data: 400 * 400 * (8/8 )=160,000Bytes
(Approximately 160K)
2. Value range: 2 to the 8th power, 0~255
1. Divide the original image into 8*8 small blocks, each block has 64 pixels.
2. Perform DCT transformation on each 8*8 block in the image (the more complex the image, the less likely it is to be compressed)
3. After different images are segmented, the complexity of each small block is different, so the final compression result is also different
Original video = picture sequence.
Each ordered picture in a video is called a "frame". After compressed video, various algorithms will be adopted to reduce the data capacity, of which IPB is the most common.
The code rate is the number of data bits transmitted per unit time during data transmission. Generally, the unit we use is kbps, which is kilobits per second. That is, the sampling rate (not equivalent to the sampling rate, the unit of the sampling rate is Hz, which means the number of samples per second). The larger the sampling rate per unit time, the higher the accuracy, and the closer the processed file is to the original file. , But the file volume is directly proportional to the sampling rate, so almost all encoding formats pay attention to how to use the lowest bit rate to achieve the least distortion. Around this core, cbr (fixed bit rate) and vbr (variable code) are derived Rate), the higher the bit rate, the clearer the bit rate, otherwise the picture is rough and more mosaic.
The frame rate is the number of frames of pictures transmitted in one second, and can also be understood as the number of times the graphics processor refreshes per second.
-Surveillance cameras (network cameras and touch you cameras)
-Cameras required by different industries (ultra-wide dynamic camera, infrared camera, thermal imaging camera, etc.)
-Smart camera
-Industrial cameras
Network cameras are generally higher in definition than traditional analog cameras. Analog cameras are currently in a state of being eliminated. It can be understood as the previous generation of surveillance cameras, and network cameras are currently a mainstream camera. Cameras, in about 13 years, probably 70% to 80% of the market is analog cameras, but now maybe 60% to 70% are network cameras.
Performance, in other words, performance will be replaced by another word called Latency (low latency). Low latency means that when your performance is better, your processing analysis efficiency is higher, which is equivalent to the lower your latency. This is performance. The other is called throughput, which means the amount of data you can process at the same time.
And what is the difference between CPU and GPU? The main reason lies in these two places. The CPU is a high-performance, that is, ultra-low latency. It can quickly do complex calculations and can achieve a good performance requirement. The GPU is based on a format called an arithmetic unit, so its advantage is not low latency, because it is really not good at doing complex calculations, each of its processors is very small, relatively weak, but It can let all its weak processors do processing at the same time, which is equivalent to processing a large amount of data at the same time, which means that its throughput is very large, so CPU focuses on performance, and GPU focuses on Throughput.
Stanford CS223B
It is more suitable for the basics and suitable for students who are just getting started. Relatively speaking, the combination with deep learning will be less. It will not focus on deep learning in the whole course, but mainly focus on computer vision, which will cover all aspects.
This should not need to be introduced. Generally, many people know that this is a course that combines computer vision and deep learning. We can see it on YouTube. The teacher of this course is Li Feifei. If you don’t know, you can check it. For a moment, if you do computer vision, this person can be regarded as the "leading" in the industry and academia.
It is a very classic computer vision library, which implements many common computer vision algorithms. Can help everyone get started quickly.
If you are doing computer vision, Caffe is more recommended. Caffe is better at doing convolutional neural networks, which are the most used in computer vision.
So no matter what other open source software you learn later, Caffe is inevitable, because after learning Caffe, you will find that if you understand Caffe, you will use Caffe, and even have the ability to change its source code. You will find that you have a qualitative leap in understanding of deep learning.
TensorFlow has been very popular recently, but its entry barrier is not low. It takes much more time to learn to use it than all other software. Secondly, it is currently not particularly mature and stable, so the update iterations between versions are very There are too many, compatibility is not good, there is still a lot of room for improvement in operating efficiency.