The automatic start-up time-consuming scheme has low accuracy in conventional image comparison during keyframe recognition. This article briefly introduces the model optimization process using the scikit-learn image classification algorithm in start-up time-consuming applications. In the subsequent sequel, TensorFlow CNN, transfer learning, and other algorithms will be used to provide a comparison of recognition effects.
Time consumption for app launch and key page loading is a common performance indicator and also a key indicator for competitive product comparison. In the time-consuming testing, how to automatically identify key images is crucial. Because the advertisements and homepage posts of the video app during the startup process change minute by minute. The traditional automatic comparison method based on gray Histogram and Threshold doesn't work when identifying key images.
Ø Manual recognition: time-consuming, labor-intensive
Aphone610 version: 3 competitors, 14 scenarios, each scenario 10 times, 2 minutes per time, about 14 hours == 2 person-days
Ø Image comparison: grayscale histogram + threshold (not feasible)
1) Whole image comparison: Advertisements and homepage posters change during the video startup process
2) Partial comparison: The first screen after the app is fully launched is not completely displayed, and it may not be in the same place every time
Ø Buried point reporting: The accuracy of the results has always been questioned (not feasible)
1) Obtain using adb shell am start -W [packageName]/[packageName.MainActivity]
2) App buried point reporting: Add buried points in the code, and data is reported after the homepage is loaded
Starting speed keyframe image recognition is actually a common image classification problem in machine learning. The current image classification algorithm and open source code base are very mature, and applications are common. Previously, I browsed an article on the Internet that mentioned the use of machine learning to achieve time-consuming and automated keyframe recognition. Now, I will give the implementation and optimization process here.
As shown in the figure below, the video is disassembled into a series of picture frames after the screen recording software and automated script is used to complete the startup process. Through the trained machine learning model, identify the startup process corresponding to each picture, calculate the total number of frames from the first picture to the stable startup, and then get the final startup time.
Common image feature processing methods include:
1) Flattening of raw pixel features
2) Extracting color histograms (using cv2.normalize to extract a 3D color histogram from the HSV color domain and smoothing it)
In this scheme, method (1) is initially chosen. That is, the screen recording video resolution is 480p*720p, and after decompressing 8 times, each pixel point is represented by 3 data points. Finally, one image is represented by a 16,200-dimensional list - 16200 features. Subsequently, a comparison will be made using a 3D color histogram as a feature.
The first batch of sample sets:
In the process of algorithm selection, according to "Don't spend too much time on algorithm selection, let your model run first" and the sklearn official website algorithm selection guidance, because the number of samples is 1000+<100k, select SVM+linear core to start with.
In machine learning, if large errors are encountered, the common model tuning methods are nothing more than:
Add samples ----- avoid overfitting
Choose fewer features - avoid overtaking
Get more features ---- avoid underfitting
Adjust the model or regular parameter ------
Of course, in the implementation process, we need to find out the problem first, and cannot blindly increase samples or reduce parameters. In general:
Fast implementation algorithm
Plot learning curve
Analyze the error sample feature and select the means to be taken
Model: LinearSVC (C=1.0), learning is provided in sklearn_ The curve() function does not need to be implemented by yourself
1) Total sample 1225, 10%, 25%, 50%, 75%, 100% 5 rounds, train: valid=3:1
2) Calculate the curve of the average variance versus the number of samples
It can be seen from the figure below that the current fit model is overfitting, so the next thing to do is to increase samples, adjust parameters, reduce features, and other methods.
Step 1: Adjust the LinearSVC parameters (such as C, class_weight) - confirm that C=10 is the most appropriate
Step 2: Add samples (give priority to the small number of classified samples, and the classified samples with low test_set accuracy). The error is the lowest when there are 1610 samples
In order to better analyze problems, you can use classification_ Report to get the specific accuracy of each category
From sklearn.metrics import classification_ Report
Step 3: Reduce features
1. In the process of feature reduction, try to obtain the optimal feature number through RFEVC, and the optimization result is not obvious
a. The step size of feature increase is artificially selected. If it is too large, key features may be omitted. If it is too small, the amount of calculation is too large
b. The step is consistent, and the optimal characteristic number may be different each time
c. Little improvement
2) Increase image compression: from the original 8 times ->12 times ->16 times
a. Seen from the learning curve, the fitting still exists, and the overall test_ The error is still reduced
b. For the category of label=start with serious deviation, the higher the compression speed, the lower the accuracy
PCA: feature replacement, mapping the original feature to the new feature, so as to achieve dimension reduction. The main purpose of dimension reduction is to reduce the amount of calculation, but some enthusiastic comrades suggested that we try it. Facts have proved that "using PCA to avoid over fitting" is a bad case.
4) After the color histogram is extracted and smoothed as the image feature, it is found that precision and recall are 10 percentage points lower~~
Step 4: Adjust the result classification
As previously analyzed, the recognition accuracy of the start classification is very low. After analyzing and comparing the pictures, the only difference between the start and desk is that the app icon is grayed out. After evaluation, desk and start are classified into one category, which has little impact on the actual time-consuming test, but can improve the accuracy of many test sets
The learning curve after 8 categories become 7 categories has converged, and the over fitting situation is much better
Comparison before and after optimization