Log file analysis for the web. Corporate: Search for SEO at the Google Search Console.

The real problem in life
Have you ever felt like this? Your SEO team works hard. Edit all points where the Google Search Console (GSC) recommends to be all green, but ... the SEO rank of the important page "is completely silent", not moving anywhere or worse is New content that has just been released It takes a long time that Google will collect data (Index), although pressing "Request Indexing".
You may be sitting in the temples and thinks "What do we miss?" "Why is the company's big website fighting a small website? Of the competitors in some keywords? "The answer to this frustration May not be in the report that we see every day. But hidden in the raw data files that you may have never opened. That is the "log file" of the server.
Prompt for illustrations: The picture of a marketer or the SEO team is sitting seriously meeting. The computer screen shows the SEO graph that is completely calm and has a Google Search Console icon in the screen to convey the effort to solve the problem with the tools. But haven't found the solution yet
Why did that problem occur?
This problem arises because the Google Search Console is the "conclusion" that Google reports to us, but not all "real behavior" that occurs. For large organization websites with tens of thousands of webpages, Google has a limited resource to collect data. Also known as "Crawl Budget"
Imagine Googlebot as a delivery staff that has a limited time and oil each day. If your website is like a large city with a small alley, a little Soi (such as URL, which is caused by a strange Filter, Parameter, duplicate duty). Instead of actually delivering goods to the customer's house (Important page) This problem is the key to understanding the management of Crawl Budget, which is what GSC does not tell us directly.
Prompt for illustrations: Infographic images compare the Crawl Budget like the "oil tank" of the Googlebot car, with multiple pipes leak out of the tank to the icon that conveyed "Not important duties" such as filters, error pages, or duplicate duties Resulting in the oil reaching "Important Pages" (Important Pages)
If left, how will it affect?
Allowing Googlebot to be lost and using Crawl Budget Affecting business more than just "The rankings do not rise". The real tangible impact is:
- Important pages are reduced to the importance: New product page, the main service page, or the article of the business may be able to be visited. Because it takes time with the garbage Causing these pages to not be fresh in the eyes of Google
- Loss of business opportunities: When the Landing Page or a new promotion page, Index is slow, that means you are losing the opportunity to sell and compete every day.
- Using a budget for content is not worth it: Your team may devote to create high quality content, but if Googlebot rarely meets or not. That content almost does not create SEO results at all.
- Enormous risks when moving the web: If you have never checked the log file and moved the web. The hidden problem may become severe and the traffic is difficult to recover. Having a checklist for moving the web is very important, but the Log file analysis will make the checklist more perfect.
Prompt for illustrations: Image showing a calendar page with a new marketing campaign But there is a spider on the webpage, along with the Traffic graph, which is a well -prepared media center that has been deserted because of invisible technical problems.
Is there any solution? And where should it start?
The most direct solution is to stop "guess". The behavior of Googlebot and turn to look at the "truth" from raw data, that is to do "Log File Analysis".
Log File Analysis is the process of using a server log to analyze to see what Search Engine Bot (especially Googlebot) to do on our website. It tells us to be detailed to the second level of which page. How often, what Error is found and which part of the website is especially?
What you get from log file analysis:
- Discover Googlebot's function often, but there is no value. SEO: For example, the duty is caused by Faceted Navigation (filtering) that creates millions of URL. To be managed in Robots.txt
- Looking for important duties, but Googlebot is rarely: to find the reason why BOT is not reached, maybe because the Internal Link is too small or the web structure is too complicated.
- Check the true Status Codes: Search page 404 (Not Found) or 301 (Redirect) that is wrong. GSC may not report complete.
- Confirm technical solutions: After fixing anything We can see from the log file whether Googlebot is aware of that change yet.
You can start using tools like Screaming Frog Log File Analyser or Lumar (formerly DEEPCRAWL) designed for this event. And is also an important basis for deeper SEOs such as Edge SEO .
Prompt for illustrations: The detective style is using an extension glasses to Log File that is a long letter code. And in the magnifying glass, it appears as an easy -to -understand icon, such as GoogleBot icons, 404 page icons, icons, frequencies. To convey that the tools help provide complex information to understand
Examples from the real thing that used to be successful
A large E-Commerce company There are more than 50,000 SKUS products. They encounter problems that new collections that invest heavily, not ranked at all. Traffic from Organic Search for years, even though the team has adjusted according to all GSC.
To do: The team decided to Log File Analysis for the first time and what was found was very shocking. They discovered that 70% of the Crawl Budget was used with crawl. The duty is due to the users of "filters" products (such as colors, sizes, prices), which create countless URL. In addition, the Keyword Cannibalization That is hidden between the Category and these filters
Result: After the team has blocked URL from these filters in the Robots.txt file and managed the new Internal Link structure only 3 months later. The result was:
- Googlebot is back to crawl, the main product page and Category page more often. 300%
- The new collection is starting to rank on page 1-2 within a few weeks.
- The overall organic traffic of the website increased by 25% in one quarter.
This is the power of vision of what competitors are invisible. Which is the only way to come from the Google Search Console
Prompt for illustrations: Clear Before & After graph, the left (BEFORE) is a Crawl Budget image at 70%, red and 30%. The right (After), 80% of the proportion is green, and 20% is red (WASTED), along with the sales lines that are assembled.
If wanting to follow, what to do? (Can be used immediately)
You can start log file analysis by yourself, follow the following steps:
- Request a log file from the IT team or the hosting service provider: tell your team that you want the "Server Access Logs" of the website, specify the desired period. (Recommend to go back at least 30 days). The files are usually. Log or .gz
- Prepare Log File Analyser: For beginners, Screaming Frog Log File Analyser is a very good option. Because it is easy to use and has a free version to try
- Import Log File into the program: Follow the procedures of the program to import the log file and should crawl your website with normal Screaming Frog to compare all URL data.
- Start analyzing the treasure:
- Look at "URLS" Tab: Looking for URL in Googlebot often (High Number of Crawls), but it is URL that you don't want to be ranked.
- Look at "Response Codes" TAB: Check if there is a 404 page (can't find) or 5xx (server crash) that Googlebot found a lot or not.
- Compared to data from GSC: Bring the GoogleBot information frequently compared to the duty. Impression/CLICK in GSC, which page is often in, but never gets a traffic, it may be a sign of the problem.
These data analysis will help you see the overall picture and can plan the web structure directly. If you want Improve corporate website structure To be ready to deal with these challenges Having data from the log file is the strongest foundation.
Prompt for illustrations: Step-By-Step 4 Step 1) 1) Log 2) icon. SCREAMING FOG 3) Icon Arrow Import 4) Dashboard icon with Graph and in-depth information appears. Show that the process is not as complicated as expected
Questions that people tend to wonder And the answers that are cleared
Question: How is the information in the log file unlike in the Google Search Console (Crawl Stats)?
Answer: GSC Crawl Stats is the information that Google "Summary and Sampling" come to see, but Log file is the "100%raw data" that actually happened on your server. It is much more detailed and more complete. Causing the problem that GSC may overlook
Question: How often do we do Log File Analysis?
Answer: For large organization websites It is recommended to do at least once in a quarter, but if there is a big change in the web structure, moving the web, or the traffic amount falls without knowing the reason should be done immediately.
Q: In addition to googlebot, do we see other bots too?
Answer: See all! You will see both Bingbot, Ahrefsbot, Semrushbot and other Bot, which allows you to understand that someone has collected your website. But for the SEO target, we mainly focus on Googlebot.
Question: Is it a Log file analysis? Is it a programmer?
Answer: Not necessary. SCREAMING FOGA or LUMAR tools, allowing people to do SEOs without the base of the code to analyze. But you must have a basic understanding of Technical SEO to interpret information and continue to use.
Prompt for illustrations: Image icons, people with a question mark floating above the head And there is an arrow pointing to another icon that is a bright lamp Conveys changing doubts into a clear understanding
Summary to be easy to understand + want to try to do
SEO for large websites is like taking care of complex machines. Dependence on only the Google Search Console May cause us to overlook the problems with small gear Some are working wrong.
Log file analysis is to open the engine lid to see the actual work one by one. It is a change from "guess" to "real knowledge" how Google can see and interact with your website. Investment is effective to understand this information. Is to create a sustainable competitive advantage Helping every money that invests with content and SEO is the most worthwhile.
Do not let your website not fully work. Let's start the conversation with your IT team from today to request Access Logs and open the new world of SEO that is profound and can be measured.
If you are ready to upgrade your organization website to the next level Whether Improve the website (Website Renovation) or create a new organization website. Which has a strong structure ready for SEO in the long run. The Vision X Brain team is ready to give advice.
Recent Blog

EEAT is not just SEO! In -depth how to build and show signs of Experience, Expertise, Authorittiveness, and Trustworthiness on the IR website to win investors.

Change the boring website into a digital showroom! UX/UI design techniques and use Interactive Content to present an interesting industrial product and stimulate contact.

No budget to fight a big website? Use the strategy of "Barnacle SEO" to create a strong identity on a strong platform to attract quality traffic and build credibility for your SME brand.