{"id":244767,"date":"2022-03-21T11:41:21","date_gmt":"2022-03-21T11:41:21","guid":{"rendered":"https:\/\/www.webscale.com\/?p=244767"},"modified":"2023-12-29T15:44:45","modified_gmt":"2023-12-29T20:44:45","slug":"troubleshooting-web-app-slowdown-7-steps","status":"publish","type":"post","link":"https:\/\/www.webscale.com\/blog\/troubleshooting-web-app-slowdown-7-steps\/","title":{"rendered":"Troubleshooting Your Web App Slowdown in 7 Steps"},"content":{"rendered":"

In this article, I will walk you through the step-by-step process of diagnosing an increase in the average time to first byte (TTFB) for an application running on the\u00a0Webscale platform<\/a>. More specifically, we will use the data available in the\u00a0Webscale Traffic Viewer<\/a>\u00a0to identify a service within the distributed application infrastructure that caused the increase. These investigation techniques are a great example of how you can query the data available within our Traffic Viewer application to narrow the source of a problem within your application.<\/p>\n

Every now and then, one of our customers will bring an issue, such as the one I describe here, to my attention. In this case, the user jumped to an erroneous conclusion that the performance issue was within the CDN serving their application, when in fact there was an application issue.<\/p>\n

The application that we will explore was deployed using a typical Webscale model with\u00a0CloudEDGE CDN<\/a>\u00a0in front of a scalable ecommerce application. Traffic in this environment first arrives at a CDN PoP within CloudEDGE. On a cache miss, it will be routed towards the origin where the request will leave CloudEDGE, via a Proxy close to the origin, where it will be serviced by an origin application server.<\/p>\n

\"\"<\/p>\n

Figure 1:<\/strong>\u00a0The current standard architecture for Webscale.<\/em><\/p>\n

While investigating a slowdown the team recognized that there was a noticeable increase in TTFB on the customer application, as measured by a synthetic network monitoring tool. The tool tracks network transit times for resources on a fixed set of application web pages, and the team noticed a sizable discrepancy in the data captured over the last 24 hours. They brought the issue to my attention and we began using our Traffic Viewer tools to verify the slowdown and tease out the source of it. The following is a rough description of our investigation that demonstrates using the\u00a0Webscale Portal<\/a>\u00a0tools to isolate the source of an application issue.<\/p>\n

Step 1: Analyzing data on all successful requests in Traffic Viewer<\/strong><\/p>\n

Before looking for the cause of the slowdown, we had to determine whether it was observable in our data. To do this, we began by loading up the CDN traffic logs in the Traffic Viewer for the relevant time period and filtering out records that had a status code other than 200. The resulting data is plotted in the graph of Figure 2. Note that we limited our selection to requests that received a response status code of 200, allowing us to eliminate blocked requests or errors that were unrelated to our performance issue.<\/p>\n

\"\"<\/p>\n

Figure 2:<\/strong>\u00a0The CDN TTFB metric from the time period when the slowdown began. The center of the graph is where the slowdown started at about 06:00 on February 22nd.<\/em><\/p>\n

The user believed that there was a CDN issue and we first wanted to verify that in our logs. We can see that at 06:00 on February 22nd there is a noticeable increase in TTFB. From this zoomed-out view inside the CDN logs, we can measure an average TTFB of 104.7 ms before 06:00 on the 22nd, and an average TTFB of 198.3 ms after. So from the CDN logs, it does appear that the CDN performance halved during this time period. While 100ms may sound small, it is not. Recall that this includes data for all requests serviced by the CDN at this time, including a large number of requests that only take a few milliseconds to complete. But we need to look more closely at all of the data available to understand exactly what is slowing the application down.<\/p>\n

Step 2: Deep dive into CDN-cached request data using Traffic Viewer<\/b><\/p>\n

For this application, requests can be divided into two primary buckets from the\u00a0CDN<\/a>\u00a0perspective\u2014requests that are cached at the CDN and requests that are not cached at the CDN. In this case, we have two cache behaviors configured on our application, one for requests with paths that start with \u201c\/media\u201d and one for those that start with \u201c\/static.\u201d All other requests to the CDN are guaranteed cache misses that will be routed to origin for processing. If we limit our data view to just static and media requests with a 200 status code, then we get the graph shown in Figure 3.<\/p>\n

\"\"<\/p>\n

Figure 3:<\/strong>\u00a0The CDN TTFB metric from the time period when the slowdown began but limited to only requests that match request_status_code = 200 and ((request_path ~ \/static*) or (request_path ~ \/media*)). Nothing really interesting to see here, before and after the event cache performance looks pretty much the same.<\/em><\/p>\n

Notice that the increase we initially observed in the CDN logs is no longer apparent in Figure 3. From the figure, we can see that the CDN cache performance before and after 06:00 was approximately the same. Thus, we can rule out the CDN cached requests as the potential root cause and start investigating the uncached requests.<\/p>\n

Step 3: Excluding CDN-cached requests from the dataset in Traffic Viewer<\/strong><\/p>\n

Let\u2019s instead plot the opposite data, so requests that had a 200 status code but were neither media nor static assets. These results are in Figure 4.<\/p>\n

\"\"<\/p>\n

Figure 4:<\/strong>\u00a0The CDN TTFB metric from the time period when the slowdown began but without considering static assets or media, i.e., filtered with response_status_code = 200 and request_path !~ \/static* and request_path !~ \/media*.<\/em><\/p>\n

Now this is interesting. We can clearly see that the slowdown is real and does not involve cached assets at the\u00a0CDN.<\/a>\u00a0The graph in Figure 4 only plots requests that returned a status code 200 and were not cacheable at the CDN and we can see that at approximately 06:00 on Feb 22 non-cacheable requests significantly slowed down.<\/p>\n

Step 4: Identifying an individual problematic request in the CDN log<\/b><\/p>\n

Now, let\u2019s identify a specific slow request from this data, and then track it through the proxy logs to try and identify the source of the slowdown. Figure 5 has the details of a single request for the homepage of this site, that is in our data set, which had a TTFB of 4212 ms or well within the slowdown period.<\/p>\n

\"\"<\/p>\n

Figure 5:<\/strong>\u00a0Details of a request from the CDN log that experienced the slowdown.<\/em><\/p>\n

Right-clicking on the request, we can access the Show details popup menu item that will give us more detail on the request.<\/p>\n

\"\"<\/p>\n

Figure 6:<\/strong>\u00a0How to navigate to Show details.<\/em><\/p>\n

Specifically, we\u2019re looking for the request_id. It\u2019s important to know that a unique request_id is assigned to each request that arrives at the\u00a0Webscale data plane<\/a>\u00a0and that, as that request traverses the system, its request_id will not change. This means that we can copy the request_id from the details of the\u00a0CDN<\/a>\u00a0request log and search for that request in the proxy log. Recall that if the request is a miss against the CDN cache then, in this application, the request will egress to origin through a proxy that will log its handling.<\/p>\n

\"\"<\/p>\n

Figure 7:<\/strong>\u00a0In Show Details, scroll down to find the Request ID.<\/em><\/p>\n

With this request_id, we can switch from the CDN logs to the proxy logs to see where the slowdown might have occurred.<\/p>\n

\"\"<\/p>\n

Figure 8:<\/strong>\u00a0How to navigate from CDN logs to proxy logs.<\/em><\/p>\n

Step 5: Locating and investigating the problematic request in the proxy logs<\/strong><\/p>\n

Looking in the proxy logs, we filter on this request_id and find the exact record that we are interested in, as seen in Figure 5.<\/p>\n

\"\"<\/p>\n

Figure 9:<\/strong>\u00a0The proxy log record for the request matching the request_id taken from the CDN logs.<\/em><\/p>\n

Looking a little deeper, we can see details of the request that was processed. In this case, we\u2019re very interested in the server latency, a measure of the time it took the origin server to process this request.<\/p>\n

\"\"<\/p>\n

Figure 10:<\/strong>\u00a0Details of a request from the proxy log that experienced the slowdown.<\/em><\/p>\n

From the data, we can see that this request took the origin server about 4015 ms to process with no queue latency. This is important because we now know that the slowdown is related to the origin request handling and not anything in the\u00a0Webscale data plane<\/a>. Now we are confident that something must have happened to the application at this time.<\/p>\n

Step 6: Correlating the findings with data in the Events logs<\/strong><\/p>\n

Let\u2019s check to see if there is an event that has more data on the issue in the events logs. In Figure 6, we can see that a severe event did occur at the same time the application origin slowed down.<\/p>\n

\"\"<\/p>\n

Figure 11:<\/strong>\u00a0Comparing proxy TTFB data for successful homepage requests like our identified record as compared to severe events that have occurred within the application during this time period.<\/em><\/p>\n

From the event details in the log, we can see that a 500 status code was received from the application, to our synthetic application monitor at that time.<\/p>\n

\"\"<\/p>\n

Figure 12:<\/strong>\u00a0Details of the event that caused the slowdown.<\/em><\/p>\n

The recorded 500 status code indicates an internal application error occurred which is often correlated with an application deployment, where the deployment engineer did not flush the internal application cache.<\/p>\n

Step 7: Communicating the findings to the application developer<\/b><\/p>\n

At this point, we would begin working with the developer to understand what changes occurred in the application at this time, because one of these changes likely caused a degradation in the performance of HTML rendering in the application.<\/p>\n

Through this investigation, we were able to use our\u00a0Portal tools<\/a>\u00a0to quickly isolate a possible performance issue and to confidently associate it with the responsible system. Further, we were able to isolate the approximate time of the change and when it began impacting customers on the application. In this case, we helped our user get clear on where in the system they should be looking for the issue. This saved them time which they otherwise would waste on investigating the wrong service. You can use the same methodology to speed up your own root cause analysis of similar issues. For more information about using Webscale Traffic Viewer, visit our\u00a0Knowledge Base<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"

In this article, I will walk you through the step-by-step process of diagnosing an increase in the average time to first byte (TTFB) for an application running on the\u00a0Webscale platform. More specifically, we will use the data available in the\u00a0Webscale Traffic Viewer\u00a0to identify a service within the distributed application infrastructure that caused the increase. These […]<\/p>\n","protected":false},"author":4,"featured_media":255673,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","rank_math_lock_modified_date":false,"_aioseo_description":"","_aioseo_keywords":"","_aioseo_og_article_section":"","_aioseo_og_article_tags":"","_aioseo_og_description":"","_aioseo_og_title":"","_aioseo_title":"","_aioseo_twitter_description":"","_aioseo_twitter_title":"","_author_photo":"field_6513304084a08","_doc_url":"","_dp_original":"","_et_autogenerated_title":"","_et_body_layout_enabled":"","_et_body_layout_id":"","_et_builder_dynamic_assets_loading_attr_threshold":"2","_et_builder_module_features_cache":null,"_et_builder_version":"","_et_default":"","_et_enabled":"","_et_footer_layout_enabled":"","_et_footer_layout_id":"","_et_header_layout_enabled":"","_et_header_layout_id":"","_et_pb_ab_current_shortcode":"","_et_pb_ab_subjects":"","_et_pb_built_for_post_type":"","_et_pb_custom_css":"","_et_pb_enable_shortcode_tracking":"","_et_pb_excluded_global_options":"","_et_pb_first_image":"","_et_pb_gutter_width":"","_et_pb_module_type":"","_et_pb_page_layout":"et_no_sidebar","_et_pb_page_z_index":"","_et_pb_post_hide_nav":"default","_et_pb_row_layout":"","_et_pb_show_page_creation":"","_et_pb_show_title":"on","_et_pb_side_nav":"off","_et_pb_static_css_file":"","_et_pb_truncate_post":"","_et_pb_truncate_post_date":"","_et_post_bg_color":"#ffffff","_et_post_bg_layout":"light","_et_template":[],"_et_theme_builder_marked_as_unused":"","_et_use_on":"","_gallery_link_target":"","_global_colors_info":"","_lh_copy_from_url-original_file":"","_version_history":"","_wp_old_date":["2022-03-29"],"_wpcode_auto_insert":"","_wpcode_auto_insert_number":"","_wpcode_conditional_logic":[],"_wpcode_conditional_logic_enabled":"","_wpcode_library_id":"","_wpcode_library_version":"","_wpcode_location_extra":"","_wpcode_note":"","_wpcode_priority":"","_wpcode_shortcode_attributes":[],"_wpmf_gallery_custom_image_link":"","ao_post_optimize":[],"author_photo":"255858","doc_url":"","et_enqueued_post_fonts":{"family":{"et-gf-lato":"Lato:100,100italic,300,300italic,regular,italic,700,700italic,900,900italic"},"subset":["latin","latin-ext"],"cache_key":"{\"gph\":0,\"divi\":\"4.24.1\",\"wp\":\"6.6.2\",\"enable_all_character_sets\":\"false\"}"},"rank_math_contentai_score":{"wordCount":"100","linkCount":"0","headingCount":"100","mediaCount":"62.22"},"rank_math_description":"","rank_math_facebook_image":"","rank_math_facebook_image_id":"","rank_math_internal_links_processed":["1"],"rank_math_og_content_image":{"check":"f406e2199fc6cbd71351b0547ee02ce9","images":[]},"rank_math_seo_score":"30","rank_math_title":"","version_history":"","wp-smpro-smush-data":[],"wp-smush-animated":"","wpmf_filetype":"","wpmf_order":"","wpmf_size":"","_":"","_bj_lazy_load_skip_post":[],"_divi_filters_post_type":"","_et_dynamic_cached_attributes":{"sticky_position":["top"],"use_custom_gutter":["on"],"fullwidth":["off"],"button_icon":["$||divi||400"],"social_network":["facebook","twitter","linkedin","youtube","last_fm"],"header_2_font":"|800|||||||","header_2_font_size":"34px","header_2_text_align":"center","animation_intensity_slide":"10%","animation_duration":"800ms","animation_delay":"15ms","animation_intensity_zoom":"15%","animation_intensity_flip":"15%","animation_intensity_fold":"15%","animation_intensity_roll":"15%","animation_direction":"center","animation_style":"none","background_color_gradient_start":"rgba(255,255,255,0)","background_color_gradient_end":"#fafafa","custom_padding":"120px||0px||false|false","background_color_gradient_stops":"rgba(255,255,255,0) 0%|#fafafa 100%","custom_padding_last_edited":"on|desktop","custom_padding_tablet":"||30px||false|false","custom_padding_phone":"60px||||false|false"},"_et_dynamic_cached_shortcodes":["et_pb_post_content","et_pb_contact_field","et_pb_signup_custom_field","et_pb_social_media_follow_network","et_pb_section","et_pb_row","et_pb_column","et_pb_blog","et_pb_blurb","et_pb_button","et_pb_code","et_pb_contact_form","et_pb_post_nav","et_pb_post_title","et_pb_signup","et_pb_social_media_follow","et_pb_text"],"_et_pb_ab_bounce_rate_limit":"","_et_pb_ab_stats_refresh_interval":[],"_et_pb_content_area_background_color":"","_et_pb_dark_text_color":"","_et_pb_light_text_color":"","_et_pb_section_background_color":"","_job_location":"","_job_locations":"","_links_to":"","_links_to_target":"","_product_image_gallery":"","_schema_code":"","_synced_version":"","_wp_attachment_context":"","_wp_attachment_image_alt":[],"_wpie_source_url":"","_yoast_wpseo_content_score":"30","_yoast_wpseo_focuskeywords":"[]","_yoast_wpseo_metadesc":"","_yoast_wpseo_opengraph-image":"","_yst_prominent_words_version":"1","inline_featured_image":["0"],"job_location":[],"job_locations":"","options":"","original-file":"","post_views_count":"","rank_math_analytic_object_id":"1846","rank_math_canonical_url":"","rank_math_focus_keyword":[],"rank_math_news_sitemap_robots":"index","rank_math_primary_category":"0","rank_math_primary_ccategory":"","rank_math_primary_job_locations":"","rank_math_primary_partners_category":"","rank_math_primary_pr_category":"","rank_math_primary_press_release_year":"","rank_math_rich_snippet":"","rank_math_robots":["index"],"rank_math_schema_Article":[],"rank_math_schema_Organization":[],"rank_math_schema_VideoObject":[],"rank_math_shortcode_schema_s-23675683-fff5-4300-88fe-da8afc8b1bb9":"","rank_math_shortcode_schema_s-307bbc91-c6b1-41aa-950d-c50d435a949c":"","rank_math_shortcode_schema_s-63a052dbc0384":"","rank_math_shortcode_schema_s-63a052dbc039d":"","rank_math_shortcode_schema_s-63a052dbc03a6":"","rank_math_shortcode_schema_s-63a052dbc03aa":"","rank_math_shortcode_schema_s-63a052dbc03b5":"","rank_math_shortcode_schema_s-63a052dbc03ba":"","rank_math_shortcode_schema_s-63a052dbc03bd":"","rank_math_shortcode_schema_s-63b6dd7d53a96":"","rank_math_shortcode_schema_s-63b6dd7d53a9f":"","rank_math_shortcode_schema_s-63b6dd7d53aa2":"","rank_math_shortcode_schema_s-63b6dd7d53aa4":"","rank_math_shortcode_schema_s-63b6dd7d53aa7":"","rank_math_shortcode_schema_s-63b6dd7d53aa9":"","rank_math_shortcode_schema_s-63b6dd7d53aab":"","rank_math_shortcode_schema_s-63b6dd7d53aad":"","rank_math_shortcode_schema_s-63b6dd7d53aaf":"","rank_math_shortcode_schema_s-63c15fcf43311":"","rank_math_shortcode_schema_s-63c15fcf43322":"","rank_math_shortcode_schema_s-63c15fcf43325":"","rank_math_shortcode_schema_s-63c15fcf43327":"","rank_math_shortcode_schema_s-63c15fcf43329":"","rank_math_shortcode_schema_s-63c15fcf4332a":"","rank_math_shortcode_schema_s-63c15fcf4332c":"","rank_math_shortcode_schema_s-63c15fcf4332e":"","rank_math_shortcode_schema_s-63c15fcf43330":"","rank_math_shortcode_schema_s-63f52c5ed40bb":"","rank_math_shortcode_schema_s-6409f40a9b7d5":"","rank_math_shortcode_schema_s-64354a3892419":"","rank_math_shortcode_schema_s-6440158136148":"","rank_math_shortcode_schema_s-6446d2f9353ee":"","rank_math_shortcode_schema_s-6446d2f9353f3":"","rank_math_shortcode_schema_s-6447c0fe4673c":"","rank_math_shortcode_schema_s-64e4d743542d7":"","schema_code":"","smush-complete":"","smush-info":"","smush-stats":[],"synced_version":"","wpmf_remote_video_link":"","_exp":"","_inc":"","_mc4wp_settings":[],"_post-subtitle":"","_pwh_dcfh_contact_email":"","_pwh_dcfh_contact_form_id":"","_pwh_dcfh_form_fields":"","_pwh_dcfh_ip_address":"","_pwh_dcfh_page_id":"","_pwh_dcfh_read_by":"","_pwh_dcfh_referer_url":"","_pwh_dcfh_user_agent":[],"_section1_col1":"","_section1_col2":"","_section1_col3":"","_section1_col4":"","_section2_col1":"","_section2_col2":"","_section2_col3":"","_section2_col4":"","_section2_col5":"","_section2_col6":"","_section3_col1":"","_section3_col2":"","_section3_col3":"","_section3_col4":"","_section3_col5":"","_section3_col6":"","_section4_col1":"","_section4_col2":"","_section4_col3":"","_section4_col4":"","_section4_col5":"","_section4_col6":"","_section5_col1":"","_section5_col2":"","_section5_col3":"","_section5_col4":"","_section5_col5":"","_section5_col6":"","_section6_col1":"","_section6_col2":"","_section6_col3":"","_section6_col4":"","_section6_col5":"","_section6_col6":"","_select_author":"","_test":"","_wp_attachment_backup_sizes":[],"_yoast_wpseo_estimated-reading-time-minutes":[],"_yoast_wpseo_focuskw":[],"_yoast_wpseo_focuskw_text_input":[],"_yoast_wpseo_linkdex":[],"_yoast_wpseo_meta-robots-nofollow":[],"_yoast_wpseo_meta-robots-noindex":[],"_yoast_wpseo_primary_category":[""],"_yoast_wpseo_title":[],"_yoast_wpseo_wordproof_timestamp":"","exp":"","inc":"","post-subtitle":[],"rank_math_schema_BlogPosting":[],"section1_col1":"","section1_col2":"","section1_col3":"","section1_col4":"","section2_col1":"","section2_col2":"","section2_col3":"","section2_col4":"","section2_col5":"","section2_col6":"","section3_col1":"","section3_col2":"","section3_col3":"","section3_col4":"","section3_col5":"","section3_col6":"","section4_col1":"","section4_col2":"","section4_col3":"","section4_col4":"","section4_col5":"","section4_col6":"","section5_col1":"","section5_col2":"","section5_col3":"","section5_col4":"","section5_col5":"","section5_col6":"","section6_col1":"","section6_col2":"","section6_col3":"","section6_col4":"","section6_col5":"","section6_col6":"","select_author":"","test":"","footnotes":""},"categories":[1,10],"tags":[],"class_list":["post-244767","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","category-performance"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts\/244767"}],"collection":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/comments?post=244767"}],"version-history":[{"count":0,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/posts\/244767\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/media\/255673"}],"wp:attachment":[{"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/media?parent=244767"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/categories?post=244767"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webscale.com\/wp-json\/wp\/v2\/tags?post=244767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}