ANR(Application Not Responding ) 应用无响应的简称,是 Android 系统为了在 App 卡死时给用户强制退出 App 的机会,从而避免无响应问题,一方面提升用户体验,另一方面也是 Android 系统的一种自我保护机制。
下面基于 Android 11.0,分析四大组件产生 ANR 的流程。
Broadcast ANR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 final void processNextBroadcast (boolean fromMsg) { mService.updateCpuStats(); while (mParallelBroadcasts.size() > 0 ) { } broadcastTimeoutLocked(false ); performReceiveLocked(r.callerApp, r.resultTo,...); cancelBroadcastTimeoutLocked(); }
先处理并行广播,因为是单向通知,不需要等待反馈,所以并行广播没有 ANR。
再处理串行广播。
判断是否已经有一个广播超时消息;
根据目标进程优先级,分别在前台队列和后台队列(超时时限不同)中排队处理;
根据不同的队列,发出不同延时的 ANR 消息;如果处理及时,取消延时消息;如果处理超时,触发 ANR;
广播的 ANR 处理相对简单,主要是再次判断是否超时、记录日志,记录 ANR 次数等。然后就继续调用 processNextBroadcast 函数,处理下一条广播了。
1 2 3 4 static final int BROADCAST_FG_TIMEOUT = 10 *1000 ;static final int BROADCAST_BG_TIMEOUT = 60 *1000 ;
Service ANR 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 public final class ActiveServices { static final int SERVICE_TIMEOUT = 20 *1000 ; static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10 ; ComponentName startServiceLocked (IApplicationThread caller, ...) { final boolean callerFg; if (caller != null ) { final ProcessRecord callerApp = mAm.getRecordForAppLocked(caller); callerFg = callerApp.setSchedGroup != ProcessList.SCHED_GROUP_BACKGROUND; } else { callerFg = true ; } } void scheduleServiceTimeoutLocked (ProcessRecord proc) { Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; mAm.mHandler.sendMessageDelayed(msg, proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); } void serviceTimeout (ProcessRecord proc) { String anrMessage = null ; if (timeout != null && mAm.mProcessList.mLruProcesses.contains(proc)) { Slog.w(TAG, "Timeout executing service: " + timeout); anrMessage = "executing service " + timeout.shortInstanceName; } else { } if (anrMessage != null ) { mAm.mAnrHelper.appNotResponding(proc, anrMessage); } } private void serviceDoneExecutingLocked (ServiceRecord r, boolean inDestroying, boolean finishing) { if (r.app.executingServices.size() == 0 ) { mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); } } }
启动服务时判断是否是前台服务,决定超时时间;
发送 SERVICE_TIMEOUT_MSG 服务超时消息;
如果服务没有超时,移除消息。否则处理服务超时逻辑,记录日志等。
ContentProvider ANR ContentProvider 超时为 CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 public abstract class ContentResolver implements ContentInterface { public static final int CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS = 10 * 1000 ; } public class ActivityManagerService { private boolean attachApplicationLocked (@NonNull IApplicationThread thread, int pid, ...) { if (providers != null && checkAppInLaunchingProvidersLocked(app)) { Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG); msg.obj = app; mHandler.sendMessageDelayed(msg, ContentResolver.CONTENT_PROVIDER_PUBLISH_TIMEOUT_MILLIS); } } public final void publishContentProviders (IApplicationThread caller, List<ContentProviderHolder> providers) { if (wasInLaunchingProviders) { mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r); } } private final void processContentProviderPublishTimedOutLocked (ProcessRecord app) { cleanupAppInLaunchingProvidersLocked(app, true ); mProcessList.removeProcessLocked(app, false , true , ApplicationExitInfo.REASON_INITIALIZATION_FAILURE, ApplicationExitInfo.SUBREASON_UNKNOWN, "timeout publishing content providers" ); } }
Activity ANR Activity 的 ANR 是相对最复杂的,也只有 Activity 中出现的 ANR 会弹出 ANR 提示框。 最终的表现形式是:弹出一个对话框,告诉用户当前某个程序无响应,输入一大堆与 ANR 相关的日志,便于开发者解决问题。
InputDispatching:
Activity 最主要的功能之一是交互,为了方便交互,Android 中的 InputDispatcher 会发出操作事件,最终在 InputManagerService 中发出事件,通过 InputChannel 向 Activity 分发事件。交互事件必须得到响应,如果不能及时处理,IMS 就会报出 ANR,交给 AMS 去弹出 ANR 提示框。
KeyDispatching:
如果输入是个 Key 事件,会从 IMS 进入 ActivityRecord.Token.keyDispatchingTimeOut,然后进入 AMS 处理,不同的是,在 ActivityRecord 中,会先截留一次 Key 的不响应,只有当 Key 连续第二次处理超时,才会弹出 ANR 提示框。
窗口焦点:
Activity 总是需要有一个当前窗口来响应事件的,但如果迟迟没有当前窗口(获得焦点),比如在 Activity 切换时,旧 Activity 已经 onPause,新的 Activity 一直没有 onResume,持续超过 5 秒,就会 ANR。 App 的生命周期太慢,或 CPU 资源不足,或 WMS 异常,都可能导致窗口焦点。
1. 判断是否有 focused 组件以及 focused Application:
这种一般是在应用启动时触发,比如启动时间过长在这过程中触发了 keyevent 或者 trackball moteionevent 就会出现。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 void InputDispatcher::dumpDispatchStateLocked (String8& dump) { dump.appendFormat (INDENT "DispatchEnabled: %d\n" , mDispatchEnabled); dump.appendFormat (INDENT "DispatchFrozen: %d\n" , mDispatchFrozen); if (mFocusedApplicationHandle != NULL ) { dump.appendFormat (INDENT "FocusedApplication: name='%s', dispatchingTimeout=%0.3fms\n" , mFocusedApplicationHandle->getName ().string (), mFocusedApplicationHandle->getDispatchingTimeout ( DEFAULT_INPUT_DISPATCHING_TIMEOUT) / 1000000.0 ); } else { dump.append (INDENT "FocusedApplication: <null>\n" ); } ... }
对应于
Reason: Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 boolean inputDispatchingTimedOut (ProcessRecord proc, ...) { final String annotation; if (reason == null ) { annotation = "Input dispatching timed out" ; } else { annotation = "Input dispatching timed out (" + reason + ")" ; } if (proc != null ) { synchronized (this ) { if (proc.isDebugging()) { return false ; } if (proc.getActiveInstrumentation() != null ) { Bundle info = new Bundle (); info.putString("shortMsg" , "keyDispatchingTimedOut" ); info.putString("longMsg" , annotation); finishInstrumentationLocked(proc, Activity.RESULT_CANCELED, info); return true ; } } mAnrHelper.appNotResponding(proc, activityShortComponentName, aInfo, parentShortComponentName, parentProcess, aboveSystem, annotation); } return true ; }
2. 判断前面的事件是否及时完成:
对应于
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 10. Wait queue head age: 5591.3ms.)
出现这种问题意味着主线程正在执行其他的事件但是比较耗时导致输入事件无法及时处理。
生成 ANR 信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 final void appNotResponding (ProcessRecord app, ActivityRecord activity,...) { long anrTime = SystemClock.uptimeMillis(); if (isMonitorCpuUsage()) { mService.updateCpuStatsNow(); } final boolean isSilentAnr; EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,...; StringBuilder info = new StringBuilder (); info.setLength(0 ); info.append("ANR in " ).append(app.processName); if (activity != null && activity.shortComponentName != null ) { info.append(" (" ).append(activity.shortComponentName).append(")" ); } info.append("\n" ); info.append("PID: " ).append(app.pid).append("\n" ); if (annotation != null ) { info.append("Reason: " ).append(annotation).append("\n" ); } if (parent != null && parent != activity) { info.append("Parent: " ).append(parent.shortComponentName).append("\n" ); } StringBuilder report = new StringBuilder (); report.append(MemoryPressureUtil.currentPsiState()); ProcessCpuTracker processCpuTracker = new ProcessCpuTracker (true ); File tracesFile = ActivityManagerService.dumpStackTraces(true , firstPids,...); if (isMonitorCpuUsage()) { mService.updateCpuStatsNow(); synchronized (mService.mProcessCpuTracker) { report.append(mService.mProcessCpuTracker.printCurrentState(anrTime)); } info.append(processCpuTracker.printCurrentLoad()); info.append(report); } info.append(processCpuTracker.printCurrentState(anrTime)); Slog.e(TAG, info.toString()); if (tracesFile == null ) { Process.sendSignal(app.pid, Process.SIGNAL_QUIT); } else if (offsets[1 ] > 0 ) { mService.mProcessList.mAppExitInfoTracker.scheduleLogAnrTrace( pid, uid, getPackageList(), tracesFile, offsets[0 ], offsets[1 ]); } mService.addErrorToDropBox("anr" , app, app.processName, ...); if (isSilentAnr() && !isDebugging()) { kill("bg anr" , ApplicationExitInfo.REASON_ANR, true ); return ; } Message msg = Message.obtain(); msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG; msg.obj = new AppNotRespondingDialog .Data(this , aInfo, aboveSystem); mService.mUiHandler.sendMessage(msg); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 public class ActivityManagerService { public static final String ANR_TRACE_DIR = "/data/anr" ; File dumpStackTraces (ArrayList<Integer> firstPids, ...) { Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids); final File tracesDir = new File (ANR_TRACE_DIR); File tracesFile = createAnrDumpFile(tracesDir); Pair<Long, Long> offsets = dumpStackTraces( tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids); return tracesFile; } public static Pair<Long, Long> dumpStackTraces (String tracesFile, ArrayList<Integer> firstPids, ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) { Slog.i(TAG, "Dumping to " + tracesFile); long remainingTime = 20 * 1000 ; if (firstPids != null ) { } if (nativePids != null ) { } if (extraPids != null ) { } } }
收集 ANR 信息最长 20 秒;
调用 Debug.dumpJavaBacktraceToFileTimeout()
native 方法,按进程重要程度 dump 信息堆栈信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 bool debuggerd_trigger_dump (pid_t tid, DebuggerdDumpType dump_type, unsigned int timeout_ms, unique_fd output_fd) { LOG (INFO) << TAG "started dumping process " << pid; const int signal = (dump_type == kDebuggerdJavaBacktrace) ? SIGQUIT : BIONIC_SIGNAL_DEBUGGER; sigval val = {.sival_int = (dump_type == kDebuggerdNativeBacktrace) ? 1 : 0 }; if (sigqueue (pid, signal, val) != 0 ) { log_error (output_fd, errno, "failed to send signal to pid %d" , pid); return false ; } LOG (INFO) << TAG "done dumping process " << pid; }
每一个应用进程都会有一个 SignalCatcher 线程,专门处理 SIGQUIT,来到 art/runtime/signal_catcher.cc
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 void * SignalCatcher::Run (void * arg) { SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg); CHECK(signal_catcher != nullptr); Runtime* runtime = Runtime::Current(); SignalSet signals; signals.Add(SIGQUIT); signals.Add(SIGUSR1); while (true ) { int signal_number = signal_catcher->WaitForSignal(self, signals); if (signal_catcher->ShouldHalt()) { runtime->DetachCurrentThread(); return nullptr; } switch (signal_number) { case SIGQUIT: signal_catcher->HandleSigQuit(); break ; case SIGUSR1: signal_catcher->HandleSigUsr1(); break ; default : LOG(ERROR) << "Unexpected signal %d" << signal_number; break ; } } }
当应用发生 ANR 之后,系统会收集许多进程,来 dump 堆栈,从而生成 ANR Trace 文件。收集的第一个也是一定会被收集到的进程,就是发生 ANR 的进程。接着系统开始向这些应用进程发送 SIGQUIT 信号,应用进程收到 SIGQUIT 后开始 dump 堆栈。
参考 [1] developer ANRs [2] Android ANR 分析详解 [3] 看完这篇 Android ANR 分析,就可以和面试官装逼了! [4] 微信 Android 团队手把手教你高效监控 ANR [5] Input 系统—ANR 原理分析 - Gityuan [6] 彻底理解安卓应用无响应机制 - Gityuan [7] 理解 Android ANR 的触发原理 - Gityuan