ssh-workbench/docs/SESSION_LIFECYCLE_AUDIT.md
jima 9c980bbea7 Session lifecycle: fix service death on unbind, ProxyJump ID, quick-connect history
Critical fixes:
- ensureStarted() counteracts previous stopSelf() before creating any session,
  preventing service destruction when Activity unbinds (screen off, Home, task switch)
- onTaskRemoved() re-starts service to survive swipe-from-recents
- Auto-reconnect no longer removes session from map (prevents empty-map window)
- Auto-reconnect clones password CharArray before disconnect zeroes the original
- checkStopSelf() now includes SFTP sessions in the empty check

ProxyJump fix:
- onConnect callback now passes savedConnectionId directly instead of re-looking
  up by host/port/username (LIMIT 1 picked wrong duplicate connection)
- Threaded through ConnectionListScreen → NavGraph → MainActivity → MainViewModel

UX improvements:
- Quick-connect history (pro): stores 100 entries with timestamps, shows 20 in
  dropdown with relative dates, X to remove, dismiss on focus loss
- Disconnect snackbar suppressed for active session (user already sees the bar)
- DisconnectedBar: muted dark background with teal/grey buttons instead of
  red/green/blue Christmas lights
- Session auto-switch works for Idle state (ProxyJump sessions start in Idle
  during jump chain build)

Refactoring:
- SshConnectionHelper extracted: shared auth lookup, TOFU, session factory
  (eliminates duplication between connectSSH, buildJumpChain, openSftpSession)
- Remove redundant stopForeground call from updateNotification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:33:45 +02:00

6.1 KiB
Raw Permalink Blame History

Session Lifecycle Audit — 2026-04-02

Bug Report

Symptom: User backgrounds the app (Home, screen off, task switch), comes back, and sessions are GONE. The foreground notification shows "No active sessions". The service was destroyed while it had active connected sessions.

Evidence: Three log dumps from 2026-04-02 all show onDestroy sessions=1 firing 14-17ms after onStop — unbound from TerminalService. Screenshots show the "No active sessions" foreground notification.


Root Cause: CONFIRMED

The stopSelf() → late unbindService() race

Timeline from Log 2 (13:20 session):

  1. 13:23:22 — User closes failed session → checkStopSelf()stopSelf() (service "started" component dies)
  2. 13:24:08 — User connects new session (Activity still foreground, no new onStart(), no new startForegroundService())
  3. 13:25:54 — User backgrounds app → onStop()unbindService()
  4. 13:25:54 (14ms later) — Service destroyed with sessions=1 because:
    • Started component: DEAD (stopSelf called at 13:23:22, never restarted)
    • Binding: JUST REMOVED by unbindService()
    • Result: no started + no binding = Android destroys the service

Why startForegroundService() wasn't called between steps 1 and 2:

  • startForegroundService() is only called in MainActivity.onStart() (line 635)
  • onStart() runs when the Activity becomes visible
  • Between steps 1 and 2, the Activity was already visible — no new onStart() call
  • The BIND_AUTO_CREATE flag kept the service alive via binding, but the started lifecycle was dead

Fix Required

Call ensureStarted() (which calls startForegroundService()) every time a new session is created in TerminalService. This re-starts the started component, ensuring the service survives after unbindService().


All Session Open Paths

# Path Trigger Code
1 SSH from connection list User tap MainViewModel.connect()TerminalService.connectSSH()
2 SSH from quick-connect User typed user@host Same as above
3 SSH duplicate session Tab overflow → Duplicate MainViewModel.duplicateSession()connectSSH()
4 SSH auto-reconnect Network restored scheduleAutoReconnect()connectSSH() (internal)
5 SSH profile auto-connect Launch intent --es profile handleProfileConnect()connect()
6 Telnet from connection list User tap MainViewModel.connect()TerminalService.connectTelnet()
7 Local shell from connection list User tap MainViewModel.connect()TerminalService.startLocalShell()
8 SFTP from connection list Context menu "New SFTP" MainViewModel.openSftpTab()TerminalService.openSftpSession()
9 SFTP from tab overflow Overflow menu → SFTP Same as above

All Session Close Paths

# Path User-initiated? Code
1 Tab Close button YES MainViewModel.disconnectSession()TerminalService.disconnectSession()
2 DisconnectedBar Close YES Same
3 Tab overflow → Close YES Same
4 Disconnect All (connection card) YES disconnectAllForConnection()
5 onDestroy() NO (system) disconnectAll()disconnectSession() for each
6 Auto-reconnect sessions.remove() NO (internal) Line 1158 — brief removal during reconnect cycle
7 connectSSHInternal safety NO (internal) Line 389 — disconnectSession(sessionId) before creating entry

All Android Lifecycle Events Affecting Sessions

Event What happens Risk
Home button onPauseonStopunbindService HIGH if stopSelf was called earlier
Back to home Same as Home Same
Screen off Same as Home Same
Task switcher (Recent) onPauseonStopunbindService Same
Swipe from Recents onTaskRemoved() (not overridden!) → potential service death HIGH — no protection
Process death (low memory) Service onDestroy or no callback at all CRITICAL — sessions lost
Doze mode Network restricted, keepalive fails → SSH disconnect Expected — session stays in map as Disconnected
Network change (WiFi→mobile) TCP socket dies → SSH disconnect Expected — same
App update Process killed Sessions lost

All Session State Transitions (SSH)

Idle → Connecting → Connected → Disconnected(reason)
                  → Error(msg)       ↑ (auto-reconnect)
                                     → Disconnected(timeout/lost)

Disconnect causes (7 paths in SSHSession):

  1. Transport DisconnectListener (WiFi off, server disconnect)
  2. Read loop EOF (clean shell exit)
  3. Read loop exception (socket break)
  4. Keepalive timeout (3 missed × 15s = 45s)
  5. Read silence + probe failure (90-120s)
  6. Write timeout (10s)
  7. Write exception

All correctly transition to Disconnected → cleanupDisconnectedSession keeps in maps.

Bugs Found

BUG 1 — CRITICAL: stopSelf() not counteracted by new sessions

Impact: Service destroyed with active sessions when Activity unbinds. Fix: Call ensureStarted() in connectSSHInternal(), connectTelnet(), startLocalShell().

BUG 2 — HIGH: No onTaskRemoved() override

Impact: Swipe from Recents can kill service without onDestroy(). Fix: Override onTaskRemoved() — do NOT stop the service if sessions exist.

BUG 3 — MEDIUM: Auto-reconnect sessions.remove() gap

Impact: Brief window where session is absent from map during reconnect. Fix: Replace entry in-place instead of remove-then-add.

BUG 4 — LOW: Double stopForeground calls

Impact: Log noise, no functional issue. Fix: Gate second call.

BUG 5 — INFO: "No active sessions" notification text

Impact: User sees misleading notification. Note: This text should only appear transiently during checkStopSelfstopSelf flow. With Bug 1 fixed, the service won't linger with "No active sessions" because it will either have sessions or be stopped.