Files
Kordant/android/docs/data-collection-audit.md

330 lines
16 KiB
Markdown

# Data Collection Audit — Kordant Android
> **Last updated:** 2026-06-01
> **Auditor:** Android Production Readiness
> **Target:** `com.kordant.android` (v1.0, target SDK 36)
> **Purpose:** Complete the Google Play Data Safety form accurately
---
## 1. Data Collected by the App
### 1.1 Personal Information
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| Name | ✅ Yes | User registration/signup | Account creation, personalization | Yes |
| Email address | ✅ Yes | User registration/signup, Google Sign-In, password reset | Authentication, account recovery, notifications | Yes |
| Password | ✅ Yes | User registration, signup, password reset | Authentication (never stored locally in plaintext) | Yes |
| Phone number | ✅ Yes | User profile update, Call Screening, SpamShield | Caller ID verification, spam detection, user profile | No |
| Avatar/photo | ✅ Optional | User profile, Google Sign-In | Profile display | No |
**Sources:**
- `AuthRepository.kt` — signup, login, forgot/reset password
- `User.kt` data model — `name`, `email`, `phone`, `avatarUrl`
- `SettingsViewModel.kt` — updateProfile(name, phone)
- `GoogleSignInAccount` — idToken from Google Sign-In
### 1.2 Audio / Voice Data
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| Voice recordings | ✅ Yes | VoicePrint enrollment | Voice biometric identification, spam call analysis | No |
| Voice analysis results | ✅ Yes | VoicePrint analysis API | Analyze incoming calls against enrolled voice prints | No |
| Audio samples | ✅ Yes | RECORD_AUDIO permission | Create voice fingerprint for caller identification | No |
**Sources:**
- `VoiceEnrollment.kt``sampleCount`, `status`
- `VoiceAnalysis.kt``confidence`, `result`
- `TRPCApiService.kt``voiceprint.createEnrollment`, `voiceprint.analyzeAudio`
- `AndroidManifest.xml``RECORD_AUDIO` permission
### 1.3 Phone Numbers & Call Data
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| Incoming caller phone numbers | ✅ Yes | Call Screening Service | Spam detection, caller identification | Yes (for call screening feature) |
| Phone numbers to monitor | ✅ Yes | Watchlist (DarkWatch) | Alerts for data broker exposure of monitored numbers | No |
| Blocked/reported numbers | ✅ Yes | SpamShield rules | Community spam protection | No |
| Anonymized call logs | ✅ Yes | CallScreeningRepository | Analytics, false positive detection | No (SHA-256 hashed) |
**Privacy protections:**
- All phone numbers are SHA-256 **hashed** before being stored in the local spam database.
- Raw phone numbers are never written to disk in the spam DB.
- Call logs store only hashed representations (`SpamDatabase.hashPhoneNumber()`).
- Anonymized call logging (`logScreenedCall` stores `number_hash`, not raw number).
**Sources:**
- `CallScreeningService.kt``onScreenCall()`, `extractPhoneNumber()`
- `SpamDatabase.kt``hashPhoneNumber()`, `TABLE_SPAM_NUMBERS`, `TABLE_CALL_LOG`
- `WatchlistItem.kt``type`, `value` (phone numbers being monitored)
- `SpamRule.kt` — blocking rules
### 1.4 Device & Usage Information
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| FCM device token | ✅ Yes | Firebase Cloud Messaging | Push notification delivery | Yes |
| App version | ✅ Yes | `X-Client-Version` header | API compatibility, debugging | Yes |
| Device platform | ✅ Yes | `X-Client-Platform: android` header | API routing, analytics | Yes |
| Unique request IDs | ✅ Yes | `X-Request-ID` header | Request tracing, debugging | Yes |
| Android OS version | ✅ Yes | `Build.VERSION.SDK_INT` (network requests) | Analytics, crash reporting | Yes |
| Device model | ✅ Yes | Crashlytics reports | Crash debugging | Yes |
| Device language/locale | ✅ Yes | User preferences | Localization | Yes |
| Boot completed events | ✅ Yes | `RECEIVE_BOOT_COMPLETED` permission | Re-schedule background sync after reboot | Yes |
**Sources:**
- `NetworkModule.kt` — request headers, logging interceptor
- `FCMService.kt``onNewToken()`, `registerDeviceToken()`
- `KordantApp.kt` — Crashlytics initialization
- `SecureStorageManager.kt``fcmDeviceToken`
### 1.5 App Activity & Analytics
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| App startup timing | ✅ Yes | `StartupTracker.kt` | Performance monitoring, cold start optimization | Yes |
| Login/logout events | ✅ Yes | `AuthRepository.kt` | Authentication tracking | Yes |
| Feature usage API calls | ✅ Yes | All API endpoints via tRPC | Service functionality | Yes |
| Notification preferences | ✅ Yes | `UserPreferencesDataStore.kt` | Respect user notification choices | Yes |
| Theme preferences | ✅ Yes | `UserPreferencesDataStore.kt` | User personalization | No |
**Sources:**
- `StartupTracker.kt` — app cold start timing
- `TRPCApiService.kt` — all API endpoints
- `UserPreferencesDataStore.kt` — user settings & preferences
### 1.6 Crash & Performance Data
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| Crash reports | ✅ Yes | Firebase Crashlytics | Bug fixing, app stability | Yes |
| ANR traces | ✅ Yes | Android system + Crashlytics | Performance debugging | Yes |
| Security violation reports | ✅ Yes | `KordantApp.reportCompromiseToBackend()` | Security monitoring | Yes |
**Sources:**
- `KordantApp.kt``initializeCrashlytics()`
- `build.gradle.kts``firebase-crashlytics` dependency
- `AndroidManifest.xml``firebase_crashlytics_collection_enabled=true`
### 1.7 Property & Data Broker Data
| Data Type | Collected? | Source | Purpose | Required? |
|-----------|-----------|--------|---------|-----------|
| Property addresses | ✅ Yes | HomeTitle feature | Property title monitoring, data broker listing detection | No |
| Owner names | ✅ Yes | Property records | Property ownership verification | No |
| Broker listing URLs | ✅ Yes | Remove Brokers feature | Track data broker removal requests | No |
| Data exposure details | ✅ Yes | DarkWatch feature | Dark web monitoring alerts | No |
**Sources:**
- `Property.kt``address`, `ownerName`, `county`
- `BrokerListing.kt``propertyAddress`, `brokerName`, `url`
- `Exposure.kt``type`, `source`, `details`
- `WatchlistItem.kt` — PII being monitored (email, phone, SSN, etc.)
---
## 2. Data NOT Collected
| Data Type | Confirmed Not Collected | Evidence |
|-----------|------------------------|----------|
| Precise/approximate location | ✅ Not collected | No `ACCESS_FINE_LOCATION` or `ACCESS_COARSE_LOCATION` permission in manifest |
| Health & fitness data | ✅ Not collected | No health-related APIs or permissions |
| SMS/MMS messages | ✅ Not collected | No `READ_SMS` or `RECEIVE_SMS` permission |
| Calendar | ✅ Not collected | No calendar permissions or APIs |
| Contacts | ✅ Not collected | No `READ_CONTACTS` permission |
| Photos/videos | ✅ Not collected | No `CAMERA` or `READ_MEDIA_IMAGES` permission; Coil only loads from URLs |
| Files & documents | ✅ Not collected | No file access permissions |
| Financial info (credit card numbers) | ✅ Not collected | Stripe Checkout is handled via web; no payment card data touches the app |
| Biometric data (fingerprint) | ✅ Not collected | Biometric auth uses platform biometric prompt; no biometric data collected by app |
| Browsing history | ✅ Not collected | No web browsing functionality |
---
## 3. Third-Party SDK Data Collection
### 3.1 Firebase Cloud Messaging (FCM)
- **Provider:** Google
- **Data collected by SDK:** Device token, IP address, push notification delivery status
- **Purpose:** Push notification delivery
- **Data shared with third parties:** Google (for notification delivery)
- **User control:** Can disable notifications in system settings or in-app preferences
### 3.2 Firebase Crashlytics
- **Provider:** Google
- **Data collected by SDK:** Crash traces, device model, OS version, app version, stack traces, timestamps, device locale
- **Purpose:** Crash reporting, app stability monitoring
- **Data shared with third parties:** Google (Firebase)
- **User control:** Crashlytics collection can be disabled; enabled by default in release builds
### 3.3 Google Sign-In
- **Provider:** Google
- **Data collected by SDK:** Google account email, profile name, avatar URL, OAuth tokens
- **Purpose:** Authentication, account creation
- **Data shared with third parties:** Google (OAuth flow)
- **User control:** User must explicitly tap "Sign in with Google" to initiate
### 3.4 OkHttp / Retrofit
- **Provider:** Square, Inc.
- **Data collected by SDK:** HTTP request/response data
- **Purpose:** API networking
- **Data shared with third parties:** None (logs are sanitized locally — tokens, emails, phones redacted)
- **User control:** N/A
### 3.5 Stripe (via web backend)
- **Provider:** Stripe, Inc.
- **Data collected by SDK:** None directly on Android; payments handled via Stripe Checkout in web view
- **Purpose:** Payment processing
- **Data shared with third parties:** Stripe (when user initiates purchase via web view)
- **User control:** User initiates payment voluntarily
### 3.6 Coil Image Loader
- **Provider:** Coil (open source)
- **Data collected by SDK:** None (local image caching only)
- **Purpose:** Image loading and caching
- **Data shared with third parties:** None
- **User control:** N/A
---
## 4. Security Practices
| Practice | Status | Evidence |
|----------|--------|----------|
| **Encryption in transit** | ✅ TLS 1.2+ | `network_security_config.xml` enforces TLS, disables cleartext |
| **Certificate pinning** | ✅ Implemented | SHA-256 pin hashes for `api.kordant.com` and `staging.api.kordant.com` |
| **Encryption at rest** | ✅ AES-256-GCM | `EncryptedSharedPreferences` with `MasterKey` in Android Keystore |
| **Auth token storage** | ✅ Encrypted | Access and refresh tokens in `EncryptedSharedPreferences` |
| **PII storage** | ✅ Encrypted | User profile JSON in `EncryptedSharedPreferences` |
| **Phone number storage** | ✅ SHA-256 hashed | Phone numbers hashed before SQLite storage in `SpamDatabase` |
| **API log sanitization** | ✅ Implemented | Tokens, emails, phone numbers, passwords redacted from logs |
| **Secure deletion** | ✅ Implemented | `secureOverwriteAndRemove()` overwrites keys before removal |
| **GDPR right to erasure** | ✅ Supported | `clearAllData()` removes all local data including preferences |
| **Root detection** | ✅ Implemented | `SecurityChecker.kt` — su binary, Magisk, Busybox, test-keys, emulator detection |
| **Input validation** | ✅ Server-side | Auth error messages mapped generically (`AuthErrorMapper`) |
---
## 5. Data Retention & Deletion
| Data Type | Retention | Deletion Mechanism |
|-----------|-----------|-------------------|
| Auth tokens | Until logout or token expiry | `clearAllAuthData()` or `clearAllData()` |
| Cached user profile | Until logout or overwrite | `clearUserProfile()` or `clearAllData()` |
| FCM device token | Until logout | `clearAllData()` removes token |
| Spam database | Until user clears or app uninstall | `SpamDatabase.clearAll()` or app data clear |
| Call logs (anonymized) | 7-day stats window | Auto-purged; can clear via app settings |
| User preferences | Until changed or app uninstall | `clearAll()` on DataStore |
| Crashlytics data | Per Firebase retention policy | User can request deletion via Firebase console |
| Backend data | Per server retention policy | User can request account deletion via settings or `privacy@kordant.com` |
---
## 6. Permissions Justifications
| Permission | Purpose | Required for Core Feature? |
|-----------|---------|---------------------------|
| `INTERNET` | API communication | Yes |
| `ACCESS_NETWORK_STATE` | Network status checks | Yes |
| `POST_NOTIFICATIONS` | Android 13+ notification permission | Yes |
| `READ_PHONE_STATE` | Call screening, incoming call detection | Conditional (Call Screening) |
| `ANSWER_PHONE_CALLS` | Call screening service | Conditional (Call Screening) |
| `RECORD_AUDIO` | VoicePrint enrollment | Conditional (VoicePrint) |
| `RECEIVE_BOOT_COMPLETED` | Re-schedule background sync | Yes |
| `FOREGROUND_SERVICE` | Call screening foreground service | Yes |
| `WAKE_LOCK` | Background sync processing | Yes |
| `UPDATE_WIDGETS` | Home screen widget updates | Conditional (Widget) |
| `BIND_CALL_SCREENING_SERVICE` | Android 10+ call screening role | Conditional (Call Screening) |
---
## 7. Google Play Data Safety Form Answers
### 7.1 Data Collection Overview
| Google Category | Collected? | Data Types | Purposes |
|----------------|-----------|-----------|----------|
| **Location** | ❌ No | — | — |
| **Personal info** | ✅ Yes | Name, email, phone, user ID | App functionality, personalization, account management |
| **Financial info** | ⚠️ Indirect | Payment method via Stripe web checkout | Payment processing (handled off-device) |
| **Health & fitness** | ❌ No | — | — |
| **Messages** | ❌ No | — | — |
| **Photos & videos** | ❌ No | — | — |
| **Audio files** | ✅ Yes | Voice recordings | App functionality (VoicePrint) |
| **Files & docs** | ❌ No | — | — |
| **Calendar** | ❌ No | — | — |
| **Contacts** | ❌ No | — | — |
| **App activity** | ✅ Yes | App interactions, search history, installed apps (security check) | Analytics, fraud prevention, security |
| **Web browsing** | ❌ No | — | — |
| **App info & performance** | ✅ Yes | Crash logs, diagnostics, other performance data | Analytics, fraud prevention |
| **Device & other IDs** | ✅ Yes | Device ID, FCM token | Analytics, fraud prevention |
### 7.2 Data Sharing
**Does the app share data with third parties?**
- ✅ Yes — Firebase (Google) for crash reporting and push notifications
- ✅ Yes — Stripe (when user visits billing portal web view)
- ❌ No — The app does not sell user data
### 7.3 Security Practices
| Question | Answer |
|----------|--------|
| Data encrypted in transit? | ✅ Yes — All API traffic uses TLS 1.2+ |
| Data encrypted at rest? | ✅ Yes — AES-256-GCM via EncryptedSharedPreferences |
| User can request data deletion? | ✅ Yes — Account deletion available in settings and via privacy@kordant.com |
| Independent security review? | ⚠️ Pending — External security audit planned before production launch |
---
## 8. Third-Party SDK Declaration
| SDK | Data Types | Purposes | Collected? |
|-----|-----------|---------|-----------|
| Firebase Cloud Messaging | Device ID, device token | Push notifications | Yes |
| Firebase Crashlytics | Crash logs, device info, app version | Crash analytics | Yes |
| Google Sign-In | Name, email, avatar | Authentication | Yes (user-initiated) |
| Stripe (via web) | Payment card info | Payment processing | No (off-device) |
---
## 9. Privacy Policy Requirements
The privacy policy must cover:
- [x] What data is collected (all types listed above)
- [x] How data is collected (registration, in-app, via SDKs)
- [x] Why data is collected (purposes listed per type)
- [x] How data is stored (encrypted at rest, encrypted in transit)
- [x] Third-party data sharing (Firebase, Stripe, Google)
- [x] User rights (access, correction, deletion, export)
- [x] Contact information (privacy@kordant.com)
- [x] Data retention policy
- [x] Children's privacy (COPPA compliance statement)
- [x] International transfers (GDPR compliance)
- [x] Policy update mechanism
- [x] Accessible without login
---
## 10. Validation Checklist
- [ ] Data Safety form answers match this audit
- [ ] Privacy policy URL is live and accessible without login
- [ ] Privacy policy covers all declared data types
- [ ] Third-party SDKs declared with correct data types
- [ ] Deletion request mechanism works (settings + email)
- [ ] TLS 1.3 is active (verified via network_security_config.xml)
- [ ] All permissions are justified with in-app rationale dialogs
- [ ] Data collection is honest and accurate (no false claims)
- [ ] No location data collected despite no permission declared
- [ ] Voice data collection is explicitly declared
- [ ] Analytics data collection is accurate
- [ ] Security practices documentation is complete