Skip to content
← pwnsy/blog
beginner21 min readMar 20, 2026

AI Voice Cloning Scams: When the Voice You Trust Isn't Real

ai-security#voice-cloning#ai-security#social-engineering#fraud#vishing

Key Takeaways

  • Voice cloning uses neural text-to-speech systems conditioned on speaker embeddings.
  • Virtual kidnapping is a fraud category that predates AI — scammers have been calling families claiming to have kidnapped a loved one for decades, relying on a combination of psychological pressure, social engineering, and the difficulty of immediately verifying your family member's safety.
  • The critical insight about defending against voice cloning: detection in real time is unreliable.
  • The honest answer: often not, in real time, over phone-quality audio, when the clone is well-made.
  • The current state of voice cloning fraud is not the ceiling.
  • If the call is ongoing and you haven't sent money:.

The call came in at 11:47 PM. A mother picked up to hear her daughter's voice — panicked, crying, explaining she'd been in a car accident and been taken by someone who was demanding ransom. Then a man's voice took over with wire transfer instructions.

The daughter was asleep in her bedroom at home.

This is not a horror movie premise. It's a documented 2023 case from Arizona, one of dozens reported to the FBI that year, and one of thousands that the FTC estimates occurred across the United States in 2023 alone. AI voice cloning — the technology that made this possible — has been commercially available since 2022, requires less than three seconds of source audio to produce a basic clone, and is available to anyone with a credit card.

The FBI's Internet Crime Complaint Center (IC3) received 18,000+ reports of virtual kidnapping scams in 2023. That number doesn't reflect actual prevalence — most victims don't report. The FTC's Consumer Sentinel Network tracks hundreds of millions of dollars in annual losses to voice fraud. AI voice cloning has supercharged an already-running fraud category by making the impersonation so convincing that even people who "know about scams" fall for it.

Understanding how this technology works is no longer optional for anyone with a phone.

How Voice Cloning Actually Works

The Technical Stack

Voice cloning uses neural text-to-speech systems conditioned on speaker embeddings. The process has two components that, taken together, can synthesize speech that is perceptually indistinguishable from the target speaker in normal listening conditions.

Speaker embedding extraction: The system analyzes audio samples of the target voice and produces a high-dimensional vector representation — the speaker embedding — that encodes the distinctive acoustic characteristics of that voice. These characteristics include:

  • Fundamental frequency (F0): the baseline pitch and pitch variation patterns
  • Formant structure: the resonance frequencies of the target's vocal tract, which determine vowel quality and vocal timbre
  • Speaking rate and rhythm: the target's typical pace, pausing patterns, and emphasis
  • Accent features: phoneme-level articulation patterns, characteristic vowel realizations, consonant treatments
  • Voice quality parameters: breathiness, roughness, nasality, vocal fry patterns

Neural TTS synthesis: A text-to-speech model uses the speaker embedding as a conditioning input to generate speech. Rather than speaking in a generic synthetic voice, the model generates audio where all prosodic and acoustic parameters are shaped by the extracted speaker embedding. The output sounds like the target speaker, saying any arbitrary text.

Current state-of-the-art systems (including the commercial products available to anyone) use diffusion models or flow-based architectures that produce audio with far fewer artifacts than the GAN-based systems of 2019-2021. When the output is delivered through phone-quality cellular compression — which already degrades audio fidelity significantly — synthetic artifacts become even harder to detect.

The Three-Second Threshold

This is what makes voice cloning a mass-market threat rather than a targeted attack against public figures.

Early voice cloning systems required extensive high-quality audio. The first neural TTS systems needed hours of clean, studio-quality recordings to produce convincing output. This constrained attacks to public figures — politicians, celebrities, executives — who had substantial recorded audio available.

By 2022, systems like ElevenLabs, Resemble AI, Play.ht, and Microsoft's VALL-E had reduced the requirements dramatically:

  • 3 seconds: Basic clone. Recognizable as the target, but with audible quality issues
  • 30 seconds: Good quality. Convincing over phone-quality audio in most cases
  • 3 minutes: High quality. Difficult to distinguish from authentic audio even on high-fidelity playback

The three-second threshold is the operationally significant one. Three seconds of voice audio is:

  • A voicemail greeting ("Hi, you've reached [name], leave a message")
  • A brief social media video clip
  • A few seconds of background audio captured near an open microphone
  • A short TikTok, Instagram Reel, or YouTube Shorts appearance

For any adult who has posted video content on social media in the past five years, left a voicemail, appeared in any video recording shared publicly, or attended a public event that was recorded, sufficient source audio almost certainly exists in publicly accessible form.

Warning

If you have a TikTok account, YouTube channel, podcast, or have posted videos to any social platform — even once — your voice has been publicly captured in a format sufficient for cloning with current commercial tools. The realistic threat model is not "could my voice be cloned?" but "who would be motivated to do it and what would they do with it?" For most people, the motivation would be financial fraud targeting family or colleagues.

Commercial Tools: Built for Legitimate Use, Weaponized in Practice

Several commercial voice synthesis platforms exist with stated legitimate use cases — audiobook narration, accessibility tools, virtual assistants, video game characters. The same technology is directly accessible for fraud.

ElevenLabs: The platform that made voice cloning mainstream accessible. The Instant Voice Cloning feature requires a voice sample (minimum 1 minute for best quality, shorter for basic clones), produces output in under a minute, and offers API access. In January 2023, Vice's Motherboard published a demonstration using ElevenLabs to clone public figures' voices without consent. ElevenLabs added verification requirements and moderation for certain cloning targets, but the underlying capability remained accessible. By 2024, third-party applications built on the ElevenLabs API were enabling voice cloning with fewer friction points than the native platform.

Play.ht and Resemble AI: Similar commercial services with API access, competitive pricing, and the same accessibility profile as ElevenLabs. All three have been documented in security research as platforms used to generate fraudulent audio.

OpenVoice and open-source models: Open-source voice cloning models including OpenVoice v2 (from MIT and MyShell.ai), XTTS (Coqui TTS), and SV2TTS (from the original "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" paper) run locally with no API costs, no platform terms of service to violate, and no logging. For sophisticated fraudsters, running an open-source model locally is the preferred approach precisely because it leaves no records with a commercial provider.

Underground-specific tools: The fraudster ecosystem has produced dedicated voice fraud tools optimized specifically for phone-quality output, real-time voice conversion (to transform the attacker's live voice into the target's voice during a call), and integration with phone spoofing infrastructure.

Real-Time Voice Conversion

The most sophisticated attack variant uses real-time voice conversion rather than pre-synthesized audio. Instead of recording cloned audio in advance, the attacker speaks into a microphone while software converts their voice to the target's voice in near-real-time, with latency low enough for live phone calls.

Tools in this category include:

  • RVC (Retrieval-Based Voice Conversion): An open-source, widely used model that can be fine-tuned on a target voice with as little as 10 minutes of training audio and run in real-time with consumer hardware
  • W-Okada Voice Changer: A real-time voice conversion application popular in the streaming community, but applicable to fraud with appropriate voice models loaded
  • SO-VITS-SVC: Another open-source model with strong voice conversion quality

A criminal using RVC with a pre-trained model of a target speaker can conduct a full live phone conversation — responding naturally to unexpected questions, adjusting pacing and emotional register — all in the target's voice. The latency is typically 200-500ms, which is detectable in a high-quality audio environment but not over compressed cellular audio.

The Fraud Playbook: How Attacks Are Executed

Virtual Kidnapping: The Emotional Ambush

Virtual kidnapping is a fraud category that predates AI — scammers have been calling families claiming to have kidnapped a loved one for decades, relying on a combination of psychological pressure, social engineering, and the difficulty of immediately verifying your family member's safety.

AI voice cloning transformed this from a crude scam that victims could often see through ("that doesn't sound like my daughter") into a genuinely convincing attack that succeeds at high rates.

The current attack flow:

Target identification and research: The fraudster identifies a family unit through social media. The ideal target: a parent with a young adult child who posts frequently on TikTok, Instagram, or YouTube. Public profiles provide not just voice audio but also context — where the child goes to school, what activities they're involved in (provides plausible "abduction scenario" context), and the parent's contact information.

Voice data collection: Download 30-60 seconds of the "victim's" voice from their social media. Longer is better, but the minimum is achievable from a few short video clips.

Model training: Feed the collected audio to a voice cloning system (ElevenLabs, a local open-source model, or an underground tool). Generate test clips of distressed speech: crying, calling for a parent by name, asking for help. Review and iterate until the output is convincing.

Execution: Call the parent. Play the synthesized distressed audio. Then a human fraudster takes over: "Your daughter has been taken. If you call the police, I'll hurt her. I need $X wired to [account] in the next 30 minutes."

Pressure maintenance: Keep the parent on the phone. This serves two purposes: it prevents them from calling their child to verify safety, and the ongoing emotional stimulation degrades their ability to think critically. The urgency is not incidental — it is the core mechanism of the attack.

The 2023 Arizona case that opened this article was one of dozens documented by reporters and the FBI that year. A 2023 Washington Post investigation documented multiple similar cases across the US, with victims ranging from parents of college students to grandparents. The common element: every victim heard a voice they recognized saying things that only their loved one would say, in a voice that sounded unmistakably like their family member.

CEO Fraud: The Wire Transfer That Wasn't

Business Email Compromise (BEC) is already the highest-dollar cybercrime category tracked by the FBI ($2.9 billion in reported losses for 2023 in the US alone). AI voice cloning adds a new attack vector: the phone call impersonation that bypasses even employees who have been trained to verify unusual email requests.

The 2019 UK Energy Company CEO Fraud — The Baseline Case

This is the earliest widely documented voice cloning fraud. The CEO of a UK energy subsidiary received a call purportedly from his German parent company's CEO, a person he knew personally and whose voice he recognized. The caller — speaking in the executive's voice, with his authentic German accent and characteristic speech patterns — requested an urgent transfer of €220,000 to a Hungarian supplier, citing time pressure before a business deadline.

The UK CEO transferred the funds. The money was forwarded to Mexico within the hour and was unrecoverable.

The investigation revealed the attackers had cloned the German executive's voice from publicly available recordings of speeches and media appearances. Total preparation time estimated by security researchers: approximately 35 minutes of voice data collection and model training. Total execution time: a 16-minute phone call. Return on investment: €220,000 (~$243,000 at 2019 exchange rates).

The $35 Million UAE Bank Fraud (2021)

Criminals used an AI-cloned voice of a company director to convince a bank manager at a UAE bank to authorize a $35 million wire transfer. The fraud involved multiple phone calls and an elaborate backstory about a company acquisition requiring emergency funding. The case was described in UAE court documents and reported by Forbes.

The scale — $35 million from a single fraud operation — represents the upper bound of documented voice cloning losses in a single incident. Law enforcement recovered a small fraction of the funds.

The Ferrari Executive Incident (2024)

Fraudsters using a voice clone of Ferrari CEO Benedetto Vigna contacted a Ferrari executive via WhatsApp in mid-2024. The caller used Vigna's voice, speaking in his specific regional Italian dialect (which Vigna uses), referencing a confidential acquisition deal requiring urgent action.

The targeted executive grew suspicious when the caller refused to answer a verification question about a recent conversation they had allegedly had. The executive asked the caller to confirm the title of a book Vigna had recently recommended — the fraudster could not answer. The fraud attempt was stopped before any money moved.

This case is notable for what worked as a defense: a specific personal question about a recent private interaction that a voice clone couldn't answer. It's also notable for how close it came to succeeding — the voice was described by the executive as genuinely convincing until the verification question.

The 2024 WPP CEO Deepfake (with Voice)

WPP CEO Mark Read was impersonated in a virtual meeting in early 2024 using a combination of a publicly available YouTube video playing on screen and an AI-generated voice. A WPP senior executive was directed to attend a meeting with who they believed was Read, along with another known figure. The attackers attempted to convince the executive to set up a new business and provide personal financial information.

The fraud was detected and no money transferred, but the incident demonstrated the operational deployment of multi-channel deepfake attacks (video + voice simultaneously) against major corporate targets.

The Grandparent Scam: Targeting Elderly Victims

The "grandparent scam" — in which a caller impersonates a grandchild claiming to be in trouble and needing immediate financial help — has existed for years. Pre-AI versions relied on generic emotional pressure and victims' willingness to believe their grandchild might be in trouble. AI voice cloning makes these attacks far more convincing.

The FTC's Consumer Sentinel Network data shows a consistent pattern: losses per grandparent scam victim average $9,000-$11,000, with victims overwhelmingly over 65 and primarily targeted via phone. The voice cloning layer adds a convincing impersonation that the traditional version couldn't achieve.

Why elderly individuals are specifically targeted: research on decision-making under stress shows that older adults have greater difficulty suppressing emotional responses to override deliberative reasoning when under time pressure. The "act now before something bad happens to your grandchild" framing creates exactly the psychological state most likely to produce compliance before verification occurs.

How to Verify — Practical Procedures

The critical insight about defending against voice cloning: detection in real time is unreliable. The audio quality of current clones, delivered over compressed cellular audio, defeats casual human detection at rates that make "listen carefully" an inadequate defense. The effective defenses are procedural, not perceptual.

The Family Safe Word System

This is the single most effective defense against virtual kidnapping and family fraud calls. It costs nothing, requires no technology, and is highly effective when properly implemented.

How to set it up:

  1. Choose a safe word or short phrase with your immediate family — ideally in person, not via text or email
  2. The safe word should be:
    • Something not derivable from public information about your family
    • Not a word that would come up organically in a real emergency scenario
    • Short enough to be remembered under stress (two or three words maximum)
    • Meaningfully random ("purple mongoose" is better than "family safety")
  3. Write it down on paper. Store physical copies at each household.
  4. Do not communicate the safe word digitally unless you are certain your communications are secure
  5. Establish the rule: if someone claims to be a family member in distress, they must provide the safe word before you take any action

How it defeats voice cloning: A voice clone can replicate how someone sounds. It cannot replicate what they know. The fraudster controlling the cloned voice does not know your family's privately established safe word. When you ask for it, they will either provide an incorrect answer or attempt to deflect ("I can't remember it right now, there's no time").

Optional enhancement — the duress word: A secondary code word that means "I am genuinely in danger and speaking under coercion; contact police immediately." This handles the scenario where someone might be forced to provide the real safe word under duress.

Tip

Establish the family safe word in person, not by text, email, or phone. Digital communication channels can be compromised, and you don't want the safe word to appear in a data breach. If you must communicate it digitally, use a disappearing-message-enabled encrypted messenger (Signal with disappearing messages) and have each family member confirm receipt and then delete the message. Write the word on paper as your primary record.

Hang Up and Call Back on an Independently Known Number

This procedure defeats virtually all voice cloning fraud attempts, regardless of how convincing the voice sounds. The key: the call-back number must be one you already have, not one provided during the suspicious call.

For family emergencies:

  1. Tell the caller you need one moment
  2. Hang up (or put them on hold)
  3. Call your family member on their regular phone number from your contacts
  4. Wait for them to answer and confirm their status
  5. If the original call was fraud, your family member will answer normally

For business financial requests:

  1. Tell the caller you'll call them back to confirm the request
  2. Find the requester through your company directory, a previously saved contact, or the organization's official website
  3. Call them on their known number — not any number provided in the original call
  4. Verify the request directly

The pressure tactics deployed by fraudsters — "there's no time," "if you hang up something bad will happen," "don't contact anyone else" — should be treated as confirmation that something is wrong. Real emergencies can survive a 60-second verification call. Real executives making real financial requests will not object to a callback verification. Resistance to verification is itself the red flag.

Challenge Questions for Live Conversations

A voice clone can reproduce how someone sounds but not what they know. Challenge questions using information that is not publicly accessible are effective verification tools for ongoing conversations where you cannot immediately hang up.

Effective challenge question design:

  • Ask about a specific recent private event: "What did we talk about at dinner last Sunday?"
  • Reference a detail only the real person would know: "What's the name of the restaurant we went to for your birthday?"
  • Ask about something shared privately: "What's the code for the safe at the office?"

Ineffective challenge question design:

  • Questions with publicly available answers (birthdays, anniversary dates visible on social media, home addresses)
  • Open-ended questions a skilled scammer could navigate with vague answers ("How have you been?" can be answered generically)
  • Questions about common knowledge rather than personal private knowledge

When you ask a challenge question, listen carefully for the response. A voice clone is a text-to-speech system reading scripted content. If the question is unexpected, the fraudster controlling the system must either:

  • Improvise an answer verbally (they'll deflect or give a wrong answer)
  • Type a response and wait for the TTS to speak it (there will be a noticeable delay)
  • Play the "there's no time for this" pressure tactic

Organizational Procedures for Financial Fraud

For business environments, the defense is institutional, not individual. No employee should bear sole responsibility for detecting a voice clone in real time.

Dual authorization requirements: Any wire transfer, external payment, or significant financial action above a defined threshold requires authorization from two separate individuals via two separate communication channels. A phone call from a voice-cloned CEO cannot complete a transaction that requires written authorization in the accounting system from a second approver.

Call-back verification for voice-initiated requests: Policy-level requirement: any financial instruction received via phone must be verified by callback to a number in the corporate directory before any action is taken. No exceptions for urgency. The policy handles the "but it sounded like the CEO" problem by removing individual judgment from the verification step.

Registered banking details: New payee accounts should require a multi-day approval process regardless of who requests the setup. The most common voice fraud attack involves urgent requests to pay "new" accounts. A policy that makes adding new payees a multi-day, multi-approver process defeats this class of attack entirely.

Wire transfer time delays: A 24-48 hour delay on wire transfer confirmation — during which the initiating employee can be verified by an independent approver — dramatically reduces the success rate of voice fraud. Fraudsters require speed; fraud controls require patience.

Can You Detect a Cloned Voice?

The honest answer: often not, in real time, over phone-quality audio, when the clone is well-made.

Detection accuracy drops substantially when:

  • The audio is compressed (all cellular calls)
  • The clone is of someone whose voice you know well (you over-attribute authenticity to familiar-sounding audio)
  • You're in an emotionally heightened state (the call has already made you anxious)
  • The clone was trained on high-quality source audio (more training data = fewer artifacts)

That said, there are indicators worth knowing:

Acoustic Indicators (Probabilistic, Not Definitive)

Prosody uniformity: Human speech in emotional situations is highly variable — pitch rises and falls, speaking rate fluctuates, words trail off, genuine distress produces authentic acoustic markers (elevated pitch, reduced articulation precision, breath sounds). Current clones handle average speech well but struggle to authentically reproduce the acoustic signature of genuine extreme distress. A "distressed" voice clone often sounds uniformly stressed rather than authentically panicked.

Acoustic environment mismatch: A voice clone is generated in a clean acoustic environment. If the scenario describes someone in a car accident, a chaotic public space, or a stressful physical environment, the voice should have background noise, reverb from the environment, and reduced audio quality from a held or dropped phone. A clean, studio-quality sounding voice in a supposedly chaotic environment is suspicious.

Response latency: Real-time voice conversion adds processing latency — typically 200-500ms above natural speech. In rapid back-and-forth conversation, questions may be answered after a slightly longer pause than natural. Pre-recorded clips played in response to expected cues fail entirely when unexpected questions are asked.

Inability to deviate from script: A voice clone is controlled by a human who must type responses or pre-record expected answers. Ask something unexpected. Ask something specific and recent. Ask about something trivial that only the real person would know. A real person can answer immediately with natural fluency; a clone requires processing time and may give a generic deflection.

Technical Detection Tools

For asynchronous analysis (analyzing a recording after the fact) rather than real-time detection:

Resemble Detect: Commercial API for detecting AI-generated audio. Available with a free tier for limited volume.

Azure AI Content Safety — Audio Deepfake Detection: Microsoft's API for detecting synthetic audio, integrated into the Azure cognitive services platform.

Pindrop Pulse: Enterprise-grade voice authentication and fraud detection platform used by financial institutions and call centers. Analyzes hundreds of acoustic features per call in real time. Deployed by dozens of major banks for call center fraud detection.

AASIST (Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks): Open-source model architecture with strong performance on the ASVspoof challenges. Requires technical implementation but available for self-hosted deployment.

# Example: Using Resemble AI's detection API for audio deepfake analysis
import requests
import base64
from pathlib import Path
 
def check_audio_for_deepfake(audio_file_path: str) -> dict:
    """
    Submit audio to Resemble AI's detection API.
    Returns detection result with confidence score.
    Useful for forensic analysis after a suspected fraud call.
    """
    RESEMBLE_API_KEY = "your_resemble_api_key"
 
    audio_data = Path(audio_file_path).read_bytes()
    audio_b64 = base64.b64encode(audio_data).decode()
 
    response = requests.post(
        "https://app.resemble.ai/api/v2/audios/deepfake_detection",
        headers={
            "Authorization": f"Token token={RESEMBLE_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "audio_src": audio_b64,
            # Optional: specify the model version
            # "model": "latest"
        }
    )
 
    result = response.json()
    # result['data']['is_ai_generated']: boolean
    # result['data']['score']: confidence score 0-1 (1.0 = definitely synthetic)
    return result
 
 
# Usage example
recording = "recorded_call_segment.wav"
detection = check_audio_for_deepfake(recording)
print(f"AI-generated: {detection['data']['is_ai_generated']}")
print(f"Confidence: {detection['data']['score']:.3f}")
Note

No current deepfake audio detector achieves reliably acceptable false-negative rates against all current-generation voice cloning tools in all conditions. Detection models trained on one generation of synthesis technology are regularly defeated by the next generation. These tools are useful for forensic analysis and for flagging suspicious audio at institutional scale, but they are not substitutes for procedural verification controls. The procedural defenses (safe word, hang up and call back, challenge questions) work against all voice fraud regardless of technical sophistication. Detection tools do not.

The Broader Threat Landscape: What's Coming

The current state of voice cloning fraud is not the ceiling. Several developments currently in research or early deployment will materially increase the threat in the next 12-24 months:

Real-Time Conversational AI

Current voice cloning fraud requires a human operator to conduct the conversation, using the cloned voice as an acoustic overlay. The human must know enough about the impersonated person to maintain the fiction under questioning. This is a skilled labor bottleneck.

Real-time conversational AI systems (OpenAI's Voice Mode, ElevenLabs' Conversational AI, and similar products) can conduct natural, responsive phone conversations autonomously. Combining a voice clone with a conversational AI backbone enables fully automated fraud calls that can:

  • Answer unexpected questions with information from the target's social media and public records
  • Maintain consistent backstory throughout a conversation
  • Handle emotional variation and conversational complexity without a human operator
  • Scale to hundreds of simultaneous calls

The capability exists in research and commercial form. Its operational deployment for fraud is a near-term probability.

Multimodal Attacks: Voice + Video + Messaging

The $25 million Hong Kong deepfake fraud (February 2024) combined fake video participants on a video call. The WPP case combined a YouTube video with AI voice. The attack surface is expanding toward fully synthetic identities that can communicate across multiple channels simultaneously.

A fraud operation that combines:

  • AI-generated personalized email (sets up context and trust)
  • Voice-cloned follow-up phone call (reinforces urgency and authority)
  • Fake video call if additional verification is demanded (defeats video verification)

...is a multi-layered attack that defeats each individual defense layer (email verification, phone verification, video verification) by deploying AI at each layer. The defense against this is the same: procedures that require verification through independently established channels that the attacker cannot fake — a hardware security key, a pre-established code known only in person, a callback to a number established before any of the fraudulent contacts occurred.

AI-Augmented Vishing Infrastructure

The underground services market is building infrastructure specifically optimized for AI voice fraud: voice cloning-as-a-service with pre-built models of common impersonation targets (CEOs, bank officials, government agents), real-time voice conversion API endpoints with sub-100ms latency, and automated phone call infrastructure that dials targets from spoofed local numbers and plays cloned audio.

The cost of running a 1,000-target voice fraud campaign with current commercial tools is approximately $500-1,000 (API costs, phone infrastructure rental, a few hours of operator time). The expected return on investment, given documented success rates against unprimed targets, makes this economically viable for criminal enterprises at almost any scale.

What to Do If You're Targeted

If the call is ongoing and you haven't sent money:

  1. Ask for the safe word if you have one established. If they can't provide it, the call is fraudulent.
  2. Tell the caller you need to put them on hold momentarily
  3. Use a second phone to call your family member or the person supposedly on the call
  4. If your family member is safe, hang up on the fraud call and report it

If you've already sent money:

  1. Contact your bank or wire transfer service immediately. Domestic wires can sometimes be recalled within hours; international wires rarely
  2. File a report with the FTC at ReportFraud.ftc.gov
  3. File an IC3 complaint at ic3.gov (FBI Internet Crime Complaint Center)
  4. File a report with local law enforcement for the official record
  5. If you used a wire transfer service (MoneyGram, Western Union), contact their fraud departments directly

Documentation to preserve:

  • The phone number that called you (even though it was almost certainly spoofed, it's part of the investigative record)
  • The time and duration of the call
  • Any account numbers, email addresses, or instructions provided
  • If you recorded any part of the call, preserve the recording

After the incident:

  • Establish a family safe word if you haven't already
  • Brief family members on how the attack works
  • Report the incident; you are almost certainly not the only target from this operation, and your report may help law enforcement identify and disrupt the group

The technology is advancing faster than public awareness of it. The vast majority of people who receive these calls have never heard that voice cloning with three seconds of audio is commercially available. That information asymmetry is what the fraudsters depend on. Close it in your family and in your organization before the call comes.

Sharetwitterlinkedin

Related Posts