Content filtering on AI girlfriend platforms determines which conversations proceed smoothly and which hit a wall. The rules shape user experience in ways that are not always transparent, especially when automated systems make split-second decisions about what crosses the line.
Janitor AI uses a multi-layered approach to content moderation. The platform scans user prompts before the AI generates a response, checking for keywords and semantic patterns that suggest prohibited themes. After the AI produces text or images, a second layer of classifiers reviews the output. This dual-stage process aims to catch violations early while minimizing false positives that frustrate users.
Prohibited Content Categories
The platform explicitly bans several types of material. Illegal activities top the list: violence, drug use, and child exploitation trigger immediate blocks and may result in reports to authorities where required by law. Hate speech, harassment, and discrimination based on race, gender, or other protected characteristics also fall under zero-tolerance policies.

Non-consensual themes represent another major category. Scenarios involving rape, incest, or coercion are filtered out during the prompt-scanning phase. Real person impersonation is likewise prohibited. Users cannot create characters based on celebrities or private individuals without consent, a rule designed to reduce legal liability and ethical concerns.
Explicit content that falls outside these categories may be allowed on the desktop version but restricted on mobile. App store policies force stricter filtering for mobile users, creating a two-tier experience. Desktop users access content that mobile users in restricted mode cannot, a discrepancy that has sparked complaints on forums like Reddit.
How the Filtering Mechanism Works
Janitor AI relies on a combination of keyword lists and machine learning models. Pre-generation scanning uses tools similar to OpenAI's Moderation API, which assigns risk scores to phrases and sentence structures. If a prompt exceeds a threshold, the system blocks it and displays a warning.

Post-generation review involves automated classifiers trained to detect policy violations in completed responses. Human moderators step in for edge cases or when users appeal a decision. The platform promises to review reports within 24 hours, though actual response times vary based on volume.
In April, I attended a webinar on the ethics of AI companionship where the speaker stressed that data privacy should be a top priority. Only a handful of companies provide transparency reports about how they use machine learning to refine emotional simulation. User experience hinges on whether these algorithms respect personal boundaries, a point that resonated as I compared different platforms' approaches to content moderation.
Data Handling and Privacy Implications
The platform collects user profile information, chat logs, voice recordings, image prompts, and payment data processed through third-party services. All data is encrypted at rest using AES-256 and in transit via TLS 1.3. Servers comply with GDPR standards and are located in the EU and US.
Chat logs are retained for 90 days after account deletion, while anonymized analytics remain indefinitely. Third-party sharing occurs only with explicit user consent for personalization, and aggregated data may go to research partners. Users can access, rectify, or delete their data through account settings, and they may opt out of marketing-related processing.
This retention policy matters because conversations with AI companions often include personal details. Knowing that logs disappear after 90 days offers some reassurance, though users should assume that anything typed could be reviewed by moderators or flagged by automated systems.
Common Triggers for Access Restrictions
Users report sudden access restrictions for reasons that are not always clear. The most frequent trigger is repeated attempts to bypass filters, such as rephrasing prohibited prompts or using coded language. The system tracks patterns and escalates enforcement after multiple violations.
Billing disputes also lead to temporary locks. If a chargeback is filed or payment fails, the account may be suspended until the issue is resolved. Age verification failures represent another cause: users who cannot provide valid government-issued ID or whose documents raise red flags face permanent bans.
False positives occur when legitimate prompts are misclassified. A user discussing a character's backstory involving trauma might trigger the non-consent filter, even if the scenario is handled tastefully. The platform offers an appeal system, but resolution can take days.
Differences Between Desktop and Mobile
Mobile users face additional restrictions due to app store content policies. Apple and Google require stricter filtering for apps distributed through their platforms, which forces Janitor AI to implement a restricted mode on mobile devices. Desktop users enjoy lighter filtering because the website is not bound by these rules.
This split creates confusion. A conversation that works seamlessly on desktop may hit a block on mobile, leading users to believe the platform is inconsistent. The reality is that two separate rulesets are in play, one dictated by the company and one by external gatekeepers.
User Reporting and Penalties
An in-app report button allows users to flag content they believe violates policies. Reports are reviewed by a mix of automated tools and human moderators, with a stated turnaround of 24 hours. In practice, complex cases take longer.
Penalties escalate based on severity and frequency. A first-time minor violation typically results in a warning and a temporary content block. Repeated offenses lead to temporary suspension, usually lasting seven to 14 days. Serious violations, such as attempts to generate illegal material, result in permanent bans and possible legal action.
The platform publishes transparency reports quarterly, though these documents focus on aggregate statistics rather than individual case details. Users seeking more information about a specific restriction must contact customer support, which can be slow to respond during peak periods.
Adjusting Content Settings
Desktop users can modify some filtering preferences through account settings. Options include toggling explicit content warnings and adjusting the sensitivity of the automated classifiers. Mobile users have fewer controls due to app store requirements.
Customization is limited compared to open-source alternatives. Users cannot disable the filter entirely, and attempts to manipulate settings to bypass restrictions may trigger enforcement actions. The platform prioritizes compliance over flexibility, a trade-off that frustrates advanced users but protects the company from regulatory risk.
For those seeking a balance between safety and creative freedom, understanding these boundaries is essential. Knowing what triggers a block helps users craft prompts that stay within acceptable limits while still achieving the desired interaction. The system is not perfect, but it reflects the platform's effort to navigate a complex regulatory landscape while maintaining a functional service.
Comments
No comments yet.
Leave a comment
Your email will not be shown. Comments are reviewed before they appear.