Emmanuel Cardenas

Imitation

Progress: Ongoing

Description:

This project is a voice-enabled chat interface with text-to-speech. This application not only facilitates conversations with AI but also empowers users to customize their interactions through a variety of voices.

Key Features:

OpenAI ChatGPT:

The core of this project revolves around a robust language model, crafted by OpenAI and rooted in the advanced GPT-3.5 architecture. Leveraging OpenAI's API, individuals have the ability to craft prompts and promptly receive corresponding responses.

Speech Recognition:

One effective approach for gathering user prompts involves utilizing speech recognition technology. In this regard, MDN Web Docs' Web Speech API offers asynchronous speech recognition capabilities.

Text-to-Speech:

In order to deliver an immersive user experience, it is crucial that the responses closely resemble human language. Hence, I opted for Azure AI services for speech synthesis as the preferred solution.

Future Updates:

In the pipeline are exciting updates, including the incorporation of character models with lip sync using Azure Visme. This enhancement will provide a visually engaging aspect to the conversations, making the interactions with AI even more immersive and lifelike. Other features in the works involve authentication using Passport.js, Database to store user information using MongoDB, and payment methods using various APIs.

Tech Stack:

To bring this project to life, I utilized a blend of technologies. I employed HTML and CSS for the frontend and used Babylon.js to render the character models. For the backend I used a Node.js web framework Express.js to create a smooth and responsive user experience.