Need help with setting up Tensorboard for RVC!

Hello all, I need some help with Runpod. I am trying to get tensorboard to work when using either a secure or community GPU pod. I have no idea how to get tensorboard working. Do I need an SSH Server? I was trying to follow this guide - https://blog.runpod.io/how-to-achieve-true-ssh-on-runpod/ but I have no idea where I get a public key. If anyone here is experienced with tensorboard and knows how to make it work on Runpod, then I'd be grateful. When I do get it install and running on a GPU pod, it gives me a local host link (which is something I cannot use on a remote server) I will be playing around with it for just a few more minutes, before I just give up.
RunPod Blog
How to Achieve True SSH in RunPod
RunPod allows you to get a terminal access pretty easily, but it does not run a true SSH daemon by default. There are plenty of use cases, like needing to SCP or connecting an IDE that would warrant running a true SSH daemon inside the pod. We'll go through the
59 Replies
justin
justin11mo ago
https://discord.com/channels/912829806415085598/1194711850223415348 I just updated my resource for this! LOL GREAT TIMING @Smack Me Harder ❤ can read my github repo the SSH section and not do the whole how to achieve true SSH setup make ur own private keys / public key stuff, and just use their CLI tool
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@justin okay, guessing it is this link - https://github.com/justinwlin/Runpod-Tips-and-Tricks/tree/main/SSH%20On%20Runpod will be reading it and see if it works. This is all new to me and I am not very familair with the process. Will let you know whether it is working or not. Thank you.
GitHub
Runpod-Tips-and-Tricks/SSH On Runpod at main · justinwlin/Runpod-Ti...
Runpod Tips and tricks repository. Contribute to justinwlin/Runpod-Tips-and-Tricks development by creating an account on GitHub.
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@justin are you familiar with what Tensorboard is? If not, this might not be able to help. Still reading through it.
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@justin not sure what to do here
No description
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
this isn't working. I don't know what I am doing, so I am just going to close the pod and wait for an answer from someone
justin
justin11mo ago
@Smack Me Harder ❤ that isnt ur pod id
justin
justin11mo ago
No description
justin
justin11mo ago
ur pod id is this underneath ur pod has an ID: <POD ID> No not familiar with it
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Yeah, I appreciate you helping. I need Tensoboard to monitor the training of an AI model. I get a local host link that I cannot open because the connection is refusing to connect (guessing it is for security reasons). I figured based off google searches I needed an SSH server to host the tensorboard link, so I can actually use it. Guess tensorboard can't be used for Runpod.
justin
justin11mo ago
no u can what are u using as ur template? lol u can definitely open local host and connect to it if tensorboard can run on linux it can run on runpod If you know the steps on how to set it up on a fresh machine you can use a runpod pytorch template when u connect to it will open a jupyter notebook on port 8888
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Normally, I use pytorch 2.0.1
justin
justin11mo ago
and run whatever terminal commands and stuff u need for additional stuff yea yea use the runpod template? if ur not already
justin
justin11mo ago
No description
justin
justin11mo ago
then u can get a nice web gui to do whatever u need there
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
then do I try --port 8888?
justin
justin11mo ago
dont worry about that I guess first first just get it running lol and then if u get it launched u can restart the pod, get it running, and do the rebindings later https://discord.com/channels/912829806415085598/1207848538629742623 here is an example I have for Ollama which launches on some port for a backend server and i binded to it
justin
justin11mo ago
but ur current situation id just start up a pytorch templaste when u do the connect button, it will have a jupyter labs button u can connect to and probably if u know how to install it on some other machine im guessing u used terminal / some jupyter lab since this sounds related to tensorflow go through the setup on runpod get it just running somewhere and worry about binding and hosting laster
justin
justin11mo ago
https://ngrok.com/ If needbe an easier way if u do get something through a public link is also ngrok
ngrok | Unified Application Delivery Platform for Developers
ngrok is a secure unified ingress platform that combines your global server load balancing, reverse proxy, firewall, API gateway and Kubernetes Ingress Controller to deliver applications and APIs.
justin
justin11mo ago
i saw someone else use this then u dont need to mess with port configurations and stuff and just let it tunnel your traffic in and out for u if u get somethig running on some port it sounds like
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@justin It's sort of similair to this - https://stackoverflow.com/questions/38464559/how-to-locally-view-tensorboard-of-remote-server but it's for RVC - https://github.com/Mangio621/Mangio-RVC-Fork If you look at the first link, you'll see some ssh stuff and where they talk about ports. I just added that 6006 port under TCP (may not be right idk). If I were to do tensorboard --logdir logs --port 6006 (in the directory where the folder is) then I am hoping that it will open the page up. I am basically doing this all blind. Now, for months I was just training without the tensorboard (couldn't get working), and now I am trying to see if it is actually possible. So, while I don't necessarily need it, it would be helpful for monitoring the training process. Otherwise, I just launch my python script. Open the web-ui for RVC and start training.
Stack Overflow
How to locally view tensorboard of remote server
Using my own laptop to run Tensorflow on remote server of lab I used tensorboard --logdir=./log try to view curves of the running results I got:
Starting TensorBoard on port 6006 (You can
GitHub
GitHub - Mangio621/Mangio-RVC-Fork: CREPE+HYBRID TRAINING A very ...
CREPE+HYBRID TRAINING A very experimental fork of the Retrieval-based-Voice-Conversion-WebUI repo that incorporates a variety of other f0 methods, along with a hybrid f0 nanmedian method. - Mangi...
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Will look at the ports exposed link real quick
justin
justin11mo ago
i see, to be honest. yea. idk. if u can run a script and if it does bind on port 6006, then u would just follow the expose port guide on runpod for the TCP as long as its binded to 0.0.0.0:6006 specifically if it is on 1.127.0.0 this doesnt mean to connect to outside networking ports i am not too familiar with entworking that is just what ik lol
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
lol all good. Going to see if the tcp port 6006 works and if not, then I'll just go back to how I use runpod normally. Appreciate it.
Madiator2011
Madiator201111mo ago
you do not need true ssh for tensorboard
ashleyk
ashleyk11mo ago
Yeah tensorboard is a web app, not a terminal app
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@Madiator2011 [EU] @ashleyk Well idk how to launch it/get it working. If either of you do, then I'd appreciate it if you can explain how to get it working. Like I've said before, I go to start tensorboard. It gives me a localhost link, I then click on it and it tells me connection refused.
Madiator2011
Madiator201111mo ago
Did you added --bind_all
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
I did. I tried changing the port too same thing
Madiator2011
Madiator201111mo ago
tensorboard --logdir=path/to/your/log-directory --bind_all then press the button on connect page
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
I can try that. I did go into the folder where the log folder is (so in this case it would be Mangio-RVC-Fork folder) and then I did tensorboard --logdir logs --bind_all (gave me the same issue. I will do a quick test to see if it works (Could be something I am doing wrong, so no promises it'll work)
Madiator2011
Madiator201111mo ago
btw you should not use link you get in terminal
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
I literally spent hours trying to get it to work just to see if it can because of RVC training. My only option I see to use tensorboard is to go to Paperspace and follow this guide https://aihubdocs.github.io/en/rvc/cloud/training/paperspace/ (Which if you scroll down, you'll see the tensorboard section) I did try this on RunPod, it didn't work.
Paperspace
Last update: Feb 10, 2024
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@Madiator2011 [EU] then how would I launch it after starting it?
Madiator2011
Madiator201111mo ago
I did many RVC trainings
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
and use the tensorboard on RunPod?
Madiator2011
Madiator201111mo ago
not used tensorboard that much but I did some times
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
I also tried Applio (Github) and whenever I try to start training it tells me something about wavs not found or valid, even though they were placed in the right place where I told it). I do the training on RunPod (Faster than Colab, but Tensorboard would be nice if possible) Like I said, not a big deal. It does help tho.
Madiator2011
Madiator201111mo ago
are you using custom template?
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
I mainly stick with this
No description
Madiator2011
Madiator201111mo ago
works fine here
No description
Madiator2011
Madiator201111mo ago
you want to edit pod or template and expose port 6006 like this
No description
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
ah ok let me try that real quick on a new pod
Madiator2011
Madiator201111mo ago
then install tensorboard with pip
pip install tensorboard
pip install tensorboard
Then in terminal you can start it with command
tensorboard --logdir=path/to/your/log-directory --bind_all
tensorboard --logdir=path/to/your/log-directory --bind_all
Make sure to run in temux or screen so it wont get killed.
Madiator2011
Madiator201111mo ago
Then on connect page click 6006 button
No description
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
@Madiator2011 [EU] Alright, thank you. I'll let you know in a few minutes if it works. "Make sure to run in temux or screen" - Could you explain this a little bit more?
Madiator2011
Madiator201111mo ago
NetworkChuck
YouTube
you need to learn tmux RIGHT NOW!!
Spin up your next project with Linode: https://ntck.co/linode –You get a $100 Credit good for 60 days as a new user! I just started using Tmux……it’s amazing! If you use a terminal or CLI in any capacity Tmux will 10x your productivity in 10 seconds. From creating multiple panes and windows with ease to leaving your terminal sessions active as...
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Should I do --port 6006 as well or just what you put
Madiator2011
Madiator201111mo ago
nope as 5005 is defoult port
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
same thing
No description
No description
Madiator2011
Madiator201111mo ago
do not use link from terminal
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
got you sorry it's working just going to see if it loads data when I train, but this perfect thank you.
justin
justin11mo ago
wow congrats! @Madiator2011 [EU] on 🔥 haha
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
actual legend
justin
justin11mo ago
what is rvc?
Madiator2011
Madiator201111mo ago
@Smack Me Harder ❤ it's late for me but if you get issues remaind me tomorrow
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Thank you. Will do. It's a AI model training service/software (don't know the right term). Basically install some stuff from github + some needed packages, and then you throw in .wav files of what you want to clone.
Madiator2011
Madiator201111mo ago
I usually go with batch size 10 A6000 GPU and 250 epochs
Smack Me Harder ❤
Smack Me Harder ❤OP11mo ago
Batch size 8, 16 or 20 depends on the GPU. Alright, it's working. You can consider this solved. Again, thank you for helping me solve this.
Want results from more Discord servers?
Add your server