SIP vs direct TCP sockets - android

I am implementing a real time - multi user voice transfer application for Android.
I have read that, as a standard - RTP packets are encapsulated into SIP and then sent to destination(s). What is the advantage of doing so?
my idea was to use a server, just to receive control messages from nodes and open sockets. All these nodes would be in 1 group. THen, I send out the IP addresses of each of these nodes, so that a single sender can multicast its packets directly to the destination.
is there a fatal flaw here? ( iam not concerned about the power consumption)
How does SIP do better? or does it ?
Thanks

RTP isn't "encapsulated into SIP packets". SIP is a signalling protocol. RTP is a media stream protocol. SIP is used to negotiate and set up (and tear down) media streams.
TCP is a horrible choice for media (RTP) packets; it's not clear from your writing if you're suggesting that.
Multicast is unlikely to work for many network paths/recipients.
routers play merry hell with incoming data; you'll need more than just SIP to deal with users on the open net. See STUN, TURN, ICE, UPnP, etc.

SIP or Session Initiation Protocol is a protocol that was designed specifically to address the problem you're trying to solve. Generally, the reason you should reuse (rather than reinvent the wheel) is because other people have studied the same problem and presumably have come up with a better solution as a collective group than you could as an individual. Of course that's not always true, but generally speaking it holds!
If you want to learn about SIP you could study the RFC 3261 specification, or start with the Wikipedia entry if you want a quick overview.
That being said, if you don't need the overhead of a complete and carefully tested protocol you could roll your own but make sure that when you're making that decision you know what you're foregoing and have a good reason to do so.
SIP is a signaling protocol that usually runs over TCP (although not required) and if you look at it closely you will see it is very similar to HTTP in many respects. Just like HTTP it can transport a great deal of payloads and it does so with text headers, much like HTTP can be used to transport HTML, XML, plain text or any arbitrary binary payload.

In the most simple system, you could just use Voice packets over RTP over UDP.
But you'd have no way to turn audio off, and would have to know IP addresses, port numbers, they type of codec and its characteristics beforehand.
In an overly simple view, SIP is a way to:
1. find the ip address of another endpoint from a URL. (May need STUN, TURN, ICE, etc)
2. agree on which codec to use and its options
There's a lot more to SIP than that, you may want to investigate SIP's conferencing features based on what you wrote.
You may write your own signaling protocol, and if this is for a school project, that will work just fine.
But if you are doing a commercial project, bear in mind that there is a lot more to telephony than meets the eye. The original SIP spec was greatly revised and is now a cluster of RFCs which are still being modified and added to. I recommend that you take advantage of this work rather than possibly reinventing the mistakes others have made.

Related

Does WebRTC causes any load on the broadcaster site?

Imagine someone is broadcasting an audio or video world wide through WebRTC i.e one to many communication (app like periscope which i think is not done using WebRTC). Will it get affected by the broadcasters less bandwidth ? will it increase the load on broadcaster side causing loss of packets which will decrease quality of communication ? As this topic is new and very little content is available on net please suggest some good books and online tutorials.
WebRTC facilitates peer to peer communication. Going by this login a simple WebRTC broadcast application will heavily depend on broadcaster's bandwidth as the number of recipients will be same to the number of outbound media streams.
This is one of the main reasons why WebRTC Gateways or Media Servers or similar terms have been developed. In this case the broadcaster simply sends a single stream to the intermediary Gateway or Server and the other recipients then connect to the Gateway or Server and receive the stream from there.
To put it in simple terms, you basically add a central WebRTC client to which everyone connects.
You can read more at Janus, Kurento, Licode, etc.
Or find official RTCPeerConnection documentation here.

How does a SIP based app scheme work?

This is an abstract question on how the SIP protocol works. Let us say I have a SIP server (Asterisk/Yate). And I have two Android devices that wish to connect to each other to have an audio call. ( I am looking for a purely VoIP call, no need for telephone numbers or carrier information).
How would this work? Do the packets have to pass through the server? or does the connection happen between the end-points. If the packets has to pass through the server, does the SIP server also provides profiles, or do profiles have to be created by a third party?
I need to understand how the scheme works in order to start planning building the system.
I have read lots of technical documentations, but none show an abstraction of the system. If you can provide me with resources, that would be great too.
Thanks
Since your device not know where each other located(ip/port), they call sip server or proxy.
Sip server match dialplan and send request changed(server) or unchanged(proxy) to other side.
In INVITE request each peer send address/port and info about media stream RTP
If that info unchanged(proxy) they can see each other rtp info and send rtp packets directly.
Also there is posible enother INVITE after call bridged,called re-INVITE with info about new stream for rtp(can be sound on other ip/port or video)
There are nothing called profiles in sip standart, sorry.
Anyway seams bad idea start planning voip system if you have limited REAL experience with sip server.
There are alot of articles(including wikipedia), videos on youtube with topic "how sip works", there are no way put all that here in one answer.

VOIP on my Android Application

I want to ask some question regarding the development of my push-to-talk application for Android.
Recently, I've been developing push-to-talk Apps for Android. I use DatagramSocket to send voice and receive voice as a Packet over Local Wireless network (W-LAN). I use peer-to-peer network, so no server.
I don't have a problem with the code, but I don't understand the basic of VoIP theory, so I want to ask some question, hope somebody can give me simple answer :)
1. Is my push-to-talk Apps considered a VOIP-based?
2. There is several VOIP protocol such as SIP, H.323 and many more. If my PTT Apps considered a VOIP-based, and I use Socket-Packet (UDP, am I right?) to exchange voice, then what VOIP protocol that I use? Is it considered RTP protocol?
I would like to understand the theory behind my PTT apps, I understand my java code, but I don't have proper VOIP knowledge.
I've tried to find some information in google, but I still don't understand what is the relation between my PTT Apps and the VOIP technology.
Thanks before, I'm new here and sorry for my english!
I. "VoIP" is a very broad term, but, if your app transfers voice over IP network, it's definitely VoIP one, despite it may use totally proprietary protocols (as e.g. Skype does).
II. VoIP stack is basically split to two meta-layers - 1) signaling and 2) media transport. Each of them is in turn consisting multiple own layers (e.g. for SIP: session, dialog, transaction and transport layer). Examples of signaling protocols are H.323, SIP, MGCP. The most standard media transport is RTP. You can use your own transport; RTP applies specific restrictions (as AVP profile) but is compatible with variety of libraries and other implementations.
There are protocols which use the same socket and the same transport type for both signaling and media; the widely used one is IAX. Most others separate signaling and media, so sockets are separated and likely they have different type. A standard-compliant SIP implementation shall function over both UDP and TCP, and switch to TCP for large requests (>=1300 bytes by default); SCTP is also suggested. For all transports, protocol implementation details as retransmission policy and request timeout are specified in different way, but there is no principal problem with using any correct L4 protocol.
The totally another story is with media transport (under RTP or equivalent). Here, typical TCP manner to transfer all data at the cost of floating delays is really nasty for our ears. TCP is good for bulk traffic class, as file transfer or database interaction. In interactive communication between humans, we prefer more noise and sporadic voice loss than a voice strain. So, TCP is a very bad choice here, and a synchronous transport class shall be used, and UDP is the good default choice. SCTP can also be used as media transport, but with limited retransmit option (not all stacks support it). (There are attempts to use TCP to punch through NAT points but all this is an act of despair.)
If your application supposes sending a voice message more than to one recipient at a time (i.e. a kind of broadcast or multicast), this rejects use of connection-oriented media transports, effectively retaining only UDP. This also requires proper negotiation at signaling level.
III. Selection of voice codec is very platform-specific, I don't use which ones are native for Android. In "big" VoIP, there are licensed set and free set, with very small intersection (AFAIR, G.711 and GSM). Despite this, there are good codecs (e.g. Opus) which can be adapted into wide range of requirements, including partial packet loss.

TCP-based RPC server (Erlang or something similar?) for iOS/Android app communication

I'm building native mobile applications in both iOS and Android. These apps require "realtime" updates from and to the server, same as any other network-based application does (Facebook, Twitter, social games like Words with Friends, etc)
I think using HTTP long polling for this is over kill in the sense that long polling can be detrimental to battery life, especially with a lot of TCP setup/teardown. It might make sense to have the mobile applications use persistent TCP sockets to establish a connection to the server, and send RPC style commands to the server for all web service communication. This ofcourse, would require a server to handle the long-lived TCP connection and be able to speak to a web service once it makes sense of the data passed down the TCP pipe. I'm thinking of passing data in plain text using JSON or XML.
Perhaps an Erlang based RPC server would do well for a network based application like this. It would allow for the mobile apps to send and receive data from the server all over one connection without multiple setup/teardown that individual HTTP requests would do using something like NSURLConnection on iOS. Since no web browser isn't involved, we don't need to deal with the nuances of HTTP at the mobile client level. A lot of these "COMET" and long-polling/streaming servers are built with HTTP in mind. I'm thinking just using a plain-text protocol over TCP is good enough, will make the client more responsive, allow for receiving of updates from the server, and preserve battery life over the traditional long polling and streaming models.
Does anyone currently do this with their native iOS or Android app? Did you write your own server or is there something open sourced out there that I can begin working with today instead of reinventing the wheel? Is there any reason why using just a TCP based RPC service is a worse decision than using HTTP?
I also looked into HTTP pipelining, but it doesn't look to be worth the trouble when it comes to implementing it on the clients. Also, I'm not sure if it would allow for bi-directional communication in the client<->server communication channel.
Any insight would be greatly appreciated.
Using TCP sockets with your own protocol rolled down is quite better than HTTP especially with the nature of resources on the mobile devices. Erlang will do quite well, however lets start from your protocol. Erlang excels well at this especially with the Bit Syntax expressions. However still, you could use plain text as you wish. JSON (would need a parser: Mochijson2.erl found in Mochiweb library) and XML (will need a parser: Erlsom).
I have personally worked on a project in which we were using raw TCP Sockets with our Erlang Servers and Mobile Devices. However, depending on the Port numbers you choose, Routers along the way would block/Drop packets depending on the security policies of service providers. However, i still think that HTTP can work. People chat on Facebook Mobile, send Twits e.t.c from their devices and am sure these social engines use some kind of Long Polling or Server Push or whatever but using HTTP. The mobile devices have advanced in capability of late.
Rolling your own TCP Based protocol comes with a number of challenges: Port selection, Parsing of data both at the client and server, Security issues e.t.c. Using HTTP will let you think of the actual problem than spending time correcting protocol issues at client or server. The Devices you've mentioned above like Android and IOS (Ipad, Iphone e.t.c) are very capable of handling HTTP COMET (Long polling). Am sure when you follow the standards for Web Applications on Mobile devices as well as these W3C Mobile Web Best Practices, your app will function well using HTTP.
Using HTTP methods will quicken the work and there are a lot of libraries on the SDKs of these Devices which would assist you prototype the solution you want as compared to the situation of rolling your own TCP-based plain text protocol. To back up this reasoning, look through these W3C findings.
Let me finally talk of the HTTP benefits on these Devices. If you are to use Web technologies for Mobile devices, such as Opera Widgets, Phone Gap, Sencha Touch, and JQuery Mobile, their SDKs and Libraries have Optimizations already done for you or have well documented ways in which your app can be made efficient. Further still, these technologies have the APIs to access the native Devices' resources like Battery check, SMS, MMS, GSM broadcast channels, Contacts, Lighting, GPS , and Memory; all as APIs in the JavaScript classes. It would become hard (inflexible) if you use native programming languages like J2ME, Mobile Python or Symbian C++ / Qt as compared to using Web technologies like CSS3, HTML5 and JavaScript tools mentioned above. Using the Web tools mentioned above will make your app easily distributable by say Ovi Store or Apple Store, from experience.
Take note that if you use HTTP, testing will be easy. All you need is a public Domain so the Widgets on the mobile device locates your servers over the Internet. If you role your own TCP/IP protocol, the Network Routers may be disruptive against the Port number you use unless you plan on using port 80 or another well known port, but then still your Server IP would have to be made Public. There is a short cut to this: if you put your TCP Server behind the same ISP as your testing Mobile's Internet connection, the ISP routers will see both source and destination as behind its Network. But all in all, there are challenges with rolling your own protocol.
Edit: Using HTTP, you will benefit from REST. Web Servers implemented in Erlang (especially Yaws and Mochiweb) excel at REST services. Look at this article: RESTFUL services with Yaws. For mochiweb, there is an interesting article about: A million User comet application using Mochiweb which is broken into 3 parts. Further still, you could look at the solution given to this question.
There are ZeroMQ builds for android and iOS. Java and ObjC bindings exist as well.
HTTP was created for infrequent requests with large responses. It is highly inefficient for transferring very big amounts of small data chunks. In typical situation, http headers can be twice in size of actual payload. The only strong side of HTTP is its habitualness, its 'One size fits all' karma.
If you want lightweight and fast solution, I guess ZeroMQ can be a perfect solution.
One reason to go with HTTP instead of a custom service is that it's widely supported on a transport level.
With mobile devices, a user might be on Wi-Fi at a hotel, airport, coffee shop, or corporate LAN. In some cases this means having to connect via proxy. Your application's users will be happiest if the application is able to use the device's proxy settings to connect. This provides the least surprise -- if web browsing works, then the application should work also.
HTTP is simple enough that it isn't difficult to write a server that will accept HTTP requests from a custom client. If you decide to go this route, the best solution is the one that you don't have to support. If you can write something in Erlang that is supportive of application changes, then it sounds like a reasonable solution. If you're not comfortable doing so then PHP or J2EE gets bonus points for the availability of cheap labor.
While HTTP does benefit from being widely supported, some successful projects are based on other protocols. The Sipdroid developers found that persistent TCP connections do greatly improve battery life. Their article on the topic doesn't address the server side but it does give a high-level description of their approach on the client.
Erlang is very well suited for your use case. I'd prefer using TCP over HTTP for the sake of saving battery life on the phone as you noted already.
Generally getting the communication between device and server up and running will be very easy. The protocol which you are using between the two is what will require most work. However writing protocols in Erlang is strikingly straight forward when using gen_fsm
You should checkout metajack's talk at the Erlang Factory which highlights his solution to a very similar use case for his iPhone game Snack Words.
I work on a application that connects to a Microsoft http server with long lived http/https connections to mobile devices to allow for push type data to be sent to the mobile. It works but there are lots of little gotcha's on the mobile side.
For the client to get 'packets' of data, we put the http connection into Chucked Encoding mode so that each packet is in one chucked packet.
Not all native http API services on each mobile will support calling you back when a 'chuck' of data has arrived, on the ones that don't normally wait until all the data from the server has arrived before calling the application back with the data. Platforms that support callbacks with partial data are (that I have found):
Symbian
Windows Mobile
Platforms that don't support partial data callbacks:
IOS
Blackberry
For the platforms that don't support partial callbacks, we have written our own http connection code with chucked encoding support using the native sock support. It's actually not very hard.
Don't rely on the fact that one chuck is one of your packets, http proxies or the native http api implementations may break that assumption.
On IOS with this background multitasking rules, means you can't keep this connection going while your application is in the background. You really need to use Apples Push Notification service and live by it's limitations.
Never trust mobile cellular networks, I have seen the weirdest stuff going on like the server side seeing the http connection drop and then reconnect (and replay of the original http request) while on the mobile end you don't see any drop in the connection. Basically treat the connection as unreliable where data can go missing. We ended up implementing a 'tcp' like sequence number scheme to ensure we didn't lose data.
Using http/https makes it easier to get past firewall rules on customer sites.
I'm not sure using http/https long-lived connections was the wisest decision we ever made, but it was made long before I turned up so I have to live with the fall-out of it.
As a alterative, we are looking at web sockets as well, but with the web-socket spec in the state of flux atm and generally being not to good to follow, I don't know if it will work out or not.
So that is my experience with using http/https as a long-lived realtime connection.
Your milage may vary.
It all depends on what data you are sending - the size of it, the criticality of timeliness, frequency of update etc.
If you are looking for a reasonably lazy update and verbose data (JSON say) then go with a HTTP comet pattern, as you will find it much easier to navigate standard network gear as other answers have highlighted. If you are behind a corporate firewall/proxy for example, http will be a much safer bet.
However, if you are doing fast things with small data sizes then go with something homegrown and leverage a TCP connection. It's much more to the point and you'll find the performance in real terms much better. Simple data structures and use fast operators to slice you data up as you need it.
Again as other posters have noted, battery usage is a big concern. You will eat a battery by literally burning a hole in your pocket if you are not careful. It is very easy to turn a battery that lasts 2 days into one that will last 6hours.
Lastly, don't trust the network if you are time sensitive. If you are not then a long poll over HTTP will be just fine for you. But if you are looking for high performance messaging, then be acutely aware that a mobile network is not an end-to-end TCP connection. Your requests will varying in trip time and latency.
So back to what you want to do with the app. As you are building for iOS (native obviously dictated) and Andriod, I would leverage Apple Push Services and their notification framework. Build you back end services to talk to that and also provide interfaces for non-apple devices (i.e. http or tcp level listeners). That way one platform and multiple 'gateways' for your apps. You can then do RIM via their push service too if you wanted to.

Push to talk with Android

I want to add one feature of Push To Talk kind of application for communication between my Team in my application. Beside this I also need some kind of text messaging. But I want it to be able to work in Gprs.I found that SIP API can be used for making voice calls but it says that it Requires WIFI. I want to make it run on Wifi as well as GPRS.
Can somebody give me some idea where to start from?
Push To Talk in SIP is just a regular call, with RTP doing the tricky floor control.
There's usually a media server involved broadcasting the voice bursts to all participants to save on the scarce upload bandwidth. The server usually has a public address simplifying NAT traversal for participants.
But if you are rolling your own, and don't need interoperability with other SIP services or IMS, and the whole thing resembles instant messaging more than phone calls, XMPP might be a simpler option.
I'm not sure about the Android aspect, but apart from the new, built-in SIP support which might be limited on purpose, there's always the SIP stack from SIPDroid, right?

Categories

Resources