首页 > 基础算法 > 字符串处理 > hdu 1377 A Well-Formed Problem-字符串-[解题报告]
2013
12-09

hdu 1377 A Well-Formed Problem-字符串-[解题报告]

A Well-Formed Problem

问题描述 :

XML, eXtensible Markup Language, is poised to become the lingua franca of structured data communication for the foreseeable future, due in part to its strict formatting requirements. XML parsers must report anything that violates the rules of a well-formed XML document. An XML document is said to be well-formed if it meets all of the wellformedness constraints as defined by the World Wide Web Consortium (W3C) XML specification.

XML documents are composed of units called elements, that contain either character data and/or other elements.

Elements may also contain within their declaration values called attributes. Consider the following XML document:

<?xml version="1.0"?>
<customer>
<name>
<first>John</first>
<last>Doe</last>
</name>
<address>
<street>
<number>15</number>
<direction>West</direction>
<name>34th</name>
</street>
<city>New York</city>
<state-code>NY</state-code>
<zip-code format="PLUS4">10001-0001</zip-code>
<country-code>USA</country-code>
</address>
<orders/>
</customer>

The bold identifiers contained within angle brackets are the elements of the document. The italicized identifier "format" within the "zip-code" element is an attribute of that element. All elements, with the exception of "orders", have a start and an end declaration, also called a tags. The "orders" element is an empty element, as indicated by the "/>" sequence that closes the element, and does not require a separate end-tag. The first line is a processing instruction for an XML parser and is not considered an element of the document.

The rules for a well-formed document are:

1. There is exactly one element that is not contained within any other element. This element is identified as the "root" or "document" element. In the example above, "customer" is the document element.

2. The structure of an XML document must nest properly. An element’s start-tag must be paired with a closing end-tag if it is a non-empty element.

3. The name in an element~{!/~}s end-tag must match the element type in the start-tag. For example, an element opened with <address> must be closed by </address>.

4. No attribute may appear more than once in the same start-tag or empty-element tag.

5. A parsed element must not contain a recursive reference to itself. For example, it is improper to include another address element within an address element.

6. A named attribute must have an associated value.

输入:

The input file will contain a series of XML documents. The start of each document is identified by a line containing only the processing instruction "<?xml version="1.0"?>". The end of the input is identified by a line containing only the text "<?end?>" (this is not a true XML processing instruction, just a sentinel used to mark the end of the input for this problem). As with all XML documents, white space between elements and attributes should be ignored. You may make the following assuptions with regard to the input.

The only processing instruction that will be present is the XML version rocessing instruction, and it will always appear only at the beginning of each document in the input.

Element and attribute names are case-sensitive. For example, <Address> and <address> are considered to be different.

Element and attribute names will use only alpha-numeric characters and the dash "-" character.

XML comments will not appear in the input.

Values for attributes will always be properly enclosed in double quotes.

输出:

For each input XML document, output a line containing the text ~{!0~}well-formed~{!1~} if the document is well-formed, ~{!0~}non well-formed~{!1~} otherwise.

样例输入:

<?xml version="1.0"?>
<acm-contest-problem>
        <title>A Well-Formed Problem</title>
        <text>XML, eXtensible Markup Language, is poised to become the lingua franca of
structured data communication for the foreseeable future. [...]</text>
        <input>probleme.in</input>
        <output>probleme.out</output>
</acm-contest-problem>
<?xml version="1.0"?>
<shopping-list>
        <items>
                <item quantity="1" quantity="1">Gallon of milk</item>
                <item>Frozen pizza
        </items>
</Shopping-list>
<errand-list>
        <errand>Get some cash at the ATM
                <errand>Pick up dry cleaning</errand>
        </errand>
</errand-list>
<?end?>

样例输出:

well-formed
non well-formed

ZOJ:http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemCode=1116

题目大意:

(对XML没有了解,只是按照字面意思理解了一下题意)

<name>与</name>这样有起始结束声明的,name称为tag

<order/>是empty element, 不需要end-tag

第一行<?xml version="1.0"?>不算文档的element

<zip-code format="PLUS4">10001-0001</zip-code>这里的format是zip-code的属性

一个well-formed document要满足下面几条:

1、只有一个element不嵌套在别的element里,例如例子中的"customer" 
2、嵌套正确,不是<order/>那样的,如<name>与</name>要完整配对嵌套
3、如<name>与</name>这样的声明名字要配对
4、一个tag最多一个属性
5、不递归调用,如address element里不能调用address
6、有名字的属性,比如上面的format,要有对应的值

注意大小写不同,如<Address>和<address>应该认为是不同的

输入:

每组数据第一行是 "<?xml version="1.0"?>"。

"<?end?>" 表示全部输入的结束(只是在本题中用来标记输入结束)

输入保证:

 "<?xml version="1.0"?>"会且仅会在第一行出现

element和属性只用数字、字母和短线"-"表示

XML注释不出现

属性的值一定会放在“”里

输出

"well-formed"或者"non well-formed"

解题思路:

字符串处理,主要用到栈和集合。

每次读入到”<“的时候,开始读取tag和attribute,同时判断<>里头的内容有没有问题,比如"/"出现的位置和次数有没有问题,属性的表达有没有问题等。之后就只要进行一些判断即可。下面是针对well-formed document的各条件的解决方法:

1、针对“只有一个element不嵌套在别的element里,例如例子中的"customer" ”

设置bool root=false; 当栈空且有入栈操作时root=true;当root=true且在栈空情况下有入栈操作时返回non well-formed

2、针对“嵌套正确,不是<order/>那样的,如<name>与</name>要完整配对嵌套”

栈,非类似<order/>时入栈,入栈忽略属性,出栈时与栈顶元素

3、针对“如<name>与</name>这样的声明名字要配对”

同上

4、针对“一个tag最多一个属性”

若在<>里读到""(输入保证如果有属性,其值一定包括在""里), 则之后如果再出现非空字符,返回non well-formed

5、针对“不递归调用,如address element里不能调用address”

设置set<string>类型,存放栈内元素,入栈前判断set里是否有相同tag,如果是,返回non well-formed

6、针对“有名字的属性,比如上面的format,要有对应的值”

在<>内如果读到空格,空格后面有字符,则必须有紧跟的="XXX"

源代码:

又吐血了……”狗狗搞完40题“里的代码又好短,泪奔了,为什么每次写蘑菇题我都要写得很长╮(╯▽╰)╭

#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <cctype>
#include <string>
#include <set>
#include <stack> 
#include <iostream>
#include <algorithm>
using namespace std;
#define ERROR non=true;return

set<string> used;
stack<string> s;
bool non, root;

bool gettag(char* &ch, string &tag, bool &end, bool &empty)
// 读到"<"触发gettag, 函数结束后">"已读过 
{
    int p1, p2, p3, p4;
    string attribute;
    
    end=false, empty=false;
    tag=attribute="";
    
    ch++;                                   //get tag
    if (*ch=='/'){end=true;ch++;}
    while (isalnum(*ch) || *ch=='-')      
    {
        tag+=*ch;
        ch++;
    }
    while (isspace(*ch)) ch++;
    if (*ch=='>')                            //end of tag?
        return 1;
    if (*ch=='/')
    {
        int cal=0;
        ch++;
        while (*ch!='>'){cal++;ch++;}
        if (cal>0) return -2;
        if (end) return -1;
        else empty=true;
    }
    while (isalnum(*ch) || *ch=='-' || *ch=='.' || *ch=='\"' || *ch=='=')        //get attribute
    {
        attribute+=*ch;
        ch++;
    }
    if (attribute.length()>0)
    {
        p1=attribute.find("=", 0);      if (p1==-1) return -3;
        p2=attribute.find("\"", 0);     if (p2<p1) return -4;
        p2=attribute.find("\"", p1+1);  if (p2==-1) return -5;
        p3=attribute.find("\"", p2+1);  if (p3==-1 || p3==p2+1) return -6;   //empty value
        p4=attribute.find("\"", p3+1);  if (p4!=-1) return -7;               //extra "
        p4=attribute.find("=", p1+1);   if (p4!=-1) return -8;               //extra =
    }
    while (isspace(*ch)) ch++;
    if (*ch=='>') return 1;
    if (*ch=='/')
    {
        int cal=0;
        ch++;
        while (*ch!='>'){cal++;ch++;}
        if (cal>0) return -10;
        if (end) return -9;
        else empty=true;
    }
    while (*ch!='>'){
        ch++;
        if (!isspace(*ch) && *ch!='>') return -11;
    }
    return 1;
}

void solve(char *ch)
{
    bool end, empty;
    string tag;
    int tmp;
    while (!non && *ch!='\0')
    {
        if (*ch=='<')
        {
            tmp=gettag(ch, tag, end, empty);
            if (tmp==1)
            {
                if (end && empty){
                    ERROR;
                }
                if (empty && used.count(tag)>0){
                    ERROR;
                }
                else if (!empty && !end)
                {
                    if (used.count(tag)>0){
                        ERROR;
                    }
                    if (s.empty())
                    {
                        if (root){ERROR;}
                        else root=true;
                    }
                    used.insert(tag);
                    s.push(tag);
                }
                else if (!empty && end)
                {
                    if (s.empty() || s.top()!=tag){
                    ERROR;
                }
                    used.erase(tag);
                    s.pop();
                }
            }
            else{ERROR;}
        }
        else ch++;
    }
}

int main()
{
    string tag, attribute;
    bool firstdata=true;
    bool end, empty;
    char buf[65536];
    do{
        while (gets(buf)!=NULL && strcmp(buf, "<?xml version=\"1.0\"?>")!=0
                && strcmp(buf, "<?end?>\n")!=0)
        {
            solve(buf);
        }
        if (strcmp(buf, "<?xml version=\"1.0\"?>")==0)
        {
            if (!firstdata)
            {
                if (non || !s.empty()) printf("non well-formed\n");
                else printf("well-formed\n");
            }
            else firstdata=false;
            non=false;
            root=false;
            used.clear();
            while (!s.empty()) s.pop();
        }
    }while (strcmp(buf, "<?end?>")!=0);
    if (non || !s.empty()) printf("non well-formed\n");
    else printf("well-formed\n");
    return 0;
}

解题转自:http://blog.csdn.net/program_shun/article/details/6581798


  1. 算法是程序的灵魂,算法分简单和复杂,如果不搞大数据类,程序员了解一下简单点的算法也是可以的,但是会算法的一定要会编程才行,程序员不一定要会算法,利于自己项目需要的可以简单了解。

  2. 第二种想法,我想来好久,为啥需要一个newhead,发现是把最后一个节点一直返回到嘴上面这层函数。厉害,这道题之前没样子想过。